Russian Digital Libraries
Journal - 2001 -
Vol. 4 - No 5
|
New Information Technologies
in Systematization and Analysis of Literary Texts
(On example of the Russian short story of the Twentieth century)
G.Y. Martynenko
St. Petersburg State University
Address: St. Petersburg,, 199034, Universitetskaya nab., 11
Phone: +7 (812) 328-9519
Fax: +7 (812) 312-2246
E-mail: gymart@ts4306.spb.edu
Digital Anthology of the Russian Short Stories of the XX Century is currently
being created at the Department of Computational, Applied and Mathematical
Linguistics of St. Petersburg State University. The corpus represents
a full-text database of Russian stories, which is divided into so-called
synchronic groups (sub-anthologies) according to traditional conceptions
of dividing the Russian literature into periods. Sub-anthologies include
selected works of the maximum number of writers, active in correspondent
literary period. For most prominent writers (Chekhov, Bunin, Kuprin, Gorkij,
Sologub, Platonov, Bulgakov, Zoschenko, Shukshin and others) individual
author's anthologies are being made. The system of frequency dictionaries
is built for the whole Anthology, for concrete literary periods and for
works of individual writers. Then each dictionary is exposed to structurisation
based on the system of parameters. Being founded on the information contained
in frequency dictionary, the statistical distributions of the certain
type can be further constructed (depending on what parameters are used
in the role of dependent and independent variables). Analysis of the recent
scientific works and the results of our own investigations allowed us
to determine a rather complete list of parameters, which may be used for
description of text (and corpora) lexicostatistical structure. All parameters
had been tested, and in the result we obtained the list of statistically
consistent parameters, which may be recommended to use for text systematisation
and analysis. Digital Anthology of the Russian Short Stories and results
of textual analysis, carried out on its material represent significant
interest both for traditional researches in the field of the Russian literature,
linguistic poetics and stylistics of the literary texts, and for experts
on cultural heritage and specialists on new information technologies.
The original methodology of texts systematisation and their investigation
on lexicostatistical level can be also successfully used for analysis
of texts in any language and of various genres (not only literary, but
also business, publicistic, scientific texts, etc.).
Martynenko Gregory Y. -
doctor of science in Computational Linguistics, professor of the Department
of Mathematical, Computational and Applied Linguistics of St. Petersburg
State University, Founder and Head of St. Petersburg Stylometrics School.
Martynenko
G.Y.
|