Russian Digital Libraries Journal - 2000 - Vol 3 - Issue 1
|
Digital Audio Collections on the Internet
Liya V. Bondarko, Pavel À. Skrepin, Nina B. Volskaya, Tatyana Y. Sherstinova
Saint Petersburg State University
Introduction
Since 1998 the Phonetics Department of St. Petersburg University, Philological Faculty, has been working on cataloguing and provision of Internet access to the scientific and folklore sound record collections stored in St. Petersburg. Our task is to preserve the unique records by transferring them from tapes to electronic media (CD-ROM), to create complete electronic catalogues of sound record collections, to design Internet sites describing the sound record collections, including search facilities, as well as to publish in the global network the most interesting parts of records and their fragments. The regional and national importance of such projects consists in the preservation of Russian national heritage, combining the sound record collections of St. Petersburg with a unified regional system, the integration of methods and concepts of sound database presentation on the Web, the standardisation and development of a user interface model for specialists in the humanities, which for the most part are not very knowledgeable about computer technologies.
Currently the following collections are to undergo cataloguing: collection of Pushkin's House, collection of the Phonetics Department of St. Petersburg University. These sound records are unique because they contain national folklore, dialects and rare languages of Russia. The earliest records were produced at the beginning of the 20th century.
Catalogue of sound records in St. Petersburg collections
http://www.speech.nw.ru/phonetics/homepage.html
The main idea of the Catalogue is to present information on the sound archives of St. Petersburg in the global network. Currently the collections of Pushkin's House are being catalogued. They are the largest archive of folklore records in Russia with a total playing time of 100 000 hours. This work is carried out within the framework of the international project "Sound Archives on the World Wide Web with Sound Recordings from Saint-Petersburg Collections". The project is supported by the INTAS programme. General information about the collections is given on the Pushkin's House web site:
http://www.pushkinhouse.spb.ru/structure/unit11.shtml
The card catalogue of the sound archive is used to create a computer data base which is later converted into HTML-format for Internet presentation. The sound records are restored and transferred to modern magnetic media (CD-ROM). Samples of different types of records are selected for the Internet presentation. Then the records are deciphered, segmented into phrases and words, transcribed and analysed by specialists in phonetics. The results of processing and the records are available to users via an acoustic database hosted on the site.
The main page contains four main references: 1) technical requirements; 2) general information about the project; 3) text catalogue of records; and 4) acoustic database of record samples.
Text catalogue of records
The text catalogue is a table created as the result of transformation of a relational database, which was completed by the staff of the sound archive on the basis of information contained in card catalogues. The database works in the Microsoft-Access 91/2000 environment. Rows of the table contain descriptions of concrete records, columns – parameters of description.
The main parameters (fields) of a record are: 1) collection number, 2) consecutive number, 3) archive code, 4) title of recording, 5) first line of recording, 6) genre, 7) number of performers, 8) main performers (up to 12), 9) date of recording, 10) place of recording (city, village), 11) region, 12) nationality of performers, 13) language, 14) recording quality, 15) mono/stereo, 16) recording speed, 17) annotation.
The current Internet catalogue includes the following fields: 1) recording number in the search results list, 2) collection number, 3) archive code, 4) title of recording, 5) first line of recording, 6) genre, 7) number of performers, 8) main performer (the first one), 9) date of recording, 10) place of recording (city, village), 11) region, 12) annotation.
The first page of the text catalogue of sound recordings lists the collections and their parts available on the Internet in electronic form. Currently the cataloguing of collections 002, 004, 005 and 009 (recordings made in the Arkhangelskaya and Leningradskaya regions of Russia) is completed. Through a hyperlink the users can access the search facility of a collection selected by them or the search mode for all the collections.
Search facilities
Currently the search facility provides on-line search by genre (prose, epics, songs etc.), place of recording and character line (1 to 50 characters), which is normally included in the title of the record or in the first line.
In order to make the loading of search results from the web more efficient, the result is divided in pages each containing 20 records.
Acoustic database
In order to access the acoustic database of collection samples the user has to fill in an on-line registration form and enter a personal password (http://www.speech.nw.ru/phonetics/reg-form.html). After this, a page is opened which is structured like the main catalogue. In the middle of the page there is a table with record description by standard parameters. Additionally each line of the table has a hyperlink to a whole list of documents about the recording and the recording itself. The acoustic database pages look like traditional Internet pages: two vertical frames – the left one is smaller and serves as menu and the right one contains the information. The system is set to display the orthographic transcription of the recorded text by default. The menu allows the switching to phonetic transcription, which can be displayed either as symbols or as a graphic file.
The recordings can be listened to in one of the following modes: full text, big sections, separate phrases, and separate words. The last two modes allow dynamic work with the materials. In order to listen to a recording the user has to specify the playing mode and then the orthographic transcription of the text with corresponding hyperlinks is displayed in the main window. The user can select a fragment (word, phrase etc.) and the hyperlink will activate a corresponding sound file which is played via a sound browser or a default audio player.
A separate menu option allows the display of phonetic commentary which lists and analyses the specific features of the local pronunciation and general trends characterising the particular regional dialect, genre or historical period. This material already serves as the basis for the research on cohabitation and mutual influence of Russian, Nenets and Komi languages in the Arkhangelsk region of Russia.
Regional variants of the Russian language
The project aims at the creation of an interactive research database of regional variants of the Russian language, including all the regions of Russia and the former USSR. At the same time this system can be regarded as interactive information complex which can serve as the model for the creation of different multi-function library systems. The recording are provided by the Phonetics Department of St. Petersburg University.
The database users have the following on-line options:
- to select a regional variant and switch to the orthographic transcription of the text and then listen to the whole text or any of its fragments (to do that the user needs to specify the initial and final elements);
- the user has access to the database describing the phonetic features of different segments (words and phrases) for each particular regional variant and can listen to sound examples of those features;
- 3) alternatively the user can choose certain phonetic features and find all the regional variants that contain those features.
Each meaningful segment of the database (phrases, words) is a potential subject for research and therefore is followed by a table with description characteristics. The list of such characteristics is not limited. If a user selects one of the meaningful segments from the database all the information on the segment will be displayed (non-normative pronunciation of a word or a syllable, non-standard intonation of a sentence or part of a sentence). Future plans include the possibility for users to add their own commentaries and descriptions on-line. All user additions will be entered into the database.
The Internet system is currently in the development stage. Technical solutions for the implementation of some logical components and their integration into the system have been found. The text corps is in preparation, system modules are being developed. It is planned that the site will become available on the web at the end of 2000 (the URL information will be given at http://www.speech.nw.ru/).
Technical requirements
Since the sound recording catalogues available now contain recordings in the Russian, Nenets and Komi languages, they are oriented to users who have at least some knowledge of Russian, and most of the supporting information is given in Russian, without English translation. The sites use CP 1251 coding with the use of standard Cyrillic fonts except for phonetic transcription which requires a specialised Times Trn3 font. Users can download and install the font from the Web. The technical requirements' page of the Internet site gives complete information about this.
Recordings and their fragments are stored in way-format. To listen to them a user has to have a SoundBlaster card on the PC and a corresponding browser Plug-in or system player. Shortly the recordings will be re-formatted to RealAudio which will allow listening to them in real time, provided that the user has a RealAudio Player installed. Some versions of the RealAudio Player are distributed free by the supplier – RealNetworks Inc.
Access characteristics
There is free Internet access to the text catalogues. Since the recordings themselves are part of the national heritage of Russia, access to them is given upon entry of a personal password which can be assigned to frequent users of the site or upon completion of an on-line user registration form. The recording samples can only be used for scientific and research purposes. All unauthorised use, copying and other commercial use of recordings will be persecuted.
Accessibility to the materials for the majority of users is given by a simple and standardised interface which is essential for philologists, most of whom are still very wary and mistrustful of modern computer technologies.
Conclusion
The Internet-site "The Catalogue of sound recordings from St. Petersburg collections" is under construction. The "Regional variants of Russian" web-site will become available at the end of 2000. There are plans to publish a number of other collections on the web such as Russian Norm of Pronunciation (Moscow and St. Petersburg variants), Sound Dictionary of the Nenets Language, Folktales of the North of Russia, Folk poems of the North of Russia, Folklore of the Volga Germans etc.
During the implementation of the Internet projects it became clear that we need to join forces with different working groups involved in such projects, in order to agree on a joint concept of sound recording presentation, standard formats for their storage as well as text catalogues, to develop a user friendly interface for users specialising in humanities, in on-line and off-line mode.
Additionally, as sound recording take up a lot of space, there are some technical and financial problems with regard to the maintenance of sound sites on the web. For instance the Russian Folklore in modern recordings web-site was opened in September 1999 and was closed several months later due to technical difficulties of the Internet provider. As the complete digitisation of sound recordings and their catalogues is a long process, we find it advisable to create a separate physical server for the integration of different projects on Internet publication of sound catalogues and recordings. The server will be located in the Experimental Phonetics Laboratory of St. Petersburg University, which is one of the leaders in domestic speech technology development.
The projects presented in the paper aim to solve scientific, educational and cultural tasks. Our products are designed for linguists, folklore researchers, specialists in phonetics, librarians, multimedia developers, ethnographists, psychologists, sociologists and other Internet users interested in the sound recordings of speech as cultural heritage.
About the authors
|
Liya Bondarko, PhD., professor., head of Phonetics Department of the St. Petersburg State University, leading expert in general and experimental phonetics, over 200 publications, current interests include: anthropomorphism of linguistic theories, applied phonetics (automatic speech recognition and synthesis), statistical characteristics of sounds, phonetic data organisation for phonological interpretation. |
|
Pavel Skrelin, head of Experimental Phonetics Laboratory, St. Petersburg State University. |
|
Nina Volskaya, Phonetics Department of St. Petersburg State University |
|
Tatyana Sherstinova, specialist in information systems and Internet technologies, Phonetics Department of St. Petersburg State University |
|