Russian Digital Libraries Journal - 2000 - Vol 3 - Issue 5
|
The Retroconversion Process in the Tacis Project: the Library's View
Olga A. Lavrenova Russian State Library
Marie-Elise Freon Jouve SA
1. The retroconversion of traditional catalogues will remain extremely urgent so long as the catalogues of the largest libraries do not go electronic. Then the processes of retroconversion of other library catalogues becomes a simple process of selection of the necessary records and addition of local information.
The collections of the largest Russian libraries are the main constituents of the union catalogues of the country. For example, the General Alphabetical Catalogue of the Russian State Library describes domestic documents, documents of the CIS countries, and also the richest collections of foreign publications in all languages of the world. The catalogues contain unique items of information on the rare and valuable printed and hand-written books since 15 centuries. The size of the alphabetical catalogues only is 15 mln cards which is much more that in any other library of Russia. Besides the RSL maintains a number of union catalogues with information on different kinds of publications from the collections of several large libraries (for example, union catalogue of Russian books 1826 - 1917 ). Only the RSL maintains national union catalogues of foreign maps and musical scores. The catalogues of the main library of the country cover 90 - 95 % of all the national publications.
Currently the RSL together with a number of other libraries is engaged in such projects, as "Digital Library Creation" and "Electronic Document Delivery". It is necessary to note that the absence of a complete electronic catalogue as a search facility for electronic documents and as a tool for remote search of bibliographic records for the order of documents on various media adds difficulties in the implementation of these projects. Therefore together with the digitisation of documents there is a need to form machine-readable bibliographic records for them if such records are not yet in the electronic catalogue. This emphasises the need for retroconversion even more. Certainly, those libraries, which begin the implementation of reader service subsystems will also encounter some problems with system registration of the item circulation status if there is no bibliographic record for a particular publication in the electronic catalogue.
2. Main factors defining the importance of retroconversion:
- Preservation of traditional catalogues: catalogues of the majority of the largest Russian libraries only exist in paper form, therefore if a paper card is lost the path to the corresponding document may be lost forever.
- Access: electronic (machine-readable) catalogues can be accessible to any person in any regions of Russia or worldwide. They can be accessed either from the office of from home. Currently such possibility only exists for certain types of publications received by the RSL since May 1998.
- Besides a card catalogue only allows one way search (depending on the order of cards in the catalogue box), whereas an electronic database is searchable by different parameters.
Other considerations are less important.
The retrospective conversion of catalogues is carried out to take care of the preservation and accessibility of bibliographic information and the creation of electronic catalogues of leading libraries should form the basis for the retrospective conversion of catalogues of other libraries of Russia.
3. This presentation has been prepared jointly with an expert from the French company Jouve – Marie-Elise Freon under the Tacis project "Creation of an information system for the Russian State Library". One of the project activities is a pilot retroconversion of the library catalogues. It has been successfully implemented on a batch of 10,000 records from the Union Catalogue of Russian Books from 1826-1917. The issues of retroconversion preparation and organisation, and choice of supplier were tackled by a team including the RSL and European experts to create a framework for future retroconversion projects.
It should be noted that the retroconversion in our context is not only transformation of traditional records in electronic form, but also their transfer into a new format.
4. The retroconversion includes the following main tasks:
- Choice of the catalogue, description of data structure, bibliographic record elements and format of their presentation;
- The choice of optimal methodology, technology, software and equipment;
- Selection of supplier, definition of financial arrangement and the project timescale;
- Procurement of equipment;
- Preparation of catalogue for retroconversion;
- Preparation of instructions on information tagging, development of dictionaries, and codes for different record elements;
- Transfer of information from the cards into machine-readable form, quality control, editing;
- Provision of access to the database via the local network and Internet.
5. General principles of catalogue retroconversion:
- The data are entered the same way they are presented on the original cards;
- No additional information is entered into the electronic record;
- Some elements of an electronic record can be derived from the traditional record;
- This is not retrospective cataloguing, nor re-cataloguing, but transformation of records into a different form;
- It is recommended to structure information in accordance with international standards.
It is possible to organise the process of retroconversion in several ways. The choice of an optimal method for the library depends on the skills and number of staff, available funding, size of collections and requirements of the library.
6. A pre-project study of the retroconversion object is carried out to specify the targets, assess possible difficulties, define the required quality level.
Decisions are taken on the necessity and feasibility of:
- Tidying up the catalogue or converting all the records;
- Carrying out stock taking of the collections: there's no use in converting cards for the books which are no longer found in the library collections;
- Bar coding of books prior to the record conversion.
The library catalogue analysis includes checking the correspondence between the catalogues and collections, types of catalogues (alphabetical, subject, systematic, topographic), types of cards in the catalogues (hand written, printed, type written etc.).
In the selection of the catalogues priority should be given to the catalogues which:
- Contain the most complete and accurate information,
- Reflect the most important parts of the collections,
- Contain records for the most frequently used publications,
- Have frequent overlaps with other catalogues (if records for the same book can be found in different catalogues).
It is advisable to start with a meaningful part of the catalogue. If it's clear that the retroconversion of the complete catalogue will not be possible within the foreseeable future, it makes sense to choose a meaningful part, so that there can be a final product of commercial value. For instance, if the part chosen is all the records under the letters A-D, there'll hardly be any use for the final batch of records, whereas if the records in question relate to a number of individual or corporate authors we can derive a complete and useful database. Ideally an attempt could be made to market such database on a CD and use the revenue to carry on the retroconversion. If a successful fragment of a catalogue is made available on the Internet it can be used as publicity to obtain further funding for the retroconversion project.
7. Preparation of card catalogues consists in the selection of cards and their preparation.
In particular the following things are necessary:
- To strike out all the information which shouldn't be entered into the database or list such information in the operator instructions (e.g. reference to other catalogues, initials of cataloguers etc.);
- To check whether the same information is not duplicated in several places (e.g. different forms of the name of the author);
- To group all the copies of the same publication on one card.
The following should be done for printed catalogues:
- To check whether the records are made to the same scheme (delete duplicated titles, number them if hierarchic structure is not defined);
- If records contain cross-references, a decision should be taken on whether or not they need to be included in the database;
- If not all the records should be converted, unnecessary records should be deleted, and indices analysed.
8. The retroconversion management includes preparation of requests for proposals, quality control and standardisation.
It is advisable to hold a competition (tender) for the best proposal from potential suppliers, but without such competition it is also possible to make sure that the future supplier understands the problem and is capable of solving it within a reasonable time scale and with optimal "price-quality" ratio.
9. The request for proposals should include the description of:
- The library as an object of automation (collection and reader statistics, information flows etc.);
- Catalogues to be converted (number of cards, physical and logical structure);
- Method of conversion (alternatively the supplier can be asked to prove that the method chosen by them is the best one with regard to the library requirements);
- Bibliographic information on cards, ways of its presentation and standardisation of fields;
- Fields filled in with the information taken from other fields or directories (e.g. country codes, language codes);
- exceptions;
- required quality level (accepted number of errors per batch of records, number of errors which will mean that the batch should be re-entered);
- any kinds of additional processing of converted records (e.g. getting rid of duplications, standardisation of authority records and links, bar coding);
- mandatory procedures of test loading, entry testing etc.;
- required system and standards (ISO 2709, ISO 1001 etc.).
Sample cards should be attached to such a document.
It also makes sense to ask the supplier to carry out a test run of the conversion.
If large amounts of information are converted it is impossible to check each record, therefore quality control should be based on the random selection of records.
ISO 2859 (parts 0,2,3,4) describes the procedure of random sampling for checking of data batches (batches are selected on the basis of the amount of converted records and the time periods, e.g., all the records done in one month), and the principles for the definition of acceptable quality level (percentage of records with errors in relation to the whole number of records).
10. This is how the pilot retroconversion within the RSL-Tacis project was organised.
Clear and detailed specifications were prepared for the retroconversion with the description of processed information for each field. There was also a description of how that information is presented on cards and the rules for the conversion of the information into electronic form.
The specification contained:
- rules of character entry (e.g., capital and small letters);
- how to recognise unnecessary information on cards;
- rules for the distribution of information between fields and sub-fields on the basis of formal characteristics;
- rules of standard transformation of codes in machine-readable form (e.g. shelf-marks);
- rules for the entry of information on different copies of the same document and their location;
- specific cases (periodicals, multi-volume publications etc.).
The specification contained examples of different fields and several examples of complete records with original cards appended.
11. Once the retroconversion is complete it is recommended to carry out control by authority files and bar-code the items.
12. On the basis of the pre-project study and definition of tasks the retroconversion methodology is chosen.
There are different methodologies and ways of their combination for the conversion of card and printed catalogues:
- direct capture of information from the cards or printed catalogues:
- complete conversion into MARC format with manual entry of linear text and MARC tagging (in parallel with the text entry tags can be inserted into the text on cards to be manually entered later);
- manual capture of text information with subsequent automatic tagging (recognition of punctuation marks, character strings, special dictionaries and lexicons);
- if good quality printed catalogue cards are used, OCR (the catalogue should be scanned beforehand) and transformation of records in the standard format manually or automatically;
- use of records from other catalogues;
- alternatively the cards can first be copied to microfiche with further processing and scanning done as described above.
Russian libraries involved in retroconversion projects tend to use manual keying of card text into the databases. The information is entered either from the paper cards themselves or from the images of cards. In some cases though this method is not very efficient. When the text on cards is well printed OCR can be used, or when the operators are low qualified and low paid staff.
This approach requires a lot of investment even when the manual effort is minimised. Low productivity of manual data entry have been demonstrated in many cases both domestically and internationally. This may also give a high percentage of errors. The opposite to the manual information entry is card scanning with subsequent character recognition. This method helps to save considerable resources with well printed cards, powerful software with well developed directories and dictionaries. Nevertheless the automated technology may not work well with the RSL catalogues containing both printed and hand written information. Besides even with good quality cards the use of OCR normally results in 2-5% errors. Therefore the total cost of automatic processing of cards and checking the records may exceed the cost of manual keying within the same time scale.
When the conversion method depends on the quality of the catalogue, mixed technologies are used. There can be different methods for each catalogue within the same library.
The two suppliers that participated in the test run for the RSL pilot retrocnoversion – ProSoft and Giper companies – offered two different approaches to the RSL retroconversion which in a way presented two opposite options.
One of the options was the maximum degree of automation of all the character recognition, MARC tagging, and spell checking with minimal human involvement. This was the offer of Giper. The company has some experience of work according to this methodology. On the other hand ProSoft-M company is relying on its highly qualified specialists who create bibliographic records by keying on image technology. The company implements complete automated technology control, automation of separate routine operations, maximum comfort of the working environment and specialisation of operator staff on different operations. Neither of the companies used direct manual information capture from the paper cards. Scanning was an intermediary stage in both cases. Different technologies were applied only to create machine readable records.
We also made use of information on the technologies implemented by Jouve in France for benchmarking purposes.
Once the test run specifications were ready we carried out a test of the suppliers in May 1999 on a batch of 500 catalogue cards. The test run allowed the potential suppliers to demonstrate their skills and knowledge in the creation of records in the required format and quote the price for the complete pilot conversion. The library and EU experts evaluated the quality and technology offered by the suppliers. The catalogue chosen for the pilot conversion was the Catalogue of Russian Books of the 19th century with a low quality of cards and the technology offered by ProSoft-M proved to be more appropriate.
13. The library passed to the supplier catalogue cards (input). The expected output was not graphic images of the cards, but a batch of machine-readable bibliographic records where all the characters had been entered manually. Furthermore, the record elements had to be tagged by the fields of the chosen format. In the RSL case it was USMARC format.
We can suppose that the majority of older catalogues are similar to the RSL catalogue of 19th century Russian books, i.e.:
- part of the text is printed and part of the text is written in hand in black or coloured pencil;
- information is found on both sides of the card;
- the physical dimensions of the cards are 75x125 mm with a hole in the lower part.
One of the conditions was that the originals of the card could not be taken away from the library and should be scanned on the library premises.
During the scanning a special identification/control number was assigned to each card image, so that the images of the front and back of the card could be linked together.
The retroconversion cycle was divided into the following stages:
- scanning of the cards;
- graphic card image processing;
- generation of MARC records.
14. The quality control includes the following:
- permanent control of completeness and quality of scanning on the basis of random sampling;
- constant spell checking prior to MARC format tagging on the basis of random sampling;
- constant control of information tagging by MARC fields on the basis of random sampling.
In order to check the information entry into the local fields a test loading of records into the library system was carried out. The library evaluated the results of the test loading, and the company made corresponding changes in its working procedures. At the end of the information capture the non-standard cases were sent to the library both in electronic and printed form.
There was also automatic formatting of authority records on the basis of information found on the cards.
The records for the quality control were sent to the library monthly. The quality was checked with the cards used in the information entry. The quality control was done on the basis of random sampling. The number of records used for the quality control depended on the total size of the batch and the desired quality level.
15. The retroconversion issues are discussed in more detail in the article submitted by O. Lavrenova for the publication in "On Friendly Terms with Computer" annex of the "Biblioteka" (Library) journal during the year 2000.
|