Russian Digital Libraries Journal - 2000 - Vol 3 - Issue 2
|
Standardisation of Format Software Tools in Electronic Document Delivery
Viktor P. Zakharov
Library of the Russian Academy of Sciences
In recent years the library collections of both domestic and foreign publications receive much fewer new acquisitions. On the one hand the price of publications and transportation has grown considerably. On the other hand the government funding for culture and science in Russia has practically stopped. During the last 8 years (1991-1999) the number of foreign periodicals acquired by the Library of the Russian Academy of Science has diminished 10 times, and of foreign monographs – 12 times. At the same time due to the wide spread of information technologies the readers become increasingly aware of the existing information sources and their expectations of the libraries change accordingly. This situation raises the issue of the disclosure of library holdings and their integration via intensive inter-library document exchange. One of the most efficient ways out of the situation is the organisation of electronic document exchange between libraries.
Currently there are no standard formats for the presentation of electronic copies, nor unified software and hardware tools. This can be explained in part by the novelty of the technology and in part by the huge diversity of technical tools and technological approaches used in different libraries. Some libraries have document order functions integrated in the library management software they use. An example of such system in Russia can be the one used by INION. Another reason for the technological diversity is that the electronic document delivery (EDD) usually includes several functions: receiving and processing orders, scanning, delivery of electronic copies to customers, electronic copy print-out by the customer, statistics and financial settlement.
At present we see mass implementation of EDD systems in libraries. The issue is widely discussed in professional publications and the number of publications is increasing. A review of publications and detailed analysis of the development history and different EDD models is given in the paper by V.A. Glukhov and O.L. Lavrik "Electronic Document Delivery" (Moscow, INION, 1999).
First Russian EDD services were created quite recently. The main technological networking functions are implemented via Internet protocols and software. Standard scanning programmes supplied with scanners are used for scanning. In most cases electronic copies are sent to clients via e-mail as TIFF files and subsequently they are printed out and given to readers on paper. PDF format is also used. Ftp-servers are used for file transfer too.
The choice of formats is often defined on the basis of bilateral agreements: either the supplier agrees to the customer demands or the customer adapts to the supplier. In this process a number of problems arise which are solved in one way or another by negotiation or trial and error, but all this requires time and effort. Such solutions and agreements are usually bilateral and there are situations when one library sends electronic copies to other libraries in different formats and with the use of different technology. Clearly it can not continue like that - unification and standardisation are needed in the EDD area. With further spread of EDD services this issue becomes increasingly more topical. In 1999 several corporate projects (in particular projects of the Open Society Institute and a project for the creation of an nEDD Service Association) were aimed at the unification of project solutions, including those concerning information support and software.
The Ariel system is widely used both as server and client software for EDD in the West (over 500 libraries worldwide). The Ariel software for EDD was developed by the RLG consortium. The software is delivered in both client and server versions. Ariel workstation is a computer with Windows 95/98/NT (for Ariel 1.0 MS-DOS 5.0 or higher) equipped with a scanner (for server version) and printer and connected to the Internet or e-mail (FTP and MIME protocols are supported). The document to be delivered is scanned, assigned with a name, the address of another workstation is entered and the document gets into the dispatch queue. The nAriel workstation periodically connects to other workstations and sends/receives information. The incoming documents can be previewed on screen in a user friendly internal preview software and then printed out. A wide range of scanners are supported including special non-contact scanners for thick books and fragile paper. The scanning resolution is 150-300 dpi, which is 2-4 times higher than an ordinary fax machine. The images are transmitted without the loss of quality and with a good degree of compression. Electronic copies produced by Ariel system are multi-page TIFF files compressed by Group IV Fax preceded by GEDI (Group of Electronic Document Interchange) heading. There is an option of data import/export in and out of the Ariel system and their further processing by other software.
The participants of the St. Petersburg project, including the Library of the Academy of Science, chose the Ariel system. Nevertheless Ariel can hardly be regarded as a typical project solution for all the Russian libraries. We need to develop a similar domestic product. There have been some attempts to do this, e.g. D2 software complex, but none of them were brought to completion. This delay has both a positive and a negative side. The positive aspect consists in the fact that with the existing experience of work with Ariel system our libraries can better formulate the requirements for such a software product. These requirements should, of course, include some international experience.
In fact we should probably speak about several software products. What is meant by that?
Firstly there need to be several functional complexes each performing one or several closely related technological operations (one sub-system) such as:
- Receipt and processing of requests;
- Search for document addresses;
- Scanning and dispatch;
- Receipt and print-out (or other types of processing);
- Statistics and financial settlement;
- Accumulation and storage of full text documents.
Secondly, it's quite likely that there will be several multi-functional complexes depending on working conditions (technical facilities, operating systems, capacity etc.).
Last but not the least – EDD systems are part of library systems and should be integrated with them, which can also lead to multiple technology use. Therefore we should start with the selection and adaptation of protocols for interaction of different level software/hardware complexes:
- Within one service – between different functional components of EDD system;
- Between EDD and the library software;
- Between the customer and the service provider;
- Between applications and network management processes (protocols).
One of the characteristic features of present EDD services is their autonomy, e.g. a low degree of integration with the general library technologies such as ILL. According to the above mentioned survey by Glukhova and Lavrik only 2 libraries out of 10 surveyed had EDD integrated in the ILL module. We find such separation inappropriate as it entails duplication, irrational use of human resources, poor co-ordination, and lack of adequate information for librarians and users about new possibilities.
We think that the main concept should be the creation of a sub-system of document services to remote users. We should aim at the integration of EDD and ILL. Besides we need to re-define the notion of delivered documents. Documents should include any information materials which a library can make available to users, be it a book, list of bibliography or article from an electronic journal. Programs can also be considered as a type of electronic document. It should be noted that a lot of publications nowadays have annexes in machine readable form which can contain even some software. And all these publications become parts of library collections.
Two ILL protocols have been known and developed for a long time internationally: ISO 10160 Information and documentation -- Open Systems Interconnection -- Interlibrary Loan Application Service Definition and ISO 10161 Information and documentation -- Open Systems Interconnection -- Interlibrary Loan Application Protocol Specification -- Part 1: Protocol specification; Part 2: Protocol implementation conformance statement (PICS) proforma.
Unfortunately I have not managed to get hold of the full text of the documents in electronic form. It seems advisable that these protocols should become the basis for information support of document delivery systems comprising EDD as one of their parts. At present this protocol is only an abstract notion but I am sure in the nearest future it will have a great impact on both ILL and EDD. Current foreign experience suggests the same. In particular in April 2000 the RLG consortium is supposed to launch a new ILL software RLG's ILL Manager based on this protocol which prefers the Ariel system for EDD. There are similar developments in Australia (CILLA and JEDDS projects) and other countries.
There is a considerable number of digital library projects which aim to solve similar tasks, e.g. British projects eLib (Electronic Library) and EDDIS (Electronic Document Delivery: the Integrated Solution).
As it was said above due to a number of reasons it is difficult to introduce western software in Russian libraries on a large scale. I can give at least two reasons here: different working practices in Russian and western libraries, including copyright issues, and the price of western products (Ariel – 400-890 USD, projected price for the RLG's ILL Manager - 2000-5000 USD). Even if some libraries can use those products, the majority of libraries will have to look for alternative solutions. Therefore we should be looking into developingn our own software. At the current stage the most important things are formats and protocols, which should closely correspond to the international standards.
International organisations such as CCITT, ISO, IEC have developed, discussed and ratified several sets of international standards: ISO/DIS 10160/1 for the exchange of information between ILL systems, Z39.50 protocol, standards X.400 (CCITT X.400 - X.430. Data Communication Networks: Message Handling Systems: Recommendations), X.500 and LDAP (CCITT X.500 - X.521. Data Communication Networks Directory: Recommendations), Internet standards and many others. I attach the list of standards which in my view should be taken into consideration or used in the development of an EDD (electronic ILL) system.
There has been some work on standardisation of information exchange between libraries in Russia. For instance, the "Open Library Systems" centre has developed protocol LXP (Library eXchange Protocol) for the transmission and exchange of bibliographic information and synchronisation of library servers. The centre has a long record of work on the use of Z 39.50 protocol for co-operative cataloguing. The Natural Science Library of the Academy of Science has developed a simple tag language for e-mailing ILL orders on the basis of standard postage forms used in this country.
Let's consider the functional requirements of the ILL and EDD software.
One general requirement is the maximal automation of the main processes and flexible approach to user requests. The users are both libraries and end users. Another requirement is that automated systems in different libraries should be compatible. Furthermore, all types of document delivery should be supported: electronic copies, photocopies or delivery of the original via ILL, and all types of delivery: e-mail, FTP, other data transmission protocols, dispatch by fax or regular mail. The formats and coding of electronic copies can also be different:
- Simple text (structured and non-structured) (ISO 646);
- SGML-text (ISO 8879, ISO 12083);
- ODA-text (ISO 8613);
- Graphic images of pages in CCITT Group 3/Group4 Facsimile;
- PostScript-files;
- PDF-files;
- Multimedia documents with text and non-text components (graphics, audio, video).
The general requirements can also include:
- User friendly interface;
- Tools for the interaction with other library subsystems;
- Implementation of international standards.
The general system requirements also include confidentiality, security, preservation of information etc.
The concrete requirements include the provision of necessary functionality at all the stages of technological process.
In particular the module for the formation and processing of orders should ensure:
- User registration;
- User category definition;
- Formation of orders in different ways, including:
- by searching the electronic catalogue;
- by filling in a special form;
- by using old orders stored in the archive;
- Entry of desired time for the order delivery;
- Copyright declaration with electronic signature;
- Dispatch (initialisation) of order.
Ideally there should be two variants for the implementation of this module: via Web interface and via Z39.50-client program. The complex of standards and software for the EDD should support both order processing and connection of the EDD system as a user to other systems.
Processing of a user order includes the following stages:
- Receipt of orders via ISO 10160/1 protocol from one of the electronic channels (e-mail, FTP-server or HTTP-server);
- Or receipt of order in arbitrary form as a normal electronic letter;
- Checking the information in the order;
- Transformation of the order into request form (automated or manual);
- Checking the user status (reader registration, priority etc.);
- Generation of a refusal if the user doesn't meet the basic requirements;
- Generation of a request to specify the order, if necessary;
- Transfer of search request into the search subsystem;
- Diverting the request into another system (via ISO 10160/1) if the requested document is not found;
- etc.
The requested original may be found in electronic or paper form. In the latter case the document is passed to the scanning unit. The result of scanning is an electronic copy of the document (if no other copy is requested – photocopy, microfilm) and supplementary information in electronic form. The quality of scanning should be appropriate for subsequent printer output and text recognition by special software. Coloured images should be digitised with the preservation of colours. Recognised text should include all the illustrations.
Then the document is dispatched via the appropriate channel (e-mail, FTP, ICQ).
The system should also include statistics and financial settlement modules; the final registration of an order will happen after the receipt of the electronic "confirmation of delivery". The order statistics should be accumulated and stored for a specified period of time. Accordingly there should be tools for processing of statistical data and their conversion into other formats (e.g. Excel) if necessary. The statistical data will be used for a regular printout of reports. If a decision is taken to store the electronic copies there should be a separate subsystem of storage and browsing.
This is only part of the requirements which should be formulated for the information and software components of the EDD system.
In conclusion it should be emphasised that EDD is a sort of a "bridge" between the traditional library technologies and new information technologies. On the one hand, EDD should be considered a part of a unified library system. On the other hand the EDD technology creates a real basis for the creation and operation of digital libraries. I do not find the term "electronic delivery" particularly successful (the English language publications use both "delivery" and "supply", which I find a more appropriate term). We should consider not only the ways and channels of delivery, but a new paradigm of library service, when libraries no longer are closed institutions and turn into open computer-library networks serving users regardless of their location. This is the essence of –the electronic document delivery and a new function of libraries.
Annex: Internet documents (RFC) for EDD
RFC-2503 MIME Types for Use with the ISO ILL Protocol. R. Moulton, M. Needleman. February 1999.
RFC-2302 Tag Image File Format (TIFF) - image/tiff MIME Sub-type Registration. G. Parsons, J. Rafferty, S. Zilles. March 1998.
RFC-2045 Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. N. Freed and N. Borenstein. November 1996.
RFC-2046 Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. N. Freed and N. Borenstein. November 1996.
RFC-2047 MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text. K. Moore. November 1996.
RFC-2048 Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures. N. Freed, J. Klensin and J. Postel. November 1996.
RFC-2049 Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples. N. Freed and N. Borenstein. November 1996.
RFC-1740 MIME Encapsulation of Macintosh Files - MacMIME. P. Faltstrom, D. Crocker and E. Fair. December 1994.
RFC-1741 MIME Content Type for BinHex Encoded Files. P. Faltstrom, D. Crocker and E. Fair. December 1994.
RFC-1341 MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies. N. Borenstein, N. Freed. June 1992.
RFC-2110 MIME E-mail Encapsulation of Aggregate Documents, such as HTML 2110 MIME E-mail Encapsulation of Aggregation (MHTML). J. Palme, A. Hopmann. March 1997. 2084 Considerations for Web Transaction Security. G. Bossert, S. Cooper, W. Drummond. January 1997.
RFC-1807 A Format for Bibliographic Records. R. Lasher and D. Cohen. June 1995.
RFC-2068 Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-Lee. January 1997.
RFC-2428 FTP Extensions for IPv6 and NATs. M. Allman, S. Ostermann, C. Metz. September 1998.
RFC-2228 FTP Security Extensions. M. Horowitz, S. Lunt. October 1997. (Updates RFC0959)
RFC-2305 A Simple Mode of Facsimile Using Internet Mail. K. Toyoda, H. Ohno, J. Murai, D. Wing. March 1998.
RFC-1767 MIME Encapsulation of EDI Objects. D. Crocker. March 1995.
RFC-2159 A MIME Body Part for FAX. H. Alvestrand. January 1998.
RFC-2161 A MIME Body Part for ODA. H. Alvestrand. January 1998.
RFC-2164 Use of an X.500/LDAP directory to support MIXER address mapping. S. Kille. January 1998.
About the Author

Viktor P. Zakharov
|