Russian Digital Libraries Journal

Russian Digital Libraries Journal - 2000 - Vol 3 - Issue 3


The Role of Electronic Document Delivery Services in the Creation of Digital Libraries

Olga V. Barysheva
Russian National Library


The purposes and tasks of document delivery and electronic libraries

It is possible today to state with confidence, that document delivery has become a reality in Russia. In spite of some remaining problems of financial and legal character libraries, centres of scientific information and commercial companies accept and carry out document delivery orders. Such document delivery services are widely advertised in the Internet.

Each organization rendering EDD services has a certain amount of document copies stored in files of different formats. Many services prefer to delete the copies of documents, the main reason is not that they do not know what to do with them, but the lack of memory for their storage. This problem was known even before the appearance of digital libraries.

The digital library concept is based on the principle of timely provision of the necessary information via high speed telecommunication channels. The selection of material is also important as it has to do with the content of the digital library. Documents in digital libraries should be reliable information published in authoritative sources (for example, in scientific publications containing summaries and abstracts). There should be no limit to the subject coverage just as in traditional libraries. The largest currently existing electronic libraries can be divided into: private (such as Moshkov's library), state (for example OREL) and commercial (for example National electronic library). The works of fiction, scientific magazines and information sources constitute the nmain items in the majority of libraries. The majority of electronic libraries have cultural - entertaining and scientific - educational coverage.

As for document delivery, the percentage of different subjects is not precisely enough defined, but nevertheless some tendencies have been traced. According to data of the British Library [1], 69 % of the orders - natural and engineering science, 19 % - public literature, 9 % - humanitarian sciences, 3 % - not defined.

We can conclude that delivery services of the documents and electronic library collections coincide where scientific literature is concerned, because the purpose of their work - to quickly provide information - coincides also.

Storage of the electronic documents

One more general feature of electronic libraries and delivery services is that the documents are created by the same means. They do not appear as electronic documents, but are digitised from traditional media with the use of various means –scanners and digital cameras. Strictly speaking, one type of document delivery - preliminary scanning - can also be looked upon as digital library creation.

From the technical point of view, each digital library is a set of files and programs for their interpretation. Files with different characteristics (image quality, formats, coding etc.) can be created for it (just as in the case of document delivery). Sets of documents can also be represented as a file system or a database. The most appropriate solution is chosen for each case separately. This also concerns the protocols of data exchange, some prefer ftp, others - http or z39.50. All these technical details should in no way influence the access by the end user.

Access to electronic documents

Access to the documents through a delivery service is rather simple: there is a module for ordering with a possibility to choose the most preferable copy format and way of its transmission. After that the user can receive the requested document, for example, by e-mail.

Access to documents in a digital library is even easier, it is enough to press the button and the required copy will appear on the screen.

But all this is only applicable in cases, when the customer /user /reader knows precisely all the characteristics of the document, i.e. has its full description. This doesn't happen very often in real life. Accordingly, a search mechanism is also required besides the storage and information access module, i.e. a network interface. There are practically only two options: search based on unified descriptions or the complete texts of the documents. In a number of cases the full text search is simply impossible: for example, if we look for a diagram or the text files are stored on a server as unrecognised images. That is why the main task both for Internet search engines and for digital libraries is to create various schemes for the processing of electronic documents (they are usually called metadata). Let's look at some examples:

  • The FGDC metadata standard [2] was introduced by the US Federal Geographic Data Committee on June 8, 1994. It contains 334 various elements, 119 of which are needed only to contain other elements. This is necessary for the description of links between other elements
  • In 1996 the technical committee ISO 211 [3] started the development of the ISO Metadata Standard (project 15046-15)
  • Besides there are so-called HTML or HTTP metadata as they are called in the specifications (RFC 1866 and 2616 respectively). I mean all known <meta> tags. For example, "http-equiv" has up to 50 attributes. Only 35 are really used, and more than 20 attributes are used in less than 1 % of cases. A similar scheme is used in the majority of Internet search engines, and also by digital libraries which are included in the Compulib [4] project for the search via AltaVista.

There is also an international metadata initiative which deals with the development of the electronic resource description model - the Dublin Core. 13 projects are already conducted on its basis and the Dublin Core itself has been translated into more than 20 languages. It is based on 15 elements defined in RFC 2413 [5].

Name of the element

Identifier

Definition

Commentaries

Title

Title

Name assigned to the resource

The title is usually a name under which the resource is known

Creator

Creator

Person(s) primarily responsible for the creation and contents of the resource

Examples of creator include the person, organisation or service. The name of the creator usually should be used for indication of the object under description

Subject and key words

Subject

Subject area determining the contents of the resource

Subject is usually expressed with the help of key words or phrases and classification codes describing the subject coverage of the resource

Description

Description

Note on the contents of the resource

Description can include (but is not limited by): abstract, table of contents, references to graphic contents presentation or simple textual description

Publisher

Publisher

Person(s) responsible for the publication of the resource

Publisher can include person, organisation or service. The name of publisher should be used for the identification of the object under description

Contributor

Contributor

Person(s) assisting in the creation of resource contents

Contributor can include person, organisation or service. The name of contributor should be used for the identification of the object under description

Date

Date

Date connected with event in life cycle of resource

Date is usually associated with creation or availability of resource. Meaning of date, recommended for practical use at the coding, is defined in ISO 8601 and supports GGGG-MM-DD format

Type of resource

Type

Characteristic or genre of the contents of the resource

Type includes such terms as general categories, functions, genres or combined levels of the contents. For practical use it is recommended to choose value from a dictionary (e.g. DCT [6]). Element format is used for the description of physical or digital representation of the resource

Format

Format

Physical or digital presentation of resource

Format usually includes copy type (media-type) or resource size. The format can be used for the definition of technical support and software or other equipment necessary for the display or management of the resource. For practical use it is recommended to choose value from a dictionary (e.g. MIME [7])

Identifier of resource

Identifier

A unique reference to the resource within a given context

For practical use it is recommended to identify the resource by means of a line or number corresponding to a formal identification system (URI [8], URL [9], DOI [10], ISBN [11])

Source

Source

The reference to the original source from which the resource was taken

The resource can be taken from the original source wholly or partially. For practical use it is recommended to identify the resource by means of a line or number corresponding to a formal identification system.

Language

Language

Language of the resource contents

For practical purposes it is recommended to use the value of the element language, determined by RFC 1766, including two-lettered language codes (from the ISO 639) optionally followed by two letter country codes (taken from the ISO 3166 [12]). For example, "en " – for English, "fr" – for French, "en-uk" – for British English

Relation

Relation

Reference to related resources

For practical use it is recommended to identify the resource by means of a line or number corresponding to a formal identification system.

Coverage

Coverage

Extent and limits of the resource content

Coverage usually includes spatial location (name of area or geographical co-ordinates), time interval (time mark, date or range of dates) or jurisdiction (such as administrative division). It is recommended to choose value from a dictionary (for example, Thesaurus of the geographical names), i.e. it is more convenient to use the names of areas and periods of time instead of digital identifiers (such as systems of co-ordinates or ranges of dates)

Legal issues

Rights

Rights of access limitation and resource protection

Element rights usually contains a statement about the law governing the operation of the resource, or reference to a service providing this information. The legal information usually includes the data on intellectual property rights for, copyright and other property rights. The absence of the right element can not be the reason for any assumptions concerning the legal status of the resource.

Each element is determined with the help of a set from 10 attributes, according to the ISO/IEC11179 [13] standard for the description of elements of the data.

It is hard to tell which metadata set is better (they are hardly comparable), which one will be in more use, and which one will be the most efficient for search. At present the Dublin Core seems to have the best future prospects, for it is applicable practically to all kinds of electronic documents and can be interpreted both by machines and humans. In addition, it is international. Moreover, the possibility and necessity of the creation of metadata profiles (Dublin Core can be easily used in this capacity) is stated in the specification of HTML language, version 4.01, recommended by W3 Consortium on December 24, 1999 [14].

In any case, electronic documents cannot be used without descriptions as they cannot then be found. Documents for delivery are supplied with descriptions, they can thus easily be included in a digital library. The main thing is that the metadata scheme within the same digital library should be identical regardless of the acquisition sources, format and place of storage.

The use of electronic documents

We will not deal with the main problems of digital library operation: economic and copyright issues. We will speak only about the interaction of documents.

The main problem is to combine the traditional and electronic documents, especially when it concerns libraries. The creation of links between documents on the basis of mutual quotation, which is already used by the Institute of Scientific Information in Philadelphia (IS) [15], enables the provision of references to the description of the traditional (paper) documents, if these exist in a machine-readable format. For the moment only the Institute of Scientific Information of Social Sciences tries to create a similar mechanism in this country. Although it could be useful to apply this concept in the creation of digital libraries, including networked electronic documents, electronic copies created by publishers, delivery services or services of scanning, description of collections of traditional libraries. Then the issue of whether traditional libraries should acquire publications and documents or information and contents will lose its urgency.

We think that libraries and document delivery services should preserve digitised documents. They only need metadata according to one of the existing schemes. On the other hand, it will not be necessary for each traditional library to create its own digital library - by uniting smaller libraries with the use of the same metadata the libraries of Russia can fully contribute to digital library collection development.

References

  1. British library facts and figures http://www.bl.uk/;
  2. FGDC standards ;
  3. ISO catalogue http://www.iso.ch/infoe/catinfo.html;
  4. Compulib http://www.citycat.ru/compulib/#Kluch;
  5. Dublin Core Metadata for Resource discovery http://www.ietf.org/rfc/rfc2413.txt;
  6. DCT - List of Resource Types: Dublin Core Draft Working Group Report. http://purl.org/DC/documents/wd-typelist.htm;
  7. MIME - Internet Media Types. ;
  8. URI, URL - Naming and Addressing: URIs, URLs, ... http://www.w3.org/Addressing/;
  9. URI - Uniform Resource Identifiers: Generic Syntax, Internet Draft Standard http://www.ics.uci.edu/pub/ietf/uri/rfc2396.txt;
  10. URL - Uniform Resource Locator Specification http://www.w3.org/Addressing/URL/Overview.html;
  11. DOI – The Digital Object Identifier http://www.doi.org/;
  12. ISBN – International Standard Book Numbering http://www.reedref.com/standards/;
  13. ISO 3166 2-letter country codes http://www.w3.org/International/O-misc-iso3166.html;
  14. ISO 11179 - Specification and Standardization of Data Elements, Parts 1-6. ftp://sdct-sunsrv1.ncsl.nist.gov/x3l8/11179/;
  15. HTML 4.01 Specification http://www.w3.org/TR/html4/cover.html#minitoc;
  16. ISI Web of Science http://www.isinet.com/products/citation/wos.html;

About the Author

Barysheva Olga Vladimirovna, candidate of philological sciences on a speciality "computer science", leading engineer of a Department of Automation of the Russian National Library.


© Olga V. Barysheva, 2000


Last update - : 2003-12-09

Please address your comments and suggestions to rdlp@iis.ru