On-line information resources for economics researchers:RePEc database and RuPEc Web portal [1]Thomas Krichel
|
| document | collection |
| individual | organization |
A "document" can be a preprint, a published article, a book, a piece of software, a data set, etc. The most widely spread types at the moment are preprints and published papers, but we already have examples of software pieces in the RePEc database.
A "collection" can be a set of documents assembled into one subject group. At present a collection usually consists of series of preprints and magazines with published articles. Each document, by the way, is from the very beginning part of a certain series. In principle, the collection concept can also be used for grouping reviewed articles (e.g., into a separate collection). However, it is possible to add one more additional field into a template to reflect a new status of the reviewed document.
At the moment of writing this article, personal information integrated in a document template. However, we try to specify personal data in a separate way. Soon it will look as follows:
Template-Type: ReDIF-Person 1.0
Name: Thomas Krichel
Email: T.Krichel@surrey.ac.uk
Author-Paper: RePEc:sur:surrec:9404
Author-Paper: RePEc:sur:surrec:9601
Homepage: http://gretel.econ.surrey.ac.uk
Handle: RePEc:per:1965-06-05:thomas_krichel
As a result, we shall be able to replace information about the author (as shown in the first example from the RePEc:sur:surrec:9601 document template) with the following:
Author-Name: Thomas Krichel Author-Person: RePEc:per:1965-06-05:thomas_krichel
Advantages of this system are obvious. It brings down system administration workloads. For example, when the author changes his telephone number an appropriate modification can be done in a single point within the system. RePEc service users will be able to find the author even if his personal data in the document title has become outdated.
In conclusion, an "organization" can be represented as a set of individuals (much like a collection which is a set of documents). When registering an author, his personal data will automatically be complemented with data about his organization if the latter is loaded in the RePEc database.
3.3. User request processing services
A key feature of RePEc is its inherent capability to support numerous services related to user requests. The negative side of this approach is a certain vagueness of the RePEc concept in comparison with an XXX-type archive where user data and services are merged together. However, our approach has one more benefit: a potential provider realizes that sending his data to RePEc means a simultaneous connection of the data to all user services which were developed independently and continue operating on different RePEc servers in different countries.
Below you can see a list of user services placed in the historical order:
- BibEc at http://netec.mcc.ac.uk/BibEc.html statis HTML pages with information about working documents available on paper only.
- WoPEc at http://netec.mcc.ac.uk/WoPEc.html static HTML pages for all working documents available in electronic format.
Both of the above databases use a common search mechanism. Three search options are available: a full text WAIS search; a field search based on mSQL; and a field search based on the ROADS system. Both databases are mirrored in the USA and Japan as part of the NetEc project.
- EDIRC at http://ideas.uqam.ca/EDIRC offers data and search tools to economics-related academic institutions and research centers all over the world. This service is also mirrored in the NetEc project servers.
- IDEAS at http://ideas.uqam.ca/ offers an Excite-type index of static HTML pages represented for all "document", "article", and "software" templates from the RePEc database. This site is one of the most popular user interfaces providing access to RePEc data.
- NEP: New Economics Papers at http://netec.wustl.edu/NEP is a set of reports on newly arrived documents stored in the RePEc database. Each report is reviewed by an expert in the given subject matter. The system defines a few dozen subject matters which enables users to choose only those new papers that are related to their sphere of interest. The reviewing experts are usually PhDs, students and young researchers who work on a voluntary basis.
- INOMICS at http://www.inomics.com/query/search offers an index of RePEc data as well as enables parallel search in indices of other Web pages related to economic research.
A concluding observation: a search server of the Z39.50 format for all documents stored in RePEc is available at dbiref.kub.nl:9997. The database is called repref. The attribute set is Bib-1. The entry syntax supports USmarc, SUTR.S, GRS-I (line tags only, Type 3 tags).
4. The RuPEc project
In 1997 the Russian Virtual Laboratory for Economists and Sociologists (RVLES) won an RGNF grant (See http://www.ieie.nsc.ru/). The project involved experts from three organizations located in Novosibirsk: IEOPP SO RAN, GPNTB SO RAN, and ISI SO RAN. One of the trends within the project was to establish conditionsfor developing electronic working paper archives among Russian research institutions dealing with economics and sociology. The subproject was titled RuPEc, because it was technically based upon RePEc standards and protocols. The initial goals of RuPEc were as follows:
- bulding a Russian mirror of the full-scale RePEc database and its main services;
- developing RuPEc's own search procedure for the RePEc database (SWISH-E - freely distributed text indexing software - was used);
- setting up a Web interface for adding new documents to the RePEc database;
- creating an independent Russian database of working papers with a Web interface to support remote loading of new documents.
Implementation of the last two points gave Russian researchers a choice between placing their papers in the common international database (this required, as a minimum, bibliographic data in English) or in the Russian section of the database only (in the latter case all data should be provided in Russian).
Form early 1998 all the four parts test operation of all the four components started at http://www.ieie.nsc.ru/r-archive/. An independent Russian database was called RAWPES (Russian Archive of Working Papers in Economics and Sociology). To avoid misunderstanding let's point out that RuPEc means a family of services designed to process user requests to the RePEc database located at the RVLES server (http://www.ieie.nsc.ru/) while the term "RAWPES" relates to the Russian language database and to the Russian language user services associated with that database.
The goals of the next stage of RuPEc development include the following:
- popularization and promotion of RePEc concept and standards of data exchange between electronic archives of Russian economics research organizations in order to create a Russian language network of interlinked archives in line with the international RePEc network;
- development of various services to process requests of Russian and international users of the mentioned databases.
In Subsections 4.1. and 4.2. below we describe capabilities of individual researchers and administrators of existing electronic archives in adding new data about papers and entire collections to the RuPEc database. Subsection 4.3. describes the concept of a user interface which looks like a Web portal and makes it possible for users to keep personal sections at the RuPEc server with personal profiles, selections from the database matching the user profile, customized methods of visualizing new entries in the database, etc.
4.1. The rules of adding new papers to archive
The paper adding procedure works in the following way. At http://www.ieie.nsc.ru/r-archive/add.html there is a form where one has to indicate a mandatory minimum of data about the document to be added. The form has an English part and a Russian part. If a user fills the Russian part only, information about the document remains in the RAWPES database and is not sent to RePEc. If the English part or both parts are filled, the information is added to both databases (in any case RePEc captures only the English part of the form). See current contents of RAWPES at http://www.ieie.nsc.ru/cgi-bin/ar-search.cgi. Part of the papers which were also sent to the international RePEc database can be seen at
http://www.ieie.nsc.ru/~rupec/data/noseconom.html.
We have to mention that the international database should contain paper descriptions in English while the bibliographic data can give a reference to full text of the paper in Russian. When filling an archive entry form, a user among general bibliographic data should indicate directory and file name of the paper on his local computer which will be automatically copied to the RAWPES server for loading into the database. If full text of the paper has already been published on the Web and there is no need to keep it on the server (because a user can simply indicate its URL), a special version of the form is used which can be seen at http://www.ieie.nsc.ru/r-archive/add-url.html.
By the middle of 1999 the procedure had the following peculiarities:
- Only one method (mentioned above) was used to place full text papers on a server. However, limited bandwidth of Russian communication channels requires an off-line mode for that process.
- Availability of a full text file is mandatory for all archived papers (to successfully complete the process of placing the paper on a server a user has to indicate a file name to be uploaded or a URL address if the paper is already on the Web).
- A RAWPES administrator has the right to delete a document from the archive if he reasonably believes that thematic or other rules of the archive have been violated.
- After being placed on the RVLES server, a file with the paper and its bibliographic data can be modified only by a RAWPES administrator.
In the near future the project is to expand capabilities of the archive in all areas mentioned above.
4.2. The rules of adding new collections to the archive
Many of the Russian economics research organizations (e.g., TsEMI, IEPPP and others) have their own electronic archives to store working papers. Some of the Russian economics magazines (e.g., EMM, The Problems of Forecasting, etc.) review Web publications and refer to their bibliographic data, academic institutions and publishing houses announce new books on the Web, etc. In most cases the collections (archives) contain descriptions of papers only in Russian which effectively prevents their immediate loading into RePEc. Availability of a large Russian-speaking community and some other national features require a purely Russian network of mutually linked Russian electronic archives and a common database to integrate metadata about their contents. Protocols and templates developed for the RePEc project could become a convenient methodical and technical basis for such a network.
In fact, the RAWPES server already has all the tools needed for organizing metadata about all electronic archives devoted to economics. To add a new collection to the database its administrator should perform the following actions:
- Send a request to rupec@ieie.nsc.ru asking to attach a unique symbol identifier for the new archive to be placed in the common meta-database (for example, RAWPES has a nos identifier).
- Having obtained an identifier, one should make files in accordance with the templates described in Subsection 3.1 above. The only addition to those templates is as follows: it is necessary to add a Charset: field and indicate there the type of Cyrillic codes used in the files, e.g. Charset: Windows-1251.
- Create a server directory accessible via FTP or HTTP. Place ???arch.rdf and ???seri.rdf files in that directory. Instead of ??? put a symbol identifier obtained at Step 1.
- In the URL field of ???arch.rdf it is necessary to indicate full Internet address of the directory where the remaining .rdf files are located with bibliographic information about the archived publications (there is a separate .rdf file for each publication).
- To show that the archive is ready for addition to the meta-database one should send a message to rupec@ieie.nsc.ru indicating the URL directory created at Step 3.
When the RuPEc administrator receives the message the address indicated is added to the list used on a daily basis by the software demon which checks if the source archives have been changed and transfers the changes to the common database.
The user interface of this database is fully identical to the interface used by the RePEc international database located at the RVLES server.
Classification codes used to define subjects of archived papers are to be taken from the JEL classifier. This enables inclusion of Russian publications into a uniform subject classification common to all databases.
To simplify translation of large volumes of bibliographic data from user format to REDIF format a customizable converter has been developed. The converter can operate with data source files in local and remote communications modes. Using a user-prepared description of bibliographic data format structure (the description is based on a language with a PERL-type syntax), the converter can, for instance, with a user-defined regularity check the contents published at a user Web site, convert the captured bibliographic data into the REDIF format and send a message about the conversion results to the user via e-mail. This service radically simplifies the work of a collection administrator aimed at supporting the collection's parallel descriptions at one's own site (in any format) and in a meta-database (in the REDIF format).
4.3. Web portal of publications in economics and other trends of development
One of the trends for RuPEc services aimed at processing user requests is linked with developing more convenient means of visualizing RePEc database contents and especially new entries coming to the database. The problem of convenient visualization of new entries stems from gradual growth of the number of incoming papers. The incoming paper flow consists of daily collection of new publications in a great number of archives (over 60 archives by early 1999), regular (monthly or quarterly) publications in electronic archives, and of new archives added to the RePEc database on a nearly weekly basis.
Orientation in the flow of new papers and monitoring new publications in the subjects of interest require, on the one hand, convenient and, on the other hand, customizable tools to filter the incoming paper flow and to display the resulting data on a PC screen. This problem is not a new one. The concept used for its solution is called "a Web portal". The sires which demonstrate successful implementation of the Web portal concept are Excite (http://www.excite.com/), MyYahoo (http://my.yahoo.com/), InfoArt (http://www.infoart.ru/) and some others.
In this particular case, the problem of convenient visualization of incoming data flows on a computer screen has the following aspects:
- compact reflection on a screen of basic categories of information contained in the RePEc database;
- personal customization of the list of categories to remain on the screen; and definition of the database contents filtering rules through a predefined subset of categories.
As a result, all these capabilities create an individual user-defined Web portal for publications in economics customized to specific user interests. A software demon will regularly search the incoming data flow contents and build a Web page with references to all new papers matching the defined user profile.
The RePEc database contents makes it possible to select the following data categories which a user cab, fully or partially, include in or exclude from the list of data to be displayed:
- A list of organizations (universities, research centers, publishing houses, etc.) whose collections of electronic data are included in the database. A user can include only part of those organizations in his display list.
- A list of archives and collections devoted to a specific subject (each organization may have several archives or collections of this type). The list can be further divided by types of publications (working papers, electronic magazine publications, annotations of books, software, etc.). In his Web portal a user can leave handles pointing to electronic contents of certain magazines, certain collections of working papers in his Web portal, annotations of new books, etc.
- Thematic sections of the JEL classifier. With the help of handles pointing to publications marked with specific JEL codes by the authors, one can obtain new publications in specific areas of knowledge.
- Publications of leading researchers. In principle, names of the authors which a user would like to monitor on a regular basis can be selected from the complete list of authors published in RePEc. Having marked those names, a user will obtain references to their publications (if any) in his Web portal.
- Key words. Having selected sets of key words, a user will obtain references to corresponding search results in his Web portal. A software demon will update search results after each modification of the database.
As soon as the user completes his Web portal, the system will be able to identify the profile of his scientific interests on the basis of user-defined parameters.
Such a profile provides a number of benefits:
- consolidated displaying of the profile facilitates its modification by the user;
- the system can compare profiles of different users and generate recommendations about setting up interest groups; about availability of publications on similar subjects which might interesting to the user, etc. (the structure of this kind of service has been developed in the "recommendation systems" concept).
The above capabilities of the Web portal are an extension of one of the user services (a "reader" service). The other trend is to develop tool that could support a user in the process of generating publications (a "writer" service).
In the framework of the "writer" service RePEc can, in principle, simplify the process of quoting. For this purpose it is necessary to use an algorithm which provides a reference (a direct tab transition) to the quoted text contained in the file with full text of the quoted document. If such quotation references are inserted in an electronic document, their selection would display the required section of the quoted text directly on a user screen. In fact, the electronic quotations algorithm should make it possible to create bookmark tags which should be external to the file with the quoted document (since it is impossible to insert the tag physically, it operates at a logical level). The algorithm creates an intermediate reference database. Selection of an electronic quotation reference in a document launches a procedure on a server which logically inserts a bookmark tag in the required place of the quoted document on the basis of a unique context defined n the process of the reference generation. A browser is offered a version of the quoted document with inserted logical bookmarks, and the browser scrolls the document on the screen to the required bookmark.
There is a similar algorithm for HTML files (see a description of Web page section comments at CritSuite http://crit.org/). An algorithm of this type can be developed for any file format.
Massive use of electronic quotations and quotation references in the RePEc database creates an opportunity to set up another user service, i.e. electronic quotation index. The electronic index can be obtained by direct counting of database-stored electronic quotations for each document contained in RePEc.
5. Conclusion
Broad international coverage, wide spectrum of incoming documents, availability of personalization and filtering tools make the RePEc database and RuPEc user interfaces an important working instrument for researchers in the filed of economics and sociology.
References:
[1]. Creation of the RePEc database was supported with a grant provided by the Joint Information Systems Committee of the UK Higher Education Funding Councils in the framework of the Electronic Library Programme (RePEc/WoPEc). Creation of RuPEc user services r elated to the RePEc database and creation of the Russian RAWPES archive was supported by the Russian State Research Fund (№ 96-02-12039в ). The authors are thankful to Mrs. Yevgenia Stupina for comments which helped us to improve this text.
[2]. Some of them are home pages of the authors, others have been established by academic institutions or research centers. In rare cases we found joint directories belonging to several organizations. For example, the US Federal Reserve has a Fed in Print directory which includes publications of all its territorial branches.
[3]. Note that sometimes these aims are in conflict with each other.
[4]. We omit the Abstract field to save space.
© Кричел Т., Ляпунов В.М., Паринов С.И., 1999