Namespaces in XML and SXML
Kirill Lisovsky, Dmitry Lizorkin
Institute for System Programming RAS, Moscow State University
XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references. SXML is representation of the XML Infoset in the form of S-expressions.
A support for namespaces in SXML may be considered as a superset to namespaces in XML. In this paper we discuss its design and implementation, and provide a comparison with namespaces in XML. The discussion is illustrated by Dublin Core and Resource Description Framework examples, actively used in digital libraries applications.
While XML Recommendation  was introduced without notion of namespaces, most of XML-related technologies rely on namespaces, and Namespaces in XML Recommendation  was published by World Wide Web Consortium on January 1999, shortly after the XML Recommendation itself.
Namespaces provide a mechanism to distinguish names used in XML documents, allowing to keep these names simple and meaningful while unique.
Section 2 explains the reason for introducing namespaces in XML and gives the overview of the XML namespace mechanism. Section 3 considers the implementation of Namespaces in SXML, the representation of XML Information Set in the form of S-expressions. Section 4 introduces the Dublin Core, a metadata element set intended to facilitate discovery of electronic resources, including digital libraries (DL). Section 5 describes the representation of Dublin Core in XML/RDF, giving an illustrative example of practical XML namespaces' application for XML-based digital libraries.
2 Namespaces in XML
In the data model implied by XML, an XML document contains a tree of elements. Each element has an element type name  (sometimes called the tag name) and a set of attributes; each attribute consists of a name and a value. The element type name is generally intended to express the semantic meaning of the element. As it is discussed in , applications typically make use of the element type name and attributes of an element in determining how to process the element.
In XML 1.0 without namespaces, element type names and attribute names are unstructured strings using a restricted set of characters, similar to identifiers in programming languages. This is problematic in a distributed environment like the Web, because there is no way of guaranteeing the uniqueness of the element type name, i.e. semantically different elements can appear to have equivalent element type names. For example, one XML document may use part elements to describe parts of books, another may use part elements to describe parts of cars:
<book> <part>...</part> <part>...</part> </book> <car> <part>...</part> </car>
An XML application has no way of knowing how to process a part element unless it has some additional information external to the document. In our simple example, part elements can be distinguished by analyzing their parent's type names (book vs. car), but in general this approach is not convenient.
The similar name collision can occur with XML attributes, too. Consider the first XML document developer who uses the color attribute to express the fact that the external representation of the title element on the computer screen should have the red color:
<title color="red">...</title> and the second developer uses the color attribute to specify the color of the element's content when printed on paper:
<title color="red" color="gray16"/>...</title>
Not only can't we distinguish these two attributes. The element in the former example is not a well-formed XML, since XML Recommendation doesn't allow multiple attributes with a same name in an XML element.
The XML Namespaces Recommendation tries to improve this situation by extending the data model to allow element type names and attribute names to be qualified with a Uniform Resource Identifier (URI). Thus a document that describes parts of cars can use part qualified by one URI; and a document that describes parts of books can use part qualified by another URI. As it is noted in , the role of the URI in a name is purely to allow applications to recognize the name. There are no guarantees about the resource identified by the URI. The XML Namespaces Recommendation does not require element type names and attribute names to be qualified names; they are also allowed to be non-qualified names.
The XML Namespaces Recommendation qualifies names with URIs in an indirect way, based on the idea of a prefix. If an element type name or attribute name contains a colon, then the part of the name before a colon is considered as a prefix, and the part of the name after a colon -- as a local name. A prefix foo refers to the URI specified in the value of the xmlns:foo attribute.
For example, all the XML elements which describe books may have their element type names qualified with the common URI -- "http://www.books.com/xml". Than the element which describes parts of books (i.e. its local name is part) will be represented in XML as:
The situation with attribute names is exactly the same as with element type names.For example:
<title xmlns:display="http://www.computer-displays.org" xmlns:printer="http://www.bwprinters.net" display:color="red" printer:color="gray16"/>...</title>
All these xmlns attributes are rather cumbersome, so the XML Namespaces Recommendation allows them to be inherited : if a prefix foo is used in a tag, but an element does not have an xmlns:foo attribute, then a value of its parent element's xmlns:foo attribute will be used; if a parent does not have a xmlns:foo attribute, then a value of its grandparent element's xmlns:foo attribute will be used, and so on. For example,
<books:book xmlns:books="http://www.books.com/xml"> <books:part>...</books:part> <books:part>...</books:part> </books:book>
the books: prefix can be declared only once at the top level, and then used for multiple times in child elements without further declaration.
In many cases, most of the elements in a document have universal element type names that have the same URI. The XML Namespaces Recommendation has a special syntax to make this more convenient. An attribute xmlns specifies a URI that qualifies all unprefixed element type names. The xmlns attribute is inherited just like the xmlns: prefixed attributes. So, the previous example can be rewritten as just:
<book xmlns="http://www.books.com/xml"> <part>...</part> <part>...</part> </book>
It's worth a note that the xmlns attribute does not affect unprefixed attribute names.
3 Namespaces in SXML
As SXML itself was introduced in our previous article , in this section we will discuss the design and implementation of Namespaces in SXML.
XML has to qualify names with URIs in an indirect way (based on the idea of a prefix), since a URI can generally contain characters which are not allowed in an correct XML name. However, every character which can occur in a URI, is permitted within an SXML name either; since an SXML name is a Scheme symbol. This leads to a very important feature of SXML: unlike XML, SXML is able to qualify names with URIs in a direct way, using URI (instead of prefixes) as local name qualifiers in universal names. For example:
(http://www.books.com/xml:book (http://www.books.com/xml:part ...) (http://www.books.com/xml:part ...))
The rightmost colon in an SXML name separates the local name from the namespace URI.
While such the long SXML element names are looking cumbersome when written out, they are memory-effective as a data structure since they are Scheme symbols. No matter how long the name of a symbol may be, its long name is represented just once, in a symbol table. All other occurrences of the symbol are just references to the corresponding slot in the symbol table .
Such a representation also agrees with the Namespaces Recommendation, which says: "Note that the prefix functions only as a placeholder for a namespace name. Applications should use the namespace name, not the prefix, in constructing names whose scope extends beyond the containing document."
In SXML we can use either placeholders or namespaces themselves.
Besides the direct way to qualify names with URIs, SXML supports the concept of namespace-ids which are quite similar to XML namespace prefixes. Similarly to a prefix, a namespace-id stands for a namespace URI. The distinctive feature of a namespace-id is that there is a 1-to-1 correspondence between namespace-ids and the corresponding namespace URIs. This is generally not true for XML namespace prefixes and namespace URIs. For example, different XML prefixes may specify the same namespace URI; XML namespace prefixes may be redefined in children elements.
A namespace-id is thus a shortcut for a namespace URI in SXML names. The association between namespace-ids and namespace URIs is defined in the administrative node *NAMESPACES*, which is located before the document element. The example of RDF description represented as namespaces-extensive SXML document is shown on Table 1.
(*TOP* (@@ (*NAMESPACES* (rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#") (dc "http://purl.org/dc/elements/1.1/"))) (*PI* xml "version=\"1.0\"") (rdf:RDF (rdf:Description (dc:creator "Karl Mustermann") (dc:title "Algebra") (dc:subject "mathematics") (dc:date "2000-01-23") (dc:language "EN") (dc:description "An introduction to algebra"))))Table 1: RDF description represented as SXML document with namespaces
Table 2 illustrates the relationship between XML prefixes, SXML with directly qualified names and SXML with namespace-ids. The sample document contains some resource description expressed in Resource Description Framework, which will be discussed in detail in section 5. Resource Description Framework has its own namespace, and the rdf: prefix is typically used for this namespace. Suppose that some another namespace URI "http://www.resources-of-different-family.com" occurs in this document. As "rdf" is a natural abbreviation for this URI, it is possible that rdf: will be used as a prefix for this URI. In XML, this situation causes a sort of confusion, since the same prefix will be used for two different URIs, and prefix re-declarations will be required.
In SXML with directly qualified names, the document looks clearly, although at the price of long names.
SXML with namespace-ids will require two different ns-ids, due to a 1-to-1 relationship between namespace-ids and URI.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"> <rdf:editor xmlns:rdf="http://www.resources-of-different-family.com"> <rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:fullName xmlns:rdf="http://www.resources-of-different-family.com" >Dave Beckett</rdf:fullName> </rdf:Description> </rdf:editor> </rdf:Description> </rdf:RDF> (*TOP* (*PI* xml "version=\"1.0\"") (http://www.w3.org/1999/02/22-rdf-syntax-ns#:RDF (http://www.w3.org/1999/02/22-rdf-syntax-ns#:Description (@ (http://www.w3.org/1999/02/22-rdf-syntax-ns#:about "http://www.w3.org/TR/rdf-syntax-grammar")) (http://www.resources-of-different-family.com:editor (http://www.w3.org/1999/02/22-rdf-syntax-ns#:Description (http://www.resources-of-different-family.com:fullName "Dave Beckett")))))) (*TOP* (@@ (*NAMESPACES* (rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#") (rodf "http://www.resources-of-different-family.com"))) (*PI* xml "version=\"1.0\"") (rdf:RDF (rdf:Description (@ (rdf:about "http://www.w3.org/TR/rdf-syntax-grammar")) (rodf:editor (rdf:Description (rodf:fullName "Dave Beckett"))))))Table 2: A sample XML document containing prefix re-declarations (top), and its SXML representation using directly qualified names (middle) and using namespace-ids (bottom).
In the generic XML document one XML Namespace prefix may be associated with different URIs, but XML documents without multiple URI declarations (which are documents where correspondence between namespace prefix and URI is 1-to-1) constitute a significant subset of "practical" XML documents. Since it is a good style to make all the XML namespace prefix declaration in the document element, XML's concept of prefixes is actually very similar to SXML's concept of namespace-ids.
4 Dublin Core
Finding relevant information on the World Wide Web has become increasingly problematic due to the explosive growth of networked resources. Current Web indexing evolved rapidly to fill the demand for resource discovery tools, but that indexing, while useful, is a poor substitute for richer varieties of resource description.
An invitational workshop held in March of 1995 brought together librarians, digital library researchers, and text-markup specialists to address the problem of resource discovery for networked resources. This activity evolved into a series of related workshops and ancillary activities that have become known collectively as the Dublin Core Metadata Workshop Series.
One of the primary deliverables of this effort is a set of elements that are judged by the collective participants of these workshops to be the core elements for cross-disciplinary resource discovery. The term "Dublin Core" applies to this core of descriptive elements .
The metadata elements fall into three groups which roughly indicate the class or scope of information stored in them:
- elements related mainly to the Content of the resource,
- elements related mainly to the resource when viewed as Intellectual Property, and
- elements related mainly to the Instantiation of the resource.
Content Intellectual Property Instantiation
Title Creator Date Subject Publisher Format Description Contributor Identifier Type Rights Language Source Relation CoverageTable 3: Dublin Core metadata elements
Each element is optional and repeatable. Metadata elements may appear in any order. The ordering of multiple occurrences of the same element (e.g., Creator) may have a significance intended by the provider, but ordering is not guaranteed to be preserved in every environment. For instance, RDF/XML supports ordering, but HTML does not.
We will give a short description of some Dublin Core elements which are of a particular interest in the context of electronic libraries:
- The name given to the resource.
- Author or Creator
- The person or organization primarily responsible for creating the intellectual content of the resource, for example, authors in the case of written documents.
- Subject and Keywords
- The topic of the resource. Typically, subject is expressed as keywords or phrases that describe the subject or content of the resource.
- A textual description of the content of the resource, for example, abstracts in the case of document-like objects.
- Resource Type
- The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary.
- Resource Identifier
- A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs. Other globally-unique identifiers, such as International Standard Book Numbers (ISBN) or other formal names are also candidates for this element.
Metadata elements are typically small relative to the resource they describe and may, if the resource format permits, be embedded in it. Two such formats are the Hypertext Markup Language (HTML)  and the Extensible Markup Language (XML). HTML is currently in wide use, but once standardized, XML in conjunction with the Resource Description Framework promise a significantly more expressive means of encoding metadata.
In section 5 of this paper, we will consider the encoding of Dublin Core in XML/RDF.
5 Dublin Core in RDF
The Resource Description Framework (RDF), developed under the auspices of the World Wide Web Consortium (W3C), is an infrastructure that enables the encoding, exchange, and reuse of structured metadata . RDF uses XML (eXtensible Markup Language) as a common syntax for the exchange and processing of metadata.
RDF is the result of a number of metadata communities bringing together their needs to provide a robust and flexible architecture for supporting metadata on the web. RDF is a collaborative design effort. Several W3C Member companies are contributing intellectual resources. It is drawing upon the XML design as well as proposals submitted by Microsoft's XML-Data and Netscape. Other metadata efforts, including the Dublin Core and the Warwick Framework have also influenced the design of the RDF.
RDF provides a model for describing resources. RDF defines a resource as any object that is uniquely identifiable by an Uniform Resource Identifier (URI). Resources have properties (attributes or characteristics). A collection of properties that refers to the same resource is called a description. At the core of RDF is a syntax-independent model for representing resources and their corresponding descriptions.
The underlying structure of any expression in RDF can be viewed as a directed labeled graph, which consists of nodes and labeled directed arcs that link pairs of nodes . The RDF graph is a set of triples: Subject Node, Property Arc and Object Node. The triple is shown on Figure 1.
Each property arc represents a statement of a relationship between the things denoted by the nodes that it links, having three parts:
- a property that describes some relationship (also called a predicate),
- a value that is the subject of the statement, and
- a value that is the object of the statement.
The direction of the arc is significant: it always points toward the object of a statement.
For example, the data model corresponding to the statement "the author of Document 1 is John Smith" has a object Document 1, an arc called author and a corresponding subject John Smith. The data model corresponding to this statement is graphically expressed as Figure 2.
The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.
In order to encode the graph in XML, the nodes and arcs are represented by XML element names, attribute names, element content and attribute content .
A graph can be considered a collection of paths of the form Node, Arc, Node, Arc, Node, Arc, ... Node which cover the entire graph. In RDF/XML these turn into sequences of elements inside elements which alternate between elements for Nodes and Arcs. This has been called a series of Node/Arc stripes. The Node at the start of the sequence turns into the outermost element, the next arc turns into a child element, and so on. The stripes generally start at the top of an RDF/XML document and always begin with nodes.
To create a complete RDF/XML document, the serialization of the graph into XML must be contained inside an rdf:RDF XML element which becomes the top-level XML document element. The rdf: prefix is bound to the RDF namespace --- "http://www.w3.org/1999/02/22-rdf-syntax-ns#" --- so that applications can recognize this is an RDF/XML document.
RDF provides the ability for resource description communities to define semantics. It is important, however, to disambiguate these semantics among communities. The property-type "author", for example, may have broader or narrower meaning depending on different community needs. As such, it is problematic if multiple communities use the same property-type to mean very different things. To prevent this, RDF uniquely identifies property-types by using the XML namespace mechanism. XML namespaces provide a method for unambiguously identifying the semantics and conventions governing the particular use of property-types by uniquely identifying the governing authority of the vocabulary. For example, the property-type "author" defined by the Dublin Core Initiative as the "person or organization responsible for the creation of the intellectual content of the resource" and is specified by the Dublin Core CREATOR element. An XML namespace is used to unambiguously identify the Schema for the Dublin Core vocabulary by pointing to the definitive Dublin Core resource that defines the corresponding semantics.
The fifteen basic elements of the Dublin Core Element Set may be considered to comprise a single Namespace , which is available for reference on the World Wide Web at "http://purl.org/dc/elements/1.1/".
Each Dublin Core element is represented in RDF as the XML element
- whose namespace URI is "http://purl.org/dc/elements/1.1/", and
- whose local name is equal to the name of the Dublin Core element it represents.
The rdf:RDF XML element is a top-level XML document element in the RDF/XML document; and each resource described is enclosed in a rdf:Description container element. Table 4 gives the example of the Dublin Core description expressed in RDF/XML.
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description> <dc:creator>Karl Mustermann</dc:creator> <dc:title>Algebra</dc:title> <dc:subject>mathematics</dc:subject> <dc:date>2000-01-23</dc:date> <dc:language>EN</dc:language> <dc:description>An introduction to algebra</dc:description> </rdf:Description> </rdf:RDF>Table 4: Dublin Core description expressed in RDF/XML
XML Namespaces extends XML with universal names, whose scope extends beyond their containing document. This feature is vital for documents containing multiple markup vocabularies, resolving problems of recognition and collision, and plays a key role for a score of XML technologies and applications.
A support for namespaces in SXML may be considered as a superset to that in XML. Alongside with support for namespace prefixes, SXML provide an ability for direct use of URIs in universal names, and a mechanism of namespace identificators.
- Extensible Markup Language (XML) 1.0 (Second Edition). W3C Recommendation 6 October 2000. http://www.w3.org/TR/REC-xml
- Namespaces in XML. World Wide Web Consortium 14-January-1999. http://www.w3.org/TR/REC-xml-names/
- James Clark. XML Namespaces. February 4, 1999. http://www.jclark.com/xml/xmlns.htm
- Kirill Lisovskiy, Dmitry Lizorkin. SXML: an XML document as an S-expression. Russian Digital Libraries Journal, 2003, Vol. 6, No 2. http://www.elbib.ru/journal/2003/200302/LK/LK.en.html
- Oleg Kiselyov. SXML, Revision 2.5. August 9, 2002. http://okmij.org/ftp/Scheme/SXML.html
- Request for Comments: 2413. Dublin Core Metadata for Resource Discovery. Network Working Group. September 1998. http://rfc-2413.rfclist.org/rfc-2413.htm
- Request for Comments: 2731. Encoding Dublin Core Metadata in HTML. Network Working Group. December 1999. http://www.ietf.org/rfc/rfc2731.txt
- Eric Miller. An Introduction to the Resource Description Framework. D-Lib Magazine. May 1998. http://www.dlib.org/dlib/may98/miller/05miller.html
- Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C Working Draft 23 January 2003. http://www.w3.org/TR/rdf-concepts/
- RDF/XML Syntax Specification (Revised). W3C Working Draft 23 January 2003. http://www.w3.org/TR/rdf-syntax-grammar
- Stefan Kokkelink, Roland Schwanzl. Expressing Qualified Dublin Core in RDF / XML. DCMI Proposed Recommendation. 2002-04-14. http://dublincore.org/documents/2002/04/14/dcq-rdf-xml/
Dmitry Lizorkin - a Ph.D. student in the Moscow State University. His M.Sc. thesis, defended in 2002, was dedicated to implementation of XML Linking Language (XLink) using functional methods.e-mail: email@example.com
Kirill Lisovskiy - PhD, IT Consultant and Senior Researcher Institute for System Programming Russian Academy of Science. His primary area of research interests is functional and logic techniques for semistructured data management. Since 1999 he had participated in a number of research and development projects related to implementation and application of XML data management techniques based on the Scheme programming language.e-mail: firstname.lastname@example.org http://pair.com/lisovsky
© K. Lisovsky, D. Lizorkin, 2003