XSLT and XLink and their implementation with functional techniques
Kirill Lisovsky, Dmitry Lizorkin
Institute for System Programming RAS, Moscow State University
XSLT is a language for transforming XML documents into other XML documents. XLink is a language for describing links between resources using XML attributes and namespaces. This paper gives an overview of XSLT and XLink and discusses their application for Digital Libraries.
Powerful XSLT and XLink processors can be implemented with functional methods. This paper introduces the XSLT processor STX and the XLink processor SXLink, both implemented in functional programming language Scheme. These processors are based on the SXML data model -- XML Information Set represented in the form of S-expressions. STX and SXLink can be applied as effective transformation and linking tools for XML Digital Libraries.
XSLT  is a declarative language for transforming XML documents into another format, often for presentation. Output formats are generally XML or HTML.
XML technologies are used in virtually every area of information technologies now. Digital Libraries (DL) are not an exception here, and XML is generally considered as a key technology for modern Digital Libraries .
While XML is a convenient format for the internal representation of data; however it's not intended for visual presentation. To present information for a Digital Library user, XML data has to be transformed into a different form, e.g. into an HTML document, which can then be viewed with a web browser by a human. XSLT developed by the W3 Consortium is the declarative language to describe such transformations. XSL transformations are described as so-called stylesheets represented in XML format.
Modern Digital Libraries store different kinds of information: text, pictures, sound, video, etc. In context if Digital Library applications we often have to deal with vast resources of a great variety, multiple representation formats, rendering techniques and so on. These makes it difficult (or even impossible) to represent such a data as a single information unit (e.g. as a single file) in a reasonable way. Typically, DL data have to be represented as a set of multiple resources. Since these resources are semantically related, they should not exist as entirely independent ones, and it is often desirable to specify their relationship using links. XML Linking Language (XLink)  developed by the W3 Consortium, may be effectively used for this purpose: it is a language for describing links between resources using XML of a separate namespace.
Besides associating resources into a group, XLink allows to specify a role played by each resource participating in a link. Additionally, if traverses are intended between different resources in the Digital Library (e.g. the ability to view a picture by activating some spot in the text), XLink links can also specify semantics for these traversals.
Sections 2 and 3 give an overview of XSLT and XLink respectively. Section 4 discusses the perfect suitability of functional techniques for XSLT implementation, and introduces STX -- XSLT implementation in functional programming language Scheme. Section 5 considers XLink implementation in Scheme -- SXLink. Section 6 discusses the issue of XSLT and XLink integration by functional methods.
2 XSLT overview
XSLT is one of the most powerful and actively used technologies of the XML family. It has earned wide recognition and is being actively used in industry, usually as a language for transforming XML documents into another format, often for presentation.
While XSLT is a Turing complete programming language. it was never intended as a general-purpose XML transformation language, let alone XML application development language. As it is clearly stated in XSLT Requirements , turning XSLT into a general-purpose programming language is explicitly not a goal.
A transformation expressed in XSLT (a stylesheet) describes a set of rules for a transformation of a source tree into a result tree. The result tree is a separate from a source tree. The structure of a result tree can be completely different from the structure of a source tree. In the process of a result tree construction, elements from a source tree can be filtered and reordered, and an arbitrary structure can be imposed.
A stylesheet contains a set of template rules. A template rule consists of two parts: a pattern which is matched against nodes in a source tree and a transformation template which can be instantiated to form a part of a new result tree. A transformation is performed by pattern matching and invocation of associated templates:
- A pattern is an XPath expression and is matched against elements in a source tree. A node matches the pattern if the node is a member of the resulting nodeset of the evaluation of the pattern as XPath expression in context of the node being matched or one of its ancestors. For any given node in the source tree, a template associated with a successfully matched pattern is instantiated.
- A template is instantiated for this particular node to create part of a result tree. A template can contain an arbitrary XML data which will be used as fragments of a result tree, as well as XML elements from the XSLT namespace that are instructions for creating result tree fragments. When a template is instantiated, each instruction is executed and replaced by a result tree fragment that it evaluates to. A result tree is constructed as a result of a template (matched for a root node) instantiation.
For a simple transformation a stylesheet can often consist of only a single template, which is used as a template for the complete result tree. This approach is especially popular for a transformation of data-centric XML documents.
Figure 1 illustrates a sample XSLT stylesheet. This stylesheet transforms a digital document represented as XML data into a presentational form in XHTML. The stylesheet is denoted by the global document element xsl:stylesheet from the XSLT namespace "http://www.w3.org/1999/XSL/Transform". The stylesheet usually consists of several XSLT templates specified by their own xsl:template elements. For the sake of brevity, figure 1 shows just a single template. A template generally has a pattern which is specified as XPath expression represented as a value of template's match attribute, and is used for pattern matching against XML nodes in a source tree. The pattern on figure 1 matches all title elements which have doc parents. For every element successfully matched, the template is instantiated (with respect to this element) to create a part of a result tree, in accordance with the content of xsl:template. In our example on figure 1, transformation is recursively applied to the children of the element that matched the pattern (XSLT instruction xsl:apply-templates does that job) and the result is enclosed into a level 1 heading. Supposing that the title element that has the doc parent contains the title of the document, this template will create a level 1 heading containing this title.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="doc/title"> <h1> <xsl:apply-templates/> </h1> </xsl:template> <!-- Some more templates --> </xsl:stylesheet>
Figure 1: A sample stylesheet for transformation of an electronic document into XHTML. The template shown transforms a document title into a level 1 heading.
3 XLink overview
XML Linking Language (XLink) is a language for describing links between resources using XML syntax.
XLink provides all the functionality of HTML hyperlinks and much more: it allow to assert linking relationships among more than two resources; to associate some metadata with a link; to link resources without their modification .
XLink allows to associate arbitrary resources, not just XML-encoded ones. A resource is defined by IETF RFC 2396 as any addressable unit of information or service. If a resource is a well-formed XML document, its portion specified by a fragment identifier in XML Pointer Language (XPointer) is also treated as a resource. Such an XPointer fragment identifier can additionally be attached to the URI that addresses the XML document.
XLink offers two kinds of links:
Extended link. Extended links offer full XLink functionality, such as inbound and third-party arcs, and also can associate an arbitrary number of participating resources. The participating resources may be any combination of local and remote.
In terms of XLink, a local resource is an XML element that participates in a link by virtue of having a linking element as its parent. A resource that participates in a link by virtue of being addressed with a URI reference is considered a remote resource, even if it is in the same XML document as the link, or even inside the same linking element.
As a result, extended links can have a fairly complex structure, including elements for pointing to remote resources, elements for containing local resources, elements for specifying arc traversal rules, and elements for specifying human-readable resource and arc titles.
Typically, extended linking elements are stored separately from the resources they associate (for example, in entirely different documents). Thus, extended links are important for situations where the participating resources are read-only, or where it is expensive to modify and update them but inexpensive to modify and update a separate linking element, or where the resources are in formats with no native support for embedded links (such as many multimedia formats).
XLink defines a way to give an extended link special semantics for finding linkbases. Used in this fashion, an extended link helps an XLink application process other links.
Figure 2 gives the use case of XLink extended link. In this example, we link a book in the Digital Library, a book's author and a publisher. We give the description of the author as an XLink local resource, with all the necessary markup residing within the linking element. The book and the publisher are referred to by their URIs, so they are XLink remote resources. Each of these three resources is marked with its own label. These labels are used to specify traversals. We define two traversals: from the author to the book and from the book to the publisher. Each traversal is specified by its own XLink arc element. Note that the arc from the book to the publisher connects two remote resources.
<!--XLink extended link. The element can have an arbitrary name--> <MyExtendedLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="extended"> <!--XLink resource element. Specifies the local resource in terms of XLink--> <author xlink:type="resource" xlink:label="A"> <!--Some markup related to the local resource--> <name>John</name> <surname>Smith</surname> </author> <!--XLink locator element. Specifies the remote resource by addressing its URI--> <book xlink:type="locator" xlink:label="B" xlink:href="http://library.com/book.xml"/> <publisher xlink:type="locator" xlink:label="P" xlink:href="http://publisher.com"/> <!--XLink arc element. Specifies traverses between resources labeled "A" and "B"--> <MyArcElement xlink:type="arc" xlink:from="A" xlink:to="B"/> <MyArcElement xlink:type="arc" xlink:from="B" xlink:to="P"/> </MyExtendedLink>
Figure 2: XLink extended link example. Establishes the relationship between the book (a remote resource), its author (a local resource) and a publisher (a remote resource).
A simple link is a link that associates exactly two resources, one local and one remote, with an arc going from the former to the latter. Thus, a simple link is always an outbound link. An outbound link with exactly two participating resources is the most popularly applied one (for instance, HTML-style A and IMG links fall into this category). Because simple links offer less functionality than extended links, they have no special internal structure.
While simple links are conceptually a subset of extended links, they are syntactically different. For example, to convert a simple link into an extended link, several structural changes would be needed.
The purpose of a simple link is to be a convenient shorthand for the equivalent extended link. A single simple linking element combines the basic functions of an extended-type element, a locator-type element, an arc-type element, and a resource-type element. In cases where only a subset of these elements' features is required, the XLink simple linking element is available as an alternative to the extended linking element.
The simple link is illustrated on figure 3. It is the HTML-like A hyperlink leading to the book's table of contents. However, unlike HTML A hyperlink, XLink simple link doesn't require a named anchor to address the document's fragment, since an XPointer fragment identifier provides more flexible features.
<!--XLink simple link. The element can have an arbitrary name--> <MySimpleLink xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://library.com/book.xml#xpointer(doc/contents)"> Go to table of contents <!--Some more complicated markup can be described here--> </MySimpleLink>
Figure 3: XLink simple link example.
It's worth a note that XLink specification is mostly focused on linking data structures, and not a link behavior model. Any kind of sophisticated link processing and rendering is a responsibility of a higher-level applications layered on top of XLink.
4 STX: XSLT implementation in Scheme
In this paper we'll consider STX (Scheme-enabled Transformation of XML data), an XML transformation tool based on XSLT and Scheme which combines a processor for most common XSLT stylesheets and a framework for their extension in Scheme. STX provides an environment for a general-purpose transformation of XML data. It integrates two functional languages -- Scheme and XSLT-like transformation language on the basis of the common data model -- SXML.
STX is based on SXSLT , and may be considered as its specialization toward compatibility with XSLT. In particular, it employs pre-order traversal algorithm of SXSLT specialized and optimized for XSLT-like transformation of XML/SXML data.
4.1 Motivation for XSLT implementation in Scheme
XSLT is designed primarily for the kinds of transformations that are required when XSLT is used as part of the XSL stylesheet language . For more sophisticated processing of XML data XSLT is usually used in conjunction with a general-purpose programming language, such as Java or Python, which leads to well-known problem of impedance mismatch due to the differences in programming paradigms and data models.
XSLT may be considered as a functional language, and XML data model is really close to Scheme's S-expressions . Scheme is widely used as an extension language and as an XML-processing language, which makes it a natural candidate for an XSLT extension language.
XSLT is a Turing complete programming language, and many impressive examples prove its applicability for a number of different tasks. Still the majority of the "real life" applications use a small subset of XSLT for a relatively simple transformation of XML documents. Some of the XSLT tutorials  even claim that just a few XSLT commands are sufficient for most practical needs.
STX (Scheme-enabled Transformation of XML data) is intended for extension of typical trivial XSLT stylesheets with a program code in Scheme. In such a system, XSLT-compatible templates describe a presentation, while more sophisticated data transformation, data analysis, or even business logic are expressed in Scheme.
Such an approach makes it possible to reuse customer XSLT skills and protects the investment in presentational XSLT stylesheets and provides advanced data processing and application capabilities which make it possible to implement a complete business solution inside the proposed framework.
4.2 STX architecture
XSLT uses a slightly modified data model of XPath  which may be considered as a labeled nodes tree representation of XML Information Set . STX is based on the SXML data model, since SXML is an implementation of XML Information Set in the form of S-expressions.
SXML was designed with the goal of effective evaluation of XPath queries in the first place. SXML has formal specification  and is accompanied by a stack of major XML-related technologies implemented in Scheme. Two of these technologies, namely an XML parser SSAX and an SXML query language SXPath constitute a core of STX. Both SXML and XPath data models are based on the XML Information Set represented as a tree-like structure. Similarities between these data models are obvious , and their mutual mapping is straightforward, which simplifies their integration significantly. Due to these reasons, STX is based on the SXML data model.
Pattern matching in STX is performed in accordance with XSLT Recommendation. Template rules identify the nodes to which they apply by using a pattern, which is an XPath expression. STX uses a straightforward algorithm for pattern matching based on SXPath -- XPath implementation in Scheme . Pattern matching in STX is performed by applying SXPath to a pattern, transforming it into a nodeset selection function, and applying this function sequentially to the matched node and its parents until the matched node is a member of a resulting nodeset or root node is reached. In the first case the node matches a pattern, in the second case it does not.
Seamless XSLT and Scheme integration was the primary objective for the design of STX. It is achieved using a common data model and computational paradigm for both components of STX.
XSLT templates are essentially pure functions -- each template defines a fragment of the output tree as a function of a fragment of the input tree and produces no side effects . STX transforms its templates into lists of Scheme functions. In every list, the first function is an SXPath-generated function which is used for pattern matching, the second one is a template transformation function. This transformational function may be described in Scheme (in the case of stx:template) as well as in XSLT (in the case of xsl:template). In this context, XSLT may be considered as a syntactic sugar for Scheme code it is transformed to. This idea is illustrated by picture 1.
Picture 1: STX processing model. STX accepts stylesheets described in XSLT, in Scheme, as well an in their combination.
STX emulates a simple "presentational" subset of XSLT. Less popular XSLT constructs are not supported, but may be replaced by templates in Scheme. This category consists mostly of "programmatic" constructs such as conditional processing instructions, variable and parameter handling. We believe that this constructs are more suitable for a programming language (Scheme) than for a "stylesheet" language (XSLT).
XSLT may be considered as a pure functional language, but it has a serious limitation: in XSLT functions are not considered as first-class objects. This limitation is especially cruel for an XML transformation language because it dramatically reduces the class of implementable tree-processing algorithms. Naturally, there is no such a limitation in Scheme which makes Scheme-coded functions and templates much more effective and flexible in this application domain.
5 XLink implementation in Scheme as SXLink
SXLink is the XLink implementation and Scheme Application Program Interface (API). SXLink fully supports XLink syntax and semantics and provides the API for processing multiple documents connected by means of XLink.
5.1 Motivation for XLink implementation in Scheme
XLink is not a general-purpose programming language and is thus not sufficient for developing a complete XML application. For writing a complete XML application, XLink has to be used in conjunction with some other language, which often leads to an already mentioned problem of impedance mismatch.
There in no such problem in Scheme, because S-expressions provide a uniform framework for representing documents with XLink elements -- in the form of SXML. The ability to use the same high-level programming language for both XML processing and writing applications is one of the important advantages of the approach implemented in SXLink.
Moreover, the task of processing documents containing XLink elements is the task of tree walk and transformation. This task fits the paradigm of transforming hierarchical S-expressions perfectly, and standard algorithms and Scheme tools can be applied here.
5.2 SXLink architecture
The important part of SXLink is the specialized SSAX parser which recognizes all XLink constructs presented in an XML document and transforms it into extended SXML.
Since XLink links generally involve multiple documents being linked, SXLink provides convenient means of working with a set of documents as a whole.
As discussed in section 3, XLink specification focuses mainly on providing linking data structures, and does not provide any API for link processing. We have designed such an API and implemented in SXLink. It is worth to mention the following operations provided by SXLink:
Automatic link validation. This operation allows to ensure whether each XLink link is semantically meaningful, i.e.
- all resources participating in links are available;
- all portions of resources defined by XPointer fragment identifiers are available;
- (if required) XLink roles and/or arcroles are URIs of existing resources.
Link resolution. This operation transforms all XLink links to outbound ones. The resulting document (called the "resolved document") is equivalent to the initial one from the point of XLink semantics, but becomes simpler from the point of further processing by other applications, because it is easy to recognize the start of the traversal for an outbound link. For example, the resolved document can then be easily transformed to HTML and viewed by a browser.
Node inclusion. This operation is the further development of link resolution. Node inclusion replaces every document's node being a starting resource of a link with a corresponding ending resource of this link. This operation is particularly useful when link's ending resources specify some detailed information and it is desirable to construct a new document with all this information explicitly included.
SXLink provides additional flexibility of operations considered by using Scheme functions as first-class objects.
6 Functional integration of XSLT and XLink
This section describes several use cases of integrating XSLT and XLink targeted to improve expressive power and flexibility of XML transformation and linking. This approach to XSLT and XLink integration benefits significantly if implemented using functional techniques, as a further extension to STX and SXLink.
6.1 Linking a document and its stylesheet
As discussed in section 2, a single XSLT stylesheet can be applicable to a wide class of documents that have similar source tree structures. However, XSLT specification doesn't provide any mechanism of describing the association between a stylesheet and an XML document to which this stylesheet can meaningfully applied.
XLink is a natural candidate for describing these associations as links. With their ability to reside in a location separate from the linked resources , these XLink links can be specified in the most convenient location for a particular practical application:
- in an XSLT stylesheet,
- or in an XML document for which this stylesheet is designed,
- or even as a separate XML document (called a linkbase).
XSLT processor extended with XLink support will consult this linking information for choosing an appropriate XSLT stylesheet for a given XML document. Such an intellectual XSLT processor can be conveniently implemented as an integration of STX and SXLink, since both of them are based on the common data model -- SXML and algorithms (such as SXPath).
6.2 XSLT transformation as a view for an ending resource in XLink traversal
It may be desirable to specify the view for the link's ending resource (obtained as the result of link traversal) for an XLink link defined.
Although XLink specification does not provide this feature explicitly, it provides a mechanism that allows us to make such an extension.
XLink specification provides a global attribute arcrole in XLink namespace, intended for specification of the link's ending resource (relative to its starting resource) semantics. The arcrole attribute corresponds to an RDF notion of a property , where the role can be interpreted as stating that "starting-resource has arc-role ending-resource." In accordance with XLink specification , the value of an arcrole attribute must be a URI reference which identifies some resource that describes the intended property.
We suggest that one of the possible applications of XLink arcrole attribute is to specify a URI of the XSLT stylesheet. If an XLink arcrole attribute identifies the stylesheet, than it should be used for producing a presentation for the link's ending resource when the link is traversed.
The idea is illustrated by the figure 4. The XLink link leads to a book in a digital library, and the arcrole attribute identifies the stylesheet that would specify the view of the book obtained when following the link. For example, the stylesheet might transform the book into HTML. It's worth a note that such a view depends on a link traversed. Another link leading to the same book might use another XSLT stylesheet, and thus different presentations for the same book will be obtained when following different links. This behavior fully corresponds with the semantics of XLink arcrole attribute defined in XLink specification.
<link-to-a-book xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://library.com/book.xml" xlink:arcrole="http://library.com/transform.xsl"/>
Figure 4: XLink arcrole for specifying the XSLT stylesheet which serves for a view for the ending resource (the book in a digital library)
SXLink can be extended with such a functionality, since it fully recognizes XLink syntax and thus the arcrole attribute, and integration of SXLink and STX is straightforward.
6.3 XSL Transformation of multiple linked XML documents
XSL transformation generally involves a single XML document being processed. This is not surprising, taking into account that XSLT Recommendation appeared two years before XLink Recommendation.
Transformation of an XML document which is linked to another documents, with a due respect to these links is a challenging problem. Performing a full recognition of XLink markup in XSLT patterns is a cumbersome solution, since XLink syntax is quite verbose and XLink semantics is rather sophisticated. Instead, we suggest processing of XLink markup by XLink processor integrated with XSLT processor. In this case, just a minor extension of XSLT syntax is required for a transformation of linked XML documents.
Namely, we suggest to add one more axis to XPath (used as a part of XSLT): the traverse axis which will contain all nodes that can be traversed from the context node by XLink links. Using this additional axis we can express XSL transformations involving multiple connected documents in an understandable and concise way.
Figure 5 illustrates the idea of such an extended XSL template which transforms some bibliography expressed in XML. The traverse axis can be used in a match-pattern of the template to specify that a template matches any node in the bibliography that is an XLink link to a book. Note that this book can be located in a different XML document. In the template body, the traverse axis allows us to construct text from information located in a different document (xsl:value-of), and even to apply templates to a part of the remote document (xsl:apply-templates).
<xsl:template match="node()[traverse::book]"> ... <xsl:value-of select="traverse::book/title"/> ... <xsl:apply-templates select="traverse::book/chapter/section"/> ... </xsl:template>
Figure 5: XSL template extended with XLink support
Both XSLT and XLink occupy their own unique niches in the stack of XML technology. These two languages are quite important in the context of XML-based Digital Libraries.
Functional techniques are perfectly suitable for XML processing, such as XML transformation and linking. In this paper we considered some applications of functional techniques for implementation of XSLT-like XML transformation tool STX and XLink processor SXLink. STX and SXLink provide a powerful and flexible technology for XML-based Digital libraries. XSLT and XLink may benefit a lot from a mutual integration, and integration of STX and SXLink using functional techniques provides a practical approach to such an integration.
- XSL Transformations (XSLT) Version 1.0. W3C Recommendation 16 November
- Cuneiform Digital Library Initiative to Use XML Encoding for Third Millennium
- XML Linking Language (XLink) Version 1.0. W3C Recommendation 27 June 2001.
- XSLT Requirements Version 2.0. W3C Working Draft, February 2001.
- Extensible Markup Language (XML) 1.0 (Second Edition). W3C Recommendation
6 October 2000.
- Namespaces in XML. World Wide Web Consortium 14-January-1999.
- XML XLink Requirements Version 1.0. W3C Note 24-Feb-1999.
- Document Style Semantics and Specification Language (DSSSL). ISO/IEC 10179:1996(E).
- Kiselyov O. XML and Scheme. Workshop on Scheme and Functional Programming
2000, Montreal, 2000.
- Oleg Kiselyov, Shriram Krishnamurthi. SXSLT: Manipulation Language for
XML. Practical Aspects of Declarative Languages, 5th International Symposium,
- O. Kiselyov, K.Lisovsky. XML, XPath, XSLT Implementation as SXML, SXPath
and SXSLT. International Lisp Conference ILC 2002, San Francisco. October,
- Oleg Kiselyov. SXML, Revision 2.5. August 9, 2002.
- D. Jacobs. Rescuing XSLT from Niche Status.
- XML Path Language (XPath) Version 1.0. W3C Recommendation 16 November 1999.
- XML Information Set. W3C Recommendation 24 October 2001.
- Kirill Lisovsky. STX: Scheme-enabled XSLT processor.
- Kirill Lisovsky, Dmitry Lizorkin. XML Path Language (XPath) and its functional
implementation SXPath. Russian Digital Libraries Journal, 2003, Vol. 6, Issue
- M. Kay. What kind of language is XSLT? February 2001.
- Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C
Working Draft 23 January 2003.
Dmitry Lizorkin - a Ph.D. student in the Moscow State University. His M.Sc. thesis, defended in 2002, was dedicated to implementation of XML Linking Language (XLink) using functional methods.
Kirill Lisovskiy - PhD, IT Consultant and Senior Researcher Institute for System Programming Russian Academy of Science. His primary area of research interests is functional and logic techniques for semistructured data management. Since 1999 he had participated in a number of research and development projects related to implementation and application of XML data management techniques based on the Scheme programming language.