XMLtutorial
XMLtutorial
http://www.brics.dk/~amoeller/XML/
Anders Møller
http://www.brics.dk/~amoeller
and
Michael I. Schwartzbach
http://www.brics.dk/~mis
at the BRICS research center at University of Aarhus, Denmark.
A PDF version suitable for printing and off-line browsing is available upon request.
See also our tutorial Interactive Web Services with Java covering Web
programming with Java, JSP, Servlets, and JWIG.
2
Contents
1. HTML and XML - structuring information for the future (21 pp.) 4
5. XSL and XSLT - stylesheets and document transformation (21 pp.) 116
7. DOM, SAX, and JDOM - programming for XML (15 pp.) 162
8. W3C - some background on the World Wide Web Consortium (5 pp.) 183
3
Markup Languages: HTML
and XML
HTML - original motivation, development, and inherent limitations:
Selected links:
4
Hyper-Text Markup Language
HTML: Hyper-Text Markup Language
What is hyper-text?
● a document that contains links to other documents (and text, sound, images...)
● links may be actuated automatically or on request
● linked documents may replace, be inlined, or create a new window
● most combinations are supported by HTML
The start of the HTML for this page, with text, tags, and attributes:
<table width="99%">
<tr>
<td align=left>
<a href="../index.html">
<img src="../home.gif" border=0>
</a>
<a href="../info.html">
<img src="../info.gif" border=0>
</a>
<td align=right>
<a href="index.html">
<img src="../left.gif" border=0>
</a>
<a href="index.html">
<img src="../up.gif" border=0>
</a>
<a href="motivation.html">
<img src="../right.gif" border=0>
</a>
</td>
</tr>
</table>
5
<p>
<h1>Hyper-Text Markup Language</h1>
What is <b>hyper-text</b>?
<ul>
<li>a document that contains <b>links</b> to other documents
(and text, sound, images...)
<li>links may be <b>actuated</b> automatically or on request
<li>linked documents may <b>replace</b>, be <b>inlined</b>,
or create a <b>new</b> window
<li>most combinations are supported by HTML
</ul>
6
Original motivation for HTML
Exchange data on the Internet:
HTML was created by Tim Berners-Lee and Robert Caillau at CERN in 1991:
7
Compact and human
readable
Many document formats are very bulky:
Furthermore, HTML documents can be written and modified with any raw-text
editor.
8
From logical to physical
structure
Originally, HTML tags described logical structure:
The early hack for commercial pages was to make everything a huge image:
The HTML developers responded with more and more physical layout tags.
9
Stylesheets
Cascading Style Sheets (CSS):
Using stylesheets, all tags become logical - however, CSS stylesheets only
address superficial properties of documents.
B {color:red;}
B B {color:blue;}
B.foo {color:green;}
B B.foo {color:yellow;}
B.bar {color:maroon;}
In the HTML document, the most specific properties are chosen, so:
10
<b class=foo>Hey!</b>
<b>Wow!!
<b>Amazing!!!</b>
<b class=foo>Impressive!!!!</b>
<b class=bar>k00l!!!!!</b>
<i>Fantastic!!!!!!</i>
</b>
When properly used, the physical layout (a CSS file) is separated from logical
structure and the actual contents (a HTML file).
With CSS stylesheets, any tag can be made to look like any other tag.
11
Different versions of HTML
HTML has been developed extensively over the years:
1992
HTML is first defined
1993
HTML+ (some physical layout, fill-out forms, tables, math)
1994
HTML 2.0 (standard for core features)
HTML 3.0 (an extension of HTML+ submitted as a draft standard)
1995
Netscape-specific non-standard HTML appears
1996
Competing Netscape and Explorer versions of HTML
HTML 3.2 (standard based on current practices)
1997
HTML 4.0 (separates structure and presentation with stylesheets)
1999
HTML 4.01 (slight modifications only)
2000
XHTML 1.0 (XML version of HTML 4.01)
2001
XHTML 1.1 (modularization to allow different subsets)
2002
XHTML 2.0 (simplifying and generalizing several tags)
12
Syntax and validation
HTML 4.01 has a precise and formal syntax definition.
13
Browsers are forgiving
Most HTML documents are in fact not valid:
Lousy
<h2>Lousy HTML</h1>
HTML
<li><a>This is not very</b> good.
<li><i>In fact, it is quite <g>bad</g></em> ● This is not very
</ul> good.
But the browser does <a naem=suck>something. ● In fact, it is quite
bad But the
browser does
something.
This is problematic:
14
Structuring general information
Consider the following recipe collection published in HTML:
<h1>Rhubarb Cobbler</h1>
<h2>Maggie.Herrick@bbs.mhv.net</h2>
<h3>Wed, 14 Jun 95</h3>
<table>
<tr><td> 2 1/2 cups <td> diced rhubarb (blanched with boiling water, drain)
<tr><td> 2 tablespoons <td> sugar
<tr><td> 2 <td> fairly ripe bananas sliced 1/4" round
<tr><td> 1/4 teaspoon <td> cinnamon
<tr><td> dash of <td> nutmeg
</table>
15
Problems with HTML
● The language is by design hardwired to describe hypertext:
❍ there is a fixed collection of tags with a fixed semantics
16
What is XML?
XML: eXtensible Markup Language
● there is no fixed collection of markup tags - we may define our own tags,
tailored for our kind of information
● each XML language is targeted at its own application domain, but the
languages will share many features
● there is a common set of generic tools for processing documents
17
HTML vs. XML
Consider the HTML recipe collection again:
<h1>Rhubarb Cobbler</h1>
<h2>Maggie.Herrick@bbs.mhv.net</h2>
<h3>Wed, 14 Jun 95</h3>
<table>
<tr><td> 2 1/2 cups <td> diced rhubarb
<tr><td> 2 tablespoons <td> sugar
<tr><td> 2 <td> fairly ripe bananas
<tr><td> 1/4 teaspoon <td> cinnamon
<tr><td> dash of <td> nutmeg
</table>
With XML, we can instead define our own "recipe markup language" where the markup tags
directly correspond to concepts in the world of recipes:
<description>
Rhubarb Cobbler made with bananas as the main sweetener.
It was delicious.
</description>
<ingredients>
<item><amount>2 1/2 cups</amount><type>diced rhubarb</type></item>
<item><amount>2 tablespoons</amount><type>sugar</type></item>
<item><amount>2</amount><type>fairly ripe bananas</type></item>
<item><amount>1/4 teaspoon</amount><type>cinnamon</type></item>
<item><amount>dash of</amount><type>nutmeg</type></item>
</ingredients>
<preparation>
Combine all and use as cobbler, pie, or crisp.
</preparation>
18
</recipe>
Later:
● XML Schema will later be used to define our class of recipe documents
● XSLT will be used to transform the XML document into XHTML (or HTML), including
automatic construction of index, references, etc.
● XLink, XPointer, and XPath could be used to create cross-references
● XQuery will be used to express queries
19
A conceptual view of XML
An XML document is an ordered, labeled tree:
● character data leaf nodes contain the actual data (text strings)
❍ usually, character data nodes must be non-empty and non-adjacent
Unfortunately, XML is not as simple as it could be, and there is still no agreement on
XML tree terminology :-(
20
A concrete view of XML
An XML document is a (Unicode) text with markup tags and other meta-
information.
● <![CDATA[<greeting>Hello, world!</greeting>]]>
White-space (blanks, newlines, etc.) is used both for indentation and actual
contents. (xml:space attribute provides some control.)
21
Other meta-information:
<?target data...?>
an instruction for a processor, target identifies the processor for which it
is directed, data is a string containing the instruction
<!-- comment -->
a comment, will be ignored by all processors
<!DOCTYPE ...>
document type declaration (described later...)
22
Applications of XML
There are already hundreds of serious applications of XML.
XHTML
CML
WML
ThML
23
<h3 class="s05" id="One.2.p0.2">Having a Humble Opinion of Self</h3>
<p class="First" id="One.2.p0.3">EVERY man naturally desires knowledge
<note place="foot" id="One.2.p0.4">
<p class="Footnote" id="One.2.p0.5"><added id="One.2.p0.6">
<name id="One.2.p0.7">Aristotle</name>, Metaphysics, i. 1.
</added></p>
</note>;
but what good is knowledge without fear of God? Indeed a humble
rustic who serves God is better than a proud intellectual who
neglects his soul to study the course of the stars.
<added id="One.2.p0.8"><note place="foot" id="One.2.p0.9">
<p class="Footnote" id="One.2.p0.10">
Augustine, Confessions V. 4.
</p>
</note></added>
</p>
24
The recipe example
Consider again recipes, such as in this example (raw text file).
● recipes consist of ingredients, steps for preparation, possibly some comments, and a specification
of its nutrition
● an ingredient can be simple or composite
● a simple ingredient has a name, an amount (possibly unspecified), an a unit (unless amount is
dimensionless)
● a composite ingredient is recursively a recipe
This example (formatted XML file) contains five recipes. Abbreviated version:
25
From SGML to SML
- DocHeads vs. Simpletons, a process of simplification
application of SGML.
|
V
XML
❍ W3C Recommendation 1998
❍ a simple subset of SGML, targeted for Web applications
❍ now de facto standard
|
V
Canonical XML
❍ W3C Recommendation, March 2001
etc.
Occam's razor: "one should not increase, beyond what is necessary, the number of entities
required to explain anything"
26
SGML relics
- only a fool does not fear "external general parsed entities"
<?xml version="1.0"?>
<!DOCTYPE greeting [
<!ELEMENT greeting (#PCDATA)>
<!ATTLIST greeting style (big|small) "small">
<!ENTITY hi "Hello">
]>
<greeting> &hi; world! </greeting>
(described later...)
● entity declarations (ENTITY) - a simple macro mechanism
● notation declarations (NOTATION) - data format specifications
Unfortunately, they cannot always be ignored - all XML processors (even non-
validating ones) are required to:
27
XML technologies
XML is:
● hot ($$$)
● the standard for representation of Web information
● by itself, just a notation for hierarchically structured text
To "use XML":
1. define your XML language (use e.g. XML Schema to define its syntax)
2. exploit the generic XML tools (e.g. XSLT and XQuery processors), the
generic protocols, and the generic programming frameworks (e.g. DOM or
SAX) to build application tools
28
● XML Information Set
attempt to define common terminology for XML document concepts
("information set"=tree, "information item"=node, ...)
● XML-Signature
digital signatures of Web resources
● XML Encryption
encryption of Web resources
● XML Fragment Interchange
for dealing with fragments of XML documents
● XML Protocol and SOAP (Simple Object Access Protocol)
information exchange protocol
● XForms
a common sublanguage for input forms (with XHTML forms as a special
case)
● RDF (Resource Description Framework)
a framework for metadata (statements about properties and relationships)
29
Basic XML tools
Parsers
Editors
● Xeena (www.alphaWorks.ibm.com/tech/xeena)
From alphaWorks, in Java, with tree-view syntax directed editing
● XMLSpy (www.xmlspy.com)
Popular, but not free :-(
● + 1000 others...
30
Links to more information
www.w3.org/TR/REC-xml.html
the XML 1.0 specification
www.w3.org/TR/xml11
the XML 1.1 draft specification, minor changes to reflect Unicode revisions
www.w3.org/XML
W3C's XML homepage
www.xml.com
XML information by O'Reilly: articles, software, tutorials
www.oasis-open.org/cover
The XML Cover Pages: comprehensive online reference
www.xmlhack.com
<?xmlhack?>: concise XML news
news:comp.text.xml
XML newsgroup
www.ucc.ie/xml
XML FAQ
www.xml.com/axml/testaxml.htm
the Annotated XML Specification, by Tim Bray
metalab.unc.edu/xml
Cafe con Leche XML News and Resources
inf2.pira.co.uk/top011a.htm
El.pub's markup language section
wdvl.internet.com/Authoring/Languages/XML
links to XML information
www.w3schools.com/xml
XML School: an XML tutorial
www.garshol.priv.no/download/xmltools
a list of free XML tools
31
Namespaces, XInclude, and
XML Base
- common extensions to the core XML specification
Selected links:
32
Mixing XML languages
Consider an XML language WidgetML which uses XHTML as a sublanguage for
help messages:
<widget type="gadget">
<head size="medium"/>
<big><subwidget ref="gizmo"/></big>
<info>
<head>
<title>Description of gadget</title>
</head>
<body>
<h1>Gadget</h1>
A gadget contains a big gizmo
</body>
</info>
</widget>
This complicates things for processors and might even cause ambiguities.
33
Qualifying names
Simple solution: qualify names with URIs (Universal Resource Identifiers)
<{http://www.w3.org/TR/xhtml1}head>
\ / \ /
------------------------- --
qualifying URI local name
This is the idea - the actual solution is less verbose but slightly more
complicated...
34
Namespace declarations
Namespaces are declared by special attributes and associated prefixes:
<... xmlns:foo="http://www.w3.org/TR/xhtml1">
...
<foo:head>...</foo:head>
...
</...>
● declaration: xmlns="URI"
● default value: "" (means: treat as unqualified name)
● does not affect unprefixed attribute names (they belong to the containing
elements)
35
</widget>
This innocent question spawned a controversy that resulted in leaving the matter
undefined (by deprecating such namespaces).
Other controversies:
(Unfortunately, according to the spec, the choice of prefix may matter, and an
unqualified attribute generally does not just inherit the namespace from its
element.)
36
Combining XML documents
To enhance reuse and modularity, a technique for constructing new XML
documents from existing ones is desirable.
37
An XInclude example
A document containing:
<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="somewhere.xml"/>
</foo>
<bar>...</bar>
is equivalent to:
<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<bar>...</bar>
</foo>
38
XInclude details
How is the included resource denoted?
Other issues:
Many XInclude processors support only whole-document URIs, not full XPointer.
39
XML Base
A URI identifies a resource:
<... xml:base="http://www.daimi.au.dk/">
<... href="~mis/mn/index.html" .../>
</...>
Examples of applications:
Future XML parsers will support Namespaces, XInclude, and XML Base.
40
Links to more information
Namespaces:
www.w3.org/TR/REC-xml-names
the W3C XML Namespace Recommendation
www.jclark.com/xml/xmlns.htm
an explanation of the recommendation by James Clark
www.xml.com/xml/pub/1999/01/namespaces.html
an XML.com article on Namespaces
www.rpbourret.com/xml/NamespacesFAQ.htm
comprehensive Namespace FAQ
XInclude:
www.w3.org/TR/xinclude
XInclude, W3C Candidate Recommendation
www.ibiblio.org/xml/XInclude
a Java XInclude processor
XML Base:
www.w3.org/TR/xmlbase
the W3C XML Base Recommendation
41
DTD, XML Schema, and DSD
- defining language syntax with schemas
Overview:
● Schemas and schema languages - defining the syntax of your own XML
language
● Choosing a schema language - lots of alternatives
DTD - the insufficient schema language defined in the XML 1.0 spec:
42
DSD - the next generation of schema languages:
Selected links:
43
Schemas and schema
languages
A schema is a definition of the syntax of an XML-based language (i.e. a class of
XML documents).
● checks for validity, i.e. that the document conforms to the schema
requirements
44
Why bother formalizing the syntax with a schema?
45
Choosing a schema language
There have been many schema language proposals.
W3C proposals:
● DTD
● XML-Data, January 1998
● DCD (Document Content Description), July 1998
● DDML (Document Definition Markup Language), January 1999
● SOX (Schema for Object-oriented XML), July 1999
● XML Schema
Non-W3C proposals:
Unlike for many other XML technologies, it has proved difficult to reach a
consensus - probably because:
We shall look at W3C's DTD and XML Schema proposals and at the DSD
proposal developed by BRICS and AT&T.
46
DTD - Document Type
Definition
Recall from earlier that XML 1.0 contains a built-in schema language: Document
Type Definition
content models:
■ sequence: (...,...,...)
■ optional: ...?
attribute types:
47
these... (consider them deprecated)
attribute defaults:
48
Example DTD
A DTD for our recipe collections, recipes.dtd:
By inserting:
in the headers of recipe collection documents, we state that they are intended to conform to
recipes.dtd.
Alternatively, the DTD can be given locally with <!DOCTYPE collection [ ... ]>.
49
Problems with DTD
Top 15 reasons for avoiding DTD:
1. not itself using XML syntax (the SGML heritage can be very unintuitive +
if using XML, DTDs could potentially themselves be syntax checked with a
"meta DTD")
2. mixed into the XML 1.0 spec (would be much less confusing if specified
separately + even non-validating processors must look at the DTD)
5. cannot mix character data and regexp content models (and the content
models are generally hard to use for complex requirements)
6. no support for Namespaces (of course, XML 1.0 was defined before
Namespaces)
7. very limited support for modularity and reuse (the entity mechanism is
too low-level)
50
13. only defaults for attributes, not for elements (but that would often be
convenient)
14. cannot specify "any element" or "any attribute" (useful for partial
specifications and during schema development)
51
Design requirements
Quotes from the W3C Note "XML Schema Requirements" (Feb. 1999):
Design principles:
1. be prepared quickly
2. be precise, concise, human-readable, and illustrated with examples
Structural requirements:
52
data types
Unfortunately, their own XML Schema Recommendation does not fulfil all
requirements (self-describing, simple, concise, human-readable, ...)
53
XML Schema
W3C Recommendation, May 2001.
1. Structures
2. Datatypes
Main features:
Yes, it is big and complicated! (Part 1 of the spec alone is around 200 pages...)
54
A small example
Assume we want to create an XML-based language for business cards.
<card xmlns="http://businesscard.org">
<name>John Doe</name>
<title>CEO, Widget Inc.</title>
<email>john.doe@widget.com</email>
<phone>(202) 456-1414</phone>
<logo url="widget.gif"/>
</card>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:b="http://businesscard.org"
targetNamespace="http://businesscard.org">
<complexType name="card_type">
<sequence>
<element ref="b:name"/>
<element ref="b:title"/>
<element ref="b:email"/>
<element ref="b:phone" minOccurs="0"/>
<element ref="b:logo" minOccurs="0"/>
</sequence>
</complexType>
<complexType name="logo_type">
<attribute name="url" type="anyURI"/>
</complexType>
55
</schema>
<card xmlns="http://businesscard.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://businesscard.org
business_card.xsd">
...
</card>
By inserting this, the author claims that the document is intended to be valid with
respect to the schema (not that it necessarily is valid).
56
Overview of XML Schema
The most central top-level constructs:
appear
❍ element references: describe which sub-elements that may or must
If all elements are valid, the whole document is called valid. (Unlike DTD, there is
no way to require a specific root element.)
Naming conflicts: two types or two elements cannot be defined with the same
name, but an element declaration and a type definition may use the same name.
57
Constructing complex types
A complexType can contain:
● attribute declarations:
where type refers to a simple type definition and use is either required,
optional, or prohibited
Example:
58
<complexType name="order_type" mixed="true">
<choice>
<element ref="n:address"/>
<sequence>
<element ref="n:email"/>
<element ref="n:phone"/>
</sequence>
</choice>
</complexType>
Grouping of definitions:
59
Constructing simple types
Simple types can be:
❍ by a restriction:
A lot of often-used simple types (all the primitive and some derived) are
predefined:
● integer
● date
● anyURI
● unsignedLong
● language
● ...
<simpleType name="may_date">
<restriction base="date">
<pattern value="\d{4}-05-\d{2}"/>
</restriction>
</simpleType>
60
Local definitions
Instead of writing all element declarations and type definitions at top-level
(globally), they may be inlined (locally):
Example:
<complexType name="card_type">
<sequence>
<element ref="b:name"/>
<element ref="b:title"/>
<element ref="b:email" maxOccurs="unbounded"/>
<element ref="b:phone" minOccurs="0"/>
<element ref="b:background" minOccurs="0"/>
</sequence>
</complexType>
<element name="card">
<complexType>
<sequence>
<element name="name" type="string"/>
<element ref="b:title"/>
<element ref="b:email" maxOccurs="unbounded"/>
<element ref="b:phone" minOccurs="0"/>
<element ref="b:background" minOccurs="0"/>
</sequence>
</complexType>
</element>
(where the complex type card_type and the description of name have been
inlined)
except that:
61
● inlined type definitions are anonymous, so they cannot be referred to for
reuse
● inlined element declarations can be overloaded, i.e. they need not have
unique names
62
Inheritance and substitution
groups
XML Schema contains an incredibly complicated type system.
As in many programming languages, XML Schema allows (complex) types to be declared as sub-
types of existing types.
● inheritance by extension:
<complexType name="car">
<complexContent>
<extension base="n:vehicle">
<element name="wheel" minOccurs="3" maxOccurs="4"/>
</extension>
</complexContent>
</complexType>
creates a car type from a vehicle type by extending it with 3 or 4 wheel sub-elements
● inheritance by restriction:
<complexType name="small_car">
<complexContent>
<restriction base="n:car">
<element name="wheel" maxOccurs="3"/>
</extension>
</complexContent>
</complexType>
creates a small_car type from the car type by restricting it to 3 wheel sub-elements
Subsumption:
meaning that myVehicle elements are valid if they match the vehicle type.
Since car is a sub-type of vehicle, myVehicle elements are also valid if they match
car - provided that we add xsi:type="n:car" to the elements.
63
Substitution groups: - another (simpler and better) way of achieving basically the same
then we may always use myCar elements whenever myVehicle elements are required
(without using xsi:type).
64
Annotations
Schemas can be annotated with human or machine readable documentation and other
information:
<xsd:element name="author">
<xsd:annotation>
<xsd:documentation xmlns:xhtml="http://www.w3.org/1999/xhtml">
the author of the recipe,
see <xhtml:a href="authors.xml">this list</xhtml:a> of authors
</xsd:documentation>
<xsd:appinfo xmlns:fp="http://foodprocessor.org">
<fp:process type="117"/>
</xsd:appinfo>
</xsd:annotation>
...
</xsd:element>
Note that annotations can be structured, as opposed to simple <!-- ... --> XML
comments.
65
Schema inclusion and redefinition
No less that 3 mechanisms are available:
It ought to also be possible to use XInclude, but that is not mentioned in the XML Schema spec.
Example:
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:b="http://businesscard.org"
targetNamespace="http://businesscard.org">
<redefine schemaLocation="phone.xsd">
<element name="phone"/>
...
</element>
</redefine>
...
</schema>
Here, a schema for XHTML is imported together with phone.xsd (which is assumed to contain a description
of phone numbers) and its description of phone is redefined.
66
Namespaces
When defining a new XML-based language, we usually want to assign it a unique
namespace.
XML Schema
Example:
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:b="http://businesscard.org"
targetNamespace="http://businesscard.org">
<complexType name="card_type">
<sequence>
<element ref="b:name"/>
...
</sequence>
</complexType>
...
</schema>
67
● prefixes in attribute values (e.g. ref="b:name") - the namespace spec
does not tell how to resolve this
● a notion of "unqualified locals" (which is even a default) - allowing prefixes
to be omitted from locally declared elements in instance documents
68
Attribute and element defaults
Side-effect of validation: insertion of default values
● attribute defaults: are inserted (before validation) if the attribute is absent (in
elements of the type containing the declaration)
● element defaults: are inserted as character data in empty elements (of the type
of the declaration)
For some strange design reason, element defaults cannot contain markup.
Example:
<widget/>
into:
69
Identity constraints
XPath can be used to specify uniqueness requirements.
Example:
<unique name="uniqueness-requirement-87">
<selector xpath=".//personlist"/>
<field xpath="person/@ssn"/>
</unique>
Similarly, we can define keys (with key) and references (with keyref) which
generalizes the ID/IDREF mechanism from DTD in a straightforward way.
70
A larger example
A XML Schema description of our recipe collections, recipes.xsd:
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:r="http://recipes.org"
targetNamespace="http://recipes.org"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<element name="collection">
<complexType>
<sequence>
<element name="description" type="r:anycontent"/>
<element ref="r:recipe" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="recipe">
<complexType>
<sequence>
<element name="title" type="string"/>
<element ref="r:ingredient" minOccurs="0" maxOccurs="unbounded"/>
<element ref="r:preparation"/>
<element name="comment" minOccurs="0" type="string"/>
<element name="nutrition">
<complexType>
<attribute name="protein" type="r:nonNegativeDecimal" use="required"/>
<attribute name="carbohydrates" type="r:nonNegativeDecimal"
use="required"/>
<attribute name="fat" type="r:nonNegativeDecimal" use="required"/>
<attribute name="calories" type="r:nonNegativeDecimal" use="required"/>
<attribute name="alcohol" type="r:nonNegativeDecimal" use="optional"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
<element name="preparation">
<complexType>
<sequence>
<element name="step" type="string" minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="ingredient">
<complexType>
<sequence>
71
<element ref="r:ingredient" minOccurs="0" maxOccurs="unbounded"/>
<element ref="r:preparation" minOccurs="0"/>
</sequence>
<attribute name="name" use="required"/>
<attribute name="amount" use="optional">
<simpleType>
<union>
<simpleType>
<restriction base="string">
<enumeration value="*"/>
</restriction>
</simpleType>
<simpleType>
<restriction base="r:nonNegativeDecimal"/>
</simpleType>
</union>
</simpleType>
</attribute>
<attribute name="unit" use="optional"/>
</complexType>
</element>
<simpleType name="nonNegativeDecimal">
<restriction base="decimal">
<minInclusive value="0"/>
</restriction>
</simpleType>
</schema>
Note that:
<collection xmlns="http://recipes.org"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://recipes.org recipes.xsd">
...
</collection>
into our recipe collection recipes.xml, we state that the document is intended to be valid according to
recipes.xsd.
72
Problems with XML Schema
The general problem:
● it is not 100% self-describing (as a trivial example, see the previous point),
even though that was an initial design requirement
73
● defaults cannot be specified separate from the declarations (this makes it
hard to make families of schemas that only differ in the default values)
Technical problems:
names"
❍ in schemas, elements are described by "element definitions" which
Non-minimalistic design:
● the set of built-in data types is not minimalistic (a minimalistic set + some
data type libraries would lower the learning burden)
74
● the use of Perl-style regular expressions violates the principle of using
XML syntax to describe XML syntax
For other comments about the design of XML Schema, see for instance
www.xml.com/pub/a/2000/07/05/specs/lastword.html, www.ibiblio.org/xql/tally.html
, and www.xml.com/pub/a/2002/07/31/wxstypes.html.
75
Document Structure
Description 2.0
- a successor to DSD 1.0, a schema language developed in cooperation by
BRICS and AT&T Labs Research.
DSD 1.0 was announced in November 1999. A draft spec for DSD 2.0 is now
76
available!
77
Example
A DSD 2.0 description of our recipe collections:
<dsd xmlns="http://www.brics.dk/DSD/2.0"
namespace="http://recipes.org">
<if><root/>
<element name="collection"/>
</if>
<if><element name="collection"/>
<declare><contents>
<element name="description"/>
<repeat><element name="recipe"/></repeat>
</contents></declare>
</if>
<if><element name="description"/>
<rule ref="ANYCONTENT"/>
</if>
<if><element name="recipe"/>
<declare><contents>
<sequence>
<element name="title"/>
<repeat><element name="ingredient"/></repeat>
<element name="preparation"/>
<element name="nutrition"/>
</sequence>
<optional><element name="comment"/></optional>
</contents></declare>
</if>
<if><element name="ingredient"/>
<declare>
<attribute name="name"/>
<attribute name="amount">
<union>
<string value="*"/>
<stringtype ref="NUMBER"/>
</union>
</attribute>
<attribute name="unit"/>
</declare>
<require><attribute name="name"/></require>
<if><not><attribute name="amount"/></not>
<require><not><attribute name="unit"/></not></require>
<declare><contents>
<repeat min="1"><element name="ingredient"/></repeat>
<element name="preparation"/>
</contents></declare>
</if>
78
</if>
<if><element name="preparation"/>
<declare><contents>
<repeat><element name="step"/></repeat>
</contents></declare>
</if>
<if>
<or>
<element name="step"/>
<element name="comment"/>
<element name="title"/>
</or>
<declare><contents>
<string/>
</contents></declare>
</if>
<if><element name="nutrition"/>
<declare>
<attribute name="protein"><stringtype ref="NUMBER"/></attribute>
<attribute name="carbohydrates"><stringtype ref="NUMBER"/></attribute>
<attribute name="fat"><stringtype ref="NUMBER"/></attribute>
<attribute name="calories"><stringtype ref="NUMBER"/></attribute>
<attribute name="alcohol"><stringtype ref="NUMBER"/></attribute>
</declare>
<require>
<attribute name="protein"/>
<attribute name="carbohydrates"/>
<attribute name="fat"/>
<attribute name="calories"/>
</require>
</if>
<stringtype id="DIGITS">
<repeat min="1">
<char min="0" max="9"/>
</repeat>
</stringtype>
<stringtype id="NUMBER">
<sequence>
<stringtype ref="DIGITS"/>
<optional>
<sequence>
<string value="."/>
<stringtype ref="DIGITS"/>
</sequence>
</optional>
</sequence>
</stringtype>
<rule id="ANYCONTENT">
<declare><contents>
<repeat><union><element/><string/></union></repeat>
79
</contents></declare>
</rule>
</dsd>
Notice in particular:
This DSD is more precise than the DTD and the XML Schema descriptions:
One can check that this is indeed a DSD by validating it with the meta-DSD.
80
Rules
- a closer look at the central DSD 2.0 construct
Example:
<if><element name="preparation"/>
<declare><contents>
<repeat><element name="step"/></repeat>
</contents></declare>
</if>
(In addition, there are unique and pointer for generalized IDs/IDREFs,
normalize for whitespace and case normalization, and default for default
attributes and contents.)
81
Boolean expressions
Boolean logic for expressing properties of elements:
Example:
<or>
<and>
<attribute name="a"><stringtype ref="b"/></attribute>
<ancestor><element name="c"><attribute name="d"/></element></ancestor>
</and>
<contents><sequence><element name="e"/><element name="f"/></contents>
</or>
means: "either the current element has an a attribute with a b value and also a c ancestor element
with a d attribute, or - if only looking at e and f elements - its contents consist of one e element
followed by one f element."
Boolean expressions are used both as conditions in conditional constraints and as requirements (in
require).
As with the other syntactic categories, boolean expressions can be defined for modularity.
82
Regular expressions
Both attributes and character data are described by regular expressions over
the Unicode alphabet.
Example:
...
<union>
<string value="*"/>
<stringtype ref="NUMBER"/>
</union>
...
<stringtype id="DIGITS">
<repeat min="1">
<char min="0" max="9"/>
</repeat>
</stringtype>
<stringtype id="NUMBER">
<sequence>
<stringtype ref="DIGITS"/>
<optional>
<sequence>
<string value="."/>
<stringtype ref="DIGITS"/>
</sequence>
</optional>
</sequence>
</stringtype>
83
Libraries of common expressions can be made with the import feature described
later...
If more than one regular expression is declared for the contents of an element,
they are implicitly merged in an unordered manner.
84
Inclusion and extension
To enhances reusability, maintainability, and readability, DSD descriptions can
consist of several XML documents.
DSD 2.0 simply relies on XInclude for composing DSD fragments into complete
specifications. (However, full XPointer is not used - only simple URLs that denote
whole documents.)
This, combined with the notion of conditional rules, makes it easy to write
modular specifications, reuse and extend existing schemas, and create
families of related schemas.
85
Links to more information
www.w3.org/TR/xmlschema-0
XML Schema Part 0: Primer (a non-normative introduction)
www.w3.org/TR/xmlschema-1
XML Schema Part 1: Structures
www.w3.org/TR/xmlschema-2
XML Schema Part 2: Datatypes
www.brics.dk/DSD
the DSD 1.0 homepage
www.oasis-open.org/cover/schemas.html
Robin Cover's XML schema information
www.xml.com/pub/1999/12/dtd
XML.com article on schema languages
www.xml.com/pub/a/2000/11/29/schemas/part1.html
XML.com introduction to XML Schema
www.xfront.com/BestPracticesHomepage.html
"best practices" of XML Schema
www.ibiblio.org/xml/books/bible2/chapters/ch24.html
chapter from "XML Bible" on XML Schema
www.xmlhack.com/read.php?item=1097
"W3C XML Schema still has big problems", article on <?xmlhack?>
www.cobase.cs.ucla.edu/tech-docs/dongwon/ucla-200008.html
"Comparative Analysis of Six XML Schema Languages"
www.redrice.com/schemavalid/faq/xml-schema.html
XML Schema FAQ
xml.apache.org
Apache's Xerces parser and validator
86
XLink, XPointer, and XPath
- linking and addressing
Overview:
XLink:
XPath:
87
XPointer, Part II - how XPointer uses XPath:
● Context initialization - filling out the gap between XPath and XLink
● Extra XPointer features - generalizing XPath
Selected links:
● Tools
● Links to more information
88
XLink, XPointer, and XPath
- imagine a Web without links...
Three layers:
● XLink
❍ a generalization of the HTML link concept
❍ higher abstraction level (intended for general XML - not just
hypertext)
❍ more expressive power (multiple destinations, special behaviors,
linkbases, ...)
❍ uses XPointer to locate resources
● XPointer
❍ an extension of XPath suited for linking
● XPath
❍ a declarative language for locating nodes and fragments in XML
trees
❍ used in both XPointer (for addressing), XSL (for pattern matching),
XML Schema (for uniqueness and scope descriptions), and
XQuery (for selection and iteration)
These technologies are standardized but not all widely implemented yet.
89
Problems with HTML links
The HTML link model:
Construction of a hyperlink:
● Link recognition:
● Limitations:
90
The XLink linking model
Basic XLink terminology:
One linking element defines a set of traversable arcs between some resources.
Third-party links can be used to construct shared link bases for browsers.
91
An example
A linking element defining a third-party "extended" link involving two remote resources:
❍ host language: elements and attributes not belonging to this namespace are ignored by
XLink processors
❍ all XLink information is defined in attributes (in host language elements)
92
Linking elements
- defining links
xlink:label attributes
❍ an arc element has an xlink:from and an xlink:to attribute
❍ the "arc" element defines a set of arcs: from each resource having
XPointer is described later - just think of XPointer expression as URIs for now...
93
Behavior
- link semantics
Arcs can be annotated with abstract behavior information using the following
attributes:
Note: these notions of link behavior are rather abstract and do not make sense for
all applications.
94
Semantic attributes: describe the meaning of link resources and arcs
xlink:title
provide human readable descriptions (also available as
xlink:type="title" to allow markup)
xlink:role and xlink:arcrole
URI references to descriptions
95
Simple vs. Extended links
- for compatibility and simplicity
is equivalent to:
<mylink xlink:type="extended">
<myresource xlink:type="resource"
xlink:role="local"/>
<myresource xlink:type="locator"
xlink:role="remote" xlink:href="..."/>
<myarc xlink:type="arc"
xlink:from="local" xlink:to="remote" xlink:show="..." .../>
</mylink>
Many XLink properties (e.g. xlink:type and xlink:show) can conveniently be specified as
defaults in the schema definition!
96
XPointer: Why, what, and how?
● an extension of XPath which is used by XLink to locate remote link resources
● relative addressing: allows links to places with no anchors
● flexible and robust: XPointer/XPath expressions often survive changes in the target
document
● can point to substrings in character data and to whole tree fragments
Example of an XPointer:
URI
-----------------------------------------------------------------
/ \
http://www.foo.org/bar.xml#xpointer(article/section[position()<=5])
| \ /|
| ---------------------------- |
\ XPointer expression /
\ /
-----------------------------------
XPointer fragment identifier
(points to the first five section elements in the article root element.)
In HTML, fragment identifiers may denote anchor IDs - XPointer generalizes that.
97
XPointer vs. XPath
XPointer is based upon XPath:
98
XPointer fragment identifiers
An XPointer fragment identifier (the substring to the right of # in the URI) is either
● the value of some ID attribute in the document (ID attributes are specified
by the schema),
Recently, the XPointer spec has been split into four (tiny) parts:
● framework
● xmlns() scheme
● element() scheme
● xpointer() scheme
Next: We will now look into XPath and then later describe what additional
features XPointer adds to XPath...
99
XPath: Location paths
XPath is a declarative language for:
The central construct is the location path, which is a sequence of location steps separated
by /, e.g.:
❍ each node resulting from evaluation of one step is used as context for
● a context node
● a context position and size (two integers)
● variable bindings, a function library, and a set of namespace declarations
Note: in the XPath data model, the XML document tree has a special root node above the
root element.
There is a strong analogy to directory paths (in UNIX). As an example, the directory path
/*/d/*.txt selects a set of files, and the location path /*/d/*[@ext="txt"] select a
set of XML elements.
100
Location steps
A single location step has the form
● The axis selects a rough set of candidate nodes (e.g. the child nodes of the context
node).
● The predicates (zero or more) cause a further, potentially more complex, filtration.
Only candidates for which the predicates evaluate to true are kept.
This structure of location steps makes implementation rather easy and efficient, since the
complex predicates are only evaluated on relatively few nodes.
101
Axes
Available axes:
Note that attributes and namespace declarations are considered a special kind
of nodes here.
102
Some of these axes assume a document ordering of the tree nodes. The
ordering is the left-to-right preorder traversal of the document tree - which is the
same as the order in the textual representation.
The resulting sets are ordered intuitively, either forward (in document order) or
reverse (reverse document order).
For instance, following is a forward axis, and ancestor is a reverse axis.
103
Node tests
Testing by node type:
Warning: There is a bug in the XPath 1.0 spec! Default namespaces are
required to be handled incorrectly, so, if using Namespaces together with XPath
(or XSLT), all elements must have an explicit prefix.
104
Predicates
- expressions coerced to type boolean
A predicate filters a node-set by evaluating the predicate expression on each node in the set with
Example:
child::section[position()<6] / descendant::cite[attribute::href="there"]
selects all cite elements with href="there" attributes in the first 5 sections of an article
document.
105
Expressions
Available types:
Coercion may occur at function arguments and when expressions are used as
predicates.
106
Core function library
Node-set functions:
last() returns the context size
position() returns the context position
count(node-set) number of nodes in node-set
name(node-set) string representation of first node in node-set
... ...
String functions:
Boolean functions:
Number functions:
107
Abbreviations
Syntactic sugar: convenient notation for common situations
Example:
.//@href
selects all href attributes in descendants of the context node.
foo[3]
refers to the third foo child element of the context node (because 3 is coerced to
position()=3).
108
XPath visualization
Using Explorer 6 or an updated version of Explorer 5 it is easy to experiment with
XPath expressions.
This tool is implemented as an ordinary HTML page that makes heavy use of
XSLT and JavaScript.
109
XPath examples
The following XPath expressions point to sets of nodes in the recipe collection:
//ingredient[@name="flour"]/@amount
4
0.5
3
0.25
//ingredient[@name="stock"]/preparation/step[position()=2]/text()
When the liquid is relatively clear, add the carrots, celery, whole onion,
bay leaf, parsley, peppercorns and salt. Reduce the heat, cover and let
simmer at least 2 hours to make a hearty stock.
110
XPath 2.0
- currently a Working Draft, developed to capture the common subset of XSLT 2.0
and XQuery 1.0
● now using XML Schema primitive types instead of the four in 1.0
❍ new type operators: cast, treat, assert, instance of
111
XPointer: Context
initialization
An XPointer is basically an XPath expression occuring in a URI.
112
Extra XPointer features
XPointer provides a more fine-grained addressing than XPath.
Example:
/descendant::text()/point()[position()=0]
selects the locations right before the first character of all character data nodes in
the document.
Example:
/section[1] / range-to(/section[3])
selects everything from the beginning of the first section to the end of the third.
113
Tools
Kinds of tools supporting XLink/XPointer:
● browsers
● parsers
● link bases
but XLink is still not widely implemented (and not even supported by XHTML 2.0).
www.labs.fujitsu.com/free/HyBrick/en
the HyBrick browser
www.stepuk.com/products/prod_X2X.asp
the X2X link base
114
Links to more information
www.w3.org/TR/xlink
W3C's XLink Recommendation
www.w3.org/TR/xptr
W3C's XPointer Candidate Recommendation
www.w3.org/TR/xptr-framework/
W3C's XPointer Framework (Working Draft)
www.w3.org/TR/xptr-xmlns/
W3C's XPointer xmlns() Scheme (Working Draft)
www.w3.org/TR/xptr-element/
W3C's XPointer element() Scheme (Working Draft)
www.w3.org/TR/xptr-xpointer/
W3C's XPointer xpointer() Scheme (Working Draft)
www.w3.org/TR/xpath
W3C's XPath 1.0 Recommendation
www.w3.org/TR/xpath20/
W3C's XPath 2.0 (Working Draft)
www.stg.brown.edu/~sjd/xlinkintro.html
a brief introduction to XML linking
www.ibiblio.org/xml/books/bible2/chapters/ch19.html
a chapter from "The XML Bible" on XLink
www.ibiblio.org/xml/books/bible2/chapters/ch20.html
a chapter from "The XML Bible" on XPointer (and XPath)
115
XSL and XSLT
- stylesheets and document transformation
❍ Recursive processing
❍ Conditional processing
❍ Sorting
❍ Numbering
❍ Keys
116
XSLT - XSL Transformations
XSL (eXtensible Stylesheet Language) consists of two parts:
● a stylesheet separates contents and logical structure from presentation (as with CSS)
● an XSLT stylesheet is an XML document defining a transformation from one class of XML documents
into another
● XSLT is not intended as a completely general-purpose XML transformation language - it is designed for
XSL Formatting Objects as transformation target language - nevertheless: XSLT is generally useful
● XSL-FO is an XML language for specifying formatting in a more low-level and detailed way than possible
with HTML+CSS
An XSLT stylesheet:
● is declarative and uses pattern matching and templates to specify the transformation
● is vastly more expressive than a CSS stylesheet
● may perform arbitrary computations (it is Turing complete!)
Tools:
● XSLT transformation can be done either on the client (e.g. Explorer 5), or on the server (e.g. Apache
Xalan) - either as pre-processing or on-the-fly
● in the future, Web browsers only need to understand XSLT and XSL-FO (rendering HTML/XHTML can
be done using a standard stylesheet)
● today, the target language is typically XHTML which is understood by current browsers
● XSLT is widely implemented - XSL-FO is not yet...
117
Processing model
An XSLT stylesheet consists of a number of template rules:
118
Structure of a stylesheet
An XSLT stylesheet is itself an XML document:
Newer browsers contain an XSLT processor. (Older versions of Explorer 5 require an update.)
119
A tiny example
The following XSLT stylesheet transforms XML business cards into XHTML:
<xsl:template match="name">
<xsl:value-of select="text()"/>
</xsl:template>
<xsl:template match="title">
<xsl:value-of select="text()"/>
</xsl:template>
<xsl:template match="email">
<xsl:value-of select="text()"/>
</xsl:template>
<xsl:template match="phone">
<xsl:value-of select="text()"/>
</xsl:template>
</xsl:stylesheet>
120
<card>
<name>John Doe</name>
<title>CEO, Widget Inc.</title>
<email>john.doe@widget.com</email>
<phone>(202) 555-1414</phone>
<logo url="widget.gif"/>
</card>
looks like:
John Doe
CEO, Widget Inc.
john.doe@widget.com
Phone: (202) 555-1414
121
A CSS example
The following CSS stylesheet also makes business cards visible in the browser:
<card>
<name>John Doe</name>
<title>CEO, Widget Inc.</title>
<email>john.doe@widget.com</email>
<phone>(202) 555-1414</phone>
<logo url="widget.gif"/>
</card>
looks like:
John Doe
CEO, Widget Inc.
john.doe@widget.com
(202) 555-1414
The CSS2 language has some XML extensions, but is not supported by existing browsers.
122
Patterns
Patterns are simple XPath expressions evaluating to node-sets.
the node is member of the result of evaluating the pattern with respect to
some context.
match="section/subsection | appendix//subsection"
123
Templates
There are many different kinds of template constructs:
124
Literal result fragments
A literal result fragment is:
● <xsl:text ...> ... </...> (as raw text, but with white-space and
character escaping control)
Since literal fragments are part of the stylesheet XML document, only well-formed
XML will be generated.
Example:
<xsl:template match="...">
this text is written directly to output
</xsl:template>
125
Recursive processing
Recursive processing instructions:
● <xsl:call-template name="..."/>
invoke template by name (where xsl:template has name="..."
attribute)
● <xsl:copy-of select="..."/>
copy selected nodes to output
Example:
<xsl:template match="article">
<h1><xsl:apply-templates select="title"/></h1>
</xsl:template>
126
Computed result fragments
Result fragments can be computed using XPath expressions:
● <xsl:value-of select="..."/>
construct character data or attribute value (expression converted to string)
Example:
<xsl:template match="section">
<xsl:element name="sec{@level}">
<xsl:attribute name="kind">
<xsl:value-of select="kind"/>
</xsl:attribute>
</xsl:element>
</xsl:template>
127
Conditional processing
Processing can be conditional:
● <xsl:choose>
<xsl:when test="expression"> ... </...>
...
<xsl:otherwise> ... </...>
</...>
test conditions in turn, apply template for the first that is true
Example:
<xsl:template match="nutrition">
<xsl:if test="@alcohol">
<td align="right"><xsl:value-of select="@alcohol"/>%</td>
</xsl:if>
</xsl:template>
128
Sorting
Sorting chooses an order for xsl:apply-templates and xsl:for-each
(default: document order):
● order="ascending/descending"
● lang="..."
● data-type="text/number"
● case-order="upper-first/lower-first"
Example:
<xsl:template match="personlist">
<xsl:apply-templates select="person">
<xsl:sort select="name/family"/>
<xsl:sort select="name/given"/>
</xsl:apply-templates>
</xsl:template>
This template rule processes a list of persons, sorted with family name as primary
key and given name as secondary key.
129
Numbering
- for automatic numbering of sections, item lists, footnotes, etc.
level="..." any/single/multiple
count="..." select what to count
from="..." select where to start counting
lang="..."
letter-value="..."
grouping-separator="..."
grouping-size="..."/>
siblings
(example use: numbering ordered list items)
❍ level="multiple": generates whole list of numbers
Example:
<xsl:template match="footnote">
(<xsl:number level="any" count="footnote" from="chapter" format="1"/>)
</xsl:template>
130
Variables and parameters
- for reusing results of computations and parameterizing templates and whole
stylesheets
Declaration:
Use:
● $name
returns XPath value in expressions, e.g. attribute value templates
● xsl:with-param
passes parameters in xsl:call-template and xsl:apply-
templates
Example:
131
<xsl:template match="foo">
<first>
<xsl:value-of select="$X"/>
<xsl:copy-of select="$Y"/>
</first>
<second>
<xsl:value-of select="$X"/>
<xsl:copy-of select="$Y"/>
</second>
</xsl:template>
132
Keys
- advanced node IDs for automatic construction of links
A key is a triple (node, name, value) associating a name-value pair to a tree node.
Example:
<xsl:template match="section">
<h1>
<a name="{generate-id()}">
<xsl:number format="1. "/>
<xsl:apply-templates select="title"/>
</a>
</h1>
<xsl:apply-templates select="body"/>
</xsl:template>
<xsl:template match="ref[@section]">
<a href="#{generate-id(key('mykeys',@section))}">
<xsl:for-each select="key('mykeys',@section)">
Section <xsl:number level="single" count="section" format="1"/>
</xsl:for-each>
133
</a>
</xsl:template>
134
Other issues
Things not covered here:
135
XSL Formatting Objects
● XSL-FO provides exact and detailed layout control
● it resembles e.g. LaTeX, but is XML based
● recall that HTML/XHTML has different goals: the exact look is decided by the
browser - not by the author
A small example:
<?xml version="1.0"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="my-page">
<fo:region-body margin="1in"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-name="my-page">
<fo:flow flow-name="xsl-region-body">
<fo:block font-family="Times" font-size="14pt">
<fo:inline font-weight="bold">Hello</fo:inline>, world!
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
XSL-FO is not supported by existing browsers, but can be tried out using FOP that
translates into PDF.
136
Examples
The following XSLT stylesheet produces an XHTML version of the recipe XML example and illustrates
many XSLT features:
<xsl:template match="collection">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title><xsl:apply-templates select="description"/></title>
<link href="../style.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<table border="1">
<xsl:apply-templates select="recipe"/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="description">
<xsl:value-of select="text()"/>
</xsl:template>
<xsl:template match="recipe">
<tr>
<td>
<h1>
<xsl:apply-templates select="title"/>
</h1>
<ul>
<xsl:apply-templates select="ingredient"/>
</ul>
<xsl:apply-templates select="preparation"/>
<xsl:apply-templates select="comment"/>
<xsl:apply-templates select="nutrition"/>
</td>
</tr>
</xsl:template>
<xsl:template match="ingredient">
<xsl:choose>
<xsl:when test="@amount">
<li>
<xsl:if test="@amount!='*'">
<xsl:value-of select="@amount"/>
<xsl:text> </xsl:text>
<xsl:if test="@unit">
<xsl:value-of select="@unit"/>
<xsl:if test="number(@amount)>number(1)">
<xsl:text>s</xsl:text>
</xsl:if>
<xsl:text> of </xsl:text>
137
</xsl:if>
<xsl:text> </xsl:text>
</xsl:if>
<xsl:value-of select="@name"/>
</li>
</xsl:when>
<xsl:otherwise>
<li><xsl:value-of select="@name"/></li>
<ul>
<xsl:apply-templates select="ingredient"/>
</ul>
<xsl:apply-templates select="preparation"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="preparation">
<ol><xsl:apply-templates select="step"/></ol>
</xsl:template>
<xsl:template match="step">
<li><xsl:value-of select="text()|node()"/></li>
</xsl:template>
<xsl:template match="comment">
<ul>
<li type="square"><xsl:value-of select="text()|node()"/></li>
</ul>
</xsl:template>
<xsl:template match="nutrition">
<table border="2">
<tr>
<th>Calories</th><th>Fat</th><th>Carbohydrates</th><th>Protein</th>
<xsl:if test="@alcohol">
<th>Alcohol</th>
</xsl:if>
</tr>
<tr>
<td align="right"><xsl:value-of select="@calories"/></td>
<td align="right"><xsl:value-of select="@fat"/>%</td>
<td align="right"><xsl:value-of select="@carbohydrates"/>%</td>
<td align="right"><xsl:value-of select="@protein"/>%</td>
<xsl:if test="@alcohol">
<td align="right"><xsl:value-of select="@alcohol"/>%</td>
</xsl:if>
</tr>
</table>
</xsl:template>
</xsl:stylesheet>
138
Different views
The following XSLT stylesheet:
<xsl:template match="recipe">
<dish name="{title/text()}"
calories="{nutrition/@calories}"
fat="{nutrition/@fat}"
carbohydrates="{nutrition/@carbohydrates}"
protein="{nutrition/@protein}"
alcohol="{number(concat(0,nutrition/@alcohol))}"/>
</xsl:template>
</xsl:stylesheet>
<nutrition>
<dish alcohol="0"
protein="32"
carbohydrates="45"
fat="23" calories="1167"
name="Beef Parmesan with Garlic Angel Hair Pasta"/>
<dish alcohol="0"
protein="18"
carbohydrates="64"
fat="18"
calories="349"
name="Ricotta Pie"/>
<dish alcohol="0"
protein="29"
carbohydrates="59"
fat="12"
calories="532"
name="Linguine Pescadoro"/>
<dish alcohol="2"
protein="4"
carbohydrates="45"
fat="49"
calories="612"
name="Zuppa Inglese"/>
<dish alcohol="0"
protein="39"
carbohydrates="28"
fat="33"
calories="8892"
name="Cailles en Sarcophages"/>
</nutrition>
139
which validates according to the DSD2 schema:
<dsd root="nutrition">
<if><element name="nutrition"/>
<allow><element name="dish"/></allow>
</if>
<if><element name="dish"/>
<allow>
<attribute name="name"/>
<attribute name="calories"/>
<attribute name="carbohydrates"/>
<attribute name="protein"/>
<attribute name="alcohol"/>
</allow>
<require><attribute name="name"/></require>
</if>
</dsd>
<xsl:template match="nutrition">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link href="../style.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<table border="1">
<tr>
<th>Dish</th>
<th>Calories</th>
<th>Fat</th>
<th>Carbohydrates</th>
<th>Protein</th>
</tr>
<xsl:apply-templates select="dish"/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="dish">
<tr>
<td><xsl:value-of select="@name"/></td>
<td><xsl:value-of select="@calories"/></td>
<td><xsl:value-of select="@fat"/>%</td>
<td><xsl:value-of select="@carbohydrates"/>%</td>
<td><xsl:value-of select="@protein"/>%</td>
</tr>
</xsl:template>
140
</xsl:stylesheet>
141
XSLT 2.0
- currently a Working Draft
● using XPath 2.0 (which implies: using sequences and XML Schema
primitive types)
● the notion of "result tree fragments" is eliminated (now using sequences)
● multiple output documents (result-document)
● functions
● XML Base support
● XHTML output method
142
Links to more information
www.w3.org/Style/XSL/
W3C's XSL homepage, contains lots of links
www.w3.org/TR/xslt
the XSLT 1.0 specification
www.w3.org/TR/xslt20/
working draft for XSLT 2.0
www.w3.org/TR/xsl
the XSL 1.0 (defines the Formatting Objects XML language)
www.mulberrytech.com/xsl/xsl-list/
XSL-List - mailing list
www.ibiblio.org/xml/books/bible2/chapters/ch17.html
a chapter from "The XML Bible" on XSL Transformations
www.ibiblio.org/xml/books/bible2/chapters/ch18.html
a chapter from "The XML Bible" on XSL Formatting Objects
nwalsh.com/docs/tutorials/xsl/
an XSL tutorial by Paul Grosso and Norman Walsh
www.dpawson.co.uk/xsl/sect2/nono.html
"Things XSLT can't do", collected by Dave Pawson
www.alphaworks.ibm.com/tech/LotusXSL
LotusXSL, a Java XSLT implementation from IBM alphaWorks
saxon.sourceforge.net
SAXON, another Java implementation
www.jclark.com/xml/xt.html
XT, an early Java implementation by the editor of the XSLT spec
xml.apache.org/fop
an XSL Formatting Objects to PDF converter
143
XQuery
- information extraction and transformation
❍ Element constructors
❍ FLWR expressions
❍ List expressions
❍ Conditional expressions
❍ Quantified expressions
❍ Datatype expressions
144
Queries on XML documents
XML documents generalize relational data in a very straightforward manner:
Here, we see:
relations (tables)
tuples (records)
attributes (entries)
The database community has been looking for a richer data model than relations.
Hierarchical, object-oriented, or multi-dimensional databases have emerged, but neither
has reached consensus.
145
Usage scenarios
XML querying is relevant for:
● human-readable documents
to retrieve individual documents, to provide dynamic indexes, to perform
context-sensitive searching, and to generate new documents
● data-oriented documents
to query (virtual) XML representations of databases, to transform data into
new XML representations, and to integrate data from multiple
heterogeneous data sources
● mixed-model documents
to perform queries on documents with embedded data, such as catalogs,
patient health records, employment records, or business analysis
documents
146
Query language
requirements
The W3C Query Working Group has identified many technical requirements:
147
The XQuery language
The query language developed by W3C is called XQuery and is currently at the
level of a Working Draft.
● XML-QL
● YATL
● Lorel
● Quilt
Only a prototype implementation is yet supported, and many details about the
language may still change.
148
XQuery concepts
A query in XQuery is an expression that:
● path expressions
● element constructors
● FLWR ("flower") expressions
● list expressions
● conditional expressions
● quantified expressions
● datatype expressions
● namespaces
● variables
● functions
● date and time
● context item (current node or atomic value)
● context position (in the sequence being processed)
● context size (of the sequence being processed)
149
Path expressions
The simplest kind of query is just an XPath expression.
● the result is all figures with caption Tree Frogs in the second chapter of the document
zoo.xml
● the result is given as a list of XML fragments, each rooted with a caption element
● the order of the fragments respects the document order (order matters! - as opposed to SQL)
The initial context for the path expression is given by document("zoo.xml") (similarly to
XPointer).
An XQuery specific extension of XPath allows location steps to follow a new IDREF axis:
document("zoo.xml")//chapter[title = "Frogs"]//figref/@refid=>fig/caption
● the result is all captions in figures referenced in the chapter with title Frogs
● the => operator follows an IDREF attribute to its unique destination
150
Element constructors
An XQuery expression may construct new XML elements:
<employee empid="12345">
<name>John Doe</name>
<job>XML specialist</job>
</employee>
<employee empid={$id}>
<name>{$name}</name>
{$job}
</employee>
Here the variables $id, $name, and $job must be bound to appropriate
fragments.
151
FLWR expressions
The main engine of XQuery is the FLWR expression:
● FOR-LET-WHERE-RETURN
● pronounced "flower"
● generalizes SELECT-FROM-HAVING-WHERE from SQL
FOR $p IN document("bib.xml")//publisher
LET $b := document("bib.xml)//book[publisher = $p]
WHERE count($b) > 100
RETURN $p
The combined result is in this case and ordered list of publishers that publish
more than 100 books.
FOR $p IN distinct(document("bib.xml")//publisher)
LET $b := document("bib.xml)//book[publisher = $p]
WHERE count($b) > 100
RETURN $p
FOR $x in /library/book
152
LET $x := /library/book
FOR $p IN document("www.irs.gov/taxpayers.xml")//person
FOR $n IN document("neighbors.xml")//neighbor[ssn = $p/ssn]
RETURN
<person>
<ssn> { $p/ssn } </ssn>
{ $n/name }
<income> { $p/income } </income>
</person>
153
List expressions
XQuery expressions manipulate lists of values, for which many operators are
supported.
For example, the avg(...) function computes the average of a list of integers.
The following query lists each publisher and the average price of their books:
FOR $p IN distinct(document("bib.xml")//publisher)
LET $a := avg(document("bib.xml")//book[publisher = $p]/price)
RETURN
<publisher>
<name>{ $p/text() }</name>
<avgprice>{ $a }</avgprice>
</publisher>
Lists can be sorted, as in the following where books costing more than 100$ are listed
in sorted order:
154
Conditional expressions
XQuery supports a general IF-THEN-ELSE construction.
FOR $h IN document("library.xml")//holding
RETURN
<holding>
{ $h/title,
IF ($h/@type = "Journal")
THEN $h/editor
ELSE $h/author
}
</holding>
extracts from the holdings of a library the titles and either editors or authors.
155
Quantified expressions
XQuery allows quantified expressions, which decide properties for all elements in
a list:
● SOME-IN-SATISFIES
● EVERY-IN-SATISFIES
The following example finds the titles of all books which mention both sailing and
windsurfing in the same paragraph:
FOR $b IN document("bib.xml")//book
WHERE SOME $p IN $b//paragraph SATISFIES
(contains($p,"sailing") AND contains($p,"windsurfing"))
RETURN $b/title
The next example finds the titles of all books which mention sailing in every
paragraph:
FOR $b IN document("bib.xml")//book
WHERE EVERY $p IN $b//paragraph SATISFIES
contains($p,"sailing")
RETURN $b/title
156
Datatype expressions
XQuery supports all datatypes from XML Schema, both primitive and complex
types.
157
Other issues
Things not covered here:
● the XQuery language definition has 146 outstanding issues - stay tuned for
changes
158
Examples
The following XQuery expressions extract information from the recipe collection:
FOR $t IN document("recipes.xml")//title
RETURN $t
<?xml version="1.0"?>
<xql:result xmlns:xql="http://metalab.unc.edu/xql/">
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
<title>Ricotta Pie</title>
<title>Linguine Pescadoro</title>
<title>Zuppa Inglese</title>
<title>Cailles en Sarcophages</title>
</xql:result>
<floury>
{ FOR $r IN document("recipes.xml")//recipe[.//ingredient[@name="flour"]]
RETURN <dish>{$r/title/text()}</dish>
}
</floury>
<?xml version="1.0"?>
<floury xmlns:xql="http://metalab.unc.edu/xql/">
<dish>Ricotta Pie</dish>
<dish>Zuppa Inglese</dish>
<dish>Cailles en Sarcophages</dish>
</floury>
FOR $i IN distinct(document("recipes.xml")//ingredient/@name)
RETURN <ingredient name={$i}>
{ FOR $r IN document("recipes.xml")//recipe
WHERE $r//ingredient[@name=$i]
RETURN $r/title
}
</ingredient>
159
<?xml version="1.0"?>
<xql:result xmlns:xql="http://metalab.unc.edu/xql/">
<ingredient name="beef cube steak">
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
</ingredient>
<ingredient name="onion, sliced into thin rings">
<title>Beef Parmesan with Garlic Angel Hair Pasta</title>
</ingredient>
...
</xql:result>
160
Links to more information
www.w3.org/TR/xquery
XQuery 1.0 Working Draft
www.w3.org/TR/xmlquery-req
W3C XML Query Requirements
www.w3.org/TR/xmlquery-use-cases
XML Query Use Cases
www.w3.org/TR/query-semantics
XQuery 1.0 Formal Semantics
www.softwareag.com/developer/quip
XQuery prototype implementation
161
DOM, SAX, and JDOM
- XML support in programming languages
162
XML and programming
XSLT, XPath and XQuery provide tools for specialized tasks.
DOM and SAX are corresponding APIs that are language independent and
supported by numerous languages.
163
The DOM API
DOM is the official W3C proposal.
It views an XML tree as a data structure, similar to the DOM from Javascript.
164
A simple DOM example
The following Java program uses DOM to read the recipe collection and cut it down
to the first recipe:
import java.io.*;
import org.apache.xerces.parsers.DOMParser;
import org.w3c.dom.*;
165
len = children.getLength();
for (int i=0; i<len; i++)
print(children.item(i), out);
out.print("</" + node.getNodeName() + ">");
break;
case Node.ENTITY_REFERENCE_NODE:
out.print("&" + node.getNodeName() + ";");
break;
case Node.CDATA_SECTION_NODE:
out.print("<![CDATA[" + node.getNodeValue() + "]]>");
break;
case Node.TEXT_NODE:
out.print(escapeXML(node.getNodeValue()));
break;
case Node.PROCESSING_INSTRUCTION_NODE:
out.print("<?" + node.getNodeName());
String data = node.getNodeValue();
if (data!=null && data.length()>0)
out.print(" " + data);
out.println("?>");
break;
}
}
Note that:
166
● we need to make our own print method
● when using DOM in Java, one actually uses the Java language binding
167
The SAX API
SAX (Simple API for XML) started as a grassroots movement, but has gained an
official standing.
Scanning the XML file from start to end, each event invokes a corresponding
callback method that the programmer writes.
An XML tree can be built in response, but it is not required to construct a data
structure.
This is sometimes much more efficient, if the document can be piped through the
application.
168
A simple SAX example
The following Java programs reads the recipe collection and outputs the total amount of flour being used (assuming
the unit is always cup):
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.SAXParser;
float amount = 0;
7.75
169
SAX events
The following Java program traces all SAX events generated by parsing the recipe collection:
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.SAXParser;
int indent;
void printIndent() {
for (int i=0; i<indent; i++) System.out.print("-");
}
170
public void characters(char[] ch, int start, int length){
printIndent();
System.out.println("character data, length " + length);
}
start document
processing instruction: dsd
starting element: collection
-character data, length 3
-starting element: description
--character data, length 47
-end element: description
-character data, length 3
-starting element: recipe
--character data, length 5
...
-end element: recipe
-character data, length 1
end element: collection
end document
171
The JDOM API
DOM is too complicated to suit many programmers.
Since it is a general API, it does not use special Java features - for example,
existing collection classes are ignored
JDOM is a small (124K) library, since it is used on top of either DOM or SAX.
- a full XML parser is complex, dealing with encodings, namespaces, and entities
172
A simple JDOM example
The following Java program uses JDOM to read the recipe collection and cut it down to the first recipe:
import java.io.*;
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
173
The JDOM packages
JDOM contains five Java packages:
174
The JDOM tree model
JDOM has a class for every kind of XML tree node (in the general sense):
● Document
● Element
● Attribute
● Namespace
● Text
● CDATA
● Comment
● DocType
● EntityRef
● ProcessingInstruction
Each node has a parent pointer. There are no sibling pointers but several
methods for accessing the child nodes.
175
JDOM input and output
Parsing XML documents into JDOM:
176
JAXP
JAXP is the official API for XML processing from Sun.
It supports DOM, SAX, and XSLT (which may be run inside Java applications) -
and also a number of other Java/XML technologies.
177
A Business Card editor
A typical Java application is a domain-specific XML editor:
<cards>
<card>
<name>John Doe</name>
<title>CEO, Widget Inc.</title>
<email>john.doe@widget.com</email>
<phone>(202) 456-1414</phone>
<logo url="widget.gif" />
</card>
<card>
<name>Michael Schwartzbach</name>
<title>Associate Professor</title>
<email>mis@brics.dk</email>
<phone>+45 8610 8790</phone>
<logo url="http://www.brics.dk/~mis/portrait.gif" />
</card>
<card>
<name>Anders Møller</name>
<title>Research Assistant Professor</title>
<email>amoeller@brics.dk</email>
<phone>+45 8942 3475</phone>
<logo url="http://www.brics.dk/~amoeller/am.jpg"/>
</card>
</cards>
class Card {
public String name, title, email, phone, logo;
public Card(String name, String title, String email, String phone, String logo) {
this.name = name;
this.title = title;
this.email = email;
this.phone = phone;
this.logo = logo;
}
}
178
Vector doc2vector(Document d) {
Vector v = new Vector();
Iterator i = d.getRootElement().getChildren().iterator();
while (i.hasNext()) {
Element e = (Element)i.next();
String phone = e.getChildText("phone");
if (phone==null) phone="";
Element logo = e.getChild("logo");
String url;
if (logo==null) url = ""; else url = logo.getAttributeValue("url");
Card c = new Card(e.getChildText("name"), // exploit schema,
e.getChildText("title"), // assume validity
e.getChildText("email"),
phone,
url);
v.add(c);
}
return v;
}
Document vector2doc() {
Element cards = new Element("cards");
for (int i=0; i<cardvector.size(); i++) {
Card c = (Card)cardvector.elementAt(i);
if (c!=null) {
Element card = new Element("card");
Element name = new Element("name");
name.addContent(c.name);
card.addContent(name);
Element title = new Element("title");
title.addContent(c.title);
card.addContent(title);
Element email = new Element("email");
email.addContent(c.email);
card.addContent(email);
if (!c.phone.equals("")) {
Element phone = new Element("phone");
phone.addContent(c.phone);
card.addContent(phone);
}
if (!c.logo.equals("")) {
Element logo = new Element("logo");
logo.setAttribute("url",c.logo);
card.addContent(logo);
}
cards.addContent(card);
}
}
return new Document(cards);
}
179
Compile with: javac -classpath xerces.jar:jdom.jar BCedit.java
● XML documents are parsed via JDOM into domain-specific data structures
● if the input is known to validate according to some schema, then many runtime errors can be assumed
never to occur
● how do we ensure that the output of vector2doc is valid according to the schema? (well-formedness is for
free) - that's a current research challenge!
180
Problems with JDOM
JDOM is not (yet) perfect:
181
Links to more information
www.w3.org/DOM/
DOM homepage
sax.sourceforge.net
SAX project website
java.sun.com/xml/
SUN's Java/XML page
java.sun.com/xml/jaxp/
JAXP, Sun's Java APIs for XML Processing
xmlsoft.org
libxml, Gnome project's XML C library
xml.apache.org
Apache's XML page
xml.apache.org/xerces2-j/
Apache's XML parser
www.brics.dk/~amoeller/WWW/
the tutorial Interactive Web Services with Java
182
Background: W3C
- a look at the organization behind most of the XML-related specifications
183
W3C - The World Wide Web
Consortium (www.w3.org)
- the de facto leader in defining Web standards
Consists of more than 500 companies and organizations, led by Tim Berners-Lee,
creator of the World Wide Web.
"To lead the World Wide Web to its full potential by developing common
protocols that promote its evolution and ensure its interoperability."
184
Organization
W3C's organizational structure:
185
Activities
Activities carried out by groups:
● query
● schema
● linking
● core
● coordination
Other (current or former) W3C activities: HTML, HTTP, PNG, Amaya, ...
Organization of events:
186
Policies
● consensus - reach "substantial agreement"
● dissemination - limit intellectual property rights, ensure availability
+ the unofficial:
● better too soon than too late - otherwise someone else will take over
● greatest common denominator - every interested member is allowed one
favourite feature in each spec
187
Technical Reports
- the central activity of W3C
Recommendation track:
188