Xmlbasics
Xmlbasics
BASICS
Shashi Banzal
This publication, portions of it, or any accompanying software may not be reproduced in any way,
stored in a retrieval system of any type, or transmitted by any means, media, electronic d isplay or
mechanical display, including, but not limited to, photocopy, recording, Internet postings, or scanning,
without prior permission in writing from the publisher.
The publisher recognizes and respects all marks used by companies, manufacturers, and developers
as a means to distinguish their products. All brand names and product names mentioned in this book
are trademarks or service marks of their respective companies. Any omission or misuse (of any kind) of
service marks or trademarks, etc. is not an attempt to infringe on the property of others.
Our titles are available for adoption, license, or bulk purchase by institutions, corporations, etc. For
additional information, please contact the Customer Service Dept. at 800-232-0223(toll free).
All of our titles are available in digital format at www.academiccourseware.com and other digital vendors.
The sole obligation of Mercury Learning and Information to the purchaser is to replace the book,
based on defective materials or faulty workmanship, but not based on the operation or functionality of
the product.
Instance Section 38
Elements38
Character Data 40
CDATA41
Comment42
Processing Instruction 42
Entities43
General Entities 44
Parameter Entities 44
Entity References 45
Attributes46
Entities’ References and Constants 47
Unparsed Data 48
Character Data (CDATA) 49
Processing Instructions (PIS) 49
Questions for Discussion 50
Chapter 3: Document Type Definition (DTD) 53
Physical Structure in XML 53
Parsed and Unparsed Entities 53
Predefined Entities 54
Internal and External Entity 54
XML General Syntax 55
Attributes55
Valid Documents 56
Well-Formed Documents 56
Well-Formed XML Documents 57
XML Documents 58
The XML Declaration 58
Processing Instructions 59
Comments59
Document Type Declaration 59
XML Application Classification 60
Parsers60
XML Processing-Attribute Values 61
XML Processing 62
Event-Driven Parsers 62
Tree-Based Parsers 62
XML Parser 62
Parse an XML Document 63
Parse an XML String 63
Document Type Definitions (DTDS) 64
Example DTD 64
DTD <!DOCTYPE> 65
DOCTYPE Syntax 65
XML Syntax Rules 67
DTDs (Well-Formed vs. Valid) 68
General Principles in Writing DTDs 68
Document Validation 68
Validating an XML Document with a DTD 69
The Purpose of DTDs 70
Creating DTDs 71
Code Sample: DTDs/Demos/Beatles.DTD 71
Internal DTD 71
Example Internal DTD 72
External DTD 72
Example External DTD 72
Combined DTD 73
DTD Elements 74
Basic Syntax 74
Plain Text 74
Unrestricted Elements 75
Empty Elements 75
Child Elements 76
Other Elements 76
Choice of Elements 77
Empty Elements 77
Mixed Content 77
Multiple Child Elements (Sequences) 78
An XML Application without a DTD 78
DTD Element Operators 79
DTD Operators with Sequences 81
Subsequences81
The Document Element 82
Location of Modifier 82
Using Parentheses for Complex Declarations 83
XML CDATA 83
PCDATA-Parsed Character Data 83
CDATA-(Unparsed) Character Data 83
Notes on CDATA Sections 84
Internal & External Subsets 84
Standalone Attribute 85
DOCTYPE Declaration 86
Internal DTD Subset Declarations 86
External DTDs 86
Basic Markup Declarations 88
Formal DTD Structure-Entities 88
Predefined Entities 89
General Entities 89
Parameter Entities 90
Formal DTD Structure-Elements 91
Content Model 91
Cardinality Operators 92
Attributes93
Default Values 94
Attribute Types 95
CDATA95
ID96
IDREF96
Entity97
Entity, Entities 97
NMTOKEN, NMTOKENS98
Notation98
Enumerations99
Declaring Attributes 100
Conditional Sections 100
Limitations of DTDs 101
Designing XML Documents 101
XML for Messages 102
XML for Persistent Data 102
Mapping the Information Model to XML 103
A Document Type Declaration 105
Elements105
Empty Elements 106
Attributes106
CDATA107
White Space 107
Special Characters 108
Questions for Discussion 108
Chapter 4: Namespaces 111
Namespaces111
Purpose of Namespaces 112
Declaring a Namespace 112
Scope114
Qualified114
XML Namespace 115
Example Namespace 116
XML Local Namespace 116
Example Local Namespace 117
1
UNDERSTANDING XML
MARKUP LANGUAGES
The term Markup is a concatenation of the words “mark up.” This refers to
the traditional way of marking up a document in the print and design worlds.
Markup is used to modify the look and formatting of text or to establish
the structure and meaning of the document for output to some medium, such
as the printer or the World Wide Web. Markup consists of codes, or tags, that
are added to text to change the look or meaning of the tagged text. The tagged
text for a document is usually called the source code for that document. Most
word processors use some sort of markup languages to produce formatted
text. There are two types of Markup languages: Specific Markup Languages
and Generalized Markup Languages.
be formatted, but conveys nothing about the kind of text data included in the
document.
When using specific markup languages, the authors are limited to a par-
ticular set of tags. If a set of tags does not meet a need, authors must find an
alternative way to meet those needs. A document might not be portable to
other applications, as the data is not self-describing. It cannot be used for
any other purpose than that for which it was originally intended. The lan-
guage probably has a proprietary way of marking up text that is not compat-
ible with other markup languages. This can create confusion and additional
work for authors who must use several languages to accommodate different
applications.
SGML - A METALANGUAGE
SGML has added provisions for identifying the characters to be used in a
document. This makes it easier to ensure that a processor can understand
everything in a document by allowing a document to specify the character set
that it uses.
SGML provides a way to identify objects that will be used throughout a
document. These objects, called entities, are convenient to use when a text
fragment or any other data appears in several places in a document. If an entity
is declared in one place of the document, any changes to that declaration will
be reflected in all occurrences of the entity throughout the document.
SGML – Example
<!DOCTYPE CARS PUBLIC "//EXT/DTD CATALOG//EN">
<CAR>
<COLOR> Red
<PRICE> $20,000
</CAR>
INTRODUCTION TO XML
XML (eXtensible Markup Language) was invented for the purpose of hav-
ing a standard and powerful way of describing any kind of data. XML offers
a widely adopted standard way of representing text and data in a format that
can be processed without much human or machine intelligence. Information
formatted in XML can be exchanged across platforms, languages, and applica-
tions, and can be used with a wide range of development tools and utilities.
XML is a meta-language; that is, it is a language in which other languages
are created. In XML, data is “marked up” with tags similar to HTML tags. In
fact, the latest version of HTML, called XHTML, is an XML-based language,
which means that XHTML follows the syntax rules of XML.
XML is used to store data or information. This data might be intended to
be by read by people or by machines. It can be highly structured data, such as
data typically stored in databases or spreadsheets, or loosely structured data,
such as data stored in letters or manuals.
XML is all about preserving useful information—information that com-
puters can use to be more intelligent about what they do with our data. The
best part of XML is that it liberates information from the shackles of a fixed-
tag set.
EXTENSIBLE
XML is extensible. It lets you define your own tags, the order in which they
occur, and how they should be processed or displayed. Another way to think
about extensibility is to consider that XML allows us to extend our notion of
what a document is: it can be a file that lives on a file server, or it can be a
transient piece of data that flows between two computer systems (as in the
case of Web Services).
MARKUP
The most recognizable feature of XML is its tags, or elements (to be more
accurate). In fact, the elements you’ll create in XML will be very similar to the
elements you’ve already been creating in your HTML documents. However,
XML allows you to define your own set of tags.
LANGUAGE
XML is a language that’s very similar to HTML. It’s much more flexible than
HTML because it allows you to create your own custom tags. However, it’s
important to realize that XML is not just a language. XML is a meta-language:
a language that allows us to create or define other languages. For example,
with XML we can create other languages, such as RSS, MathML (a math-
ematical markup language), and even tools like XSLT.
HISTORY OF XML
In 1970, IBM introduced SGML (Standard Generalized Markup Language).
SGML was developed out of the General Markup Language (GML), which was
developed by IBM in the late 1960s. SGML is a semantic and structural language
for text documents, but it is very complicated. HTML is a subset of SGML.
In 1996, XML Working Group was formed under W3C. The World Wide
Web Consortium (W3C) is an international consortium where Member orga-
nizations, a full-time staff, and the public work together to develop Web stan-
dards. W3C was created by Tim Berners-Lee in 1994 who also invented the
World Wide Web in 1989. In 1998, W3C introduced XML 1.0.
XML (Extensible Markup Language) is a dialect of SGML. XML is not a
programming language. Rather, it is a set of rules that allows you to represent
data in a structured manner. Since the rules are standard, the XML docu-
ments can be automatically generated and processed.
XML was designed to describe data and is a cross-platform, software- and
hardware-independent tool for transmitting or exchanging information. It is
an open-standards-based technology which is both human and machine read-
able. XML is best suited for use in documents that are similar. In future Web
development, it is most likely that XML will be used to describe the data,
while HTML will be used to format and display the same data. The XML spec-
ification includes the syntax and grammar of XML documents as well as DTD.
Website creation is a fast-growing sector. In the early days, Website
design consisted primarily of creating fancy graphics and nice-looking, easy-
to-read Web pages.
As today’s Websites are interactive, the steps in Website design have
changed. Although creating a pleasant-looking Website is still important, the
primary focus has shifted from graphical design to programmatic design.
Consider a company wanting to sell its product on the Web. In such cases,
the Webpages will collect and store a user’s billing information. This calls for
storing and manipulating such data in a database. This is where XML comes
into the picture.
XML is the solution for the problems that arise when using database
Webpages.
language that has been an ISO standard since 1986. SGML is an early attempt
to combine the metadata (data about the data) with the data and it was used
primarily in large document management systems. Because SGML is a very
complex language, it has limited mass appeal.
HTML is the most recognized application of SGML and it allows any
Web browser or application which understands HTML to display information
in a consistent form. A HTML document is effective when it comes to laying
out and displaying data, but it is a fixed set of tags, and it does not have the
flexibility to describe different document and data types. HTML, in conjunc-
tion with Cascading Style Sheets (CSS), is reasonably good at displaying data,
but it is not as good as XML at transporting data that is meant to be viewed or
parsed in dozens of different ways by a variety of devices. In essence, where
HTML is a presentation language, we require a richer communication means
that can help with exchanging information from one computer to another.
The need to extract data and put a structure around information led to the
creation of XML. Since it was released in 1997, XML use has been growing
rapidly. There are two major fundamental differences between HTML and
XML:
●● Separation of form and content—HTML mostly consists of tags defining
the appearance of text; in XML, the tags generally define the structure
and content of the data, with the actual appearance specified by a specific
application or associated stylesheet.
●● XML is extensible—tags can be defined by individuals or organisations for
some specific application, whereas the HTML standard tagset is defined
by the World Wide Web Consortium (W3C).
XML is not intended as a replacement for HTML and both are com-
plementary technologies. XML is a more general and better solution to the
problem of sharing data on the Web than extending HTML.
XML STRUCTURE
One of XML’s best features is its ability to provide structure to a document.
Every XML document includes both a logical and a physical structure. The
logical structure is like a template that details the elements to be included in
a document and the order in which they have to be included. The physical
structure contains the actual data used in a document.
LOGICAL STRUCTURE
Logical Structure refers to the organization of the different parts of a docu-
ment. It indicates how a document is built, as opposed to what a document
contains. The first structural element in an XML document is an optional pro-
log element. The prolog is the base for the logical structure of an XML docu-
ment. The prolog consists of two basic components, the XML Declaration and
the Document Type Declaration. These two components are also optional.
XML DECLARATION
The XML Declaration identifies the version of the XML specification to
which the document conforms. Although the XML declaration is an optional
element, we should always include it in the XML document.
The code snippet here gives an example of basic XML declaration. Here,
the line of code must use only lowercase letters.
<?xml version="1.0"?>
XML SYNTAX
The first thing that you’ll need to do is open up your text editor of choice. At
this point, your document is going to look something like this (if you’re using
XML version 1.0):
<?xml version="1.0"?>
Once you’ve typed your directive, it’s time to start adding some content to
the page. Information on an XML page is handled in a very precise and struc-
tured format, using tags to define your data. White space can be included in
the document to make it more easily readable, though you should be careful
not to use that white space inside of your tags, as it can create problems when
being read by a browser.
Let’s say that you’ve decided to create a new XML document to tell the
world about your two favorite cats. You want to use the tag <cats>. Your doc-
ument now looks a little something like this:
<?xml version="1.0"?>
<cats>Tooter and Shade are the best cats in the world!</cats>
Note the white space in between the directive and the first tags. You could
also have put both of the tags on their own line, with the content of the tags
between them, as long as you don’t add additional white space within the tags.
Of course, the <cats> tags don’t do anything. If you load this page into a
Web browser, you’ll end up with more or less a copy of the file contents dis-
played on the screen with the tags in some pretty colors. You’ll have to define
the tags, which can be done in 1 of 4 ways:
●● Using Cascading Style Sheets (CSS)
●● Using the eXtensible Style Language (XSL) Style Sheets
●● Using a Data Island plus Script
●● Using a Data Object Model plus Script or Client-Side Program
All of this might sound complicated, but it’s really not. It does involve cre-
ating and referencing other pages, though for now we’re still working on just
the basic structure of XML. Save the document (in Text-Only mode) under
the name cats.xml (making sure to use the .xml extension).
Unfortunately, your file is still missing a few vital elements. The <cats>
tags don’t work, and the browser has no idea how to make them work. If
you load it up in a browser, you’ll just see a copy of the file, with the various
elements in different colors. This is actually useful, however; as long as you
see this, then your code is good. The browser doesn’t know what else to do
with it, in this case because some of the elements are missing, but the lack of
definitive error codes tells you that it’s at least well-coded.
Go into the file, between your directive and the content, and get ready to
add another vital element to your page. Type the following:
<?xml-stylesheet type="text/css" href="cats.css"?>
Of course, this doesn’t mean much to you right now. In time, though, it’s
going to be a vital part of your page. What you just typed is the directions that
the browser needs to find the XML processor, or the file that tells it how it
should handle the information in the XML document. The line that you just
typed tells the browser to find the file called cats.css, and that the file is a Cas-
cading Style Sheet. It also tells it that it’s the stylesheet that it needs for this
page. Now your cats.xml file should look like the following, which looks a lot
more like an XML file.
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="cats.css"?>
<cats>Tooter and Shade are the best cats in the world!</cats>
<td>Author</td><td>John Milton</td>
</tr></table>
<!-Book snippet in XML —>
<BooksForSale>
<Title>Paradise Lost</Title>
<Author>John Milton</Author>
</BooksForSale>
XML BENEFITS
Initially, XML received a lot of excitement, but that has now died down some.
This isn’t because XML is not as useful, but rather because it doesn’t provide
the “Wow! factor” that other technologies, such as HTML, do. When you
write an HTML document, you see a nicely formatted page in a browser—
instant gratification. When you write an XML document, you see an XML
document—not so exciting. However, with a little more effort, you can make
that XML document sing.
XML is Everywhere
XML is now as important for the Web as HTML was to the foundation of the
Web. XML is the most common tool for data transmissions between all sorts
of applications. XML is used in many aspects of Web development, often to
simplify data storage and sharing.
XML DISADVANTAGES
XML is useful for developing future Web applications, and it almost defines
the future of Web development. However, XML also has some drawbacks.
One of the biggest drawbacks of XML is that it lacks adequate applications
for processing.
While layering the specifications, it is against the rules to disable or ban cer-
tain document types, which is allowed in SOAP.
If your job is to implement a Web application which is based on XML, you
may need to configure the parser not to perform the DTD-based validations,
and also not to try and resolve the external entities. This could be an answer to
some problems, so taking precautionary measures is worthwhile. Publishing
documents on the Web requires the same precautions; the document types
should not be included.
A document may not be valid in the way XML describes it to be, and some
people even believe that document validation in XML is overrated. Docu-
ment data types are not very powerful when it comes to validation and it has
been forgotten that the document has its own language and grammar which
are not efficient for getting validated. There is also the problem of other pro-
grams not trusting the XML DTD. The doctype in HTML is much different
from the doctype in XML. You may not be able to use the doctype in XML
as an indicator, which helps programs understand what type of document it
is dealing with.
If there is an application which exists that can handle multiple vocabular-
ies of XML, and also knows to dispatch the respective documents to the con-
cerned handlers by checking the namespace at the root of the element, then
you can consider yourself lucky. If the vocabularies are not mentioned in the
namespace, then you can look for them in the mime type. In some cases,
the vocabularies are not present in the name space, nor are they specific to
the mime. Such language is certainly a bad example and will create problems
because you will have to use the root element name.
XML specifications define three kinds of file processing. The first one
is DTD based validations which do not perform or retrieve external entities.
The second one is the DTD based validation, which does not perform or
retrieve external entities so that the information set and the reference library
can be expanded. The third one is to perform the DTD-based validation by
retrieving the external entities so that the information set and the entity ref-
erence can be expanded.
The point of having many profiles is so that the application has a choice
and it chooses the right one. Character entities are considered unsafe for Web
applications. It is a disadvantage because there will be a problem with the
input and its editor. On the World Wide Web, there may be other options
available when there is such a problem. The situation need not be so unfor-
tunate because there may be a solution which exists, and there is an input
method which can solve the problem with the editor. If the XHTML entities
were pre-defined, then there wouldn’t be many problems.
Simplicity
Information coded in XML is easy to read and understand, and it can be pro-
cessed easily by computers.
Self-Describing
Unlike records in traditional database systems, XML data does not require
relational schemata, file description tables, or external data type definitions
because the data itself contains this information. XML also guarantees the
total usability of data, which is imperative for business applications whose
tasks extend beyond the mere presentation of content.
XML documents use a self-describing and simple syntax:
<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Susan</to>
<from>Sullivan</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The first line is the XML declaration. It defines the XML version (1.0)
and the encoding used (ISO-8859-1 = Latin-1/West European character set).
The next line describes the root element of the document (like saying:
“this document is a note”):
<note>
The next 4 lines describe 4 child elements of the root (to, from, heading,
and body):
<to>Susan</to>
<from>Sullivan</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
And finally the last line defines the end of the root element:
</note>
APPLICATION INDEPENDENCE
Using XML, data is no longer dependent on a specific application for creation,
viewing or editing. In this sense, XML is to data what Java is to applications. Java
allows programs to run anywhere—XML allows data to be used by any application.
INTERNATIONALIZATION
Internationalization is important for electronic worldwide business applica-
tions. XML supports multilingual documents and the Unicode standard.
FUTURE-ORIENTED
XML is the endorsed industry standard of the World Wide Web Consortium
(W3C) and is supported by all leading software providers. Furthermore, XML
is also the standard today in an increasing number of other industries, such as
health care.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
The terms parent, child, and sibling are used to describe the relationships
between elements. Parent elements have children. Children on the same level
are called siblings (brothers or sisters). All elements can have text content and
attributes (just like in HTML).
<book category="WEB">
<title lang="en">Learning XML</title>
<author>G.Ram</author>
<year>2009</year>
<price>13.95</price>
</book>
</bookstore>
In XML, it is illegal to omit the closing tag. All elements must have a
closing tag:
<p>This is a paragraph</p>
<p>This is another paragraph</p>
You might have noticed from the previous example that the XML decla-
ration did not have a closing tag. This is not an error. The declaration is not a
part of the XML document itself, and it has no closing tag.
Opening and closing tags are often referred to as Start and end tags. Use
whatever terms you prefer.
In the example above, “properly nested” simply means that since the <i>
element is opened inside the <b> element, it must be closed inside the <b>
element.
The error in the first document is that the date attribute in the note ele-
ment is not quoted.
XML IS FREE
XML doesn’t cost anything to use. It can be written with a simple text edi-
tor or one of the many freely available XML authoring tools, such as XML
Notepad. In addition, many Web development tools, such as Dream-weaver
and Visual Studio .NET, have built-in XML support. There are also many free
XML parsers, such as Microsoft’s MSXML (downloadable from microsoft.
com) and Xerces (downloadable at apache.org).
XML TECHNOLOGY
The structured data is contained in an XML document, a text file with .xml
as the extension. You can use CSS as in HTML to provide style sheets for
XML data display. For more advanced features, power, and flexibility for the
presentations, you could use XSL (XML Style sheet Language) to build the
style sheets.
To enforce the structural constraints and rules on the data contained in an
XML document, you could code a DTD (Document Type Definition). Due
to certain limitations that were inherent in DTDs, the W3C came up with a
specification to serve the same purpose as DTDs—the schemas. The schemas
are contained in a .xsd file, and DTDs in a .dtd file. XML schema is an XML-
based alternative to DTD.
USES
XML is widely used for the following purposes.
●● Storing configuration information—typically data in an application which
is not stored in a database. Most server software has configuration files in
XML formats.
●● XML documents can also be used as a mini data store. This data can be
used to present it on a variety of targets including browsers, and print
media.
●● Transmitting data between applications—overcomes problems in client
server applications which are cross-platform in nature. Ex: A Windows
program talking to a mainframe, Little and Big Endian problems, and
data type size variations across platforms.
When XML data is transferred across different systems, the data contained
in an XML document can be read using a software entity called a parser. Most
of the popular databases (Oracle, MS SQL Server, Sybase, and DB2) provide
their own mechanisms to store and retrieve data as XML. Some of them also
provide parsers to work with the XML documents programmatically. XML is
a key technology when it comes to Web Services. .NET uses XML extensively.
It is used as a data format for everything—configuration files, metadata, RPC,
and object serialization.
Content Management
Almost all of the leading content management systems use XML in one way or
another. A typical use would be to store a company’s marketing content in one
or more XML documents. These XML documents could then be transformed
for output on the Web as Word documents, as PowerPoint slides, in plain text,
or audio format. The content can also easily be shared with partners who can
then output the content in their own formats. Storing the content in XML
makes it much easier to manage content for two reasons.
Content changes, additions, and deletions are made in a central location
and the changes will cascade out to all formats of presentation. There is no
need to be concerned about keeping the Word documents in sync with the
Website, because the content itself is managed in one place and then trans-
formed for each output medium.
Formatting changes are made in a central location. To illustrate, suppose
a company had many marketing Web pages, all of which were produced from
XML content being transformed to HTML. The format for all of these pages
could be controlled from a single XSLT and a sitewide formatting change
could be made modifying that XSLT.
WEB Services
XML Web services are small applications or pieces of applications that are
made accessible on the Internet using open standards based on XML. Web
services generally consist of three components:
●● SOAP—an XML-based protocol used to transfer Web services over the
Internet.
●● WSDL (Web Services Description Language)—an XML-based lan-
guage for describing a Web service and how to call it.
●● Universal Discovery Description and Integration (UDDI)—The
yellow pages of Web services. UDDI directory entries are XML docu-
ments that describe the Web services a group offers. This is how people
find available Web services.
It might be a bit easier to list what you can’t use it for! In addition to
making simple Webpages about your cats, you can use XML to create more
complex applications such as online databases, custom-built pages, and more.
By combining XML with Style Sheets and dynamic elements, you can even
create a storefront for online shopping! The possibilities are nearly endless.
RDF/RSS Feeds
RDF (Resource Description Framework) is a framework for writing XML-
based languages to describe information on the Web (e.g., Web pages). RSS
(RDF Site Summary) is an implementation of this framework; it is a language
that adheres to RDF and is used to describe Web content. Website publish-
ers can use RSS to make content available as a “feed,” so that Web users can
access some of their content without actually visiting their site. Often, RSS is
used to provide summaries with links to the company’s Website for additional
information.
Limitations
Not surprisingly, there are limits to what you can do with style sheets.
Languages for style sheets are optimized for different purposes. You need to
be aware of how a style sheet language works to use it most effectively.
CSS, for example, is designed to be compact and efficient. Documents
have to be rendered quickly because people don’t want to wait a long time for
something to read. The style sheet processor is on the client end, and doesn’t
have a lot of computing power at its disposal. So the algorithm for applying
styles needs to be very simple. Each rule that matches an element can only
apply a set of styles. There is no other processing allowed, no looking back-
ward or forward in the document for extra information. You have only one
pass through the document to get it right.
Sometimes, information is stored in an order other than the way you want
it to be rendered. If that is the case, then you need something more powerful
than CSS. XSLT works on a tree representation of the document. It provides
the luxury of looking ahead or behind to pull together all the data you need to
generate output. This freedom comes at the price of increased computational
requirements. Although some browsers support client-side XSLT processing
(e.g., Internet Explorer), it’s more likely you’ll want transformations to be
done on the server side, where you have more control and can cache the
results.
Property sets are finite, so no matter how many features are built into a
style sheet language, there will always be something lacking, some effect you
want to achieve but can’t. When that happens, you should be open to other
options, such as post-processing with custom software.
Unquestionably, implementation among clients has been the biggest
obstacle. The pace of standards development was much faster than actual
implementation. Browsers either didn’t support them or had buggy and
incomplete implementations. This is quite frustrating for designers who want
Syntax
Below is a sample CSS style sheet:
/* A simple example */
addressbook {
display-type: block;
font-family: sans-serif;
font-size: 12pt;
background-color: white;
color: blue;
}
entry {
display-type: block;
border: thin solid black;
padding: 5em;
margin: 5em;
}
name, phone, email, address {
display-type: block;
margin-top: 2em;
margin-bottom: 2em;
}
This style sheet has three rules. The first matches any addressbook ele-
ment. The name to the left of the open bracket is a selector, which tells the
processor what element this rule matches. The items inside the brackets are
the property declaration, a list of properties to apply.
CSS also has a syntax for comments. Anything inside a comment is ignored
by the processor. The start delimiter is /∗ and the end delimiter is ∗/. A com-
ment can span multiple lines and may be used to enclose CSS rules to remove
them from consideration:
/* this part will be ignored
gurble { color: red }
burgle { color: blue; font-size: 12pt; }
*/
White space is generally ignored and provides a nice way to make style sheets
more readable. The exception is when spaces act as delimiters in lists. Some
properties take multiple arguments separated with spaces like border below:
sidebar {
border: thin solid black
}
Qualitatively, this rule is like saying, “for every addressbook element, dis-
play it like a block, set the font family to any sans serif typeface with size
12 point, set the background color to white, and make the foreground (text)
blue.” Whenever the CSS processor encounters an addressbook element, it
will set apply these properties to the current formatting context.
To understand how it works, think of painting-by-numbers. In front of
you is a canvas with outlines of shapes and numbers inside the shapes. Each
number corresponds to a paint color. You go to each shape, find the paint that
corresponds to the number inside it, and fill it in with that color. In an hour
or so, you’ll have a lovely stylized pastoral scene with a barn and wildflowers.
In this analogy, the rule is a paint can with a numbered label. The color is the
property and the number is the selector.
The selector can be more complex than just one element name. It can be
a comma-separated list of elements. It could be qualified with an attribute, as
in this example, which matches a foo element with class=“flubber”:
foo.flubber { color: green; }
The first rule matches a p with attribute class=“big.” The second matches
any p regardless of attributes, and the last matches any element at all. Sup-
pose the next element to process is a p with the attribute class=“big.” All three
rules match this element.
How does CSS decide which properties to apply? The solution to this
dilemma has two parts. The first is that all rules that match are used. It’s as
if the property declarations for all the applicable rules were merged into one
set. That means all of these properties potentially apply to the element:
font-size: 18pt;
font-family: garamond, serif;
font-size: 12pt;
color: black;
font-size: 10pt;
The second part is that redundant property settings are resolved according
to an algorithm. As you can see, there are three different font-size p roperty
settings. Only one of the settings can be used, so the CSS processor has to
weed out the worst two using a property clash resolution system. As a rule of
thumb, you can assume that the property from the rule with the most specific
selector will win out. The first font-size property originates from the rule with
selector p.big, which is more descriptive than p or ∗, so it’s the winner.
In the final analysis, these three properties will apply:
font-size: 18pt;
font-family: garamond, serif;
color: black;
PROPERTY INHERITANCE
XML documents have a hierarchy of elements. CSS uses that hierarchy to pass
along properties in a process called inheritance. Going back to our DocBook
example, a sect1 contains a para. Consider the following style sheet:
sect1 {
margin-left: 25pt;
margin-right: 25pt;
font-size: 18pt;
color: navy;
}
para {
margin-top: 10pt;
margin-bottom: 10pt;
font-size: 12pt;
}
COMBINING STYLESHEETS
A very powerful feature of CSS is its ability to combine multiple style sheets by
importing one into another. This lets you borrow predefined style definitions
so you don’t have to continuously reinvent the wheel. Any style settings that
you want to redefine or don’t need can be overridden in the local style sheet.
One reason to combine style sheets is modularity. It may be more man-
ageable to break up a large style sheet into several smaller files. For example,
we could store all the styles pertaining to math equations in math.css and all
the styles for regular text in text.css. The command @import links the current
style sheet to another and causes the style settings in the target to be imported:
@import url(http://www.example.org/mystyles/math.css);
@import url(http://www.example.org/mystyles/text.css);
Some of the imported style rules may not suit your taste, or they may not
fit the presentation. You can override those rules by redefining them in your
own style sheet. Here, we’ve decided that the rule for h1 elements defined in
text.css needs to be changed:
@import url(http://www.example.org/mystyles/text.css);
h1: { font-size: 3em; } /* redefinition */
11. Can you walk us through the steps necessary to parse XML
documents?
12. What is the difference between XML and C or C++ or Java?
13. Does XML replace HTML?
14. Is it necessary to know HTML or SGML before learning XML?
15. What does an XML document actually look like (inside)?
16. How can XML data be displayed using HTML?
17. How you define the tree structure in XML?
18. What are the disadvantage of XML?
19. What are the advantages of XML?
20. Why is XML referred to as having self-describing data?
21. How we define an empty XML element?
22. How we use xml.onload?
23. Where data is stored in XML?
24. How we can say that XML is extensible?
25. How do you create an XML document? Explain it with an example.
2
XML SYNTAX
In this case, the <NAME> and </NAME> tags comprise the markup and
“Selena Sol” comprises the character data. As you can imagine, there are few
rules that manage your data (content) other than what type of data is allowed
(binary or ASCII, for example). On the other hand, there are many rules that
define how you must code your markup.
To begin an XML document, it is a good idea to include the XML decla-
ration as the very first line of the document. I say “good idea” because, though
the XML declaration is optional, it is suggested by the W3C specification.
Essentially, the XML declaration is a processing instruction that notifies
the processing agent that the following document has been marked up as an
XML document. It will look something like the following:
<?xml version = "1.0"?>
We’ll talk more about the details of processing instructions later, but we
can at least explain how the XML declaration works.
All processing instructions, including the XML declaration, begin with
<? and end with ?>. Following the initial <?, you will find the name of the
processing instruction, which in this case is “xml.”
The XML processing instruction, requires that you specify a “version”
attribute and allows you to specify optional “standalone” and “encoding”
attributes.
In its full regalia, the XML declaration might look like the following:
<?xml version = "1.0" standalone = "yes" encoding = "UTF-8"?>
PROLOG SECTION
The document prolog must be the first thing in an XML document—it is the
introduction to the document. Here is a sample prolog of an XML document:
<?xml version="1.0"?>
<!DOCTYPE book SYSTEM "DTD/book.dtd">
The specification states that both parts of the prolog are optional. The
first part is called the XML declaration and the second part the Document
Type Definition. A Document Type Definition (DTD) sets all the rules for
the document regarding elements, attributes, and other components. This
DTD may be either an external DTD or Internal DTD.
●● Internal DTD—An internal DTD document is contained completely
within the XML document.
●● External DTD—An external DTD document is a separate document, ref-
erenced from within the XML document.
The example prolog above refers to an external DTD that can be found
in the local system path “DTD/book.dtd.” Any time you use a relative or abso-
lute file path or a URL, you must use the SYSTEM keyword. The other option
is using the PUBLIC keyword, and follow it with a public identifier. This
means that the W3C or another consortium has defined a standard DTD that
is associated with that public identifier. For example,
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
XML defines the text between the start and end tags to be “character
data” and the text within the tags to be “markup.”
INSTANCE SECTION
The instance contains the remaining parts of the XML document, including
the actual contents of the document, such as characters, paragraphs, pages,
and graphics.
ELEMENTS
Elements are the most important part of an XML document. An element con-
sists of content enclosed in an opening tag and a closing tag. An element can
contain several different types of content:
●● Element Content—Contains only other elements. Example: the <name>
element in <name><firstname>Tom</firstname><lastname>Smith</
lastname></name>
●● Mixed Content—Contains both text and other elements. Example: the
<para> element in <para>This point is a <emphasis>very important</
emphasis> point.</para>
XML documents must be well-formed. First, this means that you must
follow the rules regarding case-sensitivity and always include closing tags.
Additionally, you cannot mix the order of the nested tags: the first opened ele-
ment must always be the last closed element. If any of the rules for XML syn-
tax are not followed in an XML document, the document is not well-formed.
The following is an example of an XML fragment that is not well-formed:
<tag1>
<tag2>
</tag1>
</tag2>
characteristics (called attributes). The containers hold the content (or data) of
the document. The start and end tags define the boundaries of the container.
CHARACTER DATA
Character data may be any legal (Unicode) character with the exception of
“<.” The “<” character is reserved for the start of a tag. XML also provides
useful entity references that you can use so as not to create any doubt whether
you are specifying character data versus markup. Table 2.1 shows the entity
references in XML.
Table 2.1 Character Data and its Entity References
Character Entity Reference
> >
< <
& &
” "
’ '
Note that all values are not typed. That is, they are considered strings.
Thus, if you were to process the tag
you would have to convert “10” and “13” to their numeric values outside of
the XML environment.
CDATA
It is a pretty good rule of thumb to consider anything outside of tags to be
character data and anything inside of tags to be considered markup. But alas,
in one case this is not true. In the special case of CDATA blocks, all tags and
entity references are ignored by an XML processor that treats them just like
character data.
CDATA blocks serve as a convenience measure when you want to include
large blocks of special characters a character data, but you do not want to have
to use entity references all the time. What if you wanted to write about an
XML document in XML? Consider the following example in which you would
have an example tag in your XML Guide written in XML:
<EXAMPLE>
<DOCUMENT>
<NAME>Coleen Merriman</NAME>
<EMAIL>cm@mydomain.com</EMAIL>
</DOCUMENT>
</EXAMPLE>
As you can see, you would be forced to use entity references for all the
tags.
To avoid the inconvenience of translating all special characters, you can
use a CDATA block to specify that all character data should be considered
character data whether or not it “looks” like a tag or entity reference.
Consider the following example:
<EXAMPLE>
<![CDATA[
<DOCUMENT>
<NAME>Coleen Merriman</NAME>
<EMAIL>cm@mydomain.com</EMAIL>
</DOCUMENT>
]]>
</EXAMPLE>
COMMENT
Not only will you sometimes want to include tags in your XML document
that you want the XML processor will ignore (display as character data), but
sometimes you will want to put character data in your document that you want
the XML processor to ignore (not display at all). This type of text is called
comment text.
You will be familiar with comments from HTML. In HTML, you speci-
fied comments using the <!-- and --> syntax. In XML, comments are done in
just the same way. So the following would be a valid XML comment:
<!-- Begin the Names -->
<NAME>Jim Nelson</NAME>
<NAME>Jim Sanger</NAME>
<NAME>Les Moore</NAME>
<!-- End the names -->
PROCESSING INSTRUCTION
We have already seen a processing instruction. The XML declaration is a pro-
cessing instruction. And if you recall, when we introduced the XML decla-
ration, we promised to return to the concept of processing instructions to
explain them as a category.
A processing instruction is a bit of information meant for the application
using the XML document. That is, they are not really of interest to the XML
parser. Instead, the instructions are passed intact straight to the application
using the parser.
As you might imagine, you cannot use any combination of “xml” as the
NAME_OF_APPLICATION_INSTRUCTION_IS_FOR since “xml” is
reserved. However, you might have something like
<?JAVA_OBJECT JAR_FILE = "/java/myjar.jar"?>
ENTITIES
To a large, degree much of the discussion of entities is more relevant in the
next section, writing “valid” documents, rather than in this section, writing
“well-formed” documents.
Entities are essentially aliases that allow you to refer to large sections of
text without having to type them out every time you want to use them.
Suppose you have your letterhead saved as an entity in a shared file. Then,
every time you write a letter in XML, you might say something like
<LETTER>
&letterhead;
<TO>Bobby Rosy</TO>
<BODY>
blah blah blah
</BODY>
<FROM>Shashi Banzal</FROM>
</LETTER>
However, instead of typing that out in every letter, you just use &
letterhead;
There are two types of entities, general and parameter entities and each
entity has two parts, the declaration and the entity reference.
GENERAL ENTITIES
General entities look something like
<!ENTITY NAME "text that you want to be represented by the entity">
PARAMETER ENTITIES
Parameter entities, that can also be either internal or external, are only used
within the DTD that we will discus in the next section so we will defer a serious
discussion until then. However, we will mention that a well-formed parameter
entity will look the same as a general entity except that it will include the “%”
specifier. Consider the following example:
<!ENTITY % NAME "text that you want to be represented by the entity">
Thus, you might have something like the following (Consider how much
easier changing office addresses is when you use entities!):
<?xml version="1.0"?>
<!DOCTYPE CLIENTS [
<!ENTITY ninthFloorAddress "2345 Broadway St Floor 9">
<!ENTITY eighthFloorAddress "2345 Broadway St Floor 8">
<!ENTITY seventhFloorAddress "2345 Broadway St Floor 7">
]>
<CLIENTS>
<CLIENT>
<NAME>Fred Jenkins</NAME>
<ADDRESS>&ninthFloorAddress;</ADDRESS>
<PHONE>x345</PHONE>
</CLIENT>
<CLIENT>
<NAME>Ravi Gupta</NAME>
<ADDRESS>&ninthFloorAddress;</ADDRESS>
<PHONE>x111</PHONE>
</CLIENT>
<CLIENT>
<NAME>Natalia Kinski</NAME>
<ADDRESS>&ninthFloorAddress;</ADDRESS>
<PHONE>x346</PHONE>
</CLIENT>
<CLIENT>
<NAME>Mary Smith</NAME>
<ADDRESS>&seventhFloorAddress;</ADDRESS>
<PHONE>x289</PHONE>
</CLIENT>
<CLIENT>
<NAME>Kristin Mancuso</NAME>
<ADDRESS>&eighthFloorAddress;</ADDRESS>
<PHONE>x945</PHONE>
</CLIENT>
</CLIENTS>
ENTITY REFERENCES
Entity references refer to the key that unlocks an entity which has been
declared in an entity declaration. Entity references follow the simple syntax of
&ENTITY_NAME;
such as
&letterhead;
Now, you have already seen that entity references can take the place of
regular character data and you have seen how useful that is. You could also use
entity references within tag attributes. For example, consider the following:
<INVOICE CLIENT = "&IBM;" PRODUCT = "&PRODUCT_ID_8762;" QUANTITY = "5">
ATTRIBUTES
In additional to content, elements may have attributes. XML attributes are
identical to HTML attributes, allowing you to attach characteristics to an ele-
ment. For example, in HTML:
<IMG SRC="images/test.jpg">
and XML
<image src="images/test.jpg" />
Attributes have a name and a value and are placed within the start tag. In
the document type definition (DTD), you define the legal attributes for an
element and what values are legal for that attribute.
An element can have multiple attributes. While you can get away with
omitting quotes for attributes in HTML, in XML the value must be sur-
rounded by single or double quotes. When you use one type of quotes, the
other type is legal within the quotes - for example
<topic name=" Brian O'Sullivan">
or
<topic name=' The Use of "s in Popular Literature '>
and
<phone>
<intcode>+353</intcode>
<localcode>1</localcode>
<prefix>800</prefix>
<extension>8583</extension>
</phone>
Using attributes in this case is obviously far simpler to write and less ver-
bose. However, it would make searching our data for all phone numbers with
an 800 prefix quite difficult. Equally, the multiple element format would make
it easy to generate an internal phone book only showing the local extensions.
Both formats are correct data formats. Essentially, which you use comes
down to your own preference.
The problem here is that XML parsers will attempt to handle this data as
an XML tag, and then generate an error because there is no closing tag. This
is a common problem, as any use of angle brackets results in this behavior.
Entity references provide a way to overcome this problem. An entity refer-
ence is a special data type in XML used to refer to another piece of data. The
entity reference consists of a unique name, preceded by an ampersand and
followed by a semicolon: &[entityname];. When an XML parser sees an entity
reference, the specified substitution value is inserted and no processing of
that value occurs. XML defines five special entities to address this problem:
< for <, > for >, & for &, " for “and ' for”. Using these
entities, it is possible to define the above example:
<chapter>
<sect1>
<title>Using HTML</title>
<para>
Then, when we wish to use this text within our XML document at any
subsequent stage, we simply use the entity: &rspca; to represent our con-
stant. Likewise, the variable representing the author’s current email could be
defined as an entity and referenced throughout the rest of the document. If
the author’s email address changes at a later date, then a simple change to the
entity would modify the data throughout the rest of the document.
UNPARSED DATA
In XML, there are three kinds of data that are ignored by the parser: com-
ments, processing instructions (PIs), and character data (CDATA). When
the parser encounters one of these, normal operation is suspended while the
parser looks for the end marker.
Comments in XML are exactly like comments in HTML. Typically, they
are ignored by most XML parsers.
<!-- this is a comment -->
3
DOCUMENT TYPE
DEFINITION (DTD)
PREDEFINED ENTITIES
In XML, certain characters are used specifically for marking up the docu-
ment. For example, in the following element, the angle brackets (< >) and
forward slash (/) are interpreted as markup and not as actual character data.
The characters that are reserved for markup cannot be used as content.
If we intend to use these characters as displayed data, they must be escaped.
To escape a character, we must use an entity to insert the character into a
document. So, if the text <bookname> is entered in the document, we use
the following sequence.
<BOOKNAME>
ATTRIBUTES
Attributes provide a method of associating values to an element without mak-
ing the attributes a part of the content of that element.
<PRICE CURRENCY="USD">315.00</PRICE>
The code snippet provides an example. Here, we can see that a currency
attribute can be added to the price element of the book document instead of
adding a separate currency element to the document.
The attribute in XML is used in the same way as an HTML attribute, but
we can define our own attribute names. One important point is that the value
of the attribute must be within single or double quotes.
VALID DOCUMENTS
The DTD (Document Type Definition) specified in the prolog outlines all
the rules for the document. A valid document must obey the rules specified
in the DTD. A valid document also obeys all the validity constraints identified
in the XML specification.
The processor must understand the validity constraints of the XML speci-
fication and check the document for possible violations. If the processor finds
any errors, it must report them to the XML application. The processor must
also read the DTD, validate the documents against it, and again report any
violations to the XML application.
As all the above-mentioned processing and checking take time, and
because validation might not always be necessary, XML supports the concept
of a well-formed document.
WELL-FORMED DOCUMENTS
A document is described as well-formed if it meets the well-formedness con-
straints of the XML recommendation. Principally, this means it must have a
single root element and all the other elements must be correctly nested. If a
document is well formed, it can be correctly parsed by a computer program.
Well-formedness can reduce the amount of work a client has to do.
For example, if the server has already validated a document, it is not nec-
essary to burden the client with validating the document again. As a result,
well-formedness can save download time because the client does not need to
download the DTD, and it can save processing time as the DTD need not be
processed again.
In many cases, authoring a DTD or validating a document is unnecessary.
For example, someone at a small company might want to use XML to provide
structure to a departmental Website, but all the features that validation pro-
vides are not needed for the site.
According to the XML specifications, a well-formed document must meet
the following criteria:
●● A well-formed document must match the definition of a document. The
definition of a document is that it should contain one or more elements. It
contains exactly one root element, also called the document element, and
all other elements must be properly nested.
XML DOCUMENTS
An XML document is made up of the following parts:
●● An optional prolog
●● A document element, usually containing nested elements
●● Optional comments or processing instructions
The Prolog
The prolog of an XML document can contain the following items:
●● An XML declaration
●● Processing instructions
●● Comments
●● A Document Type Declaration
This declares that the document is an XML document. The version attri-
bute is required, but the encoding and standalone attributes are not. If the
XML document uses any markup declarations that set defaults for attributes
or declare entities then standalone must be set to “no.”
PROCESSING INSTRUCTIONS
Processing instructions are used to pass parameters to an application. These
parameters tell the application how to process the XML document. For exam-
ple, the following processing instruction tells the application that it should
transform the XML document using the XSL stylesheet beatles.xsl.
<?xml-stylesheet href="beatles.xsl"type="text/xsl"?>
As shown above, processing instructions begin with <? and end with ?>.
COMMENTS
Comments can appear throughout an XML document. Like in HTML, they
begin with <!— and end with—>.
<!—This is a comment—>
The code snippet here conveys to the XML processor that the document
is of the class Catalog and conforms to the rules formed in the DTD file
named “book.dtd.”
The second structural element in an XML document is the document
element, where the actual content lies. Each XML document must have only
one root element, and all other elements must be completely enclosed in that
element. The document element contains all the data in an XML document.
This element can comprise any number of nested sub-elements and external
entities.
<?xml version="1.0"?>
<!DOCTYPE Book SYSTEM "Book.dtd">
<Book>
<Bookname>Paradise Lost</Bookname>
<Authorname>John Milton</Authorname>
</Book>
The code snippet given here shows the book element in Book.dtd. Here,
we can see that the element tags can include one or more optional or manda-
tory attributes that give further information about the elements they delimit.
Attributes can only be specified in the start tag.
<element.type.name attribute.name="attribute value">
The code snippet here gives the syntax for specifying an attribute. In direct
contrast to SGML and HTML, in which multiple declarations are considered
as errors, XML deals with multiple declarations of attributes in a unique man-
ner. If an element appears once with one set of attributes and then appears
again with a different set of attributes, the two sets of attributes are merged.
The first declaration for a particular element is the only one that counts, and
any other declarations are ignored.
PARSERS
The W3C Recommendation has also described the behavior of parsers or the
XML processor, or the lower tier of the XML’s architecture. This has been
defined with the objective of easing the burden on the applications that han-
dle the XML data.
XML PROCESSING
The AttValue is then processed by removing any leading or trailing spaces and
converting the multiple spaces into single spaces. The exception to this rule
arises if the attribute value is declared as CDATA in the DTD and a validating
parser is used.
There are two approaches in implementing an XML parser. They are the
event-driven parsers and the tree-based parsers.
EVENT-DRIVEN PARSERS
In this approach of XML processing, namely the event-driven parser—the
model which is familiar to the programmers of modern GUIs and operating
systems—the parser executes a call-back to the application for each class of
XML data that includes an element with attributes, character data, processing
instructions, notation, or comments.
Data handling in XML depends on the application as data is provided
through the call-backs. The XML parser does not maintain the element tree
structure or any of the data after it has been parsed.
TREE-BASED PARSERS
The most widely used structure in software engineering is the simple hierar-
chical tree.
In this approach, the well-formed documents are defined as a tree, and
common and mature algorithms are used to traverse the nodes of an XML
document.
This approach conforms to the Document Object Model as specified
by W3C. The DOM is a platform and language neutral interface that allows
manipulation of tree-structured documents.
MSXML, a Java-based XML, was developed by Microsoft. XML was later
included as a part of the Internet Explorer 5 with a different parser.
XML PARSER
All modern browsers have a built-in XML parser. An XML parser converts an
XML document into an XML DOM object—which can then be manipulated
with JavaScript.
EXAMPLE DTD
The following example demonstrates what a DTD could look like:
<!ELEMENT tutorials (tutorial)+>
<!ELEMENT tutorial (name, url)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT url (#PCDATA)>
<!ATTLIST tutorials type CDATA #REQUIRED>
DTD <!DOCTYPE>
If you’ve had the opportunity to view some XML documents, you may have
noticed a line starting with <!DOCTYPE appearing near the top of the docu-
ment. For example, if you’ve viewed the source code of a (valid) XHTML file,
you may have seen a line like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://
www.abc.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
DOCTYPE SYNTAX
To use a DTD within your XML document, you need to declare it. The DTD
can either be internal or external (located in another document).You declare a
DTD at the top of your XML document (in the prolog) using the <!DOCTYPE
declaration. The basic syntax is
<!DOCTYPE rootname [DTD]>
where, rootname is the root element, and [DTD] is the actual definition.
Actually, there are slight variations depending on whether your DTD is
internal or external (or both), public or private.
Table 3.2 DOCTYPE Declaration Syntax
DOCTYPE Example Description
Variation
<!DOCTYPE tutorials [
<!ELEMENT tutorials
(tutorial)+>
<!ELEMENT tutorial
(name,url)> This is an internal DTD (the
<!DOCTYPE rootname DTD is defined between the
<!ELEMENT name
[DTD]> square brackets within the
(#PCDATA)>
XML document).
<!ELEMENT url (#PCDATA)>
<!ATTLIST tutorials type
CDATA #REQUIRED>
]>
(continued)
(continued)
DOCUMENT VALIDATION
A well-formed document written using implicit rules cannot be checked for
errors. We rely on the integrity of the applications that create and consume the
XML for the integrity of the overall system. Errors in the code cannot be caught.
They could either cause the program to break or cause bad errors. This is the
reason that the W3C specifies the behavior of a validating parser. If an XML
document refers a DTD, a validating parser is required to retrieve the DTD and
ensure that the document conforms to the grammar that the DTD describes.
To check errors, simply use DTDs and a validating parser. The parser will
check for errors in the document syntax, vocabulary, and any specified values.
After the parser has validated the document, the document can be passed
on to the application logic. The application logic does protect the document
from faulty application logic but filters the bad data. This is particularly
important in case of Internet applications.
One cannot assume that the quality control over the application subject
and the codes written are the same. A programming team working for one
organization might be implementing a public XML vocabulary for a partic-
ular business. Their interpretation of the vocabulary may not be the same.
The same case applies for the testing as well. But, with a DTD and a validat-
ing parser, we can have an immediate and effective check of the document’s
integrity. This check depends on the DTD. With this in mind, we now delve
into the principles needed to write effective DTDs.
<name>
<firstname>George</firstname>
<lastname>Harrison</lastname>
</name>
</beatle>
<beatle link="http://www.ringostarr.com">
<name>
<firstname>Ringo</firstname>
<lastname>Starr</lastname>
</name>
</beatle>
<beatle link="http://www.webucator.com" real="no">
<name>
<firstname>Nat</firstname>
<lastname>Dunn</lastname>
</name>
</beatle>
</beatles>
CREATING DTDS
DTDs are simple text files that can be created with any basic text editor.
Although they look a little cryptic at first, they are not terribly complicated
once you get used to them.
A DTD outlines what elements can be in an XML document and the
attributes and sub-elements that they can take. Let’s start by taking a look at
a complete DTD and then dissecting it.
INTERNAL DTD
Whether you use an external or internal DTD, the actual syntax for the DTD
is the same—the same code could just as easily be part of an internal DTD or
an external one. The only difference between internal and external is in the
way it’s declared with DOCTYPE.
Using an internal DTD, the code is placed between the DOCTYPE tags
(i.e., <!DOCTYPE tutorials [ and ]>.
EXTERNAL DTD
An external DTD is one that resides in a separate document. To use the DTD,
you need to link to it from your XML document by providing the URI of the
DTD file. This URI is typically in the form of a URL. The URL can point to
a local file using a relative reference or a remote one (i.e., using HTTP) using
an absolute reference.
<tutorials>
<tutorial>
<name>XML Tutorial</name>
<url>http://www.abc.com/xml/tutorial</url>
</tutorial>
<tutorial>
<name>HTML Tutorial</name>
<url>http://www.abc.com/html/tutorial</url>
</tutorial>
</tutorials>
COMBINED DTD
You can use both an internal DTD and an external one at the same time. This
could be useful if you need to adhere to a common DTD, but also need to
define your own definitions locally.
Example
This is an example of using both an external DTD and an internal one for the
same XML document. The external DTD resides in “tutorials.dtd” and is called
first in the DOCTYPE declaration. The internal DTD follows the external
one, but still resides within the DOCTYPE declaration:
<?xml version="1.0" standalone="no"?>
<!DOCTYPE tutorials SYSTEM "tutorials.dtd" [
<!ELEMENT tutorial (summary)>
<!ELEMENT summary (#PCDATA)>
]>
<tutorials>
<tutorial>
<name>XML Tutorial</name>
<url>http://www.abc.com/xml/tutorial</url>
<summary>Best XML tutorial on the web!</summary>
</tutorial>
<tutorial>
<name>HTML Tutorial</name>
<url>http://www.abc.com/html/tutorial</url>
<summary>Best HTML tutorial on the web!</summary>
</tutorial>
</tutorials>
DTD ELEMENTS
Creating a DTD is straight forward. It’s really just a matter of defining your
elements, attributes, and/or entities.
To define an element in your DTD, you use the <!ELEMENT> declaration.
The actual contents of your <!ELEMENT> declaration will depend on the syntax
rules you need to apply to your element.
BASIC SYNTAX
The <!ELEMENT> declaration has the following syntax:
<!ELEMENT element_name content_model>
PLAIN TEXT
If an element should contain plain text, you define the element using
#PCDATA. PCDATA stands for Parsed Character Data and it is the way you
specify non-markup text in your DTDs.
Using this example - <name>XML Tutorial</name> - the “XML Tutorial”
part is the PCDATA. The other part consists of the markup.
Syntax
<!ELEMENT element_name (#PCDATA)>
Example
<!ELEMENT name (#PCDATA)>
The above line in your DTD allows the “name” element to contain non-
markup data in your XML document:
<name>XML Tutorial</name>
UNRESTRICTED ELEMENTS
If it doesn’t matter what your element contains, you can create an element
using the content_model of ANY. Note that doing this removes all syntax
checking, so you should avoid using this if possible. You’re better off defining
a specific content model.
Syntax
<!ELEMENT element_name ANY>
Example
<!ELEMENT tutorials ANY>
EMPTY ELEMENTS
You might remember that an empty element is one without a closing tag. For
example, in XHTML, the <br/> and <img/> tags are empty elements. Here’s
how you define an empty element:
Syntax
<!ELEMENT element_name EMPTY>
Example
<!ELEMENT header EMPTY>
The above line in your DTD defines the following empty element for
your XML document:
<header/>
CHILD ELEMENTS
You can specify that an element must contain another element by providing
the name of the element it must contain. Here’s how you do that:
Syntax
<!ELEMENT element_name (child_element_name)>
Example
<!ELEMENT tutorials (tutorial)>
The above line in your DTD allows the “tutorials” element to contain one
instance of the“tutorial” element in your XML document:
<tutorials>
<tutorial></tutorial>
</tutorials>
When defining child elements in DTDs, you can specify how many times
those elements can appear by adding a modifier after the element name. If
no modifier is added, the element must appear once and only once. The other
options are shown in the table below.
Table 3.3 List of Modifiers
Modifier Description
? Zero or one times.
+ One or more times.
* Zero or more times.
OTHER ELEMENTS
The other elements are declared in the same way as the document element—
with the <!ELEMENT> declaration. The Beatles DTD declares four addi-
tional elements.
Each beatle element must contain a child element name, which must
appear once and only once.
<!ELEMENT beatle (name)>
CHOICE OF ELEMENTS
It is also possible to indicate that one of several elements may appear as a
child element. For example, the declaration below indicates that an img ele-
ment may have a child element name or a child element id, but not both.
<!ELEMENT img (name|id)>
EMPTY ELEMENTS
Empty elements are declared as follows.
<!ELEMENT img EMPTY>
MIXED CONTENT
Sometimes elements can have elements and text intermingled. For example,
the following declaration is for a body element that may contain text in addi-
tion to any number of link and img elements.
<!ELEMENT body (#PCDATA | link | img)*>
Syntax
<!ELEMENT element_name (child_element_name, child_element_name,...)>
Example
<!ELEMENT tutorial (name, url)>
The above line in your DTD allows the “tutorial” element to contain one
instance of the “name” element and one instance of the “url” element in your
XML document:
<tutorials>
<tutorial>
<name></name>
<url></url>
</tutorial>
</tutorials>
The first rule declares property to be inline, in italic, and navy; the second
rule, with its comma-separated list of selectors, declares all the other ele-
ments to be block-level (with a bit of a margin added in the end).
Finally, the title is given a larger font size than the rest of the text. The
presentation of the document can be further improved by adding more rules
to the stylesheet.
, a, b a followed by b
| a|b a followed by b
() (expression) An expression surrounded by parentheses is
treated as a unit and could have any one of the
following suffixes?, ∗, or +.
Zero or More
To allow zero or more of the same child element, use an asterisk (∗).
Syntax
<!ELEMENT element_name (child_element_name*)>
Example
<!ELEMENT tutorials(tutorial*)>
One or More
To allow one or more of the same child element, use a plus sign (+).
Syntax
<!ELEMENT element_name (child_element_name+)>
Example
<!ELEMENT tutorials (tutorial+)>
Zero or One
To allow either zero or one of the same child element, use a question mark (?).
Syntax
<!ELEMENT element_name (child_element_name?)>
Example
<!ELEMENT tutorials (tutorial?)>
Choices
You can define a choice between one or another element by using the pipe (|)
operator. For example, if the “tutorial” element requires a child called either
“name”, “title,” or “subject” (but only one of these), you can do the following
document type definition (DTD).
Syntax
<!ELEMENT element_name (choice_1|choice_2|choice_3)>
Example
<!ELEMENT tutorial (name|title|subject)>
Mixed Content
You can use the pipe (|) operator to specify that an element can contain both
PCDATA and other elements.
Syntax
<!ELEMENT element_name (#PCDATA | child_element_name)>
Example
<!ELEMENT tutorial (#PCDATA|name|title|subject)*>
Syntax
<!ELEMENT element_name (child_element_name dtd_operator, child_
element_name dtd_operator,...)>
Example
<!ELEMENT tutorial (name+, url?)>
The above example allows the “tutorial” element to contain one or more
instances of the “name” element, and zero or one instance of the “url” element.
SUBSEQUENCES
You can use parentheses to create a subsequence (i.e., a sequence within a
sequence). This enables you to apply DTD operators to a subsequence.
Syntax
<!ELEMENT element_name ((sequence) dtd_operator sequence)>
Example
<!ELEMENT tutorial ((author,rating?)+ name, url*)>
The above example specifies that the “tutorial” element can contain one
or more “author” elements, with each occurence having an optional “rating”
element.
The element declaration above states that the “beatles” element must
contain one or more “beatles” elements.
LOCATION OF MODIFIER
The location of modifiers in a declaration is important. If the modifier is out-
side of a set of parentheses, it applies to the group; if the modifier is immedi-
ately next to an element name, it applies only to that element. The following
examples illustrate.
In the example below, the body element can have any number of inter-
spersed child link and img elements.
<!ELEMENT body (link|img)*>
In the example below, the body element can have any number of child
link elements or any number of child img elements, but it cannot have both
link and img elements.
<!ELEMENT body (link*|img*)>
In the example below, the body element can have any number of child
link and img elements, but they must come in pairs, with the link element
preceding the img element.
<!ELEMENT body (link, img)*>
In the example below, the body element can have any number of child
link elements followed by any number of child img elements.
<!ELEMENT body (link*, img*)>
XML CDATA
All text in an XML document is parsed by the parser, but text inside a CDATA
section is ignored by the parser.
The parser does this because XML elements can contain other elements,
as in this example, where the <name> element contains two other elements
(first and last):
<name><first>Bill</first><last>Gates</last></name>
will generate an error because the parser interprets it as the start of a new ele-
ment. “&” will generate an error because the parser interprets it as the start
of a character entity.
Some text, like JavaScript code, contains a lot of “<” or “&” characters.
To avoid errors, script code can be defined as CDATA. Everything inside
a CDATA section is ignored by the parser. A CDATA section starts with
"<![CDATA[" and ends with "]]>":
<script>
<![CDATA[
function matchwo(a,b)
{
if (a < b && a < 0) then
{
return 1;
}
else
{
return 0;
}
}
]]>
</script>
STANDALONE ATTRIBUTE
There is one further variation to be considered before we further discuss how
to provide declarations. The XML declaration can have a standalone attrib-
ute. The standalone attribute is, however, seldom seen in practice. The figure
shows the declaration of the standalone attribute.
<?xml version="1.0" standalone="YES" ?>
<!DOCTYPE Catalog…
DOCTYPE DECLARATION
The DOCTYPE declaration formally consists of the keyword followed by the
name of the document’s root element’s root element in our example the word
CATALOG. This is followed by an optional external identifier, which is again
followed by an optional block of markup characters.
The external identifier locates the external DTD (external subset).The
markup declaration block actually contains markup declarations (internal subset).
Internal DTDs are very useful. An internal DTD, however, adds a sub-
stantial size to the document. The declarations must be transmitted with the
document even if the consumer of the document does not intend to verify the
document. Internal DTDs are very useful for simple vocabularies when using
prototypes of a markup.
Sometimes, programmers might feel the need to use both the internal
as well as external DTD. In such cases, the internal DTD adds declarations.
Nonetheless, when an internal DTD declares some item that is also declared
in the external DTD, the internal DTD supersedes the external DTD. This
permits some fine-tuning of the declarations for a particular document’s
needs, but enough care must be taken, as, if we override the external DTD, it
starts to loose relevance, which is a sign of poor initial design.
EXTERNAL DTDS
An external DTD is more flexible in certain aspects. In this case, the
DOCTYPE declaration comprises the usual keyword and the root element
name, followed by another keyword denoting the source of the external DTD,
which is then followed by the location of that DTD.
The keyword can either be SYSTEM or PUBLIC.
In case, the keyword is SYSTEM, and a URL directly and explicitly locates
the DTD. Thus, the parser should be able to find the DTD given the URL
alone. Hence, what follows SYSTEM is a URL naming the DTD file. The
URLs used to locate DTDs should not contain fragment identifiers, that is,
the character # followed by a name, as XML 1.0 indicates that parsers may
signal an error if the URL contains such an identifier.
<DOCTYPE Catalog SYSTEM http://myserver/Catalog.dtd>
<DOCTYPE Catalog SYSTEM http://www.universallibrary.org/Catalog.dtd>
The keywords associated with these declarations and their meanings are
shown in the table. The first two declarations deal with the information found
in an XML document element, namely ELEMENTS and ATTRIBUTES.
The last two types could be considered supporting players. Entities in par-
ticular are designed to make an XML vocabulary designer’s life easier. They
normally consist of content that recurs in the DTD or document to warrant
creating a special declaration. Notations deal with content other than XML.
A notation is used to declare a particular class of data and associate it with
an external program. That external program becomes the handler for the
declared class of data.
it is text, it need not be XML. If the replacement content is not XML, there is
no need in using a parser on it. On the other hand, a parsed entity is XML that
is pasted into the document content, so it must be passed through the parser.
PREDEFINED ENTITIES
XML reserves some characters, such as the angle brackets, for its own use.
In addition, some characters are unprintable. XML therefore provides
some predefined entities so that authors can use these characters in their
documents without conflict. Hence, in the text content of an element, for
example, certain characters can be referred to without using them and may be
confused with the markup by the document processor during parsing.
Any character can be referred to by a numeric reference. This is done
by writing the characters followed immediately by the numeric value of the
character and a semicolon. For example, the “greater than” symbol is written
as >.
Some characters are so prevalent in XML that XML provides some pre-
defined entities.
GENERAL ENTITIES
General entities allow us to declare a piece of parsed text associated with a
name by which we shall refer to the text. The entity is declared with the key-
word ENTITY, a name, and a replacement value.
With this declaration in place, we can plug in the copyright text anywhere
in a document’s content when we need it simply by referring to the name
“copyright.” Of course, the parser needs to be told when we are making an
entity reference so that it will not confuse the entity name with the markup
text. To signal this intent, we delimit the name with an ampersand in front of
the name and a semicolon following. There cannot be a white space between
the name and its delimiters.
The ampersand character is reserved for this role in XML. If we need to
use an ampersand for something else in a document, we must use the pre-
defined entity for the character.
<!ENTITY Entity1 SYSTEM http://www.vvco.com/boilerplate/copyrighttext.txt>
General entities also have an external form, where the replacement text is
given in an external file. The declaration takes the form as shown in the figure.
PARAMETER ENTITIES
Parsed entities that are used solely within the DTD are called parameter
entities.
Parameter entities allow the user to easily reference or change commonly
used constructs in the DTD by keeping them in one place.
This is easier than changing a construct everywhere it appears in a DTD,
but it still must be edited when a construct is extended.
The keyword CDATA refers to character data. The replacement text is a
part of an attribute list declaration containing three common attributes. This
is processed as if it had been written into the DTD. Whenever this set of
attributes turns up in the DTD, we can simply refer to the entity people-
Parameters.
All the parameter entities must be declared before they are referred to in
the DTD.
This means that the parameter entity declared in the external subset of
the DTD cannot be referred to in the internal subset as the latter is read first
by the parser, thus, the reference will be seen before the declaration.
A parameter entity reference consists of the name delimited by a percent
sign in front of the name and a semicolon following. There cannot be any
white space between the delimiters and the name.
<!ATTLIST InsuredPerson
age CDATA # IMPLIED
weight CDATA #IMPLIED
height CDATA #REQUIRED
carrier CDATA #REQUIRED
Thus, the reference for the example would be as shown. For the moment,
the InsuredPerson element is declared to have four attributes: one carrier,
which is explicitly declared, and the other three, namely age, weight and
height, that appeared in the parameter entity and have already been declared
when the replacement text is substituted for the entity reference by the parser.
All the rules for well-formed documents apply to parameter entities. The
document must be well-formed after the replacement text has been substi-
tuted for the entity reference.
Just as in the case of general entities, parameter entities can also have a
replacement text that resides in an external file.
CONTENT MODEL
A content model is a specification of the internal structure of an element’s
content.
CARDINALITY OPERATORS
The operators seen so far lack something important—cardinality, such as how
many instances of an element type are permitted? The table shows the cardi-
nality operators.
Table 3.7 List of Cardinality Operators
Cardinality Operators Meaning
? Optional; may or may not appear
∗ Zero or more
+ One or more
This content model group says that the basket contains one or more
instances of the element type Cherry, followed by zero or more instances
of the choice between Apple and Orange. Note that all the elements must
appear together. This would lead to an instance as shown.
<FruitBasket>
<Cherry>…</Cherry>
<Cherry>…</Cherry>
<Apple>…</Apple>
<Orange>…</Orange>
<Orange>…</Orange>
</FruitBasket>
ATTRIBUTES
Attributes complement and modify elements by means of associating simple
properties with elements. Attributes are a rich feature in XML that allows us
to include a significant amount of information.
In HTML, SRC is an attribute in the IMG tag. Attributes are declared in
XML using the ATTLIST tag. Each element that has attributes declared for
it will have at least one ATTLIST that declares the attribute for the element.
The ATTLIST declaration consists of the ATTLIST keyword followed by
the element to which the attribute applies, followed by zero or more attribute
definitions. For readability purposes, it is better to place the attribute defini-
tion on a separate line.
Each attribute definition consists of the name of the attribute, its type,
and a default definition.
DEFAULT VALUES
There are four defaults for attribute declarations. They are shown in the table.
Table 3.8 List of a Default Attributes
Attribute Defaults Meaning
#REQUIRED Attributes must appear on every instance of element.
#IMPLIED Attribute may optionally appear on an instance of an
element.
#FIXED plus default Attribute must have default value; if attribute does not
value appear, value is assumed by the parser.
Default value only If attribute does not appear, default value assumed by
parser. If attribute appears it may have another value.
From the example, we can see that the declaration of the color attribute
gave us a default value that is blue. In the first instance, this has been explicitly
declared, but left off in the second instance of the element. A parser would
treat both as having a value of blue for the attribute color.
ATTRIBUTE TYPES
The attribute type specifies whether the attribute is needed. The table shows
the various types of the attributes and their meanings.
Table 3.9 List of Attribute Types and Meanings
Attribute Types Meaning
CDATA Character data (String)
ID Name unique within a given document
IDREF Reference of some element bearing an ID attribute
possessing the same value as the IDREF attribute
IDREFS Series of IDREFs delimited by whitespaces
ENTITY Name of a predefined external entity
ENTITIES Series of ENTITY names delimited by whitespaces
NMTOKEN A Name
NMTOKENS A series of NMTOKENS delimited by whitespaces
NOTATION Accepts one of a set of names indicating notation types
declared in the DTD
[Enumerated Value] Accepts one of a series of explicitly user-defined values
that the attributes can take on
CDATA
Eventually, all the content turns up as text. When there is an attribute type
whose value consists of just text, it may be declared as CDATA.
The value of the attribute could be any character data string of any length.
The only restriction is that the attribute value cannot contain markup. An
example is shown. As long as the attribute value is simple text, the parser will
declare it valid.
<!ATTLIST SomeCol someText CDATA #IMPLIED
<SomeCol someText= "This is a validtext">…</SomeCol>
ID
The ID attribute type will have a value that is a unique identifying name.
The value of the ID attribute must be unique throughout the document. This
allows us to uniquely name an element. No element can have more than one
ID for an element.
The attribute type must be #IMPLIED or #REQUIRED but never
#FIXED or defaulted. It makes no sense if the default value is provided,
especially the fixed default for an ID, as that would violate the uniqueness
constraint.
What can we do with an ID type attribute to make it useful? Refer to it,
of course. It can be used to model a one-to-one relationship between two
objects modeled by elements in our vocabulary. As the example shows, the
declaration attaches a personal identification number to their details within a
file as a unique identifier.
<!ATTLIST Person
PIN ID #REQUIRED>
IDREF
The IDREF allows us to create links and cross references within the docu-
ment. The values of IDREF must meet the same conditions as ID. They must
also be the same as the value of some ID attribute value within the document.
We cannot use an IDREF to point to a document that is not within the
document. In such a case, we can use ID and IDREF to cross reference infor-
mation instead of repeating it. If a document contained the declaration as the
Person element, we can have the declarations as shown elsewhere in the DTD.
<!ELEMENT AccountHolder EMPTY>
<!ATTLIST AccountHolder
id IDREF #REQUIRED>
…
</Person>
…
<AccountHolder>
The IDREFS type is used when we want to link to many other elements.
It allows us to model one-to-many relationships. The value of this attri-
bute is a series of ID values separated by white space. The individual Ids must
meet the ID type constraints and must match up with ID attribute values
elsewhere in the document.
ENTITY
Entities are used within the declarations of the attributes for efficiency and
reuse. If there is a construction that appears many times, we can declare an
entity representing the construction and then refer to it, whenever the con-
struction is needed. ENTITY is therefore referred to as replaceable content.
Entities may also be used to include unparsed entities as valid attribute
values. This is exactly the mechanism by which a document’s author can point
to data other than the XML markup.
For example, if we want to include some XML data, we can do this with
an entity as shown. We start by declaring the attribute to be of type ENTITY.
<!ENTITY Turnover_chart SYSTEM "Turnover_chart.gif" NDATAgif>
ENTITY, ENTITIES
To use an ENTITY as an attribute type, four things need to be done. Of these
four, three are declarations in the DTD. The fourth involves a specific docu-
ment instance. They are declaring a notation, declaring one or more entities
for use with the attribute, declaring an attribute of type ENTITY for some
element, and creating an instance of the element type in a document, and
providing the attribute and an entity name as the value.
NMTOKEN, NMTOKENS
Sometimes we might want to treat the value of an attribute as a distinct token
rather than text and want to leave the list of values imprecise. In such a case, we
can use name token, which is abbreviated as NMTOKEN and NMTOKENS.
Similar to IDREFS and ENTITIES, attributes of type NMTOKENS can
be declared, and they have values comprised of multiple name tokens. Each
name must be a valid name token and items must be separated by white space.
Although, they must conform to the rules for names that were discussed for
elements, they are free of one restriction. They are to be comprised of letters,
digits, and punctuation marks like colon, underscore, period, and hyphen, unlike
element and attribute names; any of these can be used as the first character of
an NMTOKEN. The following shows an example of an NMTOKEN attribute.
<!ATTLIST Employee
security_level NMTOKEN #REQUIRED>
<employee security_level="trusted">…
NOTATION
An XML parser is not set to deal with binary data formats. To overcome this
problem, notations are used which identify the format of external data items
that we would want to link to XML documents.
We need the notation declaration to declare a name for the notation and
associate the name with an external handler. The parser refers the foreign
data to the handler for processing.
The handler declaration works in same manner as DTD locating files
in the DOCTYPE declaration. It can be PUBLIC or STATIC, and it must
include the name of the external handler. The figure shows an example of
NOTATION declaration.
With notations, XML documents can be used as the unifying document
of a collection of dissimilar data types. This is useful for legal reports, medi-
cal reports, and multimedia presentations. XML, however, only provides the
minimal set of tools. Considerable effort is needed to build the proper presen-
tation semantics into an application.
<!ATTLIST Imager type NOTATION (gif | jpg) "gif">
<Image type="jpg">…
ENUMERATIONS
Name tokens are open ended. The format of values of NMTOKEN and
NMTOKENS are restricted by name rules; otherwise, the set of permissible
values are open. In many cases, we have a small set of character string values
that we want to be permitted, such as YES or NO. These are the useful enu-
merations for decision-making.
The enumeration attribute is declared by placing a group of values where
the type keyword appears. The group consists of parentheses enclosing the
permitted values separated by the pipe symbol (|). The values are not enclosed
by quotation marks, but like names as in XML, are case sensitive.
The instance of an attribute in the document must include only one of
the permitted values as it appears in the attribute declaration. Like any other
attribute value, the enumerated value should be enclosed by quotation marks.
<!ATTLIST Employee
manager (yes|no) #REQUIRED>
<!ATTLIST ClassifiedDoc
security_level (unclassified | secret | Top_secret) #REQUIRED>
In the first case, only the values YES and NO are allowed. YES, NO, and
MAYBE will all be rejected as invalid. It is important to respect case sensitiv-
ity, as it is to emphasize the values provided in the enumeration declaration.
When composing an enumeration for values that may be manually entered by
a user, all the variations produced by modifying the case of the values must
be considered.
DECLARING ATTRIBUTES
Attributes are declared using the <!ATTLIST > declaration. The syntax is
shown below.
<!ATTLIST ElementName
AttributeName AttributeType State DefaultValue?>
CONDITIONAL SECTIONS
Conditional sections are those statements that are parsed by the compiler
only if certain conditions are met. But in DTDs, this feature is restricted;
there is no conditional expression to be evaluated at runtime. DTDs include
LIMITATIONS OF DTDS
DTDs have propelled XML through its early adoption phase. However, they
suffer from a few limitations. They use a syntax all of their own, distinct from
that of document instances. Importantly, it would be beneficial if XML pars-
ers could give an application easy access to the declarations in DTDs they
process. We cannot use parsers to build dynamic DTDs.
DTDs are closed constructs. The rules of an XML are wholly contained
in the DTD. The DTD contains only the vocabulary and nothing else. There
is no simple and clear way to promote extensibility in DTDs.
DTDs also lack datatype information. The only tool that is provided is the
notation. This does little to allow us define our own types based on existing types.
or using a nested element. In this case, there are no rules and we are free to
choose the way we want either using an attribute or using a nested element.
The table gives the pros and cons of each approach.
Table 3.10 Pros and Cons of Using an Attribute or Using a Nested Element
Advantages Disadvantages
XML Attributes DTD can constrain the Simple string values. No
values; useful when there is support for metadata (or
a small set of allowed values, attributes of attributes).
such as “yes” or “no.”
DTD can define a default Unordered
value
ID and IDREF Validation
Lower source overhead
(makes a difference when
sending gigabytes of data
over the network)
White space normalization
available for certain data
types that save application
some parsing effort
Easier to process DOM and
SAX interfaces
Child elements Support arbitrarily complex Slightly higher space usage.
values and repeating values More complex programming
Ordered
Support “attributes of
attributes”
Extensible when data model
changes
Syntax
<!--DTD is on the same system as the XML document-->
<!DOCTYPE beatles SYSTEM "dtds/beatles.dtd">
Syntax
<!--DTD is publicly available-->
<!DOCTYPE beatles PUBLIC "-//Webucator//DTD Beatles 1.0//EN"
"http://www.webucator.com/beatles/DTD/beatles.dtd">
ELEMENTS
Every XML document must have at least one element, called the document
element. The document element usually contains other elements, which con-
tain other elements, and so on. Elements are denoted with tags. Let’s look
again at the Paul.xml.
EMPTY ELEMENTS
Not all elements contain other elements or text. For example, in XHTML,
there is an img element that is used to display an image. It does not contain
any text or elements within it, so it is called an empty element. In XML, empty
elements must be closed, but they do not require a separate close tag. Instead,
they can be closed with a forward slash at the end of the open tag as shown
below.
<img src="images/paul.jpg"/>
ATTRIBUTES
XML elements can be further defined with attributes, which appear inside of
the element’s open tag as shown below.
Syntax
<name title="Sir">
<firstname>Paul</firstname>
<lastname>McCartney</lastname>
</name>
CDATA
Sometimes it is necessary to include sections in an XML document that should
not be parsed by the XML parser. These sections might contain content that
will confuse the XML parser, perhaps because it contains content that appears
to be XML, but is not meant to be interpreted as XML. Such content must be
nested in CDATA sections. The syntax for CDATA sections is shown below.
Syntax
<![CDATA[
WHITE SPACE
In XML data, there are only four white space characters.
●● Tab
●● Line feed
●● Carriage return
●● Single space
There are several important rules to remember with regards to white
space in XML.
White space within the content of an element is significant; that is, the
XML processor will pass these characters to the application or user agent.
White space in attributes is normalized; that is, neighboring white spaces
are condensed to a single space. White space in between elements is ignored.
xml:space Attribute
The xml:space attribute is a special attribute in XML. It can only take one
of two values: default and preserve. This attribute instructs the application
how to treat white space within the content of the element. Note that the
application is not required to respect this instruction.
SPECIAL CHARACTERS
There are five special characters that can not be included in XML documents.
These characters are replaced with predefined entity references as shown in
the table below.
Table 3.11 List of Special Characters
Special Characters
Character Entity Reference
< <
> >
& &
“ "
‘ '
13. What is the relevance of the Element Form Default attribute in the
schema?
14. What is the XML parser?
15. Give some examples of XML DTDs or schemas that you have worked
with.
16. When constructing an XML DTD, how do you create an external entity
reference in an attribute value?
17. Can you use an attribute default in a DTD to declare an XML
namespace?
18. Do the default values of xmlns attributes declared in the DTD apply to
the DTD?
19. Does the scope of an XML namespace declaration ever include the
DTD?
20. Can you use XML namespaces in DTDs?
21. Do XML namespace declarations apply to DTDs?
22. Can you use qualified names in DTDs?
23. What are the limitations of DTD?
24. Give some examples of XML DTDs or schemas.
25. Using dynamic DOCTYPE generation, we want to generate an XML
document using JAXP parsers. We want to include a DOCTYPE tag
that references a DTD. How is this accomplished?
26. Can you use a arbitrary defined DTD to generate all possible XML
templates?
27. Defining SQL statements in the DTD: How can we declare XML
embedded SQL statements in the DTD?
4
NAMESPACES
NAMESPACES
A namespace is a collection of names that is identified by a Uniform Resource
Identifier (URI). Namespaces is a methodology for creating universally
unique names in an XML document by identifying element names with a
unique external resource.
Namespaces help XML vocabulary designers to break complex problems
into smaller pieces. Namespaces mix multiple vocabularies as needed to fully
describe a problem in a single XML document.
A URI is a unique name for resource that resides on a network. A Uni-
form Resource Locator (URL) locates the resource using an access protocol
and network location.
Namespaces are used to group elements and attributes that relate to
each other in some special way. Namespaces are held in a unique URI. Note
that, although it is possible that an XML schema is kept at this URI, it is not
required. This can be a bit confusing. It is important to understand that a
namespace is a set of rules that can be enforced by an application in whatever
way the application wishes.
It is unlikely that these editors ever visit the URI that holds the XHTML
namespace. Instead, these applications have built-in functionality to support
the namespace. The main reason a URI is used is to provide a unique variable
name to hold the namespace. Namespace authors should use URIs that they
own to prevent conflicts with each other.
PURPOSE OF NAMESPACES
As described above, one purpose of namespaces is to provide a unique identi-
fier for a group of elements and attribute declarations.
Another purpose is to allow instance documents to be made up of a com-
bination of such groups without having name conflicts. For example, we could
hold the book schema and song schema we have worked on in separate name-
spaces. Now suppose you wanted to use both schemas to create a book of
songs. Both songs and books can have Title elements. This could potentially
be a source of confusion as an application might not understand which Title
element to apply. By specifying which namespace the Title elements come
from, the confusion is removed.
DECLARING A NAMESPACE
Two XML documents might contain elements with the same names but dif-
ferent meanings. If both the documents need to be used in a single environ-
ment, there will be confusion about the overlapping elements. For example,
consider the following XML code.
<CUSTOMER>
<NAME>Shashi</NAME>
</CUSTOMER>
<BOOK>
<NAME>Yashasvi</Name>
</BOOK>
<BILL>
<CUSTOMER>
<NAME>Shashi</NAME>
</CUSTOMER>
<BOOK>
<NAME>Yashasvi</NAME>
</BOOK>
</BILL>
Here, the CUSTOMER element and the BOOK element have NAME
element, but the NAME element has different meanings in each case. If these
elements are combined into a single document as shown in the following code,
the NAME elements will lose their meaning.
This is a very big problem and the solution is XML namespaces, which offer a
way to create names that remain unique no matter where the elements are used.
PREFIX PREFIX
NO PREFIX NO PREFIX
SCOPE
Namespace declarations have scope in the same way that variable declarations
do in the programming. This is important because it is not always the case that
namespaces are declared at the beginning as XML document; they can be
included within a later section of the document.
A name can refer to a namespace only if it is used within the scope of the
namespace declaration. However, we will also need to mix namespaces where
elements would otherwise inherit the scope of a namespace, so there are two
ways in which scope can be declared. It can be either default or qualified.
To use namespaces, we need to prefix every name in a document; this
could be tiresome when we have many namespaces in the document.
By introducing the concept of name scope to our tool set, we can dispense
with a lot of prefixes. If we define a default namespace, all unqualified names
within the scope of the declaration are presumed to belong to that default. So,
if you declare a default namespace in the root element, it is treated as default
namespace for the whole document, and can only be overridden by more spe-
cific namespace declared within the document.
QUALIFIED
Though we clearly separate the various namespaces, sometimes we need to
sprinkle names from foreign namespaces through a document. For this, a
finer degree of granularity is needed. Hence, we can make use of qualified
names instead of declaring namespaces all over the space. The namespaces
are to be declared at the beginning of the document and then qualified at the
point of use.
<Measurements xmlns="urn:mydecs-science-measurements">
xmlns:units="urn:mydecs-science-unitsofmeasure"
xmlns:prop="urn:mydecs-science-thingsmeasured"
<OutsideAir units:units="Fahrenheit">86</OutsideAir>
<FuelTank>
<prop:Volume units:units="liters">120</prop:Volume>
<prop:Temperature units:units="Celsius">20</prop:Temperature>
</FuelTank>
</Measurements>
XML NAMESPACE
In XML, a namespace is used to prevent any conflicts with element names.
Because XML allows you to create your own element names, there’s
always the possibility of naming an element exactly the same as one in another
XML document.
This might be OK if you never use both documents together. But what if
you need to combine the content of both documents? You would have a name
conflict. You would have two different elements, with different purposes,
both with the same name.
Imagine we have an XML document containing a list of books.
<books>
<book>
<title>XML Programming</title>
<author>Shashi Banzal</author>
</book>
...
</books>
EXAMPLE NAMESPACE
Using the above example, we could change the XML document to look some-
thing like this:
<bk:books xmlns:bk="http://somebooksite.com/book_spec">
<bk:book>
<bk:title>XML Programming</bk:title>
<bk:author>Shashi Banzal</bk:author>
</bk:book>
...
</bk:books>
that the namespace was to be used for the whole document, and we prefixed
all child elements with the same namespace.
You can also define namespaces against a child node. This way, you could
use multiple namespaces within the same document, if required.
MULTIPLE NAMESPACES
You could also have multiple namespaces within your XML document. For
example, you could define one namespace against the root element, and
another against a child element.
Example
<bk:books xmlns:bk="http://somebooksite.com/book_spec">
<bk:book>
<bk:title>XML Programming</bk:title>
<bk:author>Shashi Banzal</bk:author>
<pub:name xmlns:pub="http://somepublishingsite.com/spec">
Sid Harta Publishers
</pub:name>
<pub:email>author@shashi .com.au</pub:email>
</bk:book>
...
</bk:books>
When you define the namespace without a prefix, all descendant ele-
ments are assumed to belong to that namespace, unless specified otherwise
(i.e., with a local namespace).
UNDERSTANDING NAMESPACES
In XML, when different markup languages have elements and attributes
that are named the same, the XML problem is much more severe, how-
ever, because XML applications aren’t smart enough to judge the difference
between the context of elements from different markup languages that share
the same name. For example, a tag named <goal> would have a very different
meaning in a sports markup language than the same tag in a markup language
for a daily planner. If you ever used these two markup languages within the
same application, it would be very important for the application to know when
you’re talking about a goal in hockey and when you’re talking about a personal
goal. The responsibility falls on the XML developer to ensure that uniqueness
abounds when it comes to the elements and attributes used in documents.
Fortunately, namespaces make it possible to enforce such uniqueness
without too much of a hassle.
NAMING NAMESPACES
The whole point of namespaces is that they provide a means of establishing
unique identifiers for elements and attributes. It is therefore imperative that
each and every namespace have a unique name. Obviously, there would be no
way to enforce this rule if everyone was allowed to make up their own names,
so a clever naming scheme was established that tied namespaces to URIs.
URIs usually reference physical resources on the Internet and are guaranteed
to be unique. So, a namespace is essentially the name of a URI. For example,
consider the Website http://www.michaelmorrison.com. To help guarantee
name uniqueness in any XML documents that created, we could associate the
documents with the namespace:
<mediacollection xmlns:mov="http://www.michaelmorrison.com/ns/movies">
Notice that in this example, the <title> and </title> tags are used so that
you would never know a namespace was involved. In this case, you are either
assuming a default namespace is in use or that there is no namespace at all.
It’s important to clarify why you would use qualified or unqualified names
because the decision to use one or the other determines the manner in which
you declare a namespace. There are two different approaches to declaring
namespaces:
DEFAULT NAMESPACES
Default namespaces represent the simpler of the two approaches to names-
pace declaration. A default namespace declaration is useful when you want
to apply a namespace to an entire document or section of a document. When
declaring a default namespace, you don’t use a prefix with the xmlns attrib-
ute. Instead, elements are specified with unqualified names and are there-
fore assumed to be part of the default namespace. In other words, a default
namespace declaration applies to all unqualified elements within the scope
in which the namespace is declared. The following is an example of a default
namespace declaration for a movie collection document:
<mediacollection xmlns="http://www.michaelmorrison.com/ns/movies">
<movie type="comedy" rating="PG-13" review="5" year="1987">
<title>Raising Arizona</title>
<comments>A classic one-of-a-kind screwball love story.</comments>
</movie>
<movie type="comedy" rating="R" review="5" year="1988">
<title>Midnight Run</title>
<comments>The quintessential road comedy.</comments>
</movie>
</mediacollection>
namespace for one of the title elements, which would override the default
namespace that is set in the mediacollection element. The following is an
example of how this is done:
<mediacollection xmlns="http://www.michaelmorrison.com/ns/movies">
<movie type="comedy" rating="PG-13" review="5" year="1987">
<title>Raising Arizona</title>
<comments>A classic one-of-a-kind screwball love story.</comments>
</movie>
<movie type="comedy" rating="R" review="5" year="1988">
<title xmlns="http://www.michaelmorrison.com/ns/title">Midnight Run</title>
<comments>The quintessential road comedy.</comments>
</movie>
</mediacollection>
Notice in the title element for the second movie element that a different
namespace is specified. This namespace applies only to the title element and
overrides the namespace declared in the mediacollection element. Although
this admittedly simple example doesn’t necessarily make a good argument for
why you would override a namespace, it can be a bigger issue in documents
where you mix different XML languages.
EXPLICIT NAMESPACES
An explicit namespace is useful whenever you want exacting control over the
elements and attributes that are associated with a namespace. This is often
necessary in documents that rely on multiple schemas because there is a
chance of having a name clash between elements and attributes defined in
the two schemas. Explicit namespace declarations require a prefix that is used
to distinguish elements and attributes that belong to the namespace being
declared. The prefix in an explicit declaration is used as a shorthand notation
for the namespace throughout the scope in which the namespace is declared.
More specifically, the prefix is paired with the local element or attribute name
to form a qualified name of the form Prefix:Local. The following is the movie
example with qualified element and attribute names:
<mediacollection xmlns:mov="http://www.michaelmorrison.com/ns/movies">
<mov:movie mov:type="comedy" mov:rating="PG-13" mov:review="5" mov:year="1987">
<mov:title>Raising Arizona</mov:title>
<mov:comments>A classic one-of-a-kind screwball love story.</mov:comments>
</mov:movie>
<mov:movie mov:type="comedy" mov:rating="R" mov:review="5" mov:year="1988">
<mov:title>Midnight Run</mov:title>
<mov:comments>The quintessential road comedy.</mov:comments>
</mov:movie>
</mediacollection>
In this code, the mov and mus namespaces (lines 3 and 4) are explicitly
declared in order to correctly identify the elements and attributes for each
type of media. Notice that without these explicit namespaces it would be dif-
ficult for an XML processor to tell the difference between the title and com-
ments elements because they are used in both movie and music entries.
Just to help hammer home the distinction between default and explicit
namespace declarations, let’s take a look at one more example. This time, the
media collection declares the movie namespace as the default namespace and
then explicitly declares the music namespace using the mus prefix. The end
result is that the movie elements and attributes don’t require a prefix when
referenced, whereas the music elements and attributes do.
The key to this code is the default namespace declaration, which is iden-
tified by the lone xmlns attribute (line 3); the xmlns:mus attribute explicitly
declares the music namespace (line 4). When the xmlns attribute is used by
itself with no associated prefix, it is declaring a default namespace, which in
this case is the music namespace.
XML NAMESPACES
XML namespaces provide a method to avoid element name conflicts.
NAME CONFLICTS
In XML, element names are defined by the developer. This often results in
a conflict when trying to mix XML documents from different XML applica-
tions. This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
<width>80</width>
<length>120</length>
</table>
In the example above, there will be no conflict because the two <table>
elements have different names.
Code Sample: Namespaces/Demos/Artist.xsd
<?xml version="1.0"?>
<xs:schema targetNamespace="http://www.webucator.com/Artist"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.webucator.com/Artist">
<xs:element name="Title" type="xs:string"/>
<xs:element name="FirstName" type="xs:string"/>
<xs:element name="LastName" type="xs:string"/>
<xs:element name="Name">
<xs:complexType>
<xs:sequence>
<xs:element ref="Title"/>
<xs:element ref="FirstName"/>
<xs:element ref="LastName"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Artist">
<xs:complexType>
<xs:sequence>
<xs:element ref="Name"/>
</xs:sequence>
<xs:attribute name="BirthYear" type="xs:gYear" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
xmlns:art="http://www.webucator.com/Artist"
xsi:schemaLocation="http://www.webucator.com/Artist ArtistLocal.xsd">
<Name>
<Title>Mr.</Title>
<FirstName>Michael</FirstName>
<LastName>Jackson</LastName>
</Name>
</art:Artist>
</xs:complexType>
</xs:element>
</xs:schema>
The result of qualifying all locals is that instance authors do not have to
differentiate between local and global declarations. They simply prefix all ele-
ments and attributes with a qualifier. This has two major advantages over
using unqualified locals.
Clarity - it is easy to tell which namespace each element belongs to.
Flexibility - the schema author can mix global and local declarations
without worrying that the instance author will get confused. As both local and
global declarations require prefixes, the instance author doesn’t need to know
how an element or attribute is declared.
Code Sample: Namespaces/Demos/XMLSchema-instance.xsd
<?xml version='1.0'?>
<xs:schema targetNamespace="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:attribute name="nil"/>
<xs:attribute name="type"/>
<xs:attribute name="schemaLocation"/>
<xs:attribute name="noNamespaceSchemaLocation"/>
</xs:schema>
</art:Artist>
</Artists>
</Song>
By importing the Artist namespace with xs:import and specifying that ele-
ments in that namespace can be referenced with the xmlns:art attribute of
xs:schema, elements and attributes in the Artist namespace are accessible to
this schema.
If you are likely to be working with data-centric content (e.g., more struc-
tured data that maps to a database), you should build a schema for the trans-
action log described below.
A networking Website has a feature that allows people to make connec-
tions through other connections they have made in the past. A member can
search the member list and on finding someone with whom (s)he would like
to connect, (s)he can ask a mutual connection to pass on a message to that
person.
DEFAULT NAMESPACES
Defining a default namespace for an element saves us from using prefixes in
all the child elements. It has the following syntax:
xmlns="namespaceURI"
44. If you start using XML namespaces, do you need to change the existing
DTDs?
45. How do you use XML namespaces with XML schemas?
46. What are qualified and unqualified local names in XML schemas?
47. Do you have to use XML namespaces with XML schemas?
48. What is a chameleon schema?
49. Is everything defined or declared in an XML schema in an XML
namespace?
50. Is there a one-to-one relationship between XML namespaces and XML
schemas?
51. How do you validate documents that use XML namespaces against
XML schemas?
52. How do you validate documents that use XML namespaces?
53. What is the Namespace-based Validation Dispatching Language
(NVDL)?
54. How do you create documents that use XML namespaces?
55. How can you check that a document conforms to the XML namespaces
recommendation?
56. Can you use the same document with both namespace-aware and
namespace-unaware applications?
57. What software is needed to process XML namespaces?
58. How can you use XML namespaces to combine documents that use
different element type and attribute names?
59. How do you use XML namespaces with Internet Explorer 5.0 and/or
the MSXML parser?
60. How do applications process documents that use XML namespaces?
61. Can an application process documents that use XML namespaces and
documents that don’t use XML namespaces?
62. Can an application be both namespace-aware and namespace-unaware?
5
INTRODUCTION TO XHTML
As with most revolutions, the birth of the Web was chaotic, and the mod-
ifications to HTML reflected that chaos. More recently, a significant effort
has been made to address the inconsistencies of HTML and to attempt to
restore some order to the language. The problem with disorder in HTML is
that Web browsers have to guess at how a page is to be displayed. Ideally, a
Web page designer should be able to define exactly how a page is to look and
have it look the same regardless of what kind of browser or operating system
someone is using. This utopia is still off in the future somewhere, but XML is
playing a significant role in leading us toward it, and significant progress has
been made.
Now, if you view these code snippets from the computer’s point of view,
you would find that the XML document would be easier to process. XML
captures the most useful information and has potential uses. This distinction
is the very essence of XML.
This XML (VPML) code includes three virtual pets: Maximillian the
pot-bellied pig, Augustus the goat, and Nigel the chipmunk. If you study the
code, you’ll notice that tags are used to describe the virtual pets much as tags
are used in HTML code to describe Web pages. However, in this example
the tags are unique to the VPML language. It’s not too hard to understand
the meaning of the code, thanks to the descriptive tags. In fact, an important
design parameter of XML was for XML content to always be human-readable.
Unlike HTML, which consists of a predefined set of tags such as <head>,
<body>, and <p>, XML allows you to create custom markup languages with
tags that are unique to a certain type of data, such as virtual pets.
The virtual pet example demonstrates how flexible XML is in solving data
structuring problems. Unlike a traditional database, XML data is pure text,
which means it can be processed and manipulated very easily, in addition to
being readable by people. For example, you can open up any XML document
in a text editor such as Windows Notepad (or TextEdit on Macintosh comput-
ers) and view or edit the code.
The fact that XML is pure text also makes it very easy for applications to
transfer data between one another, across networks, and also across different
computing platforms such as Windows, Macintosh, and Linux. XML essen-
tially establishes a platform-neutral means of structuring data, which is ideal
for networked applications, including Web-based applications.
So, in its inception, HTML was never intended to support fancy graphics,
formatting, or page-layout features. Instead, HTML was intended to focus on
the meaning of information or the content of information. It wasn’t until Web
browser vendors got excited that HTML was expanded to address the pre-
sentation of information. In fact, HTML was in many ways changed to focus
entirely on how information appears, which is what ultimately prompted the
creation of XML.
There are a variety of reasons why this is a good idea, and they all have
to do with improving the organization and structure of information. Although
presentation plays an important role in any Web site, modern Web applica-
tions have evolved to become driven by data of very specific types, such as
financial transactions. HTML is a very poor markup language for represent-
ing such data. With its support for custom markup languages, XML makes
it possible to carefully describe data and the relationships between pieces of
data. By focusing on content, XML allows you to describe the information
in Web documents. More importantly, XML makes it possible to precisely
describe information that is shuttled across the Net between applications. For
example, Amazon.com uses XML to describe products on its site and allow
developers to create applications that intelligently analyze and extract infor-
mation about those products.
XML is not a replacement for HTML or even a competitor of HTML.
XML’s impact on HTML has to do more with cleaning up HTML than it does
with dramatically altering HTML. The best way to compare XML and HTML
is to remember that XML establishes a set of strict rules that any markup
language must follow. HTML is a relatively unstructured markup language
that could benefit from the rules of XML. The natural merger of the two
technologies is to make HTML adhere to the rules and structure of XML. To
accomplish this merger, a new version of HTML has been formulated that
adheres to the stricter rules of XML. The new XML-compliant version of
HTML is known as XHTML.
XML’s relationship with HTML doesn’t end with XHTML, however.
Although XHTML is a great idea that is already making Web pages cleaner
and more consistent for Web browsers to display, we’re a ways off from seeing
a Web that consists of cleanly structured XHTML documents (pages). It’s
currently still too convenient to take advantage of the freewheeling flexibil-
ity of the HTML language. Where XML is making a significant immediate
impact on the Web is in Web-based applications that must shuttle data across
the Internet. XML is an excellent medium for representing data that is trans-
ferred back and forth across the Internet as part of a complete Web-based
Example
<html>
<body>
<script type="text/javascript">
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","cd_catalog.xml",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;
document.write("<table border='1'>");
var x=xmlDoc.getElementsByTagName("CD");
for (i=0;i<x.length;i++)
{
document.write("<tr><td>");
document.write(x[i].getElementsByTagName("ARTIST")[0].childNodes[0].nodeValue);
docum1ent.write("</td><td>");
document.write(x[i].getElementsByTagName("TITLE")[0].childNodes[0].nodeValue);
document.write("</td></tr>");
}
document.write("</table>");
</script>
</body>
</html>
Example
This is valid HTML, but invalid XHTML. Do NOT use these.
<UL>
<LI> item 1 </LI>
</UL>
Example
This is valid HTML, but invalid XHTML. Do NOT use these.
<ul>
<li>
<li>
</ul>
Example
This is valid HTML, but invalid XHTML. Do NOT use these.
<hr>
<br>
4. Do not mix up the closing tags: All elements within a container ele-
ments must be closed before the container is closed. You must do this in
XHTML. You should do this in HTML, but can sometimes get by with
tags in the wrong order.
Example
Bad. Do NOT use this.
<b><i> This text may not work correctly. </b></i>
Good. The i element is inside the b container element
<b><i> This text will be bold italic. </i></b>
5. Every attribute must have a value: Every attribute you code must
have a value in XHTML. A few attribute values may be omitted
HTML.
Example
This is valid HTML, but invalid XHTML. Do NOT use these.
<hr noshade />
Valid in both HTML and XHTML.
<hr noshade="noshade" />
Example
This is valid HTML, but invalid XHTML. Do NOT use these.
<hr width=4>
7. One each of the HTML head, title, and body tags are required in
XHTML: Sometimes one may be omitted or there may be two of one in
HTML.
Example
This example is valid in both HTML and XHTML.
<html>
<head>
<title> sample page </title>
</head>
<body>
Hello.
</body>
</html>
XHTML
All Web markup languages are based on SGML, a complicated language that
is not designed for humans to write. SGML is what is called a metalanguage;
that is, a language that is used to define other languages. To make its power
available to Web developers, SGML was used to create XML, a simplified
version, and also a metalanguage.
XML is a powerful format—you create your own tags and attributes to
suit the type of document you’re writing. By using a set group of tags and attri-
butes and following the rules of XML, you’ve created a new Markup language.
This is what has been done to create XHTML (eXtensible HyperText
Markup Language)—which is why you’ll see XHTML being called a subset
or application of XML. The pre-existing HTML 4.01 tags and attributes were
used as the vocabulary of this new Markup language, with XML providing the
rules of how they are put together.
So, using XHTML, you are really writing XML code, but restricting your-
self to a predetermined set of elements. This gives you all the benefits of XML
(see below), while avoiding the complications of true XML; bridging the gap
for developers who might not fancy taking on something as tricky as full-on
XML. As you’re coding under the guise of XHTML, all of the tags available to
you should be familiar. Writing XHTML requires that you follow the rules of
conformant XML, such as correct syntax and structure. As XHTML looks so
much like classic HTML, it faces no compatibility problems as long as some
simple coding guidelines are followed.
If all of this sounds a bit challenging, don’t worry. Transitioning to XHTML
is a simple process, with only a few rules to remember.
BENEFITS OF XHTML
The benefits of adopting XHTML now or migrating your existing site to the
new standards are many. First, they ensure excellent forward-compatibility
for your creations. XHTML is the new set of standards that the Web will be
built on in the years to come, so future-proofing your work early will save you
much trouble later on. Future browser versions might stop supporting dep-
recated elements from old HTML drafts, and so many old basic-HTML sites
may start displaying incorrectly and unpredictably.
Once you have used XHTML for a short time, it is no more difficult to use
than HTML, and in some ways, is easier since it is built on a more simplified
set of standards. Writing code is a more streamlined experience, as the days
of browser hacks and display tricks are gone. Editing your existing code is
also a nicer experience, as it is infinitely cleaner and more self-explanatory.
Browsers can also interpret and display a clean XHTML page quicker than
one with errors.
XHTML CODING
The first thing you need to know about changing over to XHTML as the new
standard is that there really isn’t much new to learn. No new tags or attributes
have been added into your repertoire, like HTML 4 (although a few have
been deprecated); this is just a move towards good, valid, and efficient coding.
XHTML documents stress logical structure and simplicity, and use CSS for
nearly all presentational concerns. It just means you have to change the way
you write code. Even if you always wrote great code before, there’re a few
new practices you need to add in.
XML DECLARATION
An XML declaration at the very top of your document defines both the ver-
sion of XML you’re using as well as the character encoding.
<?xml version="1.0" encoding="UTF-8"?>
Instead, you use a meta tag in the heading of your document. If you’re
using Unicode, this is as follows:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
XHTML DTDs
Whether you use the XML declaration or not, every XHTML document must
be defined as such by a line of code at the start of the page, and some attrib-
utes in the main <html> tag, which tell the browser what language the text
is in. The opening line is the DTD (Document Type Declaration). This tells
your browser and validators the nature of your page.
A DTD is the file your browser reads with the names and attributes of
all of the possible tags that you can use in your markup defined in it. Newer
browsers will usually have the latest specs written into their DTDs. Declare it
by putting this at the very top of your code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
That DTD is the one you use if you’re committed to writing entirely cor-
rect XHTML code. Strict XHTML dispenses with many presentational tags
and attributes, and it is very strict.
You won’t be permitted to use the font tag at all, nor will attributes like
width and height be allowed in your tables. You won’t be able to use the bor-
der attribute on images, and will have to use the alt attribute on all images if
you want to validate. You get the idea—almost all presentational attributes are
restricted in favour of wider CSS utilization, so unless you know your stuff in
this regard, it’d be best to use the XHTML Transitional below.
If you’re going to hover between HTML and XHTML, use the next DTD,
which is a bit looser, and if you’re putting together a frameset page, use the
last one.
Most people will opt for the XHTML Transitional, as changing to Strict
can be a daunting prospect.
A correct DTD allows the browser to go into standards mode, which will
render your page correctly, and similarly across browsers. Without a full DTD,
your browser enters “compatibility,” or “quirks” mode, behaving like a version
4 browser, including all of their associated quirks and inconsistencies. Also,
these declarations are all case-sensitive, so don’t change them in any way.
Finally, you need to define the XML namespace your document uses. It
is a definition of which set of tags you’re going to be using, and it concerns
the modular properties of XHTML. It’s set by adding an attribute into the
<html> tag. While we’re at it, we specify the language of our pages too. Mod-
ify your tags to this:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> </html>
XHTML STRICT
XHTML documents that conform to the Strict DTD may not use any depre-
cated HTML tags. The DOCTYPE declaration looks like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
XHTML TRANSITIONAL
XHTML documents that conform to the Transitional DTD may use depre-
cated HTML tags, but may not use the <frameset> and <frame> tags. The
DOCTYPE declaration looks like this:
XHTML FRAMESET
XHTML documents that conform to the Frameset DTD may use deprecated
HTML tags including the <frameset> and <frame> tags. The DOCTYPE
declaration looks like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<ul>
<li>Experience with Word Processing</li>
<li>Experience with HTML (optional, but recommended)</li>
</ul>
<h2>Course Outline</h2>
<div id="outline">
<ul>
<li>
XML Basics
<ul>
<li>What is XML?</li>
<li>
XML Benefits
<ul>
<li>XML Holds Data, Nothing More</li>
<li>XML Separates Structure from Formatting</li>
<li>XML Promotes Data Sharing</li>
<li>XML is Human-Readable</li>
<li>XML is Free</li>
</ul>
</li>
</ul>
<ul>
<li>
XML Documents
<ul>
<li>The Prolog</li>
<li>Elements</li>
<li>Attributes</li>
<li>CDATA</li>
<li>XML Syntax Rules</li>
<li>Special Characters</li>
</ul>
</li>
</ul>
<ul>
<li>Creating a Simple XML File</li>
</ul>
</li>
</ul>
</div>
</body>
</html>
DOCUMENT FORMATION
The actual XHTML content can be placed. After the Doctype line, as with
HTML, XHTML has <html>, <head>, <title>, and <body> tags but, unlike
with HTML, they must all be included in a valid XHTML document. The
correct setup of your file is as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Page Title</title>
OTHER HEAD DATA
</head>
<body>
CONTENT
</body>
</html>
It is important that your document follows this basic pattern. This example
uses the transitional Doctype, but you can use either of the others (although
frames pages are not structured in the same way).
XHTML TAGS
One of the major changes to HTML introduced to XHTML is that tags must
always be properly formed. With the old HTML specification, you could be
very sloppy in your coding, with missing tags and incorrect formation without
many problems, but in XHTML this is very important.
Lower Case
Probably the biggest change in XHTML is that the way in which you write
tags must be correct. Luckily, this major change can be easily implemented
into a normal HTML document without a problem.
are all incorrect tags and must not be used. The font tag must now be used as
follows:
<font>
If you are not writing your code, but instead use a WYSIWYG editor, you
can still begin to migrate your documents to XHTML by setting the editor to
output all code in lowercase. For example, in Dreamweaver 4 you can do this
by going to
Edit -> Preferences -> Code Format
Nesting
The second change to the HTML tags in XHTML is that they must all be
properly nested. This means that if you have multiple tags applying to some-
thing on your page, you must make sure you open and close them in the cor-
rect order. For example, if you have some bold red text in a paragraph, the
correct nesting would be one of the following:
<p><b><font color="#FF0000">Your Text</font></b></p>
<b><p><font color="#FF0000">Your Text</font></p></b>
<p><font color="#FF0000"><b>Your Text<b></font></p>
These are only examples, though, and there are other possibilities for
these tags. What you must not do, though, is to close tags in the wrong order,
for example:
<p><b><font color="#FF0000">Your Text</p></font></b>
Although code in this form would be shown correctly using HTML, this
is incorrect in the XHTML specification and you must be very careful to nest
your tags correctly.
Closing Tags
The previous two changes to HTML should not be a particular problem if
your HTML code is already well-formed. The final change to HTML tags
probably will require quite a lot of changes to your HTML documents to
make them XHTML compliant.
All tags in XHTML must be closed. Most tags in HTML are already closed
(for example <p></p>, <font></font>, and <b></b>), but there are several
standalone tags which do not get closed. The main three are
<br>
<img>
<hr>
There are two ways you can deal with the change in the specification. The
first way is quite obvious if you know HTML. You can just add a closing tag
to each one, e.g.,
<br></br>
<img></img>
<hr></hr>
Although you must be careful that you do not accidentally place anything
between the opening and closing tags, as this would be incorrect coding. The
second way is slightly different but will be familiar to anyone who has written
WML. You can include the closing in the actual tag:
<br />
<img />
<hr />
This is probably the best way to close your tags, as it is the recommended
way by the W3C who set the XHTML standard. You should notice that, in
these examples, there is a space before the />. This is not actually neces-
sary in the XHTML specification (you could have <br/>), but the reason it is
included here is that, as well as being correct XHTML, it also makes the tag
compatible with past browsers. As every other XHTML change is backwards
compatible, it would not be very good to have a space causing problems with
site compatibility.
In case you are wondering how the <img> tag works if it has all the nor-
mal attributes included, here is an example:
<img src="myimage.gif" alt="My Image" width="400" height="300" />
Attributes
HTML attributes are the extra parts you can add onto tags (such as src in the
img tag) to change the way in which they are shown. There are four changes
to the way in which attributes are changed.
Lowercase
As with XHTML tags, the attributes for them must be in lowercase. This
means that, although in the past, code like
<table Width="100%">
Although this is a minor issue, it is important to check your code for this
mistake.
Correct Quotation
Another change in the HTML syntax is that all attributes in XHTML must be
quoted. In HTML you could have used the following:
<table width=100%>
Attribute Shortening
It has become common practice in HTML to shorten a few of the attributes to
save on typing and transfer times. As with other common practices in HTML,
this has been removed from the XHTML specification as it causes incompat-
ibilities between browsers and other devices.
An example of a commonly shortened tag is
<input type="checkbox" value="yes" name="agree" checked>
There are other attributes (such as noresize) that also must be given in full.
The ID Attribute
Probably the biggest change from HTML to XHTML is the one tag attribute
change. All other differences just make tags more compatible. This is the only
full change.
In HTML, the <img> tag has an attribute “name.” This is usually used
to refer to the image in javascript for doing actions like image rollovers. This
attribute has now been changed to the “id” attribute. So, the HTML code
<img src="myimage.gif" name="my_image">
6
CSS STYLE SHEETS
CSS DOCUMENTS
CSS documents allow you to define a style for any HTML element. Thus, you
can define the style for an h1 element to be red with a font size of 6. This style
can then be applied to every h1 element on your Website. CSS documents
allow you to create a uniform style throughout your Web documents without
having to enter specific information for each h1 element in each page. If you
need to change the style for an h1 element, you need to change it only in the
CSS document. If you need to override the style defined in the CSS document
for one or more of your h1 elements in a specific page, you can do this, too.
One major problem with using CSS documents is that they are not sup-
ported in every browser. Microsoft Internet Explorer 5 supports nearly all the
features of CSS documents, and Internet Explorer 4 also supports most of the
CSS features. Netscape has released version 6 that supports CSS level 1 and
the DOM. If you were to create an XHTML document, you could use CSS
documents to define the presentation of the XHTML information. While CSS
documents can work for XHTML, they will not work for XML documents
that do not contain presentation information. For XML documents without
presentation information, you must use XSL.
●● Servability: CSS is, as the above quote points out, a requisite for brows-
ing XML documents on the Web. However, XML is a meta-language and
authors can construct their own elements (and/or DTDs). The freedom
in XML of authors creating their own tags comes with a price: XML tag
names have no predefined semantics. This results in all sorts of ambigui-
ties: an <img> could mean an image, or an imaginary number; even the
seemingly obvious <manual> could mean a technical book or a form of
human labor. (In an informal language (such as English), we (humans)
know the difference due to the “context.” However, such semantical
distinctions are not possible in formal languages being processed by
machines.) In such a case, a user agent would not know how to “display”
elements of these “home-brewed” languages. This is where the use of a
stylesheet language such as CSS becomes necessary, which provides the
display semantics to an XML document.
●● Accessibility: Use of CSS in the document makes it accessible, particu-
larly to people with visual or aural disability. There are various accessibil-
ity features of CSS.
AUTHORING APPROACHES
The use of CSS in XML involves the following steps:
●● Authoring the XML document
●● Authoring the CSS style sheet
●● Associating the CSS style sheet with the XML document
●● Rendering the XML document associated with the CSS style sheet
are being used, XML entity expansion has problems, and printing can lead to
unpredictable results.
Mozilla: This Netscape Communicator 5 in the beta releases (called
“milestones”). The CSS2 support in Mozilla is incomplete and the rendering
of XML documents with CSS2 is unstable. NGLayout, a native document
format for Mozilla’s graphics-rendering engine, is able to format XML docu-
ments using external CSS stylesheets.
Amaya: This is a W3C test-bed browser. It also natively supports XHTML,
an XML application that reformulates (or “XMLizes”) HTML and CSS2.
CSS SYNTAX
A CSS rule has two main parts: a selector, and one or more declarations:
●● Selector: This is the hook used to choose what part(s) of your HTML to
apply the CSS to. Following the selector is the…
●● Declaration Block: Everything within the curly brackets, “{” and “}”;
this is called the declaration block.
●● Declaration: Inside a declaration block, you can have as many declara-
tions as you want and each declaration is a combination of a CSS Property
and a value.
●● Property: This is one of the CSS Properties used to tell what part of the
selector will be changed (or styled).
●● Value: This assigns a value to the property.
CSS EXAMPLE
CSS declarations always ends with a semicolon and declaration groups are
surrounded by curly brackets:
p {color:red;text-align:center;}
To make the CSS more readable, you can put one declaration on each line
<html>
<head>
<style type="text/css">
p
{
color:red;
text-align:center;
}
</style>
</head>
<body>
<p>Welcome!</p>
<p>This line is styled with CSS.</p>
</body>
</html>
Result:
Welcome!
This line is styled with CSS.
CSS COMMENTS
Comments are used to explain your code, and may help you when you edit the
source code at a later date. Comments are ignored by browsers.
A CSS comment begins with “/∗”, and ends with “∗/”:
/*This is a comment*/
p
{
text-align:center;
/*This is another comment*/
color:black;
font-family:arial;
}
CSS SELECTORS
CSS Selectors allow us to target specific HTML elements with our style
sheets. While there are many different types of CSS Selectors, here we focus
on the four essential selectors: type, id, class, and descendant selectors.
1. Type selectors correspond with HTML elements
2. ID selectors are used by adding # in front of an elements ID
3. Class selectors are used by adding a period in front of an element’s class
4. Descendant selectors are similar to family trees; you start with the parent
element you wish to select, add a space, and continue naming any interior
elements until you’re arrived at the specific element you wish to select
1. Type Selector: Type selectors are very simple. They correspond with
any HTML element type. For example, add the following code to your blank
CSS file; these are three simple type selectors:
body {
font-family: Arial, sans-serif;
font-size: small;
}
h1{
color: green;
}
em { color:red;
}
This code selects and styles our <body> element, as well as all <h1> and
<em> elements on our page.
2. Id Selector: The id selector is used to specify a style for a single,
unique element. The id selector uses the id attribute of the HTML element,
and is defined with a “#”. The style rule below will be applied to the element
with id=“para1”:
<html>
<head>
<style type="text/css">
#para1
{
text-align:center;
color:red;
}
</style>
</head>
<body>
<p id="para1">Welcome!</p>
<p>This paragraph is not affected by the style.</p>
</body>
</html>
Result:
Welcome!
This paragraph is not affected by the style.
3. Class Selector: The class selector is used to specify a style for a group
of elements. Unlike the id selector, the class selector is most often used on
several elements. This allows you to set a particular style for any HTML ele-
ments with the same class.
The class selector uses the HTML class attribute, and is defined with a “.”
In the example below, all HTML elements with class=“center” will be
center-aligned:
<html>
<head>
<style type="text/css">
.center
{
text-align:center;
}
</style>
</head>
<body>
<h1 class="center">Center-aligned heading</h1>
<p class="center">Center-aligned paragraph.</p>
</body>
</html>
Result:
Center-aligned heading
Center-aligned paragraph.
You can also specify that only specific HTML elements should be affected
by a class. In the example below, all p elements with class=“center” will be
center-aligned:
<html>
<head>
<style type="text/css">
p.center
{
text-align:center;
}
</style>
</head>
<body>
<h1 class="center">This heading will not be affected</h1>
<p class="center">This paragraph will be center-aligned.</p>
</body>
</html>
Result:
This heading will not be affected
This paragraph will be center-aligned.
Class attributes: The class attribute that allows you to create subclasses
of elements in HTML is also not likely to be available in the majority of XML-
based document formats. Of course, CSS lets you select elements based on
any attribute, not just class, but the syntax is less convenient.
<?xml-stylesheet href="#s1"type="text/css"?>
<doc>
<s id="s1">
s { display: none }
p { display: block }
p .note { color: red }
</s>
<p>Some text... </p>
<p class="note">A note... </p>
</doc>
If the document format doesn’t specify that class creates a subclass, then
you’ll have to use the longer selectors with “[ ]:”
<?xml-stylesheet href="#s1"type="text/css"?>
<doc>
<s id="s1">
s { display: none }
p { display: block }
If there is no class attribute, but there is something else we can use, the
attribute selectors “[ ]” still apply:
<?xml-stylesheet href="#s1" type="text/css"?>
<doc>
<s id="s1">
s { display: none }
p { display: block }
p[warning="yes"] { color: red }
</s>
<p>Some text... </p>
<p warning="yes">A note... </p>
</doc>
It begins with “#intro” which selects our Intro Div. This is followed by
a space, and then “.important.” So essentially our selector is telling the Web
browser to (1) find the element with the id of intro, (2) go inside that element
and find any elements with the class of important.
Within the orange paragraph, the word “important” is red. Let’s imagine
we want to change the color, since red text on an orange background is diffi-
cult to read. The word “important” is inside an <em> element, so we’ll use the
following code to select and style it:
#intro .important em {
color: white;
}
This code is telling the browser to (1) find the element with an id of intro,
(2) go inside that element and find any elements with a class of important, and
(3) go inside that element and select any <em> elements.
n external style sheet can be written in any text editor. The file should
A
not contain any html tags. Your style sheet should be saved with a .css
extension. An example of a style sheet file is shown below:
hr {color:sienna;}
p {margin-left:20px;}
body {background-image:url("images/back40.gif");}
o not leave spaces between the property value and the units!
D
“margin-left:20 px” (instead of “margin-left:20px”) will work in IE, but
not in Firefox or Opera.
●● Internal Style Sheet: An internal style sheet should be used when a
single document has a unique style. You define internal styles in the head
section of an HTML page, by using the <style> tag
<head>
<style type="text/css">
hr {color:sienna;}
p {margin-left:20px;}
body {background-image:url("images/back40.gif");}
</style>
</head>
●● Inline Styles: An inline style loses many of the advantages of style sheets
by mixing content with presentation. To use inline styles you use the style
attribute in the relevant tag. The style attribute can contain any CSS prop-
erty. The example shows how to change the color and the left margin of
a paragraph:
<p style="color:sienna;margin-left:20px">This is a paragraph.</p>
I f some properties have been set for the same selector in different style
sheets, the values will be inherited from the more specific style sheet. For
example, an external style sheet has these properties for the h3 selector:
h3
{
color:red;
text-align:left;
font-size:8pt;
}
And an internal style sheet has these properties for the h3 selector:
h3
{
text-align:right;
font-size:20pt;
}
If the page with the internal style sheet also links to the external style
sheet, the properties for h3 are
color:red;
text-align:right;
font-size:20pt;
he color is inherited from the external style sheet and the text-alignment
T
and the font-size is replaced by the internal style sheet.
CSS STYLES
There are different types of style such as backgrounds, text, fonts, links, lists,
and tables.
.
.
</BOOKSTORE>
BOOKSTORE
{
background-color: #ffffff;
width: 100%;
}
BOOK
{
display: block;
margin-bottom: 30pt;
margin-left: 0;
}
BOOKTITLE
{
color: #FF0000;
font-size: 20pt;
}
AUTHOR
{
color: #0000FF;
font-size: 20pt;
}
PUBLISHER, PRICE, EDITION
{
Display: block;
color: #000000;
margin-left: 20pt;
}
Result:
Internet and its applications
K. Ram
RK Publications
400.00
2006
Java Programming
AK Sharma
Jam Publications
550.00
2008
XSL TRANSFORMATION
Applying an XSL stylesheet to the XML feed has the advantage of being able
to fully customize the display, like being able to add links or change the order
of the nodes. The transformation needs to happen on the client so that the
XML remains intact.
First we add the reference to the XSL file inside the feed:
1<?xml version="1.0" encoding="ISO-8859-1" ?>
2<?xml-stylesheet type="text/xsl" href="latest.xsl" ?>
3<?xml-stylesheet type="text/css" href="latest.css" ?>
4<rss version="2.0">
We can add the XSLT specification, as well as leave the CSS link there.
Having both added is perfectly fine, as only one of them is going to be used in
the end. If the browser understands XSL, then it will use that and ignore the
CSS. See the complete XSL file:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/rss">
<html>
<head>
<link href="xsl.css" rel="stylesheet" type="text/css"/>
<style type="text/css">
body {
font-size:0.83em;
}
</style>
</head>
<body>
<div id="logo">
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:value-of select="channel/link" />
</xsl:attribute>
<xsl:value-of select="channel/title" />
</xsl:element>
</div>
<div class="Snippet" style="border-width:0; background-color:#FFF; margin:1em">
<div class="titleWithLine">
<xsl:value-of select="channel/description" />
</div>
<dl style="padding-right:1em">
<xsl:for-each select="channel/item">
<dd>
<xsl:element name="a">
<xsl:attribute name="href">
<xsl:value-of select="link"/>
</xsl:attribute>
<xsl:value-of select="title"/>
</xsl:element>
</dd>
<dt>
<xsl:value-of select="description" /><br />
<span class="comments"><xsl:value-of select="pubDate" /></span>
</dt>
</xsl:for-each>
</dl>
</div>
<div id="footer">
<xsl:value-of select="channel/copyright" />
</div>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The important thing to notice here is that you can output complete
HTML, together with links to external CSS files, for an improved customiza-
tion of the display. If you already have a stylesheet that you use on your site,
you can make a reference to it, and use it in the XSL file to create a similar
look and feel. This is what is shown above, with the link to the CSS file “xsl.
css.” There are also many ways to define an XSLT file, like using templates.
XSL PATTERNS
XML documents represent a tree of nodes. XSL patterns provide a query
language for locating nodes in XML documents. After the nodes in the XML
document are identified using a pattern, the nodes will be transformed using
a template. The XSL patterns we will be using have the same format as the
patterns we used with XPath, such as / (child), // (ancestor), .(current node),
@ (attribute), and ∗ (wildcard). In addition to the patterns we already men-
tioned, XSL also has filter operators to manipulate, sort, and filter XML data.
One of the best things about CSSs is the fact that they can be used by a
multitude of languages, not just XML. You can create a CSS file that’s used
to define how certain elements are formatted in an XML document, and then
turn around and use it in a DHTML document.
CSSs uses data as it encounters it in the document, and isn’t as adaptable
as some other languages when dealing with XML. It is fairly common and
has support in multiple browsers, however and as long as you code with care,
many of the drawbacks of using CSS can be easily avoided.
7
XML SCHEMA BASICS
XML SCHEMA
Generically, we can refer to schema as metadata, or data about data. Some of
the schema efforts are not just concerned with defining a vocabulary; they go
beyond attempting to explain the relationships between certain types of data.
Schemas refine DTDs by permitting more precision in expressing some
concepts in the vocabulary. Schemas use a wholly different syntax than DTDs.
They permit us to borrow vocabulary from other schemas, thereby solving the
validation problem. Overall, schemas are better answers to the problem of
specifying vocabularies.
ROLE OF A SCHEMA
The concept of a schema has been present for many years in both the data-
base and the document world.
The formal role of the schema is to define the set of all possible valid
documents.
We need to be careful in using the words “validity schema.” In the XML
standard, being “valid” means something specific. Informally, it means that a
document conforms to the rules in its DTD. A document is said to be valid if
it satisfies all the constraints defined by the information model.
W3C supports an XML-based alternative to DTD, called the XML schema.
<xs:element name="note">
<xs:complexType>
<xs:sequence>
DTD AS A SCHEMA
As a constraint language, DTDs are very limited. They provide some control
over which elements can be nested within each other, but say nothing about
the text contained within the elements. They offer slightly more control over
attributes, but even this is very limited. For example, there is no way of say-
ing that an attribute must be numeric. It is the document itself that decides
whether it is going to reference a DTD or not, which DTD it is going to refer-
ence, and whether it is going to override any of the declarations in the DTD
in its private internal subset.
class of XML documents. The schema also specifies the structure that those
documents must adhere to and the type of content that each element can hold.
XML documents that attempt to adhere to an XML schema are said to
be instances of that schema. If they correctly adhere to the schema, then they
are valid instances. This is not the same as being well-formed. A well-formed
XML document follows all the syntax rules of XML, but it does necessarily
adhere to any particular schema. So, an XML document can be well-formed
without being valid, but it cannot be valid unless it is well-formed.
</xs:complexType>
</xs:element>
</xs:schema>
A FIRST LOOK
An XML schema describes the structure of an XML instance document by
defining what each element must or may contain. An element is limited by
its type. For example, an element of complex type can contain child elements
and attributes, whereas a simple-type element can only contain text. The dia-
gram below gives a first look at the types of XML schema elements.
Schema authors can define their own types or use the built-in types.
The following is a high-level overview of schema types.
●● Elements can be of a simple type or complex type.
●● Simple type elements can only contain text. They cannot have child ele-
ments or attributes.
●● All the built-in types are simple types (e.g, xs:string).
●● Schema authors can derive simple types by restricting another simple
type. For example, an email type could be derived by limiting a string to
a specific pattern.
●● Simple types can be atomic (e.g, strings and integers) or non-atomic (e.g, lists).
●● Complex-type elements can contain child elements and attributes as well
as text.
●● By default, complex-type elements have complex content, meaning that
they have child elements.
●● Complex-type elements can be limited to having simple content, meaning
they only contain text. They are different from simple type elements in
that they have attributes.
●● Complex types can be limited to having no content, meaning they are
empty, but they have may have attributes.
●● Complex types may have mixed content-a combination of text and child
elements.
A DTD File
The following example is a DTD file called “note.dtd” that defines the ele-
ments of the XML document above (“note.xml”):
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
The first line defines the note element to have four child elements: to,
from, heading, and body.
Lines 2–5 defines the to, from, heading, and body elements to be of type
“#PCDATA.”
An XML Schema
The following example is an XML schema file called “note.xsd” that defines
the elements of the XML document above (“note.xml”):
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.abc.com"
xmlns="http://www.abc.com"
elementFormDefault="qualified">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
A Reference to a DTD
This XML document has a reference to a DTD:
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM
"http://www.abc.com/dtd/note.dtd">
<note>
<to>Shashi</to>
<from>Yashasvi</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
indicates that the elements and data types used in the schema come from
the “http://www.w3.org/2001/XMLSchema” namespace. It also specifies that
the elements and data types that come from the “http://www.w3.org/2001/
XMLSchema” namespace should be prefixed with xs:
This fragment
targetNamespace="http://www.abc.com"
indicates that the elements defined by this schema (note, to, from, heading,
and body) come from the “http://www.abc.com” namespace.
This fragment
xmlns="http://www.abc.com"
indicates that any elements used by the XML instance document which were
declared in this schema must be namespace qualified. Referencing a schema
in an XML document.
This XML document has a reference to an XML schema:
<?xml version="1.0"?>
<note xmlns="http://www.abc.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.abc.com note.xsd">
<to>Shashi</to>
<from>Yashasvi</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
specifies the default namespace declaration. This declaration tells the schema-
validator that all the elements used in this XML document are declared in the
“http://www.abc.com” namespace.
Once you have the XML schema instance namespace available:
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
you can use the schema location attribute. This attribute has two values. The
first value is the namespace to use. The second value is the location of the
XML schema to use for that namespace:
xsi:schemaLocation="http://www.abc.com note.xsd"
However, the “only text” restriction is quite misleading. The text can
be of many different types. It can be one of the types included in the XML
schema definition (boolean, string, and date) or it can be a custom type that
you define yourself.
You can also add restrictions (facets) to a data type in order to limit its
content or you can require the data to match a specific pattern.
The syntax for defining a simple element is:
<xs:element name="xxx" type="yyy"/>
where xxx is the name of the element and yyy is the data type of the element.
The XML schema has a lot of built-in data types. The most common
types are
xs:string
xs:decimal
xs:integer
xs:boolean
xs:date
xs:time
A fixed value is also automatically assigned to the element, and you cannot
specify another value.
In the following example the fixed value is “red:”
<xs:element name="color" type="xs:string" fixed="red"/>
An XML schema is an XML document and must follow all the syntax
rules of any other XML document; that is, it must be well formed. XML sche-
mas also have to follow the rules defined in the “schema of schemas,” which
defines, among other things, the structure of and element and attribute names
in an XML schema.
Although it is not required, it is a common practice to use the xs qualifier
to identify schema elements and types.
The document element of XML schemas is xs:schema. It takes the attri-
bute xmlns:xs with the value of http://www.w3.org/2001/XMLSchema, indi-
cating that the document should follow the rules of XML schema. This will be
clearer after you learn about namespaces.
In this XML schema, we see a xs:element element within the xs:schema ele-
ment. xs:element is used to define an element. In this case, it defines the element
Author as a complex type element, which contains a sequence of two elements:
FirstName and LastName, both of which are of the simple type, string.
There is a need for constraints for two reasons: stylistic reasons and
rocessing reasons. The processing reasons define the information require-
p
ments of the next stage in the process, i.e., handling the document.
There is a great temptation to use the ability to impose rules thought-
lessly to make the system unnecessarily rigid. Information systems have a bad
reputation for inflexibility, and the aim should be to use constraints sensi-
bly to allow the humans in the process the maximum scope for using their
intelligence.
SCHEMA AS AN EXPLANATION
The purpose of schema is to explain the document, the interpretation, and
usage of the constructs provided. This purpose facilitates a common under-
standing of the message between the sender and the recipient.
In both the document and database traditions, this role of a schema is only
secondary, though it is the more important role.
The schema is often not properly understood by the person who enters
the data on the screen. As a result, the user interprets the schema in differ-
ent ways, and hence attaches various meanings to the data fields, though the
structure remains unchanged. Consequently, the system suffers from what is
called semantic drift.
XSD Attributes
Simple elements cannot have attributes. If an element has attributes, it is
considered to be of a complex type. But the attribute itself is always declared
as a simple type.
The syntax for defining an attribute is
<xs:attribute name="xxx" type="yyy"/>
where xxx is the name of the attribute and yyy specifies the data type of the
attribute.
The XML schema has a lot of built-in data types. The most common
types are
xs:string
xs:decimal
xs:integer
xs:Boolean
xs:date
xs:time
A fixed value is also automatically assigned to the attribute, and you can-
not specify another value. In the following example, the fixed value is “EN:”
<xs:attribute name="lang" type="xs:string" fixed="EN"/>
Restrictions on Content
When an XML element or attribute has a data type defined, it puts restric-
tions on the element’s or attribute’s content. If an XML element is of type
“xs:date” and contains a string like “Hello World,” the element will not vali-
date. With XML schemas, you can also add your own restrictions to your XML
elements and attributes.
XSD Restrictions/Facets
Restrictions are used to define acceptable values for XML elements or attrib-
utes. Restrictions on XML elements are called facets.
Restrictions on Values
The following example defines an element called “age” with a restriction. The
value of age cannot be lower than 0 or greater than 120:
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="120"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
The example above could also have been written like this:
<xs:element name="car" type="carType"/>
<xs:simpleType name="carType">
<xs:restriction base="xs:string">
<xs:enumeration value="Audi"/>
<xs:enumeration value="Golf"/>
<xs:enumeration value="BMW"/>
</xs:restriction>
</xs:simpleType>
In this case, the type “carType” can be used by other elements because it
is not a part of the “car” element.
below defines an element called “letter” with a restriction. The only accept-
able value is ONE of the LOWERCASE letters from a to z:
<xs:element name="letter">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[a-z]"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
The next example also defines an element called “initials” with a restric-
tion. The only acceptable value is THREE of the LOWERCASE OR UPPER-
CASE letters from a to z:
<xs:element name="initials">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[a-zA-Z][a-zA-Z][a-zA-Z]"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
The next example also defines an element called “letter” with a restric-
tion. The acceptable value is one or more pairs of letters, each pair consisting
of a lower case letter followed by an upper case letter. For example, “sToP”
will be validated by this pattern, but not “Stop” or “STOP” or “stop:”
<xs:element name="letter">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="([a-z][A-Z])+"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<xs:pattern value="male|female"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Restrictions on Length
To limit the length of a value in an element, we would use the length, max-
Length, and minLength constraints. This example defines an element called
“password” with a restriction. The value must be exactly eight characters:
<xs:element name="password">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:length value="8"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<!ELEMENT MI (#PCDATA)>
<!ELEMENT Last (#PCDATA)>
<!ELEMENT Suffix (#PCDATA)>
We must minimally have first and last names, but we may optionally have
a middle initial, honorific (such as Mr., Ms., and Dr.), and a suffix (such as Jr.
and III). When we use the DTD for doing this, we are constrained by the fact
that the DTD needs to be changed each time we want to have an element. We
cannot possibly have an element, which can be optional. For performing such
operations, we can use what can be called as schema-enabled DTD, wherein
we can have schema within a DTD.
To start with, we can have a <Schema> element as the root of the schema.
Then we have an element called Name, the name of which is set in the name
attribute of the <element> tag.
<Schema …>
<element name="Name">
<type>
<element name="Honorific" type="string" minOccurs="0" maxOccurs="1"/>
<element name="First" type="string"/>
<element name="MI" type="string" minOccurs="0" maxOccurs="1"/>
<element name="last" type="string"/>
<element name="suffix" type="string" minOccurs="0" maxOccurs="1"/>
</type>
</element>
</Schema>
<element name="Name">
STRUCTURES
Everything we can define with a DTD is accounted for in the structures por-
tion of XML schemas. As XML schemas are written in XML syntax, and struc-
tures refer to the XML constructs that we can use to define our markup. This
means that XML schemas are really just another application of XML.
The structures section of the XML specification is the part where the
elements and attributes for defining schemas are set out. More importantly,
the content model for elements is described in this part. The content model
clearly specifies the allowable internal structure of an element.
PREAMBLE
A schema consists of a preamble and zero or more definitions and declaration.
The preamble is found within the root element, schema. This must
include at least three pieces of information attributes. The following are some
of the most commonly used information attributes.
Table 7.2 Information Attributes
TargetNS contains the namespace and URI of the schema that is
being used
version attribute is used to specify the version of the schema
xmlns attribute provides the namespace for the XML schemas specification
finalDefault and provide defaults for two types of extensions
exactDefault
SAMPLE PREAMBLE
<?xml version="1.0"?>
<schema targetNS="http://myserver/myschema.xsd"
version="1.0"
xmlns="http://www.w3.org/2003/XMLSchema">
</schema>
The code snippet here shows how schema is used in XML with a few attri-
butes. Here, the schema is residing on myserver, and is called myschema.xsd,
.xsd being the file extension for XML schemas. The version attribute specifies
that the XML used in this schema is of version 1. The default namespace
declaration is the schema reference to XML schemas. This is a closed model
schema, which means that all documents conforming to this schema will be
completely defined by the schema and must not have any outside content.
CONTENT MODELS
XML schemas provide us with mechanisms for describing content model with
a lot more accuracy than DTDs. These use complex type definitions and a
new structure, the <group> element, to build the internal contents of an ele-
ment declaration.
The content attribute tells us what elements describe, although it says
nothing about the permitted attributes.
Table 7.3 shows the content attribute value and meaning.
Table 7.3 Content Attribute Value and its Meanings
Content Attribute Value Meaning
Unconstrained Content of any kind
Empty Empty element
Mixed Elements and character data
Compositors in the schema draft show how the content may be composed.
These compositors are values of the order attribute of a <group> element.
This new element gives us a way to provide ordered bodies of elements in a
declaration. The compositors are shown in Table 7.4.
Table 7.4 Compositor Keyword with its Meaning and DTD Equivalent
Compositor Keyword Meaning DTD Equivalent
Seq Elements must follow in , (comma)
exact order
Choice Exactly one of the model | (pipe)
elements appears
ELEMENT DECLARATION
Syntax of schemas must be in such a way so as to make it usable in XML. The
schemas are hence written using the syntax of XML, so as to make them appli-
cable to XML documents.
<element name="Book" />
DERIVATION
A new type extends another when it adds additional content to its source type.
In this case, all the content declared in the source type appears in the derived
type.
The code here gives an example of how types are derived. Here, the type
FormalPersonName extends from PersonName and adds an additional prop-
erty of adding an honorific element to the derived type.
<type name="PersonName">
<element name="FirstName" type="string" />
<element name="MI" type="string" />
<element name="LastName" type="string" />
</type>
DATA TYPES
The real world relies on the concepts of numbers, strings, and sets. Hence,
the programs written in modern programming languages support elaborate
systems of built-in types and procedures for defining new types. Therefore,
the addition of data types to XML schemas are a great asset to programmers
using XML for data in their applications.
The support for data types includes the ability to check the validity of a
value in a document. This also includes aiding an appropriate conversion from
text to the native type when processing an XML document.
Schema data types are said to have a set of distinct values called their
value space.
PRIMITIVE TYPES
Primitive data types are those that are not defined in terms of other types.
They are axiomatic. It is natural for the XML schema proposal to include the
classic XML 1.0 types, but it also adds some types of its own.
Table 7.5 gives a list of primitive types introduced by XML schema.
Table 7.5 List of Primitive Types
Schema Primitive Type Definition
String Finite Sequence of ISO 10646 or Unicode
characters, such as “thisisastring”.
Boolean The set (true, false).
(continued)
(continued)
(continued)
(continued)
HYPERLINKS
Hypertext differs from the normal text in that it has hyperlinks. The hyper-
links are identified by the characteristic blue color underlined text that identi-
fies hotspots. These hotspots, when clicked, will take us to the Web pages that
are specified as links. Linking or cross-referencing can be done when
●● There is a need to provide context-sensitive help, for instance, when we
navigate tutorials, it will be easy to understand if elaborations for some
technical terms or external references are provided.
●● A file has to be referred or displayed on clicking the mouse at a particular
point of the document.
LINKS
Link is a functionality that is associated with a text or an object in a document
using markup language.
LINK ELEMENTS
HTML has two link elements namely, A and IMG, whereas in XML links, the
link elements are identified by the element attributes.
Any XML element can act as a link element provided it has the right kind
of attributes.
<|ELEMENT CORRELATION ANY> <!ATTLIST CORRELATION
xlink:form CDATA #FIXED value>
The primary attribute that identifies the XML element as a link is the
xlink:form attribute, whose declaration in an XML DTD would be as shown.
Here, the value should be a locator and not the linking elements. The value
can be simple or extended.
LOCATORS
XML links work with link elements. The link elements contain locators.
Locators are in the form of attributes or other elements that point to spe-
cific locations.
In general, a locator is a URI, a fragment identifier, or a URI combined
with a fragment identifier. Locators for XML documents are extended pointers.
The syntax of locators allows us to use the two variations as shown here.
URI#fragment: This fetches the whole of the resource identified by the
URI and then extracts the part identified by the fragment identifier.
URI|fragment: The application can decide how it will process the URI in
order to extract the resource. This could be used to retrieve a particular part
of the document.
If the fragment identifier is a character string, the string is treated as the
value of the id attribute of an XML element. For instance, the locator sample.
html#sa2 points to the element with attribute value of sa2 in the file sample.html.
XLINKS
XML’s Xlinks are used to establish hyperlinks in XML documents.
The W3C Xlink working draft defines two categories of links. They are
simple links and extended links.
The xml:link attribute is used for specifying a link or location term as
shown here. xml:link=“simple”| “extended” | “locator” | “group” | “document”
SIMPLE LINKS
Simple links are similar to HTML links, which are formed using the element
A in HTML.
Simple links are used to jump from one source document to a specified
destination either within the same document or another document. Simple
links have only one locator and hence move in one direction from source to
target location. A simple link contains a piece of text that acts as a resource
and one end of the link.
An example for a simple link in a XML document is given here.
<sample.link xlink:form="sample" href="http:// m.com/title.XML" >see also
<sample.link>
EXTENDED LINKS
Extended links allow us to link together any number of resources, resulting in
multiple targets instead of a simple one-to-one link in HTML.
Extended links allow XML documents to link to and from resources that
cannot contain the links themselves. This includes graphic files, sound files,
read-only documents, and so on, which does not allow us to modify the con-
tents or embed links.
They enable manipulations like filtering, addition, and modification of
links. For instance, imagine that we are able to modify the links at a certain
point, so experienced readers of a technical manual can traverse a different
path from that of novice readers.
Extended links also enable application software to process the links in
different ways depending upon the requirements. An extended link does not
directly point to anything or link anything together.
An extended link element identifies itself through its xlink:form attribute
value and contains a set of locator elements that together form the extended
link as shown. Here, the comment element declares itself to be an extended
link and an opinion element declares itself to be a locator element.
<comment xlink:form="extended">
<opinion xlink:form="locator" href="link1"/>
<reference href="#division1"/>
<reference href="http://one.com/ first.html ">
<reference href="references.htm"/>
</comment>
SIMPLE-TYPE ELEMENTS
Simple-type elements have no children or attributes. For example, the Name
element below is a simple-type element; whereas the Person and HomePage
elements are not.
Code Sample: SimpleTypes/Demos/SimpleType.xml
<?xml version="1.0"?>
<Person>
<Name>Mark Twain</Name>
<HomePage URL="http://www.marktwain.com"/>
</Person>
●● gYearMonth
●● gYear
●● gMonthDay
●● gDay
●● gMonth
●● hexBinary
●● base64Binary
●● anyURI
●● QName
●● NOTATION
●● short
●● byte
●● nonNegativeInteger
●● unsignedLong
●● unsignedInt
●● unsignedShort
●● unsignedByte
●● positiveInteger
Notice the FirstName and LastName elements in the code sample below.
They are not explicitly defined as simple-type elements. Instead, the type is
defined with the type attribute. Because the value (string in both cases) is a
simple type, the elements themselves are simple-type elements.
Code Sample: SimpleTypes/Exercises/Song.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Song">
<xs:complexType>
<xs:sequence>
<!--
CONTROLLING LENGTH
The length of a string can be controlled with the length, minLength, and
maxLength facets. We used the length facet in the example above to create a
Password simple type as an eight-character string. We could use minLength
and maxLength to allow passwords that were between six and twelve charac-
ters in length.
The schema below shows how this is done. The two XML instances shown
below it are both valid, because the length of the password is between six and
twelve characters.
Code Sample: SimpleTypes/Demos/Password2.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Password">
<xs:restriction base="xs:string">
<xs:minLength value="6"/>
<xs:maxLength value="12"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="User">
<xs:complexType>
<xs:sequence>
<xs:element name="PW" type="Password"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
SPECIFYING PATTERNS
Patterns are specified using the xs:pattern element and regular expressions.
For example, you could use the xs:pattern element to restrict the Password
simple type to consist of between six and twelve characters, which can only be
lowercase and uppercase letters and underscores.
Code Sample: SimpleTypes/Demos/Password3.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Password">
<xs:restriction base="xs:string">
<xs:pattern value="[A-Za-z_]{6,12}"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="User">
<xs:complexType>
<xs:sequence>
<xs:element name="PW" type="Password"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
</xs:simpleType>
<xs:element name="Employee">
<xs:complexType>
<xs:sequence>
<xs:element name="Salary" type="Salary"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
NUMBER OF DIGITS
Using totalDigits and fractionDigits, we can further specify that the Salary
type should consist of seven digits, two of which come after the decimal point.
Both totalDigits and fractionDigits are maximums. That is, if totalDigits is
specified as 5 and fractionDigits is specified as 2, a valid number could have
no more than five digits total and no more than two digits after the decimal
point.
Code Sample: SimpleTypes/Demos/Employee2.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Salary">
<xs:restriction base="xs:decimal">
<xs:minInclusive value="10000"/>
<xs:maxInclusive value="90000"/>
<xs:fractionDigits value="2"/>
<xs:totalDigits value="7"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Employee">
<xs:complexType>
<xs:sequence>
ENUMERATIONS
A derived type can be a list of possible values. For example, the JobTitle ele-
ment could be a list of pre-defined job titles.
Code Sample: SimpleTypes/Demos/Employee3.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Salary">
<xs:restriction base="xs:decimal">
<xs:minInclusive value="10000"/>
<xs:maxInclusive value="90000"/>
<xs:fractionDigits value="2"/>
<xs:totalDigits value="7"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="JobTitle">
<xs:restriction base="xs:string">
<xs:enumeration value="Sales Manager"/>
<xs:enumeration value="Salesperson"/>
<xs:enumeration value="Receptionist"/>
<xs:enumeration value="Developer"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Employee">
<xs:complexType>
<xs:sequence>
<xs:element name="Salary" type="Salary"/>
<xs:element name="Title" type="JobTitle"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
WHITESPACE HANDLING
By default, whitespace in elements of the datatype xs:string is preserved in
XML documents; however, this can be changed for datatypes derived from
xs:string. This is done with the xs:whiteSpace element, the value of which
must be one of the following:
●● preserve—Whitespace is not normalized. That is to say, it is kept as-is.
●● replace—All tabs, line feeds, and carriage returns are replaced by single
spaces.
●● collapse—All tabs, line feeds, and carriage returns are replaced by single
spaces and then all groups of single spaces are replaced with one single
space. All leading and trailing spaces are then removed (i.e., trimmed).
In SimpleTypes/Demos/Password.xsd, we looked at restricting the length
of a Password datatype to eight characters using the xs:length element. If
whitespace is preserved, then leading and trailing spaces are considered part
of the password. In the following example, we set xs:whiteSpace to collapse,
thereby discounting any leading or trailing whitespace. As you can see, this
allows the XML instance author to format the document without the consid-
eration of the whitespace.
Code Sample: SimpleTypes/Demos/Password4.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Password">
<xs:restriction base="xs:string">
<xs:length value="8"/>
<xs:whiteSpace value="collapse"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="User">
<xs:complexType>
<xs:sequence>
<xs:element name="PW" type="Password"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
<xs:element name="PW">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:length value="8"/>
<xs:whiteSpace value="collapse"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
NONATOMIC TYPES
All of XML schema’s built-in types are atomic, meaning that they cannot be
broken down into meaningful bits. The XML schema provides for two nona-
tomic types: lists and unions.
LISTS
List types are sequences of atomic types separated by whitespace; you can
have a list of integers or a list of dates. Lists should not be confused with
enumerations. Enumerations provide optional values for an element. Lists
represent a single value within an element.
Code Sample: SimpleTypes/Demos/EmployeeList.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Salary">
<xs:restriction base="xs:decimal">
<xs:minInclusive value="10000"/>
<xs:maxInclusive value="90000"/>
<xs:fractionDigits value="2"/>
<xs:totalDigits value="7"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="JobTitle">
<xs:restriction base="xs:string">
UNIONS
Union types are groupings of types, essentially allowing for the value of an ele-
ment to be of more than one type. In the example below, two atomic simple
types are derived: RunningRace and Gymnastics. A third simple type, Event,
is then derived as a union of the previous two. The Event element is of the
Event type, which means that it can either be of the RunningRace or the
Gymnastics type.
In this example, the FirstName and LastName elements are both declared
globally. The global elements are then referenced as children of the Author
sequence.
<xs:restriction base="xs:string">
<xs:enumeration value="Mr."/>
<xs:enumeration value="Ms."/>
<xs:enumeration value="Dr."/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="PersonTitle"/>
<xs:element name="Name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Notice that there are two elements named Title, which can appear in
different locations in the XML instance and are of different types. When the
Title element appears at the root of the XML instance, its value can be any
string; when it appears as a child of Author, its value is limited to “Mr.,” “Ms.,”
or “Dr.”
The example below defines a similar content model; however, because
the elements are declared globally, the name Title cannot be used twice.
Code Sample: SimpleTypes/Demos/BookGlobal.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="PersonTitle">
<xs:restriction base="xs:string">
<xs:enumeration value="Mr."/>
<xs:enumeration value="Ms."/>
<xs:enumeration value="Dr."/>
</xs:restriction>
</xs:simpleType>
<xs:element name="BookTitle" type="xs:string"/>
<xs:element name="Title" type="PersonTitle"/>
<xs:element name="Name" type="xs:string"/>
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element ref="BookTitle"/>
<xs:element name="Author">
<xs:complexType>
<xs:sequence>
<xs:element ref="Title"/>
<xs:element ref="Name"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
DEFAULT VALUES
Elements that do not have any children can have default values. To specify a
default value, use the default attribute of the xs:element element.
Code Sample: SimpleTypes/Demos/EmployeeDefault.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
---- Code Omitted ----
<xs:element name="Employee">
<xs:complexType>
<xs:sequence>
<xs:element name="Salary" type="Salary"/>
<xs:element name="Title" type="JobTitle" default="Salesperson"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
When defaults are set in the XML schema, the following rules apply for
the instance document.
●● If the element appears in the document with content, the default value
is ignored.
●● If the element appears without content, the default value is applied.
●● If the element does not appear, the element is left out. In other words,
providing a default value does not imply that the element should be
inserted if the XML instance author leaves it out.
Examine the following XML instance. The Title element cannot be
empty; it requires one of the values from the enumeration defined in the
JobTitle simple type. However, in accordance with the second item from the
list, the schema processor applies the default value of Salesperson to the Title
element, so the instance validates successfully.
Code Sample: SimpleTypes/Demos/MikeBanzal.xml
<?xml version="1.0"?>
<Employee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="EmployeeDefault.xsd">
<Salary>90000</Salary>
<Title/>
</Employee>
FIXED VALUES
Element values can be fixed, meaning that if they appear in the instance docu-
ment, they must contain a specified value. Fixed elements are often used for
boolean switches.
Code Sample: SimpleTypes/Demos/EmployeeFixed.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Salary">
<xs:restriction base="xs:decimal">
<xs:minInclusive value="10000"/>
<xs:maxInclusive value="90000"/>
<xs:fractionDigits value="2"/>
<xs:totalDigits value="7"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="JobTitle">
<xs:restriction base="xs:string">
<xs:enumeration value="Sales Manager"/>
<xs:enumeration value="Salesperson"/>
<xs:enumeration value="Receptionist"/>
<xs:enumeration value="Developer"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Employee">
<xs:complexType>
<xs:sequence>
<xs:element name="Salary" type="Salary"/>
<xs:element name="Title" type="JobTitle"/>
<xs:element name="Status" type="xs:string" fixed="current"
minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
NIL VALUES
When an optional element is left out of an XML instance, it has no clear
meaning. For example, suppose a schema declares a Name element as having
required FirstName and LastName elements and an optional MiddleName
element. And suppose a particular instance of this schema does not include
the MiddleName element. Does this mean that the instance author did not
know the middle name of the person in question or does it mean the person
in question has no middle name?
Setting the nillable attribute of xs:element to true indicates that such ele-
ments can be set to nil by setting the xsi:nil attribute to true.
Code Sample: SimpleTypes/Demos/AuthorNillable.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Author">
<xs:complexType>
<xs:sequence>
<xs:element name="FirstName" type="xs:string"/>
<xs:element name="MiddleName" type="xs:string" nillable="true"/>
<xs:element name="LastName" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
COMPLEX-TYPE ELEMENTS
Complex-type elements have attributes, child elements, or some combination
of the two. For example, the Name and HomePage elements below are both
complex-type elements.
Code Sample: ComplexTypes/Demos/ComplexType.xml
<?xml version="1.0"?>
<Person>
<Name>
<FirstName>Mark</FirstName>
<LastName>Twain</LastName>
</Name>
<HomePage URL="http://www.marktwain.com"/>
</Person>
Syntax
<xs:element name="ElementName">
<xs:complexType>
<!--Content Model Goes Here-->
</xs:complexType>
</xs:element>
CONTENT MODELS
Content models are used to indicate the structure and order in which child
elements can appear within their parent element. Content models are made
up of model groups. The three types of model groups are listed below.
●● xs:sequence—the elements must appear in the order specified
●● xs:all—the elements must appear, but order is not important
●● xs:choice—only one of the elements can appear.
●● xs:sequence
The following sample shows the syntax for declaring a complex-type ele-
ment as a sequence, meaning that the elements must show up in the order
they are declared.
Syntax
<xs:element name="ElementName">
<xs:complexType>
<xs:sequence>
<xs:element name="Child1" type="xs:string"/>
<xs:element name="Child2" type="xs:string"/>
<xs:element name="Child3" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
xs:all
The following sample shows the syntax for declaring a complex-type ele-
ment as a conjunction, meaning that the elements can show up in any order.
Syntax
<xs:element name="ElementName">
<xs:complexType>
<xs:all>
<xs:element name="Child1" type="xs:string"/>
<xs:element name="Child2" type="xs:string"/>
<xs:element name="Child3" type="xs:string"/>
</xs:all>
</xs:complexType>
</xs:element>
xs:choice
The following sample shows the syntax for declaring a complex-type ele-
ment as a choice, meaning that only one of the child elements may show up.
Syntax
<xs:element name="ElementName">
<xs:complexType>
<xs:choice>
<xs:element name="Child1" type="xs:string"/>
<xs:element name="Child2" type="xs:string"/>
<xs:element name="Child3" type="xs:string"/>
</xs:choice>
</xs:complexType>
</xs:element>
Syntax
<xs:element name="ElementName">
<xs:complexType>
<xs:choice>
<xs:element name="Child1" type="xs:string"/>
<xs:element name="Child2">
<xs:complexType>
<xs:sequence>
<xs:element name="GC1" type="xs:string"/>
<xs:element name="GC2" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Child3" type="xs:string"/>
</xs:choice>
</xs:complexType>
</xs:element>
Furthermore, model groups can be nested within each other. The following
example illustrates this. Notice that the choice model group, which allows for
either a Salary element or a Wage element, is nested within a sequence model
group. Both of the subsequent instances are valid according to this schema.
Code Sample: ComplexTypes/Demos/Employee.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Salary">
<xs:restriction base="xs:decimal">
<xs:minInclusive value="10000"/>
<xs:maxInclusive value="90000"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Employee">
<xs:complexType>
<xs:sequence>
<xs:element name="Name">
<xs:complexType>
<xs:sequence>
<xs:element name="FirstName"/>
<xs:element name="LastName"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:choice>
<xs:element name="Salary" type="Salary"/>
<xs:element name="Wage" type="xs:decimal"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
OCCURRENCE CONSTRAINTS
By default, elements that are declared locally must show up once, and only
once, within their parent. This constraint can be changed using the minOc-
curs and maxOccurs attributes. The default value of each of these attributes
is 1. The value of minOccurs can be any non-negative integer. The value of
maxOccurs can be any positive integer or unbounded, meaning that the ele-
ment can appear an infinite number of times.
The example below shows how minOccurs can be used to make an ele-
ment optional and how maxOccurs can be used to allow an element to be
repeated indefinitely.
Code Sample: ComplexTypes/Demos/Employee2.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="Salary">
<xs:restriction base="xs:decimal">
<xs:minInclusive value="10000"/>
<xs:maxInclusive value="90000"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Employee">
<xs:complexType>
<xs:sequence>
<xs:element name="Name">
<xs:complexType>
<xs:sequence>
<xs:element name="FirstName"/>
<xs:element name="MiddleName" minOccurs="0"/>
<xs:element name="LastName"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:choice>
<xs:element name="Salary" type="Salary"/>
<xs:element name="Wage" type="xs:decimal"/>
</xs:choice>
<xs:element name="Responsibilities">
<xs:complexType>
<xs:sequence>
<
xs:element name="Responsibility" type="xs:string"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Note that minOccurs and maxOccurs can also be applied to model groups
(e.g., xs:sequence) to control the number of times a model group can be
repeated.
<xs:element ref="LastName"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="FirstName"/>
<xs:element name="MiddleName"/>
<xs:element name="LastName"/>
<xs:element name="Wage" type="xs:decimal"/>
<xs:element name="Salary" type="Salary"/>
<xs:element name="Responsibilities">
<xs:complexType>
<xs:sequence>
<xs:element ref="Responsibility" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Responsibility" type="xs:string"/>
<xs:element name="Employee">
<xs:complexType>
<xs:sequence>
<xs:element ref="Name"/>
<xs:choice>
<xs:element ref="Salary"/>
<xs:element ref="Wage"/>
</xs:choice>
<xs:element ref="Responsibilities" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
MIXED CONTENT
Sometimes an element will contain both child elements and character text.
For example, a para element might contain mostly plain character text, but it
could also have other elements (e.g., emphasis) littered throughout the char-
acter text. As an example, let’s examine the following XML instance document.
Notice that the Bio element contains child elements Company and JobTi-
tle as well as character text. Such elements are said to contain mixed content.
The syntax for declaring elements with mixed content is shown below.
Syntax
<xs:element name="ElementName">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="Child1" type="xs:string"/>
<xs:element name="Child2" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Bio">
<xs:complexType mixed="true">
<xs:sequence maxOccurs="unbounded">
<xs:element name="Company" type="xs:string"/>
<xs:element name="JobTitle" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
---- Code Omitted ----
</xs:schema>
As you can see, complex types are defined with the xs:complexType ele-
ment. The major advantage of defining a complex type globally is that it can
be reused. For example, a schema might allow for an Illustrator element as
well as an Author element. Both elements could be of type Person. This way,
if the Person type is changed later, the change will apply to both elements.
The instance document below will validate properly against the schema
above.
Code Sample: ComplexTypes/Demos/MarkTwain.xml
<?xml version="1.0"?>
<Author xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Author.xsd">
<FirstName>Mark</FirstName>
<LastName>Twain</LastName>
</Author>
Attributes
While attributes themselves must be of the simple type, only complex-type
elements can contain attributes.
EMPTY ELEMENTS
An empty element is an element that contains no content, but it may have
attributes. The HomePage element in the instance document below is an
empty element. Below the instance is the snippet from the Author.xsd schema
that declares the HomePage element.
Code Sample: Attributes/Demos/MarkTwain.xml
<?xml version="1.0"?>
<Author xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Author.xsd">
<Name>
<FirstName>Mark</FirstName>
<LastName>Twain</LastName>
</Name>
<HomePage URL="http://www.marktwain.com"/>
</Author>
the xs:simpleContent element and then extending the element with the
xs:extension element, which must specify the type of simple content con-
tained with the base attribute. The syntax is shown below.
Syntax
<xs:element name="ElementName">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="AttName" type="xs:string"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
For example, the FirstName element in the next XML instance contains
only simple content and has a single attribute. Below the instance is the snip-
pet from the Author3.xsd schema that declares the FirstName element.
Code Sample: Attributes/Demos/NatHawthorne.xml
<?xml version="1.0"?>
<Author xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Author3.xsd">
<Name Pseudonym="true" HomePage="http://www.nathanielhawthorne.com">
<FirstName Full="false">Nat</FirstName>
<LastName>Hawthorne</LastName>
</Name>
</Author>
</xs:element>
---- Code Omitted ----
</xs:schema>
This third example shows how to declare an attribute with a derived type
globally. You may test Attributes/Demos/LifeOnTheMississippi.xml against
this schema.
Code Sample: Attributes/Demos/Book3.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:attribute name="Title">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Mr."/>
<xs:enumeration value="Ms."/>
<xs:enumeration value="Dr."/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author">
<xs:complexType>
<xs:sequence>
<xs:element name="Name" type="xs:string"/>
</xs:sequence>
<xs:attribute ref="Title"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
<xs:attribute name="Full" type="xs:boolean" default="true"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
---- Code Omitted ----
</xs:schema>
FIXED VALUES
Attribute values can be fixed, meaning that, if they appear in the instance
document, they must contain a specified value. Like with simple-type ele-
ments, this is done with the fixed attribute. You may test Attributes/Demos/
NatHawthorne3.xml against this schema.
Code Sample: Attributes/Demos/Author5.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
---- Code Omitted ----
<xs:element name="Name">
<xs:complexType>
<xs:sequence>
<xs:element name="FirstName">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="Full" type="xs:boolean" default="true"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="LastName" type="xs:string"/>
</xs:sequence>
<xs:attribute name="Pseudonym" type="xs:boolean" fixed="true"/>
<xs:attribute name="HomePage" type="xs:anyURI"/>
</xs:complexType>
</xs:element>
</xs:sequence>
---- Code ----
</xs:schema>
REQUIRING ATTRIBUTES
By default, attributes are optional, but they can be required by setting the use
attribute of xs:attribute to required as shown in the next code snippet.
Code Sample: Attributes/Demos/Author6.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
---- Code Omitted ----
<xs:element name="Name">
<xs:complexType>
<xs:sequence>
<xs:element name="FirstName">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="Full" type="xs:boolean" default="true"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element name="LastName" type="xs:string"/>
</xs:sequence>
<xs:attribute name="Pseudonym" type="xs:boolean" fixed="true"/>
<xs:attribute name="HomePage" type="xs:anyURI" use="required"/>
</xs:complexType>
</xs:element>
---- Code Omitted ----
</xs:schema>
GROUPS
Element and attribute groups can be used to create a set structure for reuse.
To illustrate the benefit of groups, let’s first look at a simple XML instance and
its (rather long) schema that does not use groups.
Code Sample: ReusingComponents/Demos/WinnieThePooh.xml
<?xml version="1.0"?>
<Book xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Book.xsd">
<Title>Winnie the Pooh</Title>
<Author Title="Mr." BirthYear="1882">
<FirstName>A.</FirstName>
<MiddleName>A.</MiddleName>
<LastName>Milne</LastName>
<Specialty>Childrens</Specialty>
</Author>
<Illustrator Title="Mr." BirthYear="1879">
<FirstName>Ernest</FirstName>
<MiddleName>H.</MiddleName>
<LastName>Shepard</LastName>
</Illustrator>
</Book>
<xs:enumeration value="Humor"/>
<xs:enumeration value="Horror"/>
<xs:enumeration value="Childrens"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attribute name="Title">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Mr."/>
<xs:enumeration value="Ms."/>
<xs:enumeration value="Dr."/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="BirthYear" type="xs:gYear"/>
</xs:complexType>
</xs:element>
<xs:element name="Illustrator" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="FirstName" type="xs:string"/>
<xs:element name="MiddleName" type="xs:string" minOccurs="0"/>
<xs:element name="LastName" type="xs:string"/>
</xs:sequence>
<xs:attribute name="Title">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Mr."/>
<xs:enumeration value="Ms."/>
<xs:enumeration value="Dr."/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="BirthYear" type="xs:gYear"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The Author element and the Illustrator element have some elements and
attributes in common. Let’s see how we can make this code more modular.
Element Groups
First, we’ll look at how we can group the FirstName, MiddleName, and
LastName elements with xs:group to avoid rewriting the elements.
Code Sample: ReusingComponents/Demos/Book2.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="GroupName">
<xs:sequence>
<xs:element name="FirstName" type="xs:string"/>
<xs:element name="MiddleName" type="xs:string" minOccurs="0"/>
<xs:element name="LastName" type="xs:string"/>
</xs:sequence>
</xs:group>
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author">
<xs:complexType>
<xs:sequence>
<xs:group ref="GroupName"/>
---- Code ----
</xs:complexType>
</xs:element>
<xs:element name="Illustrator" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:group ref="GroupName"/>
</xs:sequence>
---- Code ----
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Attribute Groups
Now let’s look at how we can use the xs:attributeGroup element to avoiding
rewriting those attributes.
Code Sample: ReusingComponents/Demos/Book3.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="GroupName">
<xs:sequence>
<xs:element name="FirstName" type="xs:string"/>
<xs:element name="MiddleName" type="xs:string" minOccurs="0"/>
<xs:element name="LastName" type="xs:string"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="AttGroupPerson">
<xs:attribute name="Title">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Mr."/>
<xs:enumeration value="Ms."/>
<xs:enumeration value="Dr."/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="BirthYear" type="xs:gYear"/>
</xs:attributeGroup>
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author">
<xs:complexType>
<xs:sequence>
<xs:group ref="GroupName"/>
<xs:element name="Specialty">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Mystery"/>
<xs:enumeration value="Humor"/>
<xs:enumeration value="Horror"/>
<xs:enumeration value="Childrens"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attributeGroup ref="AttGroupPerson"/>
</xs:complexType>
</xs:element>
<xs:element name="Illustrator" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:group ref="GroupName"/>
</xs:sequence>
<xs:attributeGroup ref="AttGroupPerson"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
ABSTRACT TYPES
When a type is made abstract, it cannot be used directly in an XML instance.
One of its derived types must be used instead. The derived type is identi-
fied in the instance document using the xsi:type attribute. The schema below
includes an abstract type with two derivations.
Code Sample: ReusingComponents/Demos/Animals.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="Measurement">
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="units" type="xs:string"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:element name="Weight" type="Measurement"/>
<xs:element name="Name" type="xs:string"/>
<!--Abstract Type-->
<xs:complexType name="Animal" abstract="true">
<xs:sequence>
<xs:element ref="Name"/>
<xs:element ref="Weight"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Dog">
<xs:complexContent>
<xs:extension base="Animal"/>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="Bird">
<xs:complexContent>
<xs:extension base="Animal">
<xs:sequence>
<xs:element name="WingSpan" type="Measurement"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:element name="Animals">
<xs:complexType>
<xs:sequence>
<xs:element name="Animal" type="Animal" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Code Explanation
The Animal type is declared as abstract by setting the abstract attribute to
true. It is extended by the Dog and Bird types. The Dog type doesn’t actually
modify the original type at all, but the Bird type addes a WingSpan element.
Note that the Animal element declared within the Animals element is of
the abstract type Animal.
Let’s now look at an instance document of this schema:
Code Sample: ReusingComponents/Demos/Animals.xml
<?xml version="1.0" encoding="UTF-8"?>
<Animals xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Animals.xsd">
<Animal xsi:type="Dog">
<Name>Rover</Name>
<Weight units="pounds">80</Weight>
</Animal>
<Animal xsi:type="Bird">
<Name>Tweetie</Name>
<Weight units="grams">15</Weight>
<WingSpan units="cm">20</WingSpan>
</Animal>
</Animals>
<xs:unique name="ArtistKey">
<xs:selector xpath="Artists/Artist"/>
<xs:field xpath="@aID"/>
</xs:unique>
</xs:element>
</xs:schema>
The Artist element has an aID attribute, which we would like to be able
to use to uniquely identify the artist. The XML schema xs:unique element is
used to enforce this. It takes two children:
●● xs:selector—takes an xpath attribute, which holds an XPath 1.0 expression
referencing the elements affected by this constraint.
●● xs:field—takes an xpath attribute, which holds an XPath 1.0 expression
specifying the part of the selected elements that must be unique.
In the example above, the selector XPath identifies all Artist elements
that are children of an Artists element. The field XPath identifies the aID
attribute as the part of the Artist element that must be unique.
In the XML instance below, each Artist must have a unique aID attribute.
Try making them the same and validating.
Code Sample: SchemaKeys/Demos/Unique.xml
<?xml version="1.0"?>
<Song xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Unique.xsd">
<Title Type="duet">The Girl Is Mine</Title>
<Year>1983</Year>
<Length>Medium</Length>
<Artists>
<Artist aID="MJ">Michael Jackson</Artist>
<Artist aID="PM">Paul McCartney</Artist>
</Artists>
---- Code Omitted ----
</Song>
KEYS
The XML schema also provides a mechanism for keys and key references—
that is, for creating a relationship between elements through the value of an
attribute or contained element. The xs:key and xs:keyref elements are used to
create such a relationship.
<xs:field xpath="@Artist"/>
</xs:keyref>
</xs:element>
</xs:schema>
Like the xs:unique element, the xs:key and xs:keyref elements each con-
tain xs:selector and xs:field child elements.
The xs:key element is used to identify the elements being referenced by
the elements specified by the xs:keyref element.
In the example above, the Artist attribute of the Stanza element must
point to an Artist element’s aID attribute, which must be unique.
In the following XML instance, each Artist must have a unique aID attri-
bute and each Stanza element must have an Artist attribute with the same
value as one of the Artist’s aID attributes. Try changing the value of a Stanza’s
Artist attribute to something arbitrary and validating.
Code Sample: SchemaKeys/Demos/Keys.xml
<?xml version="1.0"?>
<Song xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="Keys.xsd">
<Title Type="duet">The Girl Is Mine</Title>
<Year>1983</Year>
<Length>Medium</Length>
<Artists>
<Artist aID="MJ">Michael Jackson</Artist>
<Artist aID="PM">Paul McCartney</Artist>
</Artists>
<Lyrics>
<Stanza Artist="MJ">
<Line>Every night she walks right in my dreams</Line>
<Line>Every night she walks right in my dreams</Line>
---- Code Omitted ----
</Stanza>
<Stanza Artist="PM">
<Line>I don't understand the way you think</Line>
<Line>Saying that she's yours not mine</Line>
---- Code Omitted ----
</Stanza>
<Stanza Artist="MJ">
<Line>I know she'll tell you I'm the one for her</Line>
ANNOTATING A SCHEMA
The xs:annotation element is used to document a schema. It can take two ele-
ments, xs:documentation and xs:appInfo, which are used to provide human-
readable and machine-readable notes, respectively.
The xs:annotation element can go at the beginning of most schema
constructions, including xs:schema, xs:element, xs:attribute, xs:simpleType,
xs:complexType, xs:group, and xs:attributeGroup.
Both the xs:documentation and xs:appInfo elements can contain any con-
tent, including undeclared elements and attributes. This allows the schema
author to insert elements (e.g., HTML elements) to structure or format the
documentation.
Code Sample: AnnotatingXMLSchemas/Demos/Book.xsd
<xs:attributeGroup name="AttGroupPerson">
<xs:annotation>
<xs:documentation>
This attribute group can be used with any element that represents a per-
son. It provides for Title (?) and BirthYear (?).
</xs:documentation>
</xs:annotation>
<xs:attribute name="Title">
<xs:annotation>
<xs:documentation>
This optional attribute provides the title of the person in question. There
is no default value.
</xs:documentation>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="Mr."/>
<xs:enumeration value="Ms."/>
<xs:enumeration value="Dr."/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="BirthYear" type="xs:gYear"/>
</xs:attributeGroup>
<xs:element name="Book">
<xs:annotation>
<xs:documentation>
<xs:enumeration value="Mystery"/>
<xs:enumeration value="Humor"/>
<xs:enumeration value="Horror"/>
<xs:enumeration value="Childrens"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attributeGroup ref="AttGroupPerson"/>
</xs:complexType>
</xs:element>
<xs:element name="Illustrator" minOccurs="0">
<xs:annotation>
<xs:documentation>
</xs:sequence>
</xs:complexType>
</xs:element>
I f you use the method described above, only the “employee” element can
use the specified complex type. Note that the child elements, “firstname”
and “lastname,” are surrounded by the <sequence> indicator. This means
that the child elements must appear in the same order as they are declared.
You will learn more about indicators in the XSD Indicators chapter.
2. The “employee” element can have a type attribute that refers to the name
of the complex type to use:
<xs:element name="employee" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
I f you use the method described above, several elements can refer to the
same complex type, like this:
<xs:element name="employee" type="personinfo"/>
<xs:element name="student" type="personinfo"/>
<xs:element name="member" type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="fullpersoninfo">
<xs:complexContent>
<xs:extension base="personinfo">
<xs:sequence>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
The “product” element above has no content at all. To define a type with
no content, we must define a type that allows elements in its content, but we
do not actually declare any elements, like this:
<xs:element name="product">
<xs:complexType>
<xs:complexContent>
<xs:restriction base="xs:integer">
<xs:attribute name="prodid" type="xs:positiveInteger"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:element>
Or you can give the complexType element a name, and let the “product”
element have a type attribute that refers to the name of the complexType (if
you use this method, several elements can refer to the same complex type):
<xs:element name="product" type="prodtype"/>
<xs:complexType name="prodtype">
<xs:attribute name="prodid" type="xs:positiveInteger"/>
</xs:complexType>
Notice the <xs:sequence> tag. It means that the elements defined (“first-
name” and “lastname”) must appear in that order inside a “person” element.
Or you can give the complexType element a name, and let the “person”
element have a type attribute that refers to the name of the complexType (if
you use this method, several elements can refer to the same complex type):
<xs:element name="person" type="persontype"/>
<xs:complexType name="persontype">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
OR
<xs:element name="somename">
<xs:complexType>
<xs:simpleContent>
<xs:restriction base="basetype">
....
....
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
</xs:element>
We could also give the complexType element a name, and let the “shoe-
size” element have a type attribute that refers to the name of the complexType
(if you use this method, several elements can refer to the same complex type):
<xs:element name="shoesize" type="shoetype"/>
<xs:complexType name="shoetype">
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="country" type="xs:string" />
</xs:extension>
</xs:simpleContent>
</xs:complexType>
that the elements defined (name, ordered, and shipdate) must appear in that
order inside a “letter” element.
We could also give the complexType element a name, and let the “letter”
element have a type attribute that refers to the name of the complexType (if
you use this method, several elements can refer to the same complex type):
<xs:element name="letter" type="lettertype"/>
<xs:complexType name="lettertype" mixed="true">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="orderid" type="xs:positiveInteger"/>
<xs:element name="shipdate" type="xs:date"/>
</xs:sequence>
</xs:complexType>
XSD INDICATORS
We can control HOW elements are to be used in documents with indicators.
There are seven indicators.
Order indicators:
●● All
●● Choice
●● Sequence
Occurrence indicators:
●● maxOccurs
●● minOccurs
Group indicators:
●● Group name
●● attributeGroup name
Order indicators
●● Order indicators are used to define the order of the elements.
All Indicator
●● The <all> indicator specifies that the child elements can appear in any
order, and that each child element must occur only once:
<xs:element name="person">
<xs:complexType>
<xs:all>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:all>
</xs:complexType>
</xs:element>
When using the <all> indicator you can set the <minOccurs> indicator to
0 or 1 and the <maxOccurs> indicator can only be set to 1 (the <minOccurs>
and <maxOccurs> are described later).
Choice Indicator
The <choice> indicator specifies that either one child element or another can
occur:
<xs:element name="person">
<xs:complexType>
<xs:choice>
<xs:element name="employee" type="employee"/>
<xs:element name="member" type="member"/>
</xs:choice>
</xs:complexType>
</xs:element>
Sequence Indicator
The <sequence> indicator specifies that the child elements must appear in a
specific order:
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Occurrence Indicators
Occurrence indicators are used to define how often an element can occur.
For all “Order” and “Group” indicators (any, all, choice, sequence, group
name, and group reference) the default value for maxOccurs and minOccurs is 1.
maxOccurs Indicator
The <maxOccurs> indicator specifies the maximum number of times an ele-
ment can occur:
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="full_name" type="xs:string"/>
<xs:element name="child_name" type="xs:string" maxOccurs="10"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The example above indicates that the “child_name” element can occur a
minimum of one time (the default value for minOccurs is 1) and a maximum
of ten times in the “person” element.
minOccurs Indicator
The <minOccurs> indicator specifies the minimum number of times an ele-
ment can occur:
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="full_name" type="xs:string"/>
<xs:element name="child_name" type="xs:string"
maxOccurs="10" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The example above indicates that the “child_name” element can occur a
minimum of zero times and a maximum of ten times in the “person” element.
To allow an element to appear an unlimited number of times, use the
maxOccurs=“unbounded” statement:
A working example:
An XML file called “Myfamily.xml” is as follows:
The XML file above contains a root element named “persons.” Inside
this root element, we have defined three “person” elements. Each “person”
element must contain a “full_name” element and it can contain up to five
“child_name” elements.
Here is the schema file “family.xsd.”
<?xml version="1.0" encoding="ISO-8859-1"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="persons">
<xs:complexType>
<xs:sequence>
<xs:element name="person" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="full_name" type="xs:string"/>
<xs:element name="child_name" type="xs:string"
minOccurs="0" maxOccurs="5"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Group Indicators
Group indicators are used to define related sets of elements.
Element Groups
Element groups are defined with the group declaration, like this:
<xs:group name="groupname">
...
</xs:group>
You must define an all, choice, or sequence element inside the group dec-
laration. The following example defines a group named “persongroup,” that
defines a group of elements that must occur in an exact sequence:
<xs:group name="persongroup">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="birthday" type="xs:date"/>
</xs:sequence>
</xs:group>
After you have defined a group, you can reference it in another definition,
like this:
<xs:group name="persongroup">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="birthday" type="xs:date"/>
</xs:sequence>
</xs:group>
<xs:element name="person type="personinfo"/>
<xs:complexType name="personinfo">
<xs:sequence>
<xs:group ref="persongroup"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
Attribute Groups
Attribute groups are defined with the attributeGroup declaration, like this:
<xs:attributeGroup name="groupname">
...
</xs:attributeGroup>
After you have defined an attribute group, you can reference it in another
definition, like this:
<xs:attributeGroup name="personattrgroup">
<xs:attribute name="firstname" type="xs:string"/>
<xs:attribute name="lastname" type="xs:string"/>
<xs:attribute name="birthday" type="xs:date"/>
</xs:attributeGroup>
<xs:element name="person">
<xs:complexType>
<xs:attributeGroup ref="personattrgroup"/>
</xs:complexType>
</xs:element>
XSD The <any> Element
The <any> element enables us to extend the XML document with ele-
ments not specified by the schema.
The <any> Element
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:any minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The XML file below (called “Myfamily.xml”) uses components from two
different schemas, “family.xsd” and “children.xsd:”
<?xml version="1.0" encoding="ISO-8859-1"?>
<persons xmlns="http://www.microsoft.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:SchemaLocation="http://www.microsoft.com family.xsd
http://www.abc.com children.xsd">
<person>
<firstname>Hege</firstname>
<lastname>Refsnes</lastname>
<children>
<childname>Cecilie</childname>
</children>
</person>
<person>
<firstname>Stale</firstname>
<lastname>Refsnes</lastname>
</person>
</persons>
The XML file above is valid because the schema “family.xsd” allows
us to extend the “person” element with an optional element after the “last
name”element.
The <any> and <anyAttribute> elements are used to make EXTENSI-
BLE documents. They allow documents to contain additional elements that
are not declared in the main XML schema.
XSD The <anyAttribute> Element
elementFormDefault="qualified">
<xs:attribute name="gender">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="male|female"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:schema>
The XML file below (called “Myfamily.xml”) uses components from two
different schemas, “family.xsd” and “attribute.xsd:”
<?xml version="1.0" encoding="ISO-8859-1"?>
<persons xmlns="http://www.microsoft.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:SchemaLocation="http://www.microsoft.com family.xsd
http://www.abc.com attribute.xsd">
<person gender="female">
<firstname>Hege</firstname>
<lastname>Refsnes</lastname>
</person>
<person gender="male">
<firstname>Stale</firstname>
<lastname>Refsnes</lastname>
</person>
</persons>
The XML file above is valid because the schema “family.xsd” allows us to
add an attribute to the “person” element.
Element Substitution
Let’s say that we have users from two different countries: England and Norway.
We would like the ability to let the user choose whether he or she would like
to use the Norwegian element names or the English element names in the
XML document.
In the example above, the “name” element is the head element and the
“navn” element is substitutable for “name.” Look at this fragment of an XML
schema:
<xs:element name="name" type="xs:string"/>
<xs:element name="navn" substitutionGroup="name"/>
<xs:complexType name="custinfo">
<xs:sequence>
<xs:element ref="name"/>
</xs:sequence>
</xs:complexType>
<xs:element name="customer" type="custinfo"/>
<xs:element name="kunde" substitutionGroup="customer"/>
A valid XML document (according to the schema above) could look like this:
<customer>
<name>Ram Mandal</name>
</customer>
or like this:
<kunde>
<navn>Ram Mandal</navn>
</kunde>
<xs:sequence>
<xs:element ref="name"/>
</xs:sequence>
</xs:complexType>
<xs:element name="customer" type="custinfo" block="substitution"/>
<xs:element name="kunde" substitutionGroup="customer"/>
A valid XML document (according to the schema above) looks like this:
<customer>
<name>Ram Mandal</name>
</customer>
Using SubstitutionGroup
The type of the substitutable elements must be the same as, or derived from,
the type of the head element. If the type of the substitutable element is the
same as the type of the head element, you will not have to specify the type of
the substitutable element.
Note that all elements in the substitutionGroup (the head element and
the substitutable elements) must be declared as global elements, otherwise it
will not work.
Global Elements
Global elements are elements that are immediate children of the “schema”
element. Local elements are elements nested within other elements.
An XML Document
Let’s have a look at this XML document called “shiporder.xml:”
<?xml version="1.0" encoding="ISO-8859-1"?>
<shiporder orderid="889923"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="shiporder.xsd">
<orderperson>Ram Mandal</orderperson>
<shipto>
<name>Kshitij Banzal</name>
<address>20, I.G. Nagar</address>
<city>Indore</city>
<country>India</country>
</shipto>
<item>
<title>World wide web</title>
<note>Special Edition</note>
<quantity>1</quantity>
<price>120.00</price>
</item>
<item>
<title>Summer special</title>
<quantity>1</quantity>
<price>239.00</price>
</item>
</shiporder>
In the schema above, we use the standard namespace (xs), and the URI
associated with this namespace is the schema language definition, which has
the standard value of http://www.w3.org/2001/XMLSchema.
Next, we have to define the “shiporder” element. This element has an
attribute and it contains other elements, therefore we consider it as a com-
plex type. The child elements of the “shiporder” element are surrounded by a
xs:sequence element that defines an ordered sequence of sub elements:
<xs:element name="shiporder">
<xs:complexType>
<xs:sequence>
...
</xs:sequence>
</xs:complexType>
</xs:element>
Next, we have to define two elements that are of the complex type:
“shipto” and “item.” We start by defining the “shipto” element:
<xs:element name="shipto">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
We can now declare the attribute of the “shiporder” element. Since this
is a required attribute, we specify use=“required.” The attribute declarations
must always come last:
<xs:attribute name="orderid" type="xs:string" use="required"/>
elements, and then point to them through the type attribute of the element.
Here is the third design of the schema file (“shiporder.xsd”):
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="stringtype">
<xs:restriction base="xs:string"/>
</xs:simpleType>
<xs:simpleType name="inttype">
<xs:restriction base="xs:positiveInteger"/>
</xs:simpleType>
<xs:simpleType name="dectype">
<xs:restriction base="xs:decimal"/>
</xs:simpleType>
<xs:simpleType name="orderidtype">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{6}"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="shiptotype">
<xs:sequence>
<xs:element name="name" type="stringtype"/>
<xs:element name="address" type="stringtype"/>
<xs:element name="city" type="stringtype"/>
<xs:element name="country" type="stringtype"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="itemtype">
<xs:sequence>
<xs:element name="title" type="stringtype"/>
<xs:element name="note" type="stringtype" minOccurs="0"/>
<xs:element name="quantity" type="inttype"/>
<xs:element name="price" type="dectype"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="shipordertype">
<xs:sequence>
<xs:element name="orderperson" type="stringtype"/>
<xs:element name="shipto" type="shiptotype"/>
The restriction element indicates that the datatype is derived from a W3C
XML schema namespace datatype. So, the following fragment means that the
value of the element or attribute must be a string value:
<xs:restriction base="xs:string">
This indicates that the value of the element or attribute must be a string,
it must be exactly six characters in a row, and those characters must be a num-
ber from 0 to 9.
The XML processor will not modify the value if you use the string
data type.
In the example above, the XML processor will replace the tabs with
spaces.
In the example above, the XML processor will remove the tabs.
Name Description
ENTITIES
ENTITY
(continued)
(continued)
Name Description
ID A string that represents the ID attribute in XML (only
used with schema attributes)
IDREF A string that represents the IDREF attribute in XML
(only used with schema attributes)
IDREFS
language A string that contains a valid language id
Name A string that contains a valid XML name
NCName
NMTOKEN A string that represents the NMTOKEN attribute in
XML (only used with schema attributes)
NMTOKENS
normalizedString A string that does not contain line feeds, carriage
returns, or tabs
QName
string A string
token A string that does not contain line feeds, carriage
returns, tabs, leading or trailing spaces, or multiple
spaces
Time Zones
To specify a time zone, you can either enter a date in UTC time by adding a
“Z” behind the date—like this:
<start>2002-09-24Z</start>
or you can specify an offset from the UTC time by adding a positive or nega-
tive time behind the date—like this:
<start>2002-09-24-06:00</start>
or
<start>2002-09-24+06:00</start>
Time Zones
To specify a time zone, you can either enter a time in UTC time by adding a
“Z” behind the time—like this:
<start>09:30:10Z</start>
or you can specify an offset from the UTC time by adding a positive or nega-
tive time behind the time—like this:
<start>09:30:10-06:00</start>
or
<start>09:30:10+06:00</start>
Time Zones
To specify a time zone, you can either enter a dateTime in UTC time by add-
ing a “Z” behind the time—like this:
<startdate>2002-05-30T09:30:10Z</startdate>
or you can specify an offset from the UTC time by adding a positive or nega-
tive time behind the time—like this:
<startdate>2002-05-30T09:30:10-06:00</startdate>
or
<startdate>2002-05-30T09:30:10+06:00</startdate>
The example above indicates a period of five years, two months, and
10 days.
Or it might look like this:
<period>P5Y2M10DT15H</period>
The example above indicates a period of five years, two months, 10 days,
and 15 hours.
Or it might look like this:
<period>PT15H</period>
Negative Duration
To specify a negative duration, enter a minus sign before the P:
<period>-P10D</period>
Name Description
byte A signed 8-bit integer
decimal A decimal value
int A signed 32-bit integer
integer An integer value
long A signed 64-bit integer
negativeInteger An integer containing only negative values (..,−2,−1)
nonNegativeInteger An integer containing only non-negative values (0,1,2,..)
nonPositiveInteger An integer containing only non-positive values (..,−2,−1,0)
positiveInteger An integer containing only positive values (1,2,..)
short A signed 16-bit integer
unsignedLong An unsigned 64-bit integer
unsignedInt An unsigned 32-bit integer
unsignedShort An unsigned 16-bit integer
unsignedByte An unsigned 8-bit integer
Legal values for boolean are true, false, 1 (which indicates true), and 0
(which indicates false).
(continued)
Name Description
float
hexBinary
NOTATION
QName
XML EDITORS
If you are serious about XML, you will benefit from using a professional XML
editor.
XML is Text-based
XML is a text-based markup language. One great thing about XML is that
XML files can be created and edited using a simple text-editor like Notepad.
However, when you start working with XML, you will soon find that it is better
to edit XML documents using a professional XML editor.
Many Web developers use Notepad to edit both HTML and XML doc-
uments because Notepad is included with the most common OS and it is
simple to use.
But, if you use Notepad for XML editing, you will soon run into problems.
Notepad does not know that you are writing XML, so it will not be able to
assist you.
XML is an important technology, and development projects use XML-
based technologies like
8
XSL BASICS
INTRODUCTION TO XSL
XSL stands for EXtensible Stylesheet Language. It started with XSL and
ended up with XSLT, XPath, and XSL-FO. The World Wide Web Consortium
(W3C) started to develop XSL because there was a need for an XML-based
Stylesheet Language. XML does not use predefined tags (we can use any tag
names we like), and therefore the meaning of each tag is not well understood.
A <table> tag could mean an HTML table, a piece of furniture, or some-
thing else—and a browser does not know how to display it. XSL describes how
the XML document should be displayed. XSL consists of three parts:
●● XSLT—a language for transforming XML documents
●● XPath—a language for navigating in XML documents
●● XSL-FO—a language for formatting XML documents
With the XSL, you can freely do modify any of the source text. Stylesheet
1 and the Stylesheet 2 produce different output from the same source file.
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<H1><xsl:value-of select="//title"/></H1>
<H2><xsl:value-of select="//author"/></H2>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<H2><xsl:value-of select="//author"/></H2>
<H1><xsl:value-of select="//title"/></H1>
</xsl:template>
</xsl:stylesheet>
AN XML SYNTAX
An every XSL stylesheet should start with the xsl:stylesheet element. Attribute
xmlns:xsl specifies the version of the XSL(T) specification. This example shows
the simplest possible stylesheet. The default is used here because this does
not contain any information.
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
</xsl:stylesheet>
AN XSL PROCESSOR
The XSL processors parses the XML source and tries to find the matching
template rule. If it can find it, then the instructions inside the matching tem-
plate are evaluated.
The contents of the original elements can be recovered from a original
sources in two basic ways. Stylesheet 1 uses the xsl:value-of a construct. In
this case, the contents of the element are used without any further process-
ing. The xsl:apply-templates in Stylesheet 2 are different. The parser further
processes the selected elements.
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="employee">
<B><xsl:value-of select="."/></B>
</xsl:template>
<xsl:template match="surname">
<i><xsl:value-of select="."/></i>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="employee">
<B><xsl:apply-templates select="firstName"/></B>
<B><xsl:apply-templates select="surname"/></B>
</xsl:template>
<xsl:template match="surname">
<i> <xsl:value-of select="."/></i>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<bold>Hello, world.</bold>
<red>I am </red>
<italic>fine.</italic>
</xslTutorial>
HTML Output 1
<P>
<B>Hello, world.</B></P>
<P style="color:red">I am </P>
<P>
<i>fine.</i></P>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="bold">
<P><B><xsl:value-of select="."/></B></P>
</xsl:template>
<xsl:template match="red">
<P style="color:red"><xsl:value-of select="."/></P>
</xsl:template>
<xsl:template match="italic">
<P><i><xsl:value-of select="."/></i></P>
</xsl:template>
</xsl:stylesheet>
LOCATION PATHS
The parts of the XML document to which the template should be applied are
determined by the location paths. A required syntax is specified in the XPath
specification. Simple cases look similar to file system addressing.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA id='a1' pos='start'>
<BBB id='b1'/>
<BBB id='b2'/>
</AAA>
<AAA id='a2'>
<BBB id='b3'/>
<BBB id='b4'/>
<CCC id='c1'>
<DDD id='d1'/>
</CCC>
<BBB id='b5'>
<CCC id='c2'/>
</BBB>
</AAA>
</xslTutorial>
HTML Output 1
<DIV style="color:purple">BBB id=b1</DIV>
<DIV style="color:purple">BBB id=b2</DIV>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="BBB">
<DIV style="color:purple">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:template>
<xsl:template match="/xslTutorial/AAA/CCC/DDD">
<DIV style="color:red">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:template>
</xsl:stylesheet>
The processing always starts with the template match=“/”. This is a root
element and its only child is the document element, in our case, it is XSl
tutorial. Many of the stylesheets do not contain this element explicitly. When
the explicit template does not exist, the implicit template, which contains the
instructions, is called. It processes the children of the current node, including
the text nodes.
Wildcard
A template can match the selection of a location path, and the individual paths
are separated with the“|” (see Stylesheet 1). The wildcard * selects all the pos-
sibilities. Compare Stylesheet 1 with Stylesheet 2.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<employee>
<firstName>Joe</firstName>
<surname>Smith</surname>
</employee>
</xslTutorial>
HTML Output 1
<DIV>[template: firstName outputs Joe ]</DIV>
<DIV>[template: surname outputs Smith ]</DIV>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="firstName|surname">
<DIV><xsl:text> [template: </xsl:text>
<xsl:value-of select="name()"/>
<xsl:text> outputs </xsl:text>
<xsl:apply-templates/ >
<xsl:text> ]</xsl:text> </DIV>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="*">
<DIV><xsl:text> [template: </xsl:text>
<xsl:value-of select="name()"/>
<xsl:text> outputs </xsl:text>
<xsl:apply-templates/ >
<xsl:text> ]</xsl:text> </DIV>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA id='a1' pos='start'>
<BBB id='b1'/>
<BBB id='b2'/>
</AAA>
<AAA id='a2'>
<BBB id='b3'/>
<BBB id='b4'/>
<CCC id='c1'>
<CCC id='c2'/>
</CCC>
<BBB id='b5'>
<CCC id='c3'/>
</BBB>
</AAA>
</xslTutorial>
HTML Output 1
<DIV style="color:red">CCC id=c1</DIV>
<DIV style="color:red">CCC id=c2</DIV>
<DIV style="color:red">CCC id=c3</DIV>
<DIV style="color:blue">CCC id=c1</DIV>
<DIV style="color:blue">CCC id=c2</DIV>
<DIV style="color:blue">CCC id=c3</DIV>
<DIV style="color:purple">CCC id=c1</DIV>
<DIV style="color:purple">CCC id=c2</DIV>
<DIV style="color:purple">CCC id=c3</DIV>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<xsl:apply-templates select="//CCC" mode="red"/>
<xsl:apply-templates select="//CCC" mode="blue"/>
<xsl:apply-templates select="//CCC"/>
</xsl:template>
<xsl:template match="CCC" mode="red">
<DIV style="color:red">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:template>
<xsl:template match="CCC" mode="blue">
<DIV style="color:blue">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:template>
<xsl:template match="CCC">
<DIV style="color:purple">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<xsl:apply-templates select="//CCC" mode="red"/>
<xsl:apply-templates select="//CCC" mode="yellow"/>
</xsl:template>
<xsl:template match="CCC" mode="red">
<DIV style="color:red">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:template>
<xsl:template match="CCC">
<DIV style="color:purple">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:template>
</xsl:stylesheet>
TEMPLATE ORDERING
Very often, several of the templates match the selected element in the XML
source. It should be therefore decided which one to use. Templates are
ordered according to their priority, which can be specified with the priority
attribute. If a template does not contain this attribute, its priority is calculated
according to several rules.
XSL Attributes
An attribute can be accessed in the way similar to the elements. Notice @ in
front of the attribute name.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<dog name='Joe'>
<data weight='18 kg' color="black"/>
</dog>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<P>
<B>Dog: </B>Joe</P>
<P>
<B>Color: </B>black</P>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="dog">
<P><B><xsl:text> Dog: </xsl:text> </B>
<xsl:value-of select="@name"/></P>
<P><B><xsl:text> Color: </xsl:text> </B>
<xsl:value-of select="data/@color"/></P>
</xsl:template>
</xsl:stylesheet>
You can process the attribute in the same way as the element. You can
also select the elements which that contain or do not contain a given attribute.
HTML Output 2
<HTML>
<HEAD> </HEAD>
<BODY>
<P>Car: a005</P>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="car[@checked]">
<P><xsl:text> Car: </xsl:text>
<xsl:value-of select="@id"/></P>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="car[not(@checked)]">
<P><xsl:text> Car: </xsl:text>
<xsl:value-of select="@id"/></P>
</xsl:template>
</xsl:stylesheet>
AXES
Axes play very a important role in XSL. All the axes are used in the example
given below.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<doc>
<ancprec>
<p>Preceeding Ancestor. <br/></p>
</ancprec>
<gf>
<p>Ancestor. <br/></p>
<pprec choice="a">
<p>Preceeding Parent.<br/> </p>
</pprec>
<par>
<p>Parent. <br/></p>
<sibprec>
<p>Preceeding sibling.<br/> </p>
</sibprec>
<me id="id001">
<p>Me.<br/> </p>
<!-- Comment after Me -->
<chprec >
<p>Preceeding child.<br/> </p >
</chprec>
<child idref="id001">
<p>Child. <br/></p>
<?pi Processing Instruction ?>
<dprec>
<p>preceeding Descendant.<br/> </p>
</dprec>
<desc>
<p>Descendant.<br/> </p>
</desc>
<dfoll>
<p>Following Descendant.<br/> </p>
</dfoll>
</child>
<chfoll>
<p>following child.<br/> </p>
</chfoll>
</me>
<sibfoll>
<p>Following Sibling.<br/> </p>
</sibfoll>
</par>
<pfoll>
<p>Following Parent.<br/> </p>
</pfoll>
</gf>
<ancfoll>
<p>following Ancestor.<br/></p>
</ancfoll>
</doc>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD></HEAD>
<BODY>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Document</title> </head>
<body>
<H2>Following Axis</H2>
<b>Following Sibling.
<br> Following Parent.
<br> following Ancestor.
<br> </b>
<H2>Descendant or Self Axis</H2>
<b>Me.
<br> Preceeding child.
<br> Child.
<br>preceeding Descendant.
<br> Descendant.
<br> Following Descendant.
<br> following child.
<br> </b>
<H2>Descendant Axis</H2>
<b>Preceeding child.
<br> Child.
<br>preceeding Descendant.
<br> Descendant.
<br> Following Descendant.
<br> following child.
<br> </b>
<H2>Self Axis</H2>
<b>Me.
<br> </b>
<H2>Child Axis</H2>
<b>Preceeding child.
<br> Child.
<br>following child.
<br> </b>
<H2>Following Axis</H2>
<p>
<b>Following Sibling.
<br> Following Parent.
<br> following Ancestor.
<br> </b>
<br>
<i>Note the lack of ancestors here?
<br>Learned anything about document order yet?</i> </p>
<H2>Following Sibling Axis</H2>
<b> Following Sibling.
<br> </b>
<H2>Attribute Axis</H2>
<b>id001</b>
<H2>Parent Axis</H2>
<b>Parent.
<br> </b>
<H2>Ancestor or Self Axis</H2>
<b>Ancestor.
<br>Parent.
<br>Me.
<br> </b>
<H2>Ancestor Axis</H2>
<b>Ancestor.
<br>Parent.
<br> </b>
<H2>Preceding Sibling Axis</H2>
<b>Preceeding sibling.
<br> </b>
<H2>Preceeding Axis</H2>
<b>
<i>Not Implemented in XT 22 09 99</i></b>
<H2>Namespace Axis</H2>
<b>
<i>Not Implemented in XT 22 09 99</i></b>
</body>
</html>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
Note how the initial context node is reduced by the apply templates; this
stops the “leaking” of content when all we want is a subset of the whole in the
result tree.
<xsl:apply-templates select="//me"/>
</xsl:template>
<xsl:template match="br">
<br />
</xsl:template>
<xsl:template match="me" priority="10">
<html>
<head>
<title> <xsl:text> Document</xsl:text> </title>
</head>
<body>
<H2>Following Axis</H2>
<b><xsl:apply-templates select="following::*/p"/></b>
<H2>Descendant or Self Axis</H2>
<b><xsl:apply-templates select="descendant-or-self::*/p"/></b>
<H2>Descendant Axis</H2>
<b><xsl:apply-templates select="descendant::*/p"/></b>
<H2>Self Axis</H2>
<b><xsl:apply-templates select="self::*/p"/></b>
<H2>Child Axis</H2>
<b><xsl:apply-templates select="child::*/p"/></b>
<H2>Following Axis</H2>
<p><b><xsl:apply-templates select="following::*/p"/></b>
<br /><i>Note the lack of ancestors here? <br />Learned anything
about document order yet?</i></p>
<H2>Following Sibling Axis</H2>
<b><xsl:apply-templates select="following-sibling::*"/></b>
<H2>Attribute Axis</H2>
<b>
<H2>Parent Axis</H2>
<b><xsl:apply-templates select="parent::*/p"/></b>
<H2>Ancestor or Self Axis</H2>
<b><xsl:apply-templates select="ancestor-or-self::*/p"/></b>
<H2>Ancestor Axis</H2>
<b><xsl:apply-templates select="ancestor::*/p"/></b>
<H2>Preceding Sibling Axis</H2>
<b><xsl:apply-templates select="preceding-sibling::*/p"/></b>
<H2>Preceeding Axis</H2>
<b><i>Not Implemented in XT 22 09 99</i></b>
<H2>Namespace Axis</H2>
<b><i>Not Implemented in XT 22 09 99</i></b>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The child axis:: can be omitted from the location step as it is a default axis.
The Axis attribute:: can be abbreviated to an @. // is short for the /descendant-
or-self::, is short for self:: and .. is the short for parent::.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA id='a1' pos='start'>
<BBB id='b1'/>
<BBB id='b2'/>
</AAA>
<AAA id='a2'>
<BBB id='b3'/>
<BBB id='b4'/>
<CCC id='c1'>
<DDD id='d1'/>
</CCC>
<BBB id='b5'>
<CCC id='c2'/>
</BBB>
</AAA>
</xslTutorial>
HTML Output 1
<DIV style="color:red">BBB id=b1</DIV>
<DIV style="color:red">BBB id=b2</DIV>
<DIV style="color:red">BBB id=b3</DIV>
<DIV style="color:red">BBB id=b4</DIV>
<DIV style="color:red">BBB id=b5</DIV>
<DIV style="color:navy">CCC id=c1</DIV>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<xsl:for-each select="//BBB">
<DIV style="color:red">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:for-each>
<xsl:for-each select="xslTutorial/AAA/CCC">
<DIV style="color:navy">
<xsl:value-of select="name()"/>
<xsl:text> id=</xsl:text>
<xsl:value-of select="@id"/>
</DIV>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XSL SORTING
The nodes selected with an xsl:for-each (see Stylesheet 1 and Stylesheet 2) or
the xsl:apply-templates (see Stylesheet 3) can be sorted. Order of the sorting
determines the order of an attribute. Stylesheet 1 sorts in ascending order and
Stylesheet 2 sorts in descending mode.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<name>John</name>
<name>Josua</name>
<name>Charles</name>
<name>Alice</name>
<name>Martha</name>
<name>George</name>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE>
<TR>
<TH>Alice</TH></TR>
<TR>
<TH>George</TH></TR>
<TR>
<TH>Charles</TH></TR>
<TR>
<TH>John</TH></TR>
<TR>
<TH>Josua</TH></TR>
<TR>
<TH>Martha</TH></TR>
</TABLE>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE>
<xsl:for-each select="//name">
<xsl:sort order="ascending" select="."/>
<TR><TH><xsl:value-of select="."/></TH></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
Stylesheet 1 sorts the text and the Stylesheet 2 sorts the numeric mode.
Notice the most important difference. “Two” comes after “one” alphabeti-
cally, so 2 goes after 10 in text mode.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<car id="11"/>
<car id="6"/>
<car id="105"/>
<car id="28"/>
<car id="9"/>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE>
<TR>
<TH>Car-105</TH></TR>
<TR>
<TH>Car-11</TH></TR>
<TR>
<TH>Car-28</TH></TR>
<TR>
<TH>Car-6</TH></TR>
<TR>
<TH>Car-9</TH></TR>
</TABLE>
</BODY>
</HTML>
HTML Output 2
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE>
<TR>
<TH>Car-6</TH></TR>
<TR>
<TH>Car-9</TH></TR>
<TR>
<TH>Car-11</TH></TR>
<TR>
<TH>Car-28</TH></TR>
<TR>
<TH>Car-105</TH></TR>
</TABLE>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE>
<xsl:for-each select="//car">
<xsl:sort data-type="text" select="@id"/>
<TR><TH><xsl:text> Car-</xsl:text> <xsl:value-of
select="@id"/></TH></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE>
<xsl:for-each select="//car">
<xsl:sort data-type="number" select="@id"/>
<TR><TH><xsl:text> Car-</xsl:text> <xsl:value-of
select="@id"/></TH></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<word id="czech"/>
<word id="Czech"/>
<word id="cook"/>
<word id="TooK"/>
<word id="took"/>
<word id="Took"/>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE>
<TR>
<TH>cook</TH></TR>
<TR>
<TH>Czech</TH></TR>
<TR>
<TH>czech</TH></TR>
<TR>
<TH>TooK</TH></TR>
<TR>
<TH>Took</TH></TR>
<TR>
<TH>took</TH></TR>
</TABLE>
</BODY>
</HTML>
HTML Output 2
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE>
<TR>
<TH>cook</TH></TR>
<TR>
<TH>czech</TH></TR>
<TR>
<TH>Czech</TH></TR>
<TR>
<TH>took</TH></TR>
<TR>
<TH>Took</TH></TR>
<TR>
<TH>TooK</TH></TR>
</TABLE>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE>
<xsl:for-each select="//word">
<xsl:sort case-order="upper-first" select="@id"/>
<TR><TH><xsl:value-of
select="@id"/></TH></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE>
<xsl:for-each select="//word">
<xsl:sort case-order="lower-first" select="@id"/>
<TR><TH><xsl:value-of
select="@id"/></TH></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
\XSl Element
XML Source
<?xml version="1.0"?>
<xslTutorial >
<text size="H1">Header1</text>
<text size="H3">Header3</text>
<text size="b">Bold text</text>
<text size="sub">Subscript</text>
<text size="sup">Superscript</text>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<H1>Header1</H1>
<H3>Header3</H3>
<b>Bold text</b>
<sub>Subscript</sub>
<sup>Superscript</sup>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<xsl:for-each select="//text">
<xsl:element name="{@size}"><xsl:value-of select="."/></xsl:element>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<color>blue</color>
<color>navy</color>
<color>green</color>
<color>lime</color>
<color>red</color>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE>
<TR>
<TD style=" color:blue">blue</TD></TR></TABLE>
<TABLE>
<TR>
<TD style=" color:navy">navy</TD></TR></TABLE>
<TABLE>
<TR>
<TD style=" color:green">green</TD></TR></TABLE>
<TABLE>
<TR>
<TD style=" color:lime">lime</TD></TR></TABLE>
<TABLE>
<TR>
<TD style=" color:red">red</TD></TR>
</TABLE>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="color">
<TABLE>
<TR><TD>
<xsl:attribute name="style">
color:<xsl:value-of select="."/>
</xsl:attribute>
<xsl:value-of select="."/>
</TD></TR>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<p id="a12">
Compare <B>these constructs</B>
</p>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<DIV>
<B>copy-of : </B>
<p id="a12"> Compare
<B>these constructs</B>. </p></DIV>
<DIV>
<B>copy : </B>
<p/></DIV>
<DIV>
<B>value-of : </B> Compare these constructs.
</DIV>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="p">
XML Source
<?xml version="1.0"?>
<xslTutorial >
<list>
<entry name="A"/>
<entry name="B"/>
<entry name="C"/>
<entry name="D"/>
</list>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY> A, B, C, D,
</BODY>
</HTML>
HTML Output 2
<HTML>
<HEAD> </HEAD>
<BODY> A, B, C, D
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="list">
<xsl:for-each select="entry">
<xsl:value-of select="@name"/>
<xsl:text> , </xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="list">
<xsl:for-each select="entry">
<xsl:value-of select="@name"/>
<xsl:if test="not(position()=last())">
<xsl:text> , </xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<SECTION>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<P>SUMMARY: I need a pen and some paper.</P>
<P>DATA: I need bread.</P>
<P>DATA: I need butter.</P>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' > <xsl:template
match="//SECTION">
<xsl:choose>
<xsl:when test='SUMMARY'>
<P><xsl:text> SUMMARY: </xsl:text>
<xsl:value-of select="SUMMARY"/></P>
</xsl:when>
<xsl:otherwise>
<xsl:for-each select="DATA">
<P><xsl:text> DATA: </xsl:text>
<xsl:value-of select="."/></P>
</xsl:for-each>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<chapter>First Chapter</chapter>
<chapter>Second Chapter
<chapter>Subchapter 1</chapter>
<chapter>Subchapter 2</chapter>
</chapter>
<chapter>Third Chapter
<chapter>Subchapter A</chapter>
<chapter>Subchapter B
<chapter>sub a</chapter>
<chapter>sub b</chapter>
</chapter>
<chapter>Subchapter C</chapter>
</chapter>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE BORDER="1">
TR>
<TH>Number</TH>
; <TH>text</TH></TR>
<TR>
<TD>1</TD>
<TD>First Chapter</TD></TR>
<TR>
<TD>2</TD>
HTML Output 2
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE BORDER="1">
<TR>
<TH>Number</TH>
<TH>text</TH></TR>
<TR>
<TD>1</TD>
<TD>First Chapter</TD></TR>
<TR>
<TD>2</TD>
<TD>Second Chapter </TD></TR>
<TR>
<TD>2.1</TD>
<TD>Subchapter 1</TD></TR>
<TR>
<TD>2.2</TD>
<TD>Subchapter 2</TD></TR>
<TR>
<TD>3</TD>
<TD>Third Chapter </TD></TR>
<TR>
<TD>3.1</TD>
<TD>Subchapter A</TD></TR>
<TR>
<TD>3.2</TD>
<TD>Subchapter B </TD></TR>
<TR>
<TD>3.2.1</TD>
<TD>sub a</TD></TR>
<TR>
<TD>3.2.2</TD>
<TD>sub b</TD></TR>
<TR>
<TD>3.3</TD>
<TD>Subchapter C</TD></TR>
</TABLE>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE BORDER="1">
<TR><TH>Number</TH><TH>text</TH></TR>
<xsl:for-each select="//chapter">
<TR><TD>
<xsl:number/ >
</TD><TD>
<xsl:value-of select="./text()"/>
</TD></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<n>one</n>
<n>two</n>
<n>three</n>
<n>four</n>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE>
<TR>
<TD>1. one</TD></TR>
<TR>
<TD>2. two</TD></TR>
<TR>
<TD>3. three</TD></TR>
<TR>
<TD>4. four</TD></TR>
</TABLE>
</BODY>
</HTML>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE>
<xsl:for-each select="//n">
<TR><TD>
<xsl:number value="position()" format="1. "/>
<xsl:value-of select="."/>
</TD></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 1
FORMATTING MULTILEVEL NUMBERS
Stylesheet 1 and Stylesheet 2 are examples of formatting multilevel numbers.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<chapter>First Chapter</chapter>
<chapter>Second Chapter
<chapter>Subchapter 1</chapter>
<chapter>Subchapter 2</chapter>
</chapter>
<chapter>Third Chapter
<chapter>Subchapter A</chapter>
<chapter>Subchapter B
<chapter>sub a</chapter>
<chapter>sub b</chapter>
</chapter>
<chapter>Subchapter C</chapter>
</chapter>
</xslTutorial>
HTML Output 1
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE BORDER="1">
<TR>
<TH>Number</TH>
<TH>text</TH></TR>
<TR>
<TD>1 </TD>
<TD>First Chapter</TD></TR>
<TR>
<TD>2 </TD>
<TD>Second Chapter </TD></TR>
<TR>
<TD>2.A </TD>
<TD>Subchapter 1</TD></TR>
<TR>
<TD>2.B </TD>
<TD>Subchapter 2</TD></TR>
<TR>
<TD>3 </TD>
<TD>Third Chapter </TD></TR>
<TR>
<TD>3.A </TD>
<TD>Subchapter A</TD></TR>
<TR>
<TD>3.B </TD>
<TD>Subchapter B </TD></TR>
<TR>
<TD>3.B.a </TD>
<TD>sub a</TD></TR>
<TR>
<TD>3.B.b </TD>
<TD>sub b</TD></TR>
<TR>
<TD>3.C </TD>
<TD>Subchapter C</TD></TR>
</TABLE>
</BODY>
</HTML>
HTML Output 2
<HTML>
<HEAD> </HEAD>
<BODY>
<TABLE BORDER="1">
<TR>
<TH>Number</TH>
<TH>text</TH></TR>
<TR>
<TD>I:</TD>
<TD>First Chapter</TD></TR>
<TR>
<TD>II:</TD>
<TD>Second Chapter </TD></TR>
<TR>
<TD>II-1:</TD>
<TD>Subchapter 1</TD></TR>
<TR>
<TD>II-2:</TD>
<TD>Subchapter 2</TD></TR>
<TR>
<TD>III:</TD>
<TD>Third Chapter </TD></TR>
<TR>
<TD>III-1:</TD>
<TD>Subchapter A</TD></TR>
<TR>
<TD>III-2:</TD>
<TD>Subchapter B
<TR>
<TD>III-2-a:</TD>
<TD>sub a</TD></TR>
<TR>
<TD>III-2-b:</TD>
<TD>sub b</TD></TR>
<TR>
<TD>III-3:</TD>
<TD>Subchapter C</TD></TR>
</TABLE>
</BODY>
</HTML>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE BORDER="1">
<TR><TH>Number</TH><TH>text</TH></TR>
<xsl:for-each select="//chapter">
<TR><TD>
<xsl:number level="multiple" format="1.A.a "/>
</TD><TD>
<xsl:value-of select="./text()"/>
</TD></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<TABLE BORDER="1">
<TR><TH>Number</TH><TH>text</TH></TR>
<xsl:for-each select="//chapter">
<TR><TD>
<xsl:number level="multiple" format="I-1-a:"/>
</TD><TD>
<xsl:value-of select="./text()"/>
</TD></TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 1
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<TABLE border="1">
<TR>
<TH>text</TH>
<TH>number</TH>
</TR>
<xsl:for-each select="//text">
<TR>
<TD>
<xsl:value-of select="."/>
</TD>
<TD>
<xsl:value-of select="number()"/>
</TD>
</TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<TABLE border="1">
<TR>
<TH>text</TH>
<TH>number</TH>
</TR>
<TD>
<xsl:value-of select="number(5<7)"/>
</TD>
</TR>
</TABLE>
</xsl:template>
</xsl:stylesheet>
Add, subtract a and multiply use, common syntax, (see the XSL Stylesheet
1). The division syntax is less familiar. A slash “ / ” symbol is used in the pat-
terns, and so the keyword div is used instead (see XSL Stylesheet 2). The
operator mod returns a remainder from truncating the division.
XML Source
<source>
<number>1</number>
<number>3</number>
<number>4</number>
<number>17</number>
<number>8</number>
<number>11</number>
</source>
Output
<P>1 + 3 = 4</P>
<P>4 - 17 = -13</P>
<P>8 * 11 = 88</P>
XSL Stylesheet 1
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<P>
<xsl:value-of select="//number[1]"/>
<xsl:text> + </xsl:text>
<xsl:value-of select="//number[2]"/>
<xsl:text> = </xsl:text>
<xsl:value-of select="//number[1] + //number[2]"/>
</P>
<P>
<xsl:value-of select="//number[3]"/>
<xsl:text> - </xsl:text>
<xsl:value-of select="//number[4]"/>
<xsl:text> = </xsl:text>
<xsl:value-of select="//number[3] - //number[4]"/>
</P>
<P>
<xsl:value-of select="//number[5]"/>
<xsl:text> * </xsl:text>
<xsl:value-of select="//number[6]"/>
<xsl:text> = </xsl:text>
<xsl:value-of select="//number[5] * //number[6]"/>
</P>
</xsl:template>
</xsl:stylesheet>
XML Source
<source>
<number>1</number>
<number>3</number>
<number>4</number>
<number>17</number>
<number>8</number>
<number>11</number>
</source>
Output
<P>8 / 11 = 0.7272727272727273</P>
<P>8 mod 11 = 8</P>
XSL Stylesheet 2
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<P>
<xsl:value-of select="//number[5]"/>
<xsl:text> / </xsl:text>
<xsl:value-of select="//number[6]"/>
<xsl:text> = </xsl:text>
<xsl:value-of select="//number[5] div //number[6]"/>
</P>
<P>
<xsl:value-of select="//number[5]"/>
<xsl:text> mod </xsl:text>
<xsl:value-of select="//number[6]"/>
<xsl:text> = </xsl:text>
<xsl:value-of select="//number[5] mod //number[6]"/>
</P>
</xsl:template>
</xsl:stylesheet>
XML Source
<source>
<number>6</number>
<number>3.8</number>
<number>1.234</number>
<number>-6</number>
<number>-3.8</number>
<number>-1.234</number>
</source>
Output
<TABLE border="1">
<TR>
<TH>number</TH>
<TH>floor</TH>
<TH>ceiling</TH>
<TH>round</TH>
</TR>
<TR>
<TD>6</TD>
<TD>6</TD>
<TD>6</TD>
<TD>6</TD>
</TR>
<TR>
<TD>3.8</TD>
<TD>3</TD>
<TD>4</TD>
<TD>4</TD>
</TR>
<TR>
<TD>1.234</TD>
<TD>1</TD>
<TD>2</TD>
<TD>1</TD>
</TR>
<TR>
<TD>-6</TD>
<TD>-6</TD>
<TD>-6</TD>
<TD>-6</TD>
</TR>
<TR>
<TD>-3.8</TD>
<TD>-4</TD>
<TD>-3
<td>-4></td>-4>
</TR>
<TR>
<TD>-1.234</TD>
<TD>-2</TD>
<TD>-1</TD>
<TD>-1</TD>
</TR>
</TABLE>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<TABLE border="1">
<TR>
<TH>number</TH>
<TH>floor</TH>
<TH>ceiling</TH>
<TH>round</TH>
</TR>
<xsl:for-each select="//number">
<TR>
<TD>
<xsl:value-of select="."/>
</TD>
<TD>
<xsl:value-of select="floor(.)"/>
</TD>
<TD>
<xsl:value-of select="ceiling(.)"/>
</TD>
<TD>
<xsl:value-of select="round(.)"/>
</TD>
</TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
STRING FUNCTION
The function string() transforms all its argument into a string. This func-
tion is not used directly in the stylesheets since it is called by default. XSL
stylesheet 1 shows examples of the number-to-string conversions. Notice the
results of a zero division.
XML Source
<source>
<number>9</number>
<number>0</number>
<number>-9</number>
<number/>
</source>
Output
<P>9</P>
<P>NaN</P>
<P>9/0 = Infinity</P>
<P>-9/0 = -Infinity</P>
<P>0/0 = NaN</P>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:variable name="A" select="number(//number[1])"/>
<xsl:variable name="B" select="number(//number[2])"/>
<xsl:variable name="C" select="number(//number[3])"/>
<xsl:variable name="D" select="number(//number[4])"/>
<xsl:template match="/">
<P>
<xsl:value-of select="string(number($A))"/>
</P>
<P>
<xsl:value-of select="string(number($D))"/>
</P>
<P>
<xsl:value-of select="$A"/>
<xsl:text>/</xsl:text>
<xsl:value-of select="$B"/>
<xsl:text> = </xsl:text>
<xsl:value-of select="string($A div $B)"/>
</P>
<P>
<xsl:value-of select="$C"/>
<xsl:text>/</xsl:text>
<xsl:value-of select="$B"/>
<xsl:text> = </xsl:text>
<xsl:value-of select="string($C div $B)"/>
</P>
<P>
<xsl:value-of select="$B"/>
<xsl:text>/</xsl:text>
<xsl:value-of select="$B"/>
<xsl:text> = </xsl:text>
<xsl:value-of select="$B div $B"/>
</P>
</xsl:template>
</xsl:stylesheet>
XML Source
<source>
<text>124</text>
<text>AB234</text>
<text>-16</text>
<text>0</text>
<text/>
<text>false</text>
</source>
Output
<TABLE border="1">
<TR>
<TH>text</TH>
<TH>boolean</TH>
</TR>
<TR>
<TD>124</TD>
<TD>true</TD>
</TR>
<TR>
<TD>AB234</TD>
<TD>true</TD>
</TR>
<TR>
<TD>-16</TD>
<TD>true</TD>
</TR>
<TR>
<TD>0</TD>
<TD>true</TD>
</TR>
<TR>
<TD/>
<TD>false</TD>
</TR>
<TR>
<TD>false</TD>
<TD>true</TD>
</TR>
</TABLE>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<TABLE border="1">
<TR>
<TH>text</TH>
<TH>boolean</TH>
</TR>
<xsl:for-each select="//text">
<TR>
<TD>
<xsl:value-of select="."/>
<xsl:text/>
</TD>
<TD>
<xsl:value-of select="boolean(text())"/>
</TD>
</TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
Not Function
The not function returns true if the argument passed to it is false, and returns
false otherwise.
XML Source
<source>
<car id="a234" checked="yes"/>
<car id="a111" checked="yes"/>
<car id="a005"/>
</source>
Output
<P>
<B style="color:blue">a234</B>
</P>
<P>
<B style="color:blue">a111</B>
</P>
<P>
<B style="color:red">a005</B>
</P>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="car[not(@checked)]">
<P>
<B style="color:red">
<xsl:value-of select="@id"/>
</B>
</P>
</xsl:template>
<xsl:template match="car[@checked]">
<P>
<B style="color:blue">
<xsl:value-of select="@id"/>
</B>
</P>
</xsl:template>
</xsl:stylesheet>
The functions true() and false() are useful for testing conditions.
XML Source
<source>
<number>0</number>
<number>1</number>
</source>
Output
<P>true not false</P>
<P>true not false</P>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="number">
<P>
<xsl:if test="true()">
<xsl:text>true </xsl:text>
</xsl:if>
<xsl:if test="not(false())">
<xsl:text>not false</xsl:text>
</xsl:if>
</P>
</xsl:template>
</xsl:stylesheet>
The lang function returns true or false depending on the language of the
context node as specified by the xml:lang attributes. It is the same as or is
the sublanguage of the language that is specified by an argument string. The
language of the context node is determined by a value of the xml:lang attri-
bute on the context node, or, if the context node has no xml:lang attribute,
by a value of the xml:lang attribute on the nearest ancestor of a context node
that has an xml:lang attribute. If there exists no such attribute, then the lang
returns false. If such an attribute exists, then the lang returns true if the attri-
bute value is equal to an argument ignoring the case, or if there is some suffix
starting with it, such that an attribute value is equal to an argument ignoring
the suffix of an attribute value and ignoring the case.
XML Source
<source>
<P xml:lang="de">
<text xml:lang="cs">a</text>
<text xml:lang="en">and</text>
<text>und</text>
</P>
</source>
Output
<P>Czech: a</P>
<P>English: and</P>
<P>German: und</P>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="text">
<P>
<xsl:choose>
<xsl:when test='lang("cs")'>
<xsl:text>Czech: </xsl:text>
</xsl:when>
<xsl:when test='lang("en")'>
<xsl:text>English: </xsl:text>
</xsl:when>
<xsl:when test='lang("de")'>
<xsl:text>German: </xsl:text>
</xsl:when>
</xsl:choose>
<xsl:value-of select="."/>
</P>
</xsl:template>
</xsl:stylesheet>
The function string() transforms the argument into a string. This function
is not usually directly used in the stylesheets, as in most cases called by a
default. XSL stylesheet 1 shows the examples of the number-to-string conver-
sion. Notice the results of the zero divisions.
XML Source
<source>
<number>9</number>
<number>0</number>
<number>-9</number>
<number/>
</source>
Output
<P>9</P>
<P>NaN</P>
<P>9/0 = Infinity</P>
<P>-9/0 = -Infinity</P>
<P>0/0 = NaN</P>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
CONCATINATION
The string concat function returns the concatenation of the arguments passed
to it.
XML Source
<source>
<text>Start</text>
<text>Body</text>
<text>Finish</text>
</source>
Output
<P>Start - Body - Finish</P>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:variable name="T" select="concat(//text[1],' - ',//text[2],' - ',//text[3])"/>
<xsl:template match="/">
<P>
<xsl:value-of select="$T"/>
</P>
</xsl:template>
</xsl:stylesheet>
A starts-with function returns true if the first argument string starts with
a second argument string, otherwise it will return false. The contains function
returns true if the first argument string contains the second argument string,
otherwise it will return false.
XML Source
<source>
<text>Welcome to XSL world.</text>
<string>Welcome</string>
<string>XSL</string>
<string>XML</string>
</source>
Output
<TABLE border="1">
<TR>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<TABLE border="1">
<TR>
<TH colspan="3">
<xsl:value-of select="//text"/>
</TH>
</TR>
<TR>
<TH>string</TH>
<TH>starts-with</TH>
<TH>contains</TH>
</TR>
<xsl:for-each select="//string">
<TR>
<TD>
<xsl:value-of select="."/>
</TD>
<TD>
<xsl:value-of select="starts-with(//text,.)"/>
</TD>
<TD>
<xsl:value-of select="contains(//text,.)"/>
</TD>
</TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
Output
<DIV>
<B>Text: </B>Welcome to XSL world.</DIV>
<B>Text before XSL: </B>Welcome to <DIV>
<B>Text after XSL: </B> world.</DIV>
<DIV>
<B>Text from position 4: </B>come to XSL world.</DIV>
<DIV> <B>Text from position 4 of length 10: </B>come to XS</DIV>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<DIV>
<B>
<xsl:text>Text: </xsl:text>
</B>
<xsl:value-of select="//text"/>
</DIV>
<B>
<xsl:text>Text before </xsl:text>
<xsl:value-of select="//string"/>
<xsl:text>: </xsl:text>
</B>
<xsl:value-of select="substring-before(//text,//string)"/>
<DIV>
<B>
<xsl:text>Text after </xsl:text>
<xsl:value-of select="//string"/>
<xsl:text>: </xsl:text>
</B>
<xsl:value-of select="substring-after(//text,//string)"/>
</DIV>
<DIV>
<B>
<xsl:text>Text from position </xsl:text>
<xsl:value-of select="//start"/>
<xsl:text>: </xsl:text>
</B>
<xsl:value-of select="substring(//text,//start)"/>
</DIV>
<DIV>
<B>
<xsl:text>Text from position </xsl:text>
<xsl:value-of select="//start"/>
<xsl:text> of length </xsl:text>
<xsl:value-of select="//end"/>
<xsl:text>: </xsl:text>
</B>
<xsl:value-of select="substring(//text,//start,//end)"/>
</DIV>
</xsl:template>
</xsl:stylesheet>
Output
<TABLE>
<TR>
<TH colspan="4">Normalized text</TH>
</TR>
<TR>
<TD>Starting length:</TD>
<TD>15</TD>
<TD>Normalized length:</TD>
<TD>15</TD>
</TR>
<TR>
<TH colspan="4">Sequences of whitespace characters</TH>
</TR>
<TR>
<TD>Starting length:</TD>
<TD>41</TD>
<TD>Normalized length:</TD>
<TD>34</TD>
</TR>
<TR>
<TH colspan="4"> Leading and trailing whitespace. </TH>
</TR>
<TR>
<TD>Starting length:</TD>
<TD>40</TD>
<TD>Normalized length:</TD>
<TD>32</TD>
</TR>
</TABLE>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<TABLE>
<xsl:for-each select="//text">
<TR>
<TH colspan="4">
<xsl:value-of select="."/>
</TH>
</TR>
<TR>
<TD>Starting length:</TD>
<TD>
<xsl:value-of select="string-length(.)"/>
</TD>
<TD>Normalized length:</TD>
<TD>
<xsl:value-of select="string-length(normalize-space(.))"/> </TD>
</TR>
</xsl:for-each>
</TABLE>
</xsl:template>
</xsl:stylesheet>
XML Source
<source>
<hr/>
<hr/>
<hr/>
</source>
Output
<source>
<hr>
<hr>
<hr>
</source>
Output
<h1> XML output </h1>
<hr/>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<xsl:copy-of select="/source/*"/>
</xsl:template>
</xsl:stylesheet>
XML Source
<source>
<h1> HTML output </h1>
<AAA/>
<HR/>
<script>if (a < b) foo(); if (cc < dd) foo() </script>
<hr/>
<hr/>
<Hr/>
<hR/>
</source>
Output
<h1> HTML output </h1>
<AAA></AAA>
<HR><script>if (a < b)foo();
if (cc < dd) foo()
</script><hr>
<hr>
<Hr>
<hR>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method="html"/>
<xsl:template match="/">
<xsl:copy-of select="/source/*"/>
</xsl:template>
</xsl:stylesheet>
XML Source
<source>
<html>
<head>
<title>HTML</title>
</head>
<body>
<h1> HTML output </h1> ?í?ala ?nek ko?ka pa?ez be?ka me?ec vyr
</body>
</html>
</source>
Output
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>HTML</title>
</head>
<body>
<h1> HTML output </h1>
?í?ala ?nek
ko?ka pa?ez
be?ka m??ec vyr
</body>
</html>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:copy-of select="/source/*"/>
</xsl:template>
</xsl:stylesheet>
XML Source
<source>
<AAA id="12"/>
</source>
Output
<!ELEMENT AAA ANY><!ATTLIST AAAid ID #REQUIRED>Look at my source in your browser
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:output method="text"/>
<xsl:template match="AAA">
<xsl:text><!ELEMENT </xsl:text>
<xsl:value-of select="name()"/>
<xsl:text> ANY></xsl:text>
<xsl:text><!ATTLIST </xsl:text>
<xsl:value-of select="name()"/>
<xsl:text/>
<xsl:value-of select="name(@*)"/>
<xsl:text> ID #REQUIRED></xsl:text>
<xsl:text>Look at my source in your browser</xsl:text>
</xsl:template>
</xsl:stylesheet>
XML Source
<source>
<p id="a12"> Compare
<B>these constructs</B>.
</p>
</source>
Output
<DIV>
<B>copy-of : </B>
<p id="a12">
Compare <B>these constructs</B>.
</p>
</DIV>
<DIV>
<B>copy : </B>
<p/>
</DIV>
<DIV>
<B>value-of : </B>
Compare these constructs.
</DIV>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="p">
<DIV>
<B>
<xsl:text>copy-of : </xsl:text>
</B>
<xsl:copy-of select="."/>
</DIV>
<DIV>
<B>
<xsl:text>copy : </xsl:text>
</B>
<xsl:copy/>
</DIV>
<DIV>
<B>
<xsl:text>value-of : </xsl:text>
</B>
<xsl:value-of select="."/>
</DIV>
</xsl:template>
</xsl:stylesheet>
USE-ATTRIBUTE-SETS ATTRIBUTE
An xsl:copy element may have the use-attribute-sets attribute. In this way, the
attributes for the copied element can be specified. XSL Stylesheet 2 does not
work as been expected because the expressions in the attributes that do refer
to the named XSL objects are not evaluated.
XML Source
<source>
<h1>GREETING</h1>
<p>Hello, world!</p>
</source>
Output
<h1 align="center" style="color:red">GREETING</h1>
<p align="left" style="color:blue">Hello, world!</p>
XSL Stylesheet
<xsl:stylesheet version = '1.0'
xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<xsl:apply-templates select="/source/*"/>
</xsl:template>
<xsl:template match="h1">
<xsl:copy use-attribute-sets="H1">
<xsl:value-of select="."/>
</xsl:copy>
</xsl:template>
<xsl:template match="p">
<xsl:copy use-attribute-sets="P ">
<xsl:value-of select="."/>
</xsl:copy>
</xsl:template>
<xsl:attribute-set name="H1">
<xsl:attribute name="align">center</xsl:attribute>
<xsl:attribute name="style">color:red</xsl:attribute>
</xsl:attribute-set>
<xsl:attribute-set name="P">
<xsl:attribute name="align">left</xsl:attribute>
<xsl:attribute name="style">color:blue</xsl:attribute>
</xsl:attribute-set>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA name="first">
<BBB name="first">11111</BBB>
<BBB name="second">22222</BBB>
</AAA>
<AAA name="second">
<BBB name="first">33333</BBB>
; <BBB name="second">44444</BBB>
</AAA>
</xslTutorial>
HTML Output 1
<TABLE border="1">
<TR>
<TH> . </TH>
<TH>current()</TH></TR>
<TR>
<TD>first</TD>
<TD>first</TD></TR>
<TR>
<TD>11111</TD>
<TD>1111122222</TD></TR>
<TR>
<TD>second</TD>
<TD>second</TD></TR>
<TR>
<TD>33333</TD>
<TD/></TR></TABLE>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<TABLE border="1">
<TR><TH> . </TH><TH>current()</TH></TR>
<xsl:apply-templates select="//AAA"/>
</TABLE>
</xsl:template>
<xsl:template match="AAA">
<TR>
<TD>
<xsl:value-of select="./@name"/>
</TD><TD>
<xsl:value-of select="current()/@name"/>
</TD></TR>
<TR><TD>
<xsl:apply-templates select="BBB[./@name='first']"/>
</TD><TD>
<xsl:apply-templates select="BBB[current()/@name='first']"/>
</TD></TR>
</xsl:template>
</xsl:stylesheet>
Generate Id
The function generate-id generates the id conforming to the XML spec.
Stylesheet 2 does uses the generate-id function to add the id to all the ele-
ments in the source XML.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA name='top'>
<BBB pos='1' val='bbb'>11111</BBB>
<BBB>22222</BBB>
</AAA>
<AAA name='bottom'>
<BBB>33333</BBB>
<BBB>44444</BBB>
</AAA>
</xslTutorial>
HTML Output 1
<DIV>
<B>generate-id(//AAA) : </B>N3</DIV>
<DIV>
<B>generate-id(//BBB) : </B>N6</DIV>
<DIV>
<B>generate-id(//AAA[1]) : </B>N3</DIV>
<DIV>
<B>generate-id(//*[1]) : </B>N1</DIV>
<DIV>
<B>generate-id(//xslTutorial/*[1]) : </B>N3</DIV>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' >
<xsl:template match="/">
<DIV><B><xsl:text> generate-id(//AAA) : </xsl:text> </B>
<xsl:value-of select="generate-id(//AAA) "/></DIV>
<DIV><B><xsl:text> generate-id(//BBB) : </xsl:text> </B>
<xsl:value-of select="generate-id(//BBB) "/></DIV>
<DIV><B><xsl:text> generate-id(//AAA[1]) : </xsl:text> </B>
<xsl:value-of select="generate-id(//AAA[1]) "/></DIV>
<DIV><B><xsl:text> generate-id(//*[1]) : </xsl:text> </B>
<xsl:value-of select="generate-id(//*[1]) "/></DIV>
<DIV><B><xsl:text> generate-id(//xslTutorial/*[1]) : </xsl:text> </B>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA name='top'>
<BBB pos='1' val='bbb'>11111</BBB>
<BBB>22222</BBB>
</AAA>
<AAA name='bottom'>
<BBB>33333</BBB>
<BBB>44444</BBB>
</AAA>
</xslTutorial>
HTML Output 2
<xslTutorial id="N1">
<AAA id="N3" name="top">
<BBB id="N6" pos="1" val="bbb">11111</BBB>
<BBB id="N11">22222</BBB> </AAA>
<AAA id="N15" name="bottom">
<BBB id="N18">33333</BBB>
<BBB id="N21">44444</BBB> </AAA> </xslTutorial>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="*">
<xsl:copy select=".">
<xsl:attribute name="id">
<xsl:value-of select="generate-id()"/>
</xsl:attribute>
<xsl:for-each select="@*">
<xsl:attribute name="{name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:for-each>
<xsl:apply-templates/ >
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
COMBINING XSL
Many other stylesheets can be imported using xsl:import or xsl:include.
Importing the stylesheet is the same as including, except that the defini-
tions and the template rules for importing the stylesheet takes precedence over
the template rules and the definitions in an imported stylesheet. Stylesheet 1
was imported into Stylesheet 2.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<H1>IMPORTING STYLESHEETS</H1>
</xslTutorial>
HTML Output 1
IMPORTING STYLESHEETS
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:variable name="id2">Stylesheet 1(id2.xsl)</xsl:variable>
<xsl:variable name="t">Variable t from id2.xsl</xsl:variable>
</xsl:stylesheet>
The xsl:import element children should precede all the other element
children of the xsl:stylesheet element, including any of the xsl:include ele-
ment children. When the xsl:include is used to include the stylesheet, any of
the xsl:import elements in an included document are moved up in an includ-
ing document to after any of the existing xsl:import elements in an including
document.
HTML Output 3
<P>Stylesheet 1(id2.xsl)
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:include href="id2.xsl"/>
<xsl:template match="/">
<P><xsl:value-of select="$id2"/></P>
<P><xsl:value-of select="$id3"/></P>
</xsl:template>
</xsl:stylesheet>
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA/>
<BBB/>
<CCC/>
</xslTutorial>
HTML Output 4
<DIV style="color:red">AAA (according to Stylesheet 1 (id2.xsl)</DIV>
<DIV style="color:red">BBB (according to Stylesheet 1 (id2.xsl)</DIV>
<DIV style="color:red">CCC (according to Stylesheet 1 (id2.xsl)</DIV>
XSL Stylesheet 4
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<xsl:apply-templates/ >
</xsl:template>
<xsl:template match="/*/*">
<DIV style="color:blue">
<xsl:value-of select="name()"/>
<xsl:text> (according to this stylesheet)</xsl:text>
</DIV>
</xsl:template>
<xsl:include href="id2.xsl"/>
</xsl:stylesheet>
APPLY-IMPORT FUNCTION
You can use the xsl:apply-imports element to get the information from the
imported template for the elements whose behavior is changing. Stylesheet 2
imports Stylesheet 1 and overrides the template.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA/>
<BBB/>
<CCC/>
</xslTutorial>
HTML Output 1
<DIV style="color:red">AAA</DIV>
<DIV style="color:red">BBB</DIV>
<DIV style="color:red">CCC</DIV>
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/*/*">
<DIV style="color:red">
<xsl:value-of select="name()"/>
</DIV>
</xsl:template>
</xsl:stylesheet>
Overrides
Stylesheet 2 imports Stylesheet 1 and overrides the template.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA/>
<BBB/>
<CCC/>
</xslTutorial>
HTML Output 2
<EM>AAA</EM>
<EM>BBB</EM>
<EM>CCC</EM>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:import href="id2.xsl"/>
<xsl:template match="/*/*">
<EM>
<xsl:value-of select="name()"/>
</EM>
</xsl:template>
</xsl:stylesheet>
Import Precedence
The import precedence is more important than the priority precedence.
XML Source
<?xml version="1.0"?>
<xslTutorial >
<AAA id='a1' pos='start'>
<BBB id='b1'/>
<BBB id='b2'/>
</AAA>
<AAA id='a2'>
<BBB id='b3'/>
<BBB id='b4'/>
<CCC id='c1'>
<CCC id='c2'/>
</CCC>
<BBB id='b5'>
<CCC id='c3'/>
</BBB>
</AAA>
</xslTutorial>
HTML Output 1
<H3 style="color:blue">CCC (id=c1)</H3>
<H3 style="color:blue">CCC (id=c2)</H3>
<H3 style="color:blue">CCC (id=c3)</H3>
HTML Output 2
XSL Stylesheet 1
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:template match="/">
<xsl:apply-templates select="//CCC"/>
</xsl:template>
<xsl:template match="CCC" priority="10">
<H3 style="color:blue">
<xsl:value-of select="name()"/>
<xsl:text> (id=</xsl:text>
<xsl:value-of select="@id"/>
<xsl:text> )</xsl:text>
</H3>
</xsl:template>
</xsl:stylesheet>
XSL Stylesheet 2
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>
<xsl:import href="id2.xsl"/>
<xsl:template match="/">
<xsl:apply-templates select="//CCC"/>
</xsl:template>
<xsl:template match="CCC" priority="-100">
<H3 style="color:red">
<xsl:value-of select="name()"/>
<xsl:text> (id=</xsl:text>
<xsl:value-of select="@id"/>
<xsl:text> )</xsl:text>
</H3>
</xsl:template>
</xsl:stylesheet>
9
XSLT BASICS
XHTML
L W
SQL
M
XM L
Result Tree
Transformation
Source Tree
HTML
CHESSBOARD
FIGURE 9.2 XSL Transformations transform an XML source document into another document that
can be of any format (XML, HTML, text, and so on) by applying a style sheet.
An XSLT processor reads both a source XML document and an XSL style
sheet. The XSL style sheet is itself a well-formed XML document. Depending
on the implementation, an XSLT engine may be able to read an input source
as SAX events or DOM trees and also generates SAX events or DOM trees.
This program uses an XSLT engine and a style sheet to transform an XML
document describing a set of chessboard configurations into its corresponding
text format. A TransformerFactory is used to create a new Transformer from
the style sheet and the Transformer is then used to process the source docu-
ment generated.
In this context, attributes are not considered children of the elements that
contain them, so attributes get ignored by the XSLT processor unless they are
explicitly referenced by the XSLT document.
PROCESSING A TRANSFORMATION
A transformation can take place in one of three locations:
●● On the server
●● On the client (for example, your Web browser)
●● With a standalone program
The examples in this chapter use the client for transforming the XML
documents.
You might have noticed that the “After” shot contains more than the
raw XML file. It contains a heading (“Good Books”) and some text (“Yes, go
through these books!”). This is one of the benefits of XSLT.
Step 2 (XSL file): Create a file with the following content and save it as
Books.xsl into the same directory as the XML file.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>XML XSL Example</title>
<style type="text/css">
body
{
margin:10px;
background-color:#ccff00;
font-family:verdana,helvetica,sans-serif;
}
.Book-authorname
{
display:block;
font-weight:bold;
}
.Book-booktitle
{
display:block;
color:#636363;
font-size:small;
font-style:italic;
}
</style>
</head>
<body>
<h2>Good Books</h2>
<p> Yes, go through these books!</p>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="Book">
<span class="Book-authorname"><xsl:value-of select="authorname"/></span>
<span class="Book-booktitle "><xsl:value-of select="booktitle "/></span>
</xsl:template>
</xsl:stylesheet>
This XSL file contains XSL markup, HTML markup, and CSS.
XSLT SYNTAX
All XSLT documents need to be well-formed and valid XML documents, so
you need to follow the same syntax rules that apply to any other XML docu-
ment. As well as ensuring that your XSLT documents are valid XML, you
need to ensure they are valid XSLT documents.
XML VERSION
XSL documents are also XML documents and so we should include the XML
version in the document’s prolog. We should also set the standalone attribute
to “no” as we now rely on an external resource (i.e., the external XSL file).
<?xml version="1.0" standalone="no"?>
Syntax
<xsl:element_authorname>
Example
<xsl:template match="/">
....
</xsl:template>
Example
In this case, we select the root node (i.e., Books). By selecting this node, the
template element tells the XSLT processor how to transform the output. We
tell the processor to replace the root node (i.e., the whole XML document)
with what is written between the <xsl:template> tags.
In this case, the contents of an HTML document are written inside the
<xsl:template> tags. When a user views any XML document that uses this
XSL document, they will simply see the line “New content...” and the brows-
er’s title bar will read “My XSLT Example.”
<xsl:template match="Books">
<html>
<head>
<title>My XSLT Example</title>
</head>
<body>
<p>New content...</p>
</body>
</html>
</xsl:template>
Example
<xsl:template match="/">
<html>
<head>
<title>My XSLT Example</title>
</head>
<body>
<p>New content...</p>
</body>
</html>
</xsl:template>
USAGE EXAMPLE
Here, we are using two <xsl:template> elements; one for the root node and
one for its children. We have placed the <xsl:apply-templates/> element
within the <xsl:template> element for the root node. Doing this applies the
results of our other <xsl:template> element.
<xsl:template match="/">
(other content/HTML markup goes here)
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="child">
(other content/XSLT/HTML markup goes here)
</xsl:template>
So, by doing this, we can use other XSLT elements to retrieve data from
the child elements, and pass it to the main template for display. In particular,
the XSLT <xsl:value-of/> element is useful for retrieving data from an XML
element. We’ll look at that element next.
USAGE EXAMPLE
This example is a continuation of the example from the previous lesson. Here,
we have added the <xsl:value-of/> element to extract data from the child
nodes called “authorname” and “booktitle.”
<xsl:template match="/">
(other content/HTML markup goes here)
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Book">
<xsl:value-of select="authorname"/>
<xsl:value-of select="booktitle "/>
</xsl:template>
So, let’s have another look at our XML document, and see which values
will be selected:
<?xml version="1.0" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="Books.xsl"?>
<Books>
<Book>
<authorname>Shashi Banzal</authorname>
<booktitle >XML Book</booktitle>
</Book>
<Book>
<authorname>S Sharma</authorname>
<booktitle>HTML Book</booktitle >
</Book>
</Books>
And just to refresh your memory, these values will be displayed where we
choose to place the XSLT <xsl:apply-templates> element.
<xsl:for-each> EXAMPLE
Here, we use <xsl:for-each> to loop through each “authorname” element and
<xsl:value-of> to extract data from each node.
Note the value of the select attribute (“.”). This expression specifies the
current node. The <xsl:element authorname=“br”/> element/attribute is there
simply for readibility purposes. It provides a line break after each iteration.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Book">
<xsl:for-each select="authorname">
<xsl:value-of select="."/><xsl:element authorname="br"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
<xsl:sort> Example
Here, we use <xsl:for-each> to loop through each “Book” element and
<xsl:sort> to sort by the “authorname” node. We then use the <xsl:value-of>
to extract data from the “authorname” node.
RESULT
So, let’s see what would happen if we applied the above XSLT document to
the following XML document:
<Books>
<Book>
<authorname>Shashi Banzal</authorname>
<booktitle>XML Book</booktitle>
</Book>
<Book>
<authorname>S Sharma</authorname>
<booktitle>HTML Book</booktitle>
</Book>
</Books>
BEFORE
This is how the contents would be displayed before applying the <xsl:sort>
element:
XML Book
HTML Book
AFTER
This is how the contents would be displayed after applying the <xsl:sort>
element:
HTML Book
XML Book
<xsl:if> Example
THE SOURCE FILE
Imagine you have an XML file containing a list of food and its nutritional
value.
<?xml version="1.0"?>
<food_list>
<food_item type="vegetable">
<name>Tomato</name>
<carbs_per_serving>81</carbs_per_serving>
<fiber_per_serving>8</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>1280</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Spinach</name>
<carbs_per_serving>1</carbs_per_serving>
<fiber_per_serving>1</fiber_per_serving>
<fat_per_serving>0</fat_per_serving>
<kj_per_serving>40</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>French beans</name>
<carbs_per_serving>0</carbs_per_serving>
<fiber_per_serving>1</fiber_per_serving>
<fat_per_serving>0</fat_per_serving>
<kj_per_serving>14</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Lady finger</name>
<carbs_per_serving>21.5</carbs_per_serving>
<fiber_per_serving>2</fiber_per_serving>
<fat_per_serving>1</fat_per_serving>
<kj_per_serving>460</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Broccoli</name>
<carbs_per_serving>6</carbs_per_serving>
<fiber_per_serving>1</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>150</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Carrots</name>
<carbs_per_serving>30.5</carbs_per_serving>
<fiber_per_serving>2</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>550</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Sweet Potatoes</name>
<carbs_per_serving>1.5</carbs_per_serving>
<fiber_per_serving>1.5</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>55</kj_per_serving>
</food_item>
<food_item type="seafood">
<name>Crab</name>
<carbs_per_serving>0</carbs_per_serving>
<fiber_per_serving>0</fiber_per_serving>
<fat_per_serving>1</fat_per_serving>
<kj_per_serving>400</kj_per_serving>
</food_item>
<food_item type="seafood">
<name>Crawfish</name>
<carbs_per_serving>0</carbs_per_serving>
<fiber_per_serving>0</fiber_per_serving>
<fat_per_serving>2</fat_per_serving>
<kj_per_serving>390</kj_per_serving>
</food_item>
<food_item type="fruit">
<name>Orange</name>
<carbs_per_serving>15</carbs_per_serving>
<fiber_per_serving>2.5</fiber_per_serving>
<fat_per_serving>0</fat_per_serving>
<kj_per_serving>250</kj_per_serving>
</food_item>
<food_item type="fruit">
<name>Banana</name>
<carbs_per_serving>7.5</carbs_per_serving>
<fiber_per_serving>2.5</fiber_per_serving>
<fat_per_serving>0</fat_per_serving>
<kj_per_serving>150</kj_per_serving>
</food_item>
<food_item type="grain">
<name>Rice</name>
<carbs_per_serving>62</carbs_per_serving>
<fiber_per_serving>14</fiber_per_serving>
<fat_per_serving>7</fat_per_serving>
<kj_per_serving>1400</kj_per_serving>
</food_item>
<food_item type="grain">
<name>Corn</name>
<carbs_per_serving>1.5</carbs_per_serving>
<fiber_per_serving>1</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>70</kj_per_serving>
</food_item>
</food_list>
THE SOLUTION
To achieve the above outcome, we use <xsl:for-each> to loop through each
“food_item” element and <xsl:if> to check the value of the “type” attribute
(we do this by using the @ symbol—that’s how you specify an attribute). If the
attribute value equals “vegetable,” we output the details.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="food_list">
<table>
<tr style="background-color:#ccff00">
<th>Food Item</th>
<th>Carbs (g)</th>
<th>Fiber (g)</th>
<th>Fat (g)</th>
<th>Energy (kj)</th>
</tr>
<xsl:for-each select="food_item">
<xsl:if test="@type = 'vegetable'">
<tr style="background-color:#00cc00">
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="carbs_per_serving"/></td>
<td><xsl:value-of select="fiber_per_serving"/></td>
<td><xsl:value-of select="fat_per_serving"/></td>
<td><xsl:value-of select="kj_per_serving"/></td>
</tr>
</xsl:if>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>
<xsl:choose> Example
THE SOURCE FILE
Imagine we have an XML file containing different food items and their nutri-
tional value—like this:
<?xml version="1.0"?>
<food_list>
<food_item type="vegetable">
<name>Tomato</name>
<carbs_per_serving>81</carbs_per_serving>
<fiber_per_serving>8</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>1280</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Spinach</name>
<carbs_per_serving>1</carbs_per_serving>
<fiber_per_serving>1</fiber_per_serving>
<fat_per_serving>0</fat_per_serving>
<kj_per_serving>40</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>French beans</name>
<carbs_per_serving>0</carbs_per_serving>
<fiber_per_serving>1</fiber_per_serving>
<fat_per_serving>0</fat_per_serving>
<kj_per_serving>14</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Lady finger</name>
<carbs_per_serving>21.5</carbs_per_serving>
<fiber_per_serving>2</fiber_per_serving>
<fat_per_serving>1</fat_per_serving>
<kj_per_serving>460</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Broccoli</name>
<carbs_per_serving>6</carbs_per_serving>
<fiber_per_serving>1</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>150</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Carrots</name>
<carbs_per_serving>30.5</carbs_per_serving>
<fiber_per_serving>2</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>550</kj_per_serving>
</food_item>
<food_item type="vegetable">
<name>Sweet Potatoes</name>
<carbs_per_serving>1.5</carbs_per_serving>
<fiber_per_serving>1.5</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>55</kj_per_serving>
</food_item>
<food_item type="seafood">
<name>Crab</name>
<carbs_per_serving>0</carbs_per_serving>
<fiber_per_serving>0</fiber_per_serving>
<fat_per_serving>1</fat_per_serving>
<kj_per_serving>400</kj_per_serving>
</food_item>
<food_item type="seafood">
<name>Crawfish</name>
<carbs_per_serving>0</carbs_per_serving>
<fiber_per_serving>0</fiber_per_serving>
<fat_per_serving>2</fat_per_serving>
<kj_per_serving>390</kj_per_serving>
</food_item>
<food_item type="fruit">
<name>Orange</name>
<carbs_per_serving>15</carbs_per_serving>
<fiber_per_serving>2.5</fiber_per_serving>
<fat_per_serving>0</fat_per_serving>
<kj_per_serving>250</kj_per_serving>
</food_item>
<food_item type="fruit">
<name>Banana</name>
<carbs_per_serving>7.5</carbs_per_serving>
<fiber_per_serving>2.5</fiber_per_serving>
<fat_per_serving>0</fat_per_serving>
<kj_per_serving>150</kj_per_serving>
</food_item>
<food_item type="grain">
<name>Rice</name>
<carbs_per_serving>62</carbs_per_serving>
<fiber_per_serving>14</fiber_per_serving>
<fat_per_serving>7</fat_per_serving>
<kj_per_serving>1400</kj_per_serving>
</food_item>
<food_item type="grain">
<name>Corn</name>
<carbs_per_serving>1.5</carbs_per_serving>
<fiber_per_serving>1</fiber_per_serving>
<fat_per_serving>0.5</fat_per_serving>
<kj_per_serving>70</kj_per_serving>
</food_item>
</food_list>
Now, imagine if we want to present the contents of our XML file in a table
and highlight the rows a different color depending on the type of food it is.
THE SOLUTION
We could do this using the following XSL file. In this file, we check the type
attribute of the <food_item> element. We can find the value of the attribute
by typing its name with a @. If the value is “grain,” we specify one color. If it’s
“vegetable,” we specify another. If it’s neither of these, we specify a default
color using the following code:
<xsl:otherwise>.
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="food_list">
<table>
<tr style="background-color:#ccff00">
<th>Food Item</th>
<th>Carbs (g)</th>
<th>Fiber (g)</th>
<th>Fat (g)</th>
<th>Energy (kj)</th>
</tr>
<xsl:for-each select="food_item">
<xsl:choose>
<xsl:when test="@type = 'grain'">
<tr style="background-color:#cccc00">
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="carbs_per_serving"/></td>
<td><xsl:value-of select="fiber_per_serving"/></td>
<td><xsl:value-of select="fat_per_serving"/></td>
<td><xsl:value-of select="kj_per_serving"/></td>
</tr>
</xsl:when>
<xsl:when test="@type = 'vegetable'">
<tr style="background-color:#00cc00">
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="carbs_per_serving"/></td>
<td><xsl:value-of select="fiber_per_serving"/></td>
<td><xsl:value-of select="fat_per_serving"/></td>
<td><xsl:value-of select="kj_per_serving"/></td>
</tr>
</xsl:when>
<xsl:otherwise>
<tr style="background-color:#cccccc">
<td><xsl:value-of select="name"/></td>
<td><xsl:value-of select="carbs_per_serving"/></td>
<td><xsl:value-of select="fiber_per_serving"/></td>
<td><xsl:value-of select="fat_per_serving"/></td>
<td><xsl:value-of select="kj_per_serving"/></td>
</tr>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>
10
SOAP
SOAP
SOAP is a protocol for accessing a Web service. SOAP is a simple XML-based
protocol that lets applications exchange information over HTTP.
SOAP acts as a medium to provide basic messaging framework. Abstract
layers are built on these basic messaging frameworks. It transfers messages
across the board in different protocols; it also acts as a medium to transmit
XML-based messages over the network.
The Simple Object Access Protocol (SOAP) uses XML to define a proto-
col for the exchange of information in distributed computing environments.
SOAP consists of three components: an envelope, a set of encoding rules, and
a convention for representing remote procedure calls. Unless experience with
SOAP is a direct requirement for the open position, knowing the specifics
of the protocol, or how it can be used in conjunction with HTTP, it is not as
important as identifying it as a natural application of XML.
Simple Object Access Protocol (SOAP) version 1.1 is an industry standard
designed to improve cross-platform interoperability using the Web and XML.
The Web has evolved from simply pushing out static pages to creating cus-
tomized content that performs services for users. A user can be a customer
retrieving specialized Web pages for placing orders or a business partner using
a customized form for reviewing stock and sales figures. A wide range of com-
ponents located on various computers are involved in performing these Web-
based services. Because these systems consist of many computers, including
the client computer, middle-tier servers, and usually a database server, these
systems are called distributed systems. To understand how SOAP works, let’s
take a look at the distributed system first.
Advantages
There are some advantages of SOAP, such as:
1. SOAP bypasses all firewalls.
2. It has a huge collection of protocols.
3. It is platform- and language-independent.
4. It is simple and extensible.
Disadvantages
There are some disadvantages of SOAP, such as
1. SOAP is slower than middleware technologies (CORBA or RMI or IIOP)
due to the lengthy XML format that it has to follow and the parsing of the
envelope that is required.
2. SOAP depends on WSDL and does not have any standardized mecha-
nism for the dynamic discovery of the services.
3. The usage of HTTP for transporting messages, and not the defined ESB
or WS-Addressing interaction of parties over a message, is fixed.
4. Information regarding the usability of HTTP for different purposes is not
present, which makes the application protocol level problematic.
SOAP SYNTAX
SOAP Building Blocks
A SOAP message is an ordinary XML document containing the following
elements.
Envelope Element
It identifies the XML document as a SOAP message. A SOAP message always
appears within an envelope.
Header Element
The header element is optional and it contains header information.
Body Element
The body element is required and it contains call and response information.
Fault Element
The fault element is optional. It provides information about the errors that
occurred while processing the message.
All the elements above are declared in the default namespace for the
SOAP envelope:
http://www.w3.org/2001/12/soap-envelope
and the default namespace for SOAP encoding and data types is
http://www.w3.org/2001/12/soap-encoding
<soap:Head>
<soap:/Head>
<soap:Body>
<GetName>
<FirstName>ABC</FirstName>
<LastName>XYZ</LastName>
</GetName>
</soap:Body>
</soap:Envelope>
Example
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
...
Message information goes here
...
</soap:Envelope>
Syntax
soap:encodingStyle="URI"
Example
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
...
Message information goes here
...
</soap:Envelope>
Example
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Header>
<m:Trans xmlns:m="http://www.abc.com/transaction/"
soap:mustUnderstand="1">234
</m:Trans>
</soap:Header>
...
...
</soap:Envelope>
Syntax
soap:mustUnderstand="0|1"
Example
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Header>
<m:Trans xmlns:m="http://www.abc.com/transaction/"
soap:mustUnderstand="1">234
</m:Trans>
</soap:Header>
...
...
</soap:Envelope>
may be intended for the ultimate endpoint; instead, it may be intended for
one or more of the endpoints on the message path.
The SOAP actor attribute is used to address the header element to a spe-
cific endpoint.
Syntax
soap:actor="URI"
Example
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Header>
<m:Trans xmlns:m="http://www.abc.com/transaction/"
soap:actor="http://www.abc.com/appml/">234
</m:Trans>
</soap:Header>
...
...
</soap:Envelope>
Syntax
soap:encodingStyle="URI"
Example
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Body>
<m:GetPrice xmlns:m="http://www.abc.com/prices">
<m:Item>Apples</m:Item>
</m:GetPrice>
</soap:Body>
</soap:Envelope>
The example above requests the price of apples. Note that the m:Get-
Price and the Item elements above are application-specific elements. They
are not a part of the SOAP namespace. A SOAP response could look some-
thing like this:
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Body>
<m:GetPriceResponse xmlns:m="http://www.abc.com/prices">
<m:Price>1.90</m:Price>
</m:GetPriceResponse>
</soap:Body>
</soap:Envelope>
Faultstring
<faultstring xmlns="> string </faultstring>
Faultactor
<faultactor xmlns="> uriReference </faultactor>
Detail
<detail xmlns="> any number of elements in any namespace </detail>
CONTENT-TYPE
The Content-Type header for a SOAP request and response defines the
MIME type for the message and the character encoding (optional) used for
the XML body of the request or response.
Syntax
Content-Type: MIMEType; charset=character-encoding
Example
POST /item HTTP/1.1
Content-Type: application/soap+xml; charset=utf-8
CONTENT-LENGTH
The Content-Length header for a SOAP request and response specifies the
number of bytes in the body of the request or response.
Syntax
Content-Length: bytes
Example
POST/item HTTP/1.1
Content-Type: application/soap+xml; charset=utf-8
Content-Length: 250
A SOAP EXAMPLE
In the example below, a GetStockPrice request is sent to a server. The
request has a StockName parameter, and a Price parameter that will be
returned in the response. The namespace for the function is defined in http://
www.example.org/stock.
A SOAP Request
POST/InStock HTTP/1.1
Host: www.example.org
Content-Type: application/soap+xml; charset=utf-8
Content-Length: nnn
<?xml version="1.0"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Body xmlns:m="http://www.example.org/stock">
<m:GetStockPrice>
<m:StockName>IBM</m:StockName>
</m:GetStockPrice>
</soap:Body>
</soap:Envelope>
end to another end using SOAP. Both SMTP and HTTP are successful trans-
port protocols used in transmitting information.
REQUEST HEADERS
A typical HTTP message in a SOAP request being passed to a Web server
looks like this:
POST/Order HTTP/1.1
Host: www.northwindtraders.com
Content-Type: text/xml
Content-Length: nnnn
SOAPAction: “urn:northwindtraders.com:PO#UpdatePO”
Information being sent is located here.
The first line of the message contains three separate components: the
request method, the request URI, and the protocol version. In this case, the
request method is POST; the request URI is /Order; and the version number
is HTTP/1.1. The Internet Engineering Task Force (IETF) has standardized
the request methods. The GET method is commonly used to retrieve infor-
mation on the Web. The POST method is used to pass information from the
client to the server. The information passed by the POST method is then used
by applications on the server. Only certain types of information can be sent
using GET; any type of data can be sent using POST. SOAP also supports
sending messages using M-POST. When working with the POST method in
a SOAP package, the request URI contains the name of the method to be
invoked.
The second line is the URL of the server that the request is being sent
to. The request URL is implementation-specific—that is, each server defines
how it will interpret the request URL. In the case of a SOAP package, the
request URL usually represents the name of the object that contains the
method being called.
The third line contains the content type, text/xml, which indicates that the
payload is XML in the plain text format. The payload refers to the essential
data being carried to the destination. The payload information could be used
by a server or a firewall to validate the incoming message. A SOAP request
must use the text/xml as its content type. The fourth line specifies the size of
the payload in bytes. The content type and content length are required with
a payload.
The SOAPAction header field must be used in a SOAP request to specify
the intent of the SOAP HTTP request. The fifth line of the message, SOAPAc-
tion: “urn: northwindtraders.com:PO#UpdatePO,” is a namespace followed
by the method name. By combining this namespace with the request URL,
our example calls the UpdatePO method of the Order object and is scoped
RESPONSE HEADERS
A typical response message that contains the response headers is shown here:
200 OK
Content-Type: text/plain
Content-Length: nnnn
The first line of this message contains a status code and a message associ-
ated with that status code. In this case, the status code is 200 and the message
is OK, meaning that the request was successfully decoded and that an appro-
priate response was returned. If an error had occurred, the following headers
might have been returned:
400 Bad Request
Content-Type: text/plain
Content-Length: 0
In this case, the status code is 400 and the message is Bad Request, mean-
ing that the request cannot be decoded by the server because of incorrect
syntax.
defined using the HTTP Extension Framework. This method is used when
you are including mandatory information in the HTTP header, just as you
used the mustUnderstand attribute in the SOAP header element.
SOAP supports both POST and M-POST requests. A client first makes a
SOAP request using M-POST. If the request fails and either a 501 status code
or a 510 status code returns, the client should retry the request using the POST
method. If the client fails the request again and a 405 status code returns, the
client should fail the request. If the returning status code is 200, the message
has been received successfully. Firewalls can force a client to use the M-POST
method to submit SOAP requests by blocking regular POSTs of the text/xml-
SOAP content type.
If you use M-POST, you must use a mandatory extension declaration that
refers to a namespace in the envelope element declaration. The namespace
prefix must precede the mandatory headers. The following example illustrates
how to use M-POST and the mandatory headers:
M-POST /Order HTTP/1.1
Host: www.northwindtraders.com
Content-Type: text/xml
Content-Length: nnnn
Man: “http://schemas.xmlsoap.org/soap/envelope; ns=49”
49-SOAPAction: “urn:northwindtraders.com:PO#UpdatePO”
SOAP ENCODING
The SOAP encoding style provides a means to define data types similar to
what is found in most programming languages, including types and arrays.
SOAP defines simple and complex data types just as the schema standard
does. The simple type elements are the same as those defined in the second
schema standard. The complex type elements include those defined in the
first SOAP standard and are a special way of defining arrays. Structures follow
the definitions of the complex type. For example, we could have the following
structure:
<e:Customer>
<CName>Janson Maru</CName>
<Address>18,I.G. NAGAR</Address>
<ID>4</ID>
</e:Customer>
11
DOM PROGRAMMING
INTERFACE
modern hierarchical and relational databases. This makes it very easy to move
information between a database and an XML file using DOM.
We learned the ways to handle the structure of an XML document and
the ways to describe the hierarchical information. We now discuss the ways to
access the XML document from the programs. One of these ways is through
the Document Object Model.
The W3C specifies that the DOM is a language; it is a platform neutral
definition, that is, interfaces are defined for the different objects comprising
this DOM, but no specifics of implementation are provided, and it could be
done in any programming language.
The DOM layout a standard functionality for document navigation and
the manipulation of the content and structure of HTML and XML documents.
doc
title p p p p
ID=“p13”
a
rendering marked up text. Rendering can even take the form of configuring
an application or executing remote procedures.
The DOM model is easy to understand. Figure 11.2 shows the architec-
ture of an XML app using DOM.
A parser reads the XML file and builds a DOM document to match the
XML file. From that point until a save is performed, all interaction between
the app and XML hits the DOM document rather than the corresponding
XML file. It’s interesting to note that almost all XML parsers use SAX. The
reason is simple enough. Before you build a DOM document, you must
detect events such as the start of an element (start tag encountered), end of
element (end tag encountered), new attribute (name followed by equal sign
followed by quoted string encountered), and the like. DOM can be thought
of as an extra abstraction to lessen the programmer’s workload at the expense
of memory usage.
Modifications are made directly to the DOM document. Elements can be
added, deleted, renamed, and rearranged. Text nodes can be added, deleted,
or changed. Elements can be moved either within the same level or promoted
or demoted to different levels.
Obviously, the DOM is modified in apps that rewrite the XML file. But
DOM modification is also often done in an app that only renders the XML.
The classic example is in a “DOMWalker” app, which simply walks the DOM
tree and prints what it finds in a hierarchical outline. In fact, the new lines
and spaces intended to make the XML file more readable are actually legiti-
mate text nodes in XML, but in an XML app concerned only with a hierarchy,
they’re extraneous. Therefore, the first thing a DOMWalker program does is
delete text nodes made up only of whitespace.
Rendering is the challenging part of most XML apps. It’s often graphics
intensive. Consider the Dia vector drawing program, which keeps all drawing
information in XML but renders it as geometric shapes. Often, there are sev-
eral rendering processes, one for each kind of output. Thus a book authored
in XML could be rendered as a paper book, as a PDF, as a Postscript file, or
as an HTML page or series of HTML pages. Indeed, this is one of the primary
benefits of stylesheet-based documents. Often the rendering itself is decou-
pled from the app by use of XSL (eXtensible Style Language), much the same
as program logic is decoupled from the app using XML.
Rewriting the XML file is easy—about what you’d expect for your last
class project in a college Programming 101 course. In the case of DOM,
you’ve already assembled the output in a DOM document, so you just walk its
tree and write the markup.
In the case of SAX-based XML apps it’s a little harder because you often
don’t read the information in the same order you want to write it. In other
words, if your app’s specification calls for something occurring later in the
input modifying something earlier in the output, you can’t just use a read-
write loop. So you do the typical stuff—keep some things in memory, or may
be write an intermediate file and then sort it, or run 2 passes through the
XML. This is why for apps interacting with small XML files, DOM are better.
FIGURE 11.3 When using the DOM API, the application has to access or edit an in-memory represen-
tation of the source document.
The program presented below uses the DOM API to parse and load in
memory an XML document describing a set of chessboard configurations. It
then walks through the resulting DOM tree and outputs the same chessboard
configurations in text format.
Two different implementations are used to highlight the potential differ-
ences in performance when using different methods of the DOM API: either
accessing the elements by their names or relative to their parents.
DOM IMPLEMENTATION
DOM has been implemented in many languages:
●● DOM has been implemented in Java in J2SDK 1.4.1.
●● DOM has been implemented in PHP 4.
●● DOM has been implemented in Perl. XML::DOM is a DOM Perl imple-
mentation developed by Enno Derkson.
●● DOM has been implemented in JavaScript and Web browsers supporting
JavaScript.
We now take a look at the objects, methods, and properties that make up
the DOM Level 1 specification. The behavior that is specified applies only
to XML documents; the DOM may behave differently when used to access
HTML documents.
DOM Example
Look at the following XML file (books.xml):
The root node in the XML above is named <bookstore>. All other nodes
in the document are contained within <bookstore>. The root node <book-
store> holds four <book> nodes.
The first <book> node holds four nodes: <title>, <author>, <year>, and
<price>, which contain one text node each. Text is Always Stored in Text
Nodes a, s, t, n.
A common error in DOM processing is to expect an element node to
contain text. However, the text of an element node is stored in a text node.
FIGURE 11.6 A part of the node tree and the relationship between the nodes
In the XML above, the <title> element is the first child of the <book>
element, and the <price> element is the last child of the <book> element.
Furthermore, the <book> element is the parent node of the <title>,
<author>, <year>, and <price> elements.
often resemble “tag soup“—an unordered list of data elements with mean-
ingful tag names, but containing the same level of information as a flat file.
The ability of XML that many developers overlook is its ability to show rela-
tionships between elements-specifically, the ability to imply a parent-child
relationship between two elements. The invoice example shows the prepara-
tion of an XML document called INVOICE that could be better expressed in
XML as shown in the following code.
<INVOICE>
<CUSTOMER NAME = "Sam"
ADDRESS = "57, M.G. Road"
CITY = "Bangalore"
STATE = "Karnataka">
<LINEITEM PRODUCT = "Cheese"
UNITS = "2"/>
<LINEITEM PRODUCT = "Champagne"
UNITS = "3"/>
<LINEITEM PRODUCT = "Gel"
UNITS = "5"/>
<LINEITEM PRODUCT = "Bread"
UNITS = "4"/>
</INVOICE>
This document structure can be represented as a node tree that shows all
the elements and their relationships to one another.
With the DOM, we would be able to operate on the document in the
node form with its tree structure. We would be able to add any information
easily and attach it as a child to the node rather than to read through the infor-
mation and go past the last item to insert new information.
When the DOM is used to manipulate an XML text file, the first thing it
does is parse the file, breaking the file out into individual elements, attributes,
and comments.
The next thing it does is create, in the memory, a representation of the
XML file as a node tree. The developer may access the contents of the docu-
ment through the node tree and make the necessary modifications.
The DOM goes a step further and treats every item as a node—elements,
attributes, comments, processing instructions, and text. The DOM provides a
robust set of interfaces to facilitate the manipulation of the DOM node tree.
XML PARSER
The XML DOM contains methods (functions) to traverse XML trees, and
access, insert, and delete nodes. However, before an XML document can be
accessed and manipulated, it must be loaded into an XML DOM object. An
XML parser reads XML and converts it into an XML DOM object that can be
accessed with JavaScript.
Most browsers have a built-in XML parser.
XML Document
XML Processor Application
XML Tree
12
SAX (SIMPLE API FOR XML)
INTRODUCTION TO SAX
XML 1.0 allows you to encode your information in textual form and create
tags which allow you to structure the information stored in XML documents.
This information must, be read by some program to do something useful, like
viewing, modifying or printing it.
In order for your programs to access this information, you can use the
SAX (Simple API for XML) or the DOM (Document Object Model) APIs.
Both of these APIs must be implemented by the XML parser of your choice
(which also must be written in the programming language of your choice).
For Java, these parsers include the Sun TR2 XML Parser, Data channel
XJ2, IBM XML Parser for Java, and OpenXML (among many others). All of
these parsers implement the SAX API (and also the DOM API). There are
fewer differences in the implementation of SAX compared to the implemen-
tation of DOM 1.0 (simply because SAX is so much smaller and simpler than
DOM).
So, Java programs must use an XML Parser written in Java that imple-
ments the SAX API in order to use SAX.
overhead involved in building these trees in memory. It’s not unusual for large
files to completely overrun a system’s capacity. In addition, creating a DOM
tree can be a very slow process.
FIGURE 12.1 When using the SAX API, the bare minimum a developer has to do is to implement
a DocumentHandler (ContentHandler in SAX 2.0) or subclass BaseHandler (DefaultHandler)
created the custom object model and the SAX document handler you can use
the SAX parser to create instances of your custom object model based on the
data stored in your XML documents.
This process is illustrated using an example in the following paragraphs.
This example shows you how to perform these 3 steps for an AddressBook
example. The example problem is that you have an XML document which
contains your address book and you would like to view this address book using
a Swing program and a Servlet. Also, you would like to use a SAX parser
to do this instead of using a DOM parser. The first thing to do is create an
object model and deal with the SAX parser issues before even thinking about
the presentation layers (Swing and Servlet) for object model (AddressBook).
Here is what the address book XML document looks like:
1: <?xml version = "1.0"?>
2:
3: <addressbook>
4:
5: <person>
6:
7: <lastname>Idris</lastname>
8: <firstname>Nazmul</firstname>
9: <company>The Bean Factory, LLC.</company>
10: <email>xml@beanfactory.com</email>
11:
12: </person>
13:
14: </addressbook>
Please note the toXML() method. This method returns a string that con-
tains the XML representation of a Person object. This kind of method is not
only very useful for debugging, but it can be used to save Person objects to an
XML file (or other kind of XML persistence/storage engine). The Address-
Book class also has an toXML() method, and that method uses the Person
class’s toXML() method, too.
Here is a listing of the AddressBook class:
1: public class AddressBook{
2:
3: // Data Members
4: List persons = new java.util.ArrayList();
5:
6:
7: // mutator method
8: public void addPerson( Person p ){persons.add( p );}
9:
10:
11: // accessor methods
12: public int getSize(){ return persons.size();}
13: public Person getPerson( int i ){
14: return (Person)persons.get( i );}
15:
16: // toXML method
17: public String toXML(){
18: StringBuffer sb = new StringBuffer();
19: sb.append( "<?xml version=\"1.0\"?>\n" );
20: sb.append( "<ADDRESSBOOK>\n\n" );
21: for(int i=0; i<persons.size(); i++) {
22: sb.append( getPerson(i).toXML() );
23: sb.append( "\n" );
24: }
25: sb.append( "</ADDRESSBOOK>" );
26: return sb.toString();
27: }}
As you can see, these are very simple classes. The interesting part (in this
case) is Step 3.
The code example above uses the Sun TR2 parser. The classes used
from TR2 include the com.sun.xml.parser.Parser, which is used to create a
non-validating SAX parser.
Step 3: Creating a DocumentHandler
The SAX parser that was created in Step 2 reads an XML document and
fires events as it encounters open tags, close tags, CDATA, and #PCDATA
sections. These events are fired as the SAX parser reads the XML document
from top to bottom, a tag at a time. In order for the SAX parser to notify some
object that these events are occurring, an interface called DocumentHandler
is used (it’s in the org.xml.sax package). There are 3 other interfaces that exist
called EntityResolver, DTDHandler, and ErrorHandler. These 4 interfaces
together include all the methods that correspond to all possible events that
the SAX parser can fire (as its reading an XML document). The most fre-
quently used interface is the DocumentHandler interface. You have to pro-
vide an implementation of at least the DocumentHandler interface to the
SAX parser, which then will invoke the right methods in the right sequence
on your DocumentHandler implementation class. As the SAX parser reads
an XML document, events are fired, which are then translated into method
calls on all the “registered document event listeners” (which is your Docu-
mentHandler implementation class). So as these events are fired as the XML
document is read, method calls are made on your Document Handler imple-
mentation class. This class must do something useful with these method calls
and the sequence of the calls.
</response>
<response username="sue">
<question subject="appearance">C</question>
<question subject="communication">A</question>
<question subject="ship">A</question>
<question subject="inside">D</question>
<question subject="implant">A</question>
</response>
<response username="carol">
<question subject="appearance">A</question>
<question subject="communication">C</question>
<question subject="ship">A</question>
<question subject="inside">D</question>
<question subject="implant">C</question>
</response>
</surveys>
p.parse(x,h);
} catch (ParserConfigurationException e) {
System.out.println(e.toString());
} catch (SAXException e) {
System.out.println(e.toString());
} catch (IOException e) {
System.out.println(e.toString());
}
}
private static class MyContentHandler extends DefaultHandler {
static String p = "_";
public void startDocument() throws SAXException {
System.out.println("Starting document...");
}
public void endDocument() throws SAXException {
System.out.println("Ending document...");
}
public void startElement(String ns, String sName, String qName,
Attributes attrs) throws SAXException {
String eName = sName;
if (sName.equals("")) eName = qName;
System.out.println("e"+p+eName);
if (attrs!=null) {
for (int i=0; i<attrs.getLength(); i++) {
String aName = attrs.getLocalName(i);
if (aName.equals("")) aName = attrs.getQName(i);
System.out.println("a"+p+" "+aName+"="
+attrs.getValue(i));
}
}
p = p + "_";
}
public void endElement(String ns, String sName, String qName)
throws SAXException {
p = p.replaceFirst("___ ", "_");
}
public void characters(char buf[], int offset, int len)
throws SAXException {
String s = new String(buf, offset, len);
System.out.println("c"+p+s);
}
public void ignorableWhitespace(char buf[], int offset, int len)
throws SAXException {
String s = new String(buf, offset, len);
System.out.println(“i"+p+s);
}
}
}
c__
c__
c__
c__
e__first_name
c__John
c__
c__
c__
e__last_name
c__Smith
c__
c__
Ending document...
The program still works. But why did the parser fire so many “charac-
ters()” events? It looks like the parser didn’t group the space character, line
feed, and cartridge return into a single char[] and fire one “characters()”
event. It fired multiple events, one per character.
13
XPATH
XPATH INTRODUCTION
XML was created to be a self-describing markup format. As XML matured,
new XML-related creations were invented. XPath is a syntax for defining parts
of an XML document.
Although you could create a nicely structured document with XML, there
didn’t seem to be an easy way to find information inside the document.
XML documents can be thought of as a tree structure, made up of parent,
child, and sibling relationships. Because of this very logical layout of an XML
document, it seems like there should be a standard way to find information.
XPath is a language that enables you to navigate and find data within your
XML documents. Using XPath, you can select one or more nodes to retrieve
the data they contain. XPath is used quite extensively with XSLT and is a
major element in XSLT.
XPath uses path expressions to select nodes or node-sets in an XML
document. These path expressions look very much like the expressions you
see when you work with a traditional computer file system.
XPath is a technology that enables you to address parts of an XML docu-
ment, such as a specific element or set of elements. XPath is implemented as
a non-XML expression language, which makes it suitable for use in situations
where XML markup isn’t really applicable, such as within attribute values.
Attribute values are simple text and therefore can’t contain additional XML
markup. So, although XPath expressions are used within XML markup, they
don’t directly use tags and attributes themselves. This makes XPath consider-
ably different from its XSL counterparts (XSLT and XSL-FO) in that it isn’t
XPATH SYNTAX
XPath uses path expressions to select nodes or node sets in an XML docu-
ment. The node is selected by following a path or steps.
Similar to other XML technologies, XPath operates under the notion that
a document consists of a tree of nodes. XPath defines different types of nodes
that are used to describe nodes that appear within a tree of XML content.
There is always a single root node that serves as the root of an XPath tree, and
that appears as the first node in the tree. Every element in a document has
a corresponding element node that appears in the tree under the root node.
Within an element node, there are other types of nodes that correspond to
the element’s content. Element nodes may have a unique identifier associated
with them that is used to reference the node with XPath. Figure 13.1 shows
the relationship between different kinds of nodes in an XPath tree.
FIGURE 13.1 XPath is based upon the notion of an XML document consisting of a
hierarchical tree of nodes
Nodes within an XML document can generally be broken down into ele-
ment nodes, attribute nodes, and text nodes. Some nodes have names, which
can consist of an optional namespace URI and a local name; a name that
includes a namespace prefix is known as an expanded name. The following is
an example of an expanded element name:
<xsl:value-of select="."/>
In this example, the local name is value-of and the namespace prefix is
xsl. If you were to declare the XSL namespace as the default namespace for a
document, you could get away with dropping the namespace prefix part of the
expanded name, in which case the name becomes this:
<value-of select="."/>
If you declare more than one namespace in a document, you will have to
use expanded names for at least some of the elements and attributes. It’s gen-
erally a good idea to use them for all elements and attributes in this situation
just to make the code clearer and eliminate the risk of name clashes.
Getting back to node types in XPath, following are the different types of
nodes that can appear in an XPath tree:
●● Root node
●● Element nodes
●● Text nodes
●● Attribute nodes
●● Namespace nodes
●● Processing instruction nodes
●● Comment nodes
You should have a pretty good feel for these node types, considering that
you’ve learned enough about XML and have dealt with each type of node.
The root node in XPath serves the same role as it does in the structure of a
document: it serves as the root of an XPath tree and appears as the first node
in the tree. Every element in a document has a corresponding element node
that appears in the tree under the root node. Within an element node appear
all of the other types of nodes that correspond to the element’s content.
Element nodes may have a unique identifier associated with them, which is
useful when referencing the node with XPath.
The point of all this naming and referencing of nodes is to provide a means
of traversing an XML document to arrive at a given node. You use XPath to
build expressions, which are typically used in the context of some other oper-
ation, such as a document transformation. Upon being processed and evalu-
ated, XPath expressions result in a data object of one of the following types:
●● Node set A collection of nodes
●● String A text string
●● Boolean A true/false value
●● Number A floating-point number
Similar to a database query, the data object resulting from an XPath
expression can then be used as the basis for some other process, such as an
XSLT transformation. For example, you might create an XPath expression
that results in a node set that is transformed by an XSLT template. On the
other hand, you can also use XPath with XLink, where a node result of an
expression could form the basis of a linked document.
You may want to keep a bookmark around for this page, as several of
the XPath examples throughout the next section rely on the training log
sample code.
REFERENCING NODES
The most basic of all XPath patterns is the pattern that references the current
node, which consists of a simple period:
.
If you’re traversing a document tree, a period obtains the current node.
The current node pattern is therefore a relative pattern because it makes
sense only in the context of a tree of data. As a contrast to the current pattern,
which is relative, consider the pattern that is used to select the root node of a
document. This pattern is known as the root pattern and consists of a single
forward slash:
/
If you were to use a single forward slash in an expression for the training
log sample document, it would refer to the trainlog element (line 4) because
this element is the root element of the document. Because the root pattern
directly references a specific location in a document (the root node), it is con-
sidered an absolute pattern. The root pattern is extremely important to XPath
because it represents the starting point of any document’s node tree.
XPath relies on the hierarchical nature of XML documents to refer-
ence nodes. The relationship between nodes in this type of hierarchy is best
described as a familial relationship, which means that nodes can be described
as parent, child, or sibling nodes, depending upon the context of the tree. For
example, the root node is the parent of all nodes. Nodes might be parents of
some nodes and siblings of others. To reference child nodes using XPath, you
use the name of the child node as the pattern. So, in the training log example,
you can reference a session element (line 6, for example) as a child of the root
node by simply specifying the name of the element: session. Of course, this
assumes that the root node (line 4) is the current context for the pattern, in
which case a relative child path is okay. If the root node isn’t the current con-
text, you should fully specify the child path as /session. Notice in this case that
the root pattern is combined with a child pattern to create an absolute path.
If there are child nodes, there must also be parent nodes. To access a
parent node, you must use two periods:
..
Boolean: A data type with two possible values: true and false. Operations
that produce boolean type of data objects are as follows:
= Equal to
!= Not equal to
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
or Logical or
and Logical and
XPATH OPERATORS
An XPath expression returns either a node-set, a string, a Boolean, or a
number.
Table 13.1 shows a list of the operators that can be used in XPath expressions:
Table 13.1 List of the Operators used in XPath Expressions
Operator Description Example Return value
| Computes two //book | //cd Returns a node set
node sets with all book and cd
elements
+ Addition 6+4 10
- Subtraction 6–4 2
(continued)
(continued)
EVALUATION CONTEXT
XPath expressions are always evaluated with respect to a context, which con-
sists of the following:
Context node: A node referring to the current node in the source XML
structure
Context position: An integer indicating position of the context node in
the current node set
Context size: An integer indicating the number of nodes in the current
node set
Variable bindings: A collection of pairs of variable names and values
The context node is referring to the current node in the source XML
structure, which is represented as a tree of different types of nodes according
to the Document Object Model (DOM):
●● Root node: A top and first node of the XML structure
●● Element node: A node that has child nodes
●● Text node: A node representing a unit of text in the content of a parent
node
●● Attribute node: A node representing an attribute
●● Namespace node: A node representing a name declaration statement
●● Processing instruction node: A node representing a processing instruc-
tion statement
●● Comment node: A node representing a comment statement
BUILT-IN FUNCTIONS
XPath also supports built-in functions. Commonly used build-in functions are
●● boolean(number): Returns true, if the number is not a zero
●● boolean(string): Returns true, if the length of the string is great than
zero
●● boolean(node_set): Returns true, if the set is not empty
●● concat(string, string, ...): Returns the concatenation of all given string
objects
●● contains(string_1, string_2): Returns true if the first string object
contains the second string object
●● count(node_set): Returns the number of nodes in the given node set
object
●● last(): Returns the context size of the evaluation context
●● name(): Returns the qualified name of the context node
●● name(node_set): Returns the qualified name of the first node in the
given node set object
●● not(boolean): Returns true, if the given boolean object is false
●● position(): Returns the context position of the evaluation context
●● string(): Returns the string value of the context node
XPath Nodes
In XPath, there are seven kinds of nodes: element, attribute, text, namespace,
processing-instruction, comment, and document nodes. XML documents are
treated as trees of nodes. The topmost element of the tree is called the root
element. Look at the following XML document:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book>
<title lang="en">Children</title>
<author> Param Sen </author>
<year>2019</year>
<price>9.00</price>
</book>
</bookstore>
NODE FUNCTIONS
Node functions are XPath functions that relate to the node tree. Although all of
XPath technically relates to the node tree, node functions are very direct in that
they allow you to ascertain the position of nodes in a node set, as well as how many
nodes are in a set. The following are the most common XPath node functions:
●● position() determines the numeric position of a node.
●● last() determines the last node in a node set.
●● count() determines the number of nodes in a node set.
Although these node functions might seem somewhat abstract, keep in
mind that they can be used to carry out some interesting tasks when used in
the context of a broader expression. For example, the following code shows
how to use the count() function to calculate the total distance in the training
log document for sessions whose distances are recorded in miles:
count(*/distance[@units='miles'])
Assuming there are several child elements of the type item, this code ref-
erences the third child item element of the current context. To reference the
last child item, you use the last() function instead of an actual number, like this:
child::item[position()=last()]
STRING FUNCTIONS
XPath string functions are used to manipulate strings of text. With the string
functions, you can concatenate strings, slice them up into substrings, and
determine the length of them. The following are the most popular string func-
tions in XPath:
●● concat() concatenates two strings together.
●● starts-with() determines if a string begins with another string.
●● contains() determines if a string contains another string.
●● substring-before() retrieves a substring that appears before another string.
●● substring-after() retrieves a substring that appears after another string.
Another use of the string functions is finding nodes that contain a partic-
ular substring. For example, if you wanted to analyze your training data and
look for training sessions where you felt strong, you could use the contains()
function to select session elements where the comments child element con-
tains the word “strong:”
*/session[contains(comments, 'strong')]
In this example, the second and third session elements would be selected
because they both contain the word “strong” in their comments child ele-
ments (lines 17 and 24).
BOOLEAN FUNCTIONS
Boolean functions are pretty simple in that they operate solely on Boolean
(true/false) values. The following are the two primary Boolean functions that
you may find useful in XPath expressions:
●● not() negates a Boolean value.
●● lang() determines if a certain language is being used.
NUMBER FUNCTIONS
The XPath number functions should be somewhat familiar to you from when
you created XSLT stylesheets that relied on the number functions. The fol-
lowing are the most commonly used number functions in XPath:
●● ceiling() rounds up a decimal value to the nearest integer.
●● floor() rounds down a decimal value to the nearest integer.
●● round() rounds a decimal value to the nearest integer.
●● sum() adds a set of numeric values.
The following is an example of how to use the sum() function to add up
attribute values:
sum(cart/item/@price)
Of course, you can make nested calls to the XPath number functions. For
example, you can round the result of the sum() function by using the round()
function, like this:
round(sum(cart/item/@price))
XPointer, or XLink. The examples of XPath that you’ve seen in this lesson
must therefore be used in conjunction with additional code. For example, the
following code shows how one of the training log expressions from earlier in
the chapter might be used in an XSLT stylesheet:
<xsl:value-of select="*/session[@type='running']" />
In this code, the XPath expression appears within the select attribute of
the xsl: value-of element, which is responsible for inserting content from a
source XML document into an output document during the transformation of
the source document. The point is that the XSLT xsl:value-of element is what
makes the XPath expression useful.
Similar to its role in XSLT, XPath serves as the addressing mechanism
in XPointer. XPointer is used to address parts of XML documents and is
used heavily in XLink. XPointer uses XPath to provide a means of navigating
the tree of nodes that comprise an XML document. Sounds familiar, right?
XPointer takes XPath a step further by defining a syntax for fragment iden-
tifiers, which are in turn used to specify parts of documents. In doing so,
XPointer provides a high degree of control over the addressing of XML doc-
uments. When coupled with XLink, the control afforded by XPointer makes
it possible to create interesting links between documents that simply aren’t
possible in HMTL, at least in theory.
Note that in this case, the resulting value of data types, including node set,
will be converted to a string.
Let’s review a sample XML file, dictionary_xsl.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="dictionary.xsl"?>
<dictionary>
<!— dictionary_xsl.xml
-->
<word acronym="true">
<name>XML</name>
<definition referenece="Hero's Notes">eXtensible Markup
Language.</definition>
<update date="2002-12-23"/>
</word>
<word symbol="true">
<name><</name>
<definition>Mathematical symbol representing the "less than" logical
operation, like: 1<2.</definition>
<definition>Reserved symbol in XML representing the beginning of
tags, like: <![CDATA[<p>Hello world!</p>]]>
</definition>
</word>
<word symbol="false" acronym="false">
<name>extensible</name>
<definition>Capable of being extended.</definition>
</word>
</dictionary>
<xsl:template match="child::word">
<xsl:for-each select="attribute::*">
a___<xsl:value-of select="name(.)"/>=<xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="child::*">
e___<xsl:value-of select="name(self::node())"/>
<xsl:apply-templates select="self::node()"/>
</xsl:for-each>
</xsl:template>
<xsl:template match="child::name | child::definition | child::update">
<xsl:for-each select="attribute::*">
a____<xsl:value-of select="name(.)"/>=<xsl:value-of select="."/>
</xsl:for-each>
<xsl:for-each select="child::text()">
t____<xsl:value-of select="self::node()"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
t____<p>Hello world!</p>
w__word
a___symbol=false
a___acronym=false
e___name
t____extensible
e___definition
t____Capable of being extended.
<country>
<title> Atlas </title>
<artist>Romi</artist>
</country>
</albums>
If we wanted to select the artist instead, we would use this location path:
albums/rock/artist
If we wanted to select the “title” node of all albums, we could use the
following (absolute) location paths:
albums/rock/title
albums/blues/title
albums/country/title
Here are the nodes that are selected using the above location path.
<albums>
<rock>
<title>Tool Box</title>
<artist>Green Velly</artist>
</rock>
<blues>
<title>Summer Occasion</title>
<artist>Marris Mano</artist>
</blues>
<country>
<title>Atlas</title>
<artist>Romi</artist>
</country>
</albums>
SELECTING NODES
XPath uses path expressions to select nodes in an XML document. The node
is selected by following a path or steps. The most useful path expressions are
listed in Table 13.2:
Table 13.2 Most Useful Path Expressions
Expression Description
nodename Selects all child nodes of the named node
/ Selects from the root node
// Selects nodes in the document from the current node that
match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
In Table 13.3, we have listed some path expressions and the result of the
expressions.
Table 13.3 Path Expression Examples
Path Expression Result
bookstore Selects all the child nodes of the bookstore element
/bookstore Selects the root element bookstore
Note: If the path starts with a slash ( / ), it always
represents an absolute path to an element
(continued)
PREDICATES
Predicates are used to find a specific node or a node that contains a s pecific
value.
Predicates are always embedded in square brackets. In Table 13.4, we have
listed some path expressions with predicates and the result of the e xpressions.
Table 13.4 Path Expression Examples with Predicates
Path Expression Result
/bookstore/book[1] Selects the first book element that is the child of
the bookstore element.
Note: IE5 and later implemented that [0] should
be the first node, but according to the W3C
standard, it should have been [1]
/bookstore/book[last()] Selects the last book element that is the child of
the bookstore element
/bookstore/book[last()-1] Selects the last but one book element that is the
child of the bookstore element
/bookstore/book[position()<3] Selects the first two book elements that are
children of the bookstore element
//title[@lang] Selects all the title elements that have an
attribute named lang
//title[@lang=‘eng’] Selects all the title elements that have an
attribute named lang with a value of eng
(continued)
(continued)
In the following table, we have listed some path expressions and the result
of the expressions.
<blues>
<title>Summer Occasion</title>
<artist>Marris Mano</artist>
</blues>
<country>
<title> Atlas </title>
<artist>Romi</artist>
</country>
</albums>
If we wanted to select the “title” node of all albums, we could use the
following (relative) location:path:title.
Result
This single line of code has exactly the same result as the example in the pre-
vious lesson. The only difference is that, in the previous lesson, we needed 3
lines of code to provide the same result.
This line of code is selecting all title nodes within our XML document.
We don’t need to provide the full path—just the name of the node we need
to work with. This makes our life easier and keeps our code nice and clean.
<albums>
<rock>
<title>Tool Box</title>
<artist>Green Velly</artist>
</rock>
<blues>
<title>Summer Occasion</title>
<artist>Marris Mano</artist>
</blues>
<country>
<title> Atlas </title>
<artist>Romi</artist>
</country>
</albums>
CHILDREN
We can also select a node’s children using relative location paths.
Example 1: Select the two children of the “rock” node (“title” and “art-
ist”). The context node is “rock,” because that’s where our relative path starts:
rock/title
rock/artist
THE WILDCARD
The “wildcard” is represented by the asterisk (∗). The wildcard represents any
node that would be located where the wildcard is positioned. Therefore, using
our example, it is representing any node that comes under the “rock” node.
Wildcards don’t have to appear at the end of a location path—they can
also appear in the middle of a location path. We aren’t limited to just one
either—we could use as many as we like within a location path.
XPATH ATTRIBUTES
To select an attribute using XPath, you prefix the attribute’s name with a
@ symbol.
Example 1
Consider the following XML document. Note that the “artist” node now has
an attribute called “status:”
<albums>
<rock>
<title>Tool Box</title>
<artist status="active">Green Velly</artist>
</rock>
<blues>
<title>Summer Occasion</title>
<artist status="active">Marris Mano</artist>
</blues>
<country>
<title>Atlas </title>
<artist status="disbanded">Romi</artist>
</country>
</albums>
If we wanted to select the “status” attribute of the “artist” node under the
“rock” node, we could use the following expression:
albums/rock/artist/@status
Example 2
Attributes, just like any other node, can be the subject of a conditional state-
ment. For example, imagine we’re using XSLT to transform our XML docu-
ment, and we want to select all “artist” nodes where the “status” attribute is
set to “active.” We could use the XSL “if ” element to test the value.
Here’s what we would write:
<xsl:if test="@status = 'active'">
(content goes here)
</xsl:if>
XPATH – EXPRESSIONS
XPath can locate any type of information in an XML document with one line
of code. These one liners are referred to as “expressions,” and every piece of
XPath that you write will be an expression.
An XPath expression is exactly that: it’s a line of code that we use to get
information from our XML document.
14
XLINK, XQUERY, AND
XPOINTER
INTRODUCTION TO XQUERY
XQuery for XML is like SQL for databases. XQuery is the language for query-
ing XML data only. XQuery is to XML what SQL is to database tables. XQuery
is designed to query XML data—not just XML files, but anything that can
appear as XML, including databases. XQuery is supported by all the major
database engines (Oracle, IBM, and Microsoft). XQuery is built on XPath
expressions. XQuery is a language for finding and extracting the elements and
attributes from XML documents.
XQUERY EXAMPLE
We will use the following XML document in the example below.
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">The Internet</title>
<author>S. Banzal</author>
<year>2019</year>
<price>49.00</price>
</book>
<book category="WEB">
<title lang="en">MYSQL Queries</title>
<author>S. Jain</author>
<author>P. Agrawal</author>
<author>K. Rai</author>
<author>R. Ram</author>
<author>Vivek Banzal</author>
<year>2018</year>
<price>65.00</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2019</year>
<price>79.00</price>
</book>
</bookstore>
Functions
XQuery uses functions to extract the data from XML documents.
The doc() function is basically used to open the “bookdetails.xml” file:
doc("bookdetails.xml")
Path Expressions
XQuery uses path expressions to navigate through elements in the XML docu-
ment. The following path expression is used to select all the title elements in
the “bookdetails.xml” file:
doc("bookdetails.xml")/bookstore/book/title
(/bookstore selects the bookstore element, /book selects all the book elements
under the bookstore element, and /title selects all the title element under
each book element).
Predicates
XQuery uses predicates to limit the extracted data from the XML documents.
The following predicate is used as to select all the book elements under
the bookstore element that have a price element with a value that is less
than 30:
doc("bookdetails.xml")/bookstore/book[price<30]
With FLWOR
Look at the path expression given below:
doc("bookdetails.xml")/bookstore/book[price>30]/title
The expression above will select all the title elements under the book ele-
ments that are under the bookstore element that have a price element with a
value that is higher than 30. The following FLWOR expression selects exactly
the same as the path expression above:
for $x in doc("bookdetails.xml")/bookstore/book
where $x/price>30
return $x/title
The result is
<title lang="en">MYSQL Queries</title>
<title lang="en">Learning XML</title>
The for clause is used to select all book elements under the bookstore
element into a variable called $x.
The where clause selects only the book elements with a price element
with a value greater than 30.
The order by clause defines the sort-order.
The return clause specifies what should be returned. Here, it returns the
title elements.
The result of the XQuery expression above is as follows:
<title lang="en">Learning XML</title>
<title lang="en">MYSQL Queries</title>
The expression above selects all the title elements under the book ele-
ments that are under the bookstore element and returns the title elements
in the alphabetical order. Now we want to list all the book-titles in our book-
store element in an HTML list. So we add <ul> and <li> tags to the FLWOR
expression:
<ul>
{
for $x in doc("bookdetails.xml")/bookstore/book/title
order by $x
return
<li>{$x}</li>
}
</ul>
Now we want to eliminate the title element, and show only the data inside
the title elements:
<ul>
{
for $x in doc("books.xml")/bookstore/book/title
order by $x
return <li>{data($x)}</li>
}
</ul>
XQuery Terms
In XQuery, there are seven kinds of nodes: element, attribute, text, names-
pace, processing-instruction, comment, and document nodes.
Nodes
XML documents are treated as tree of nodes. The root of the tree is called the
document node or root node.
Look at the XML document given below:
<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
<book>
<title lang="en">The Internet</title>
<author>S. Banzal</author>
<year>2011</year>
<price>200.00</price>
</book>
Atomic Values
Atomic values are nodes with no parent or children. Examples of the atomic
values are as follows:
S. Banzal
"en"
Items
Items are the atomic values or nodes.
Children
Element nodes may have zero, one or more than one children. In the exam-
ple given title, author, year, and price elements are all children of the book
element:
<book>
<title>The Internet</title>
<author>S. Banzal</author>
<year>2011</year>
<price>200.00</price>
</book>
Siblings
Nodes that have the same parent is called siblings. In the example given
below; the title, author, year, and price elements are all siblings:
<book>
<title>The Internet</title>
<author>S. Banzal</author>
<year>2011</year>
<price>200.00</price>
</book>
Ancestors
A node’s parent and parent’s parent are called ancesters. In the example that
follows the ancestors of the title element are the book element and the book-
store element:
<bookstore>
<book>
<title>The Internet</title>
<author>S. Banzal</author>
<year>2011</year>
<price>200.00</price>
</book>
<bookstore>
Descendants
A node’s children and children’s children are called descendants. In the exam-
ple given below; descendants of the bookstore element are the book, title,
author, year, and price elements:
<bookstore>
<book>
<title>The Internet</title>
<author>S. Banzal</author>
<year>2011</year>
<price>200.00</price>
</book>
<bookstore>
XQUERY SYNTAX
XQuery is case-sensitive and XQuery elements, attributes, and variables must
have valid XML names.
XQuery Comparisons
There are two ways of comparing values in XQuery.
1. General comparisons: =, !=, >, >=, <, <=
2. Value comparisons: eq, ne, gt, ge, lt, le
The differences between the two comparison methods are given below.
$bookstore//book/@q>10
The expression above returns true if any q attributes have values greater
than 10.
$bookstore//book/@q gt 10
The expression above returns true if there is only one q attribute returned
by the expression, and its value is greater than 10. If more than one q is
returned, an error occurs.
The XQuery expression above will include both the title elements and the
lang attribute in the result:
<title lang="en">Pizza & Pasta</title>
<title lang="en">The Internet</title>
<title lang="en">Learning XML</title>
<title lang="en">MYSQL Queries</title>
The XQuery expression above return the title elements the exact same
way as they are described in the input document. We now want to add our
own element and attribute to the result.
}
</ul>
</body>
</html>
Result
<test>1</test>
<test>2</test>
<test>3</test>
<test>4</test>
<test>5</test>
Result
<book>1. Pizza & Pasta</book>
<book>2. The Internet</book>
<book>3. MYSQL Queries</book>
<book>4. Learning XML</book>
You can use more than one in expression in the for clause. Use a comma
to separate the parts of the expression:
for $x in (10,20), $y in (100,200)
return <test>x={$x} and y={$y}</test>
Result
<test>x=10 and y=100</test>
<test>x=10 and y=200</test>
<test>x=20 and y=100</test>
<test>x=20 and y=200</test>
Result
<test>1 2 3 4 5</test>
Result
<title lang="en">The Internet</title>
<title lang="en">Pizza & Pasta</title>
<title lang="en">Learning XML</title>
<title lang="en">MYSQL Queries</title>
Result
<title lang="en">Pizza & Pasta</title>
<title lang="en">The Internet</title>
<title lang="en">MYSQL Queries</title>
<title lang="en">Learning XML</title>
XQUERY FUNCTIONS
XQuery includes over 100 built-in functions. There are functions for string
values, numeric values, date and time comparison, node and QName manipu-
lation, sequence manipulation, Boolean values, and many more. You can also
define your own function in XQuery.
Syntax
declare function prefix:function_name($parameter AS datatype)
AS returnDatatype
{
(: ...function code here... :)
};
XSLT
To get access to the XLink attributes and features, we must declare the
XLink namespace at the top of the document. The XLink namespace is “http://
www.w3.org/1999/xlink”.
The xlink:type and the xlink:href attributes in the <homepage> elements
define that the type and href attributes come from the xlink namespace. The
words, the content of a target resource can be inserted in place of the link in
a source document. Granted, images are handled much like this in HTML
already, but XML links offer the possibility of embedding virtually any kind
of data in a document, not just an external image. Traversing embedded links
in this manner ultimately results in compound documents that are built out
of other resources, which has some interesting implications for the Web. For
example, you could build a news Web page out of paragraphs of text that are
dynamically pulled from other documents around the Web via links.
Speaking of link traversal, HTML links are limited in that the user must
trigger their traversal. For example, the only way to invoke a link on a Web
page is to click the linked text or image, as shown in Figure 14.2.
In order to traverse an HTML link, the user must click on linked text or a
linked image, which points to another document or resource.
XML links are flexible enough to allow you to construct compound docu-
ments by pulling content together from other documents.
Web Browser
Document A
Contents of
Document B Document B
XML links, which are made possible by the XLink technology, are much
more abstract than HTML links, and therefore can be used to serve more
purposes than just providing users a way of moving from one Web page to
the next.
Yet another facet of XLink is its support for creating links that reside out-
side of the documents they link. In other words, you can create a link in one
document that connects two resources contained in other documents (see
Figure 14.4). This can be particularly useful when you don’t have the capabil-
ity of editing the source and target documents. These kinds of links are known
as out-of-line links and will probably foster the creation of link repositories. A
link repository is a database of links that describe useful connections between
resources on the Web.
XML links allow you to do interesting things, such as referencing multiple
documents from a link within another document.
One example of a link repository that could be built using XLink is an
intricately cross-referenced legal database, where court cases are linked in
such a way that a researcher in a law office could quickly find and verify prec-
edents and track similar cases. Though it’s certainly possible to create such a
database and incorporate it into HTML Web pages, it is cumbersome. XLink
provides the exact feature set to make link repositories a practical reality.
inline links. Out-of-line links are useful for linking information in documents
that you can’t modify for one reason or another. For example, if you wanted
to create a link between two resources that reside on other Web sites, you’d
use an out-of-line link. Such a link is possible because out-of-line links are
geared toward opening up interesting new opportunities for how links are
used to connect documents. More specifically, it would be possible to create
link databases that describe relationships between information spread across
the Web.
Out-of-line links partially form the concept of extended links in XML.
Extended links are basically any links that extend the linking functionality
of HTML. Out-of-line links obviously are considered extended links because
HTML doesn’t support any type of out-of-line linking mechanism. Extended
links also support the association of more than one target resource with a
given link. With extended links, you could build a table of contents for a Web
site that consists solely of extended links that point to the various pages in the
site. If the links were gathered in a single document separate from the table
of contents page itself, they would also be considered out-of-line links.
XLINK EXAMPLE
Let’s try to learn some basic XLink syntax by looking at an example.
In the example above the XLink namespace is declared at the top of the
document (xmlns:xlink=“http://www.w3.org/1999/xlink”). This means that
the document has access to the XLink attributes and features.
The xlink:type=“simple” creates a simple “HTML-like” link. You can also
specify more complex links (multidirectional links), but for now, we will only
use simple links.
The xlink:href attribute specifies the URL to link to, and the xlink:show
attribute specifies where to open the link. xlink:show=“new” means that the
link (in this case, an image) should open in a new window.
In the example above, we only demonstrated simple links. XLink is more
interesting when we want to access remote locations as resources, instead of
standalone pages. The <description> element in the example above sets the
value of the xlink:show attribute to “new.” This means that the link should open
in a new window. We could have set the value of the xlink:show attribute to
“embed.” This means that the resource should be processed inline within the
page. When you consider that this could be another XML document and not
just an image, you could, for example, build a hierarchy of XML documents.
With XLink, you can also specify WHEN the resource should appear. This
is handled by the xlink:actuate attribute. xlink:actuate=“onLoad” specifies that
the resource should be loaded and shown when the document loads. How-
ever, xlink:actuate=“onRequest” means that the resource is not read or shown
before the link is clicked. This is very handy for low-bandwidth settings.
This example is the simplest possible link you can create using XLink, and
it actually carries out the same functionality as an HTML anchor link, which is
known as a simple link in XML. Notice in the code that the XLink namespace
is declared and assigned to the xlink prefix, which is then used to reference
the href attribute; this is the standard approach used to access all of the XLink
attributes. What you may not realize is that this link takes advantage of some
default attribute values. The following is another way to express the exact
same link by spelling all of the pertinent XLink attribute values:
<employees xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple"
xlink:href="employees.xml"
xlink:show="replace"
xlink:actuate="user"
xlink:role="employees"
xlink:title="Employee List">
Current Employees
</employees>
In this code, you can more clearly see how the XLink attributes are spec-
ified in order to fully describe the link. The type attribute is set to simple,
which indicates that this is a simple link. The show attribute has the value
replace, which indicates that the target resource is to replace the current doc-
ument when the link is traversed. The actuate attribute has the value user,
which indicates that the link must be activated by the user for traversal to take
place. And finally, the role and title attributes are set to indicate the meaning
of the link and its name.
The previous example demonstrated how to create a link that imitates the
familiar HTML anchor link. You can dramatically change a simple link just by
altering the manner in which it is shown and activated. For example, take a
look at the following link:
<resume xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="simple"
xlink:href="resume_e1.xml"
xlink:show="parsed"
xlink:actuate="auto"
xlink:role="employee1 resume"
xlink:title="Employee 1 Resume"/>
This code shows how to effectively embed another XML document into
the current document at the position where the link is located. This is accom-
plished by simply setting the show attribute to parsed and the actuate attribute
to auto. When a Web browser or XML application encounters this link, it will
automatically load the resume_e1.xml document and insert it into the current
document in place of the link. When you think about it, the img element in
HTML works very much like this link except that it is geared solely toward
images; the link in this example can be used with any kind of XML content.
XPointer impacts links through the href attribute, which is where you
specify the location of a source or target resource for a link. All of the flexibil-
ity afforded by XPointer in specifying document parts can be realized in the
href attribute of any link.
Although simple links such as the previous example are certainly import-
ant, they barely scratch the surface in terms of what XLink is really capable
of doing. Links get much more interesting when you venture into extended
links. A powerful use of extended links is the linkset, which allows you to link
to a set of target resources via a single source resource. For example, you
could use an extended link to establish a link to each individual employee in
a company. To create an extended link, you must create child elements of
the linking element that are set to type locator; these elements are where you
set each individual target resource via the href attribute. The following is an
example of an extended link, which should help clarify how they work:
<employees xmlns:xlink="http://www.w3.org/1999/xlink"
xlink:type="extended"
xlink:role="employees"
xlink:title="Employee List"
xlink:show="replace"
xlink:actuate="user">
<employee xlink:type="locator" xlink:href="employee1.xml">
Frank Rizzo
</employee>
<employee xlink:type="locator" xlink:href="employee2.xml">
Sol Rosenberg
</employee>
<employee xlink:type="locator" xlink:href="employee3.xml">
Jack Tors
</employee>
</employees>
This example creates an extended link out of the employees element, but
the most interesting thing about the link is that it has multiple target resources
that are identified in the child employee elements. This is evident by the fact
that each of the employee elements has an href attribute that is set to their
respective target resources.
XPOINTER SYNTAX
In HTML, we can create a hyperlink that either points to an HTML page or
to a bookmark inside an HTML page (using #).
Sometimes it is more useful to point to more specific content. For exam-
ple, let’s say that we want to link to the third item in a particular list, or to the
second sentence of the fifth paragraph. This is easy with XPointer.
CREATING XPOINTERS
Seeing a few examples of XPointer expressions can make all the difference in
understanding how XPointer is used to define document fragment identifiers.
The following is an example of a simple XPointer expression:
child::factoid
This example uses the child relative location path to locate all of the chil-
dren of the context node that are of element type factoid. Let me rephrase it
in a different way: The sample expression locates element nodes of type fac-
toid that are child nodes of the context node. Keep in mind that the context
node is the node from which you are issuing the expression, which is like the
current path of a file system when you’re browsing for files. Also, it’s worth
clarifying that the XPointer expression child::factoid simply describes the
fragment identifier for a resource and is not a complete resource reference.
When used in a complete expression, you would pair this fragment identifier
with a URI that is assigned to an href attribute, like this:
href="http://www.stalefishlabs.com/factoids.xml#child::factoid"
This example first locates all child elements that are of type factoid and
then finds the second siblings following each of those element nodes that
are of type legend. To understand how this code works, let’s break it down.
You begin with the familiar child::factoid expression, which locates element
nodes of type factoid that are child nodes of the context node. Adding on
the following-sibling::legend location path causes the expression to locate
sibling elements of type legend. Granted, this may seem like a strange use of
XPointer, but keep in mind that it is designed as an all-purpose language for
addressing the internal structure of XML documents. It’s impossible to say
how different applications might want to address document parts, which is
why XPointer is so flexible.
In addition to location paths, XPointer defines several functions that
perform different tasks within XPointer expressions. One class of functions
is node test functions, which are used to determine the type of a node. Of
course, you can use the name of an element to check if a node is of a certain
element type, but the node test functions allow you to check and see if a node
contains a comment, text, or processor instruction. The following is an exam-
ple of how to use one of these functions:
/child::processing-instruction()
XPOINTER EXAMPLE
In this example, we will show you how to use XPointer in conjunction with
XLink to point to a specific part of another document.
page, add a number sign (#) and an XPointer expression after the URL in the
xlink:href attributes.
The expression: #xpointer(id(“Poodle”)) refers to the element in the tar-
get document, with the id value of “Poodle.” So the xlink:href attribute would
look like this:
xlink:href="http://dog.com/dogbreeds.xml#xpointer(id('Poodle'))"
The following XML document refers to information of the dog breed for
each of my dogs :-), all through XLink and XPointer references:
<?xml version="1.0" encoding="ISO-8859-1"?>
<mydogs xmlns:xlink="http://www.w3.org/1999/xlink">
<mydog xlink:type="simple"
xlink:href="http://dog.com/dogbreeds.xml#Poodle">
<description xlink:type="simple"
xlink:href="http://myweb.com/mydogs/anton.gif">
Anton is my favorite dog. He has won a lot of.....
</description>
</mydog>
<mydog xlink:type="simple"
xlink:href="http://dog.com/dogbreeds.xml#Boxer">
<description xlink:type="simple"
xlink:href="http://myweb.com/mydogs/pluto.gif">
Pluto is the sweetest dog on earth......
</description>
</mydog>
</mydogs>
XPOINTER EXAMPLE
In this example, we will show you that how to use XPointer in conjunction
with XLink to point to a specific part of another document.
Confident, bold, alert and imposing, the poodle is a popular choice for its
ability to protect.
</temperament>
</dog>
<dog breed="Boxer" id="Boxer">
<picture url="http://dog.com/Boxer.gif" />
<history>
One of the earliest uses of retrieving dogs was to help fishermen retrieve
fish from the water.
</history>
<temperament>
The flat-coated retriever is a sweet, exuberant, lively dog that loves to play
and retrieve...
. </temperament>
</dog>
</dogbreeds>
The following XML document refer to information of the dog breed for
each of my dogs:-), all through XLink and XPointer references:
<?xml version="1.0" encoding="ISO-8859-1"?>
<mydogs xmlns:xlink="http://www.w3.org/1999/xlink">
<mydog xlink:type="simple"
xlink:href="http://dog.com/dogbreeds.xml#Poodle">
<description xlink:type="simple"
xlink:href="http://myweb.com/mydogs/anton.gif">
Anton is my favorite dog. He has won a lot of.....
</description>
</mydog>
<mydog xlink:type="simple"
xlink:href="http://dog.com/dogbreeds.xml#Boxer">
<description xlink:type="simple"
xlink:href="http://myweb.com/mydogs/pluto.gif">
Pluto is the sweetest dog on earth......
</description>
</mydog>
</mydogs>
15
XFORMS
INTRODUCTION TO XFORMS
XForms is the next generation of HTML forms. XForms uses XML to cre-
ate input forms on the Web. XForms is the next generation of HTML forms.
It is richer and more flexible than HTML forms. XForms will be the forms
standard in XHTML 2.0. XForms is platform- and device-independent. It
separates data and logic from presentation. XForms uses XML to define form
data. It stores and transports data in XML documents. It contains features like
calculations and validations of forms. XForms reduces or eliminates the need
for scripting.
Like an XHTML, SVG, and RSS, XForms is also an XML-based language
written with the tags that can be identified by surrounding the angle brack-
ets (the XML purists perfer to call these elements). Learning the XForms is
largely a matter of understanding what an individual elements do, as well as
how do they interrelate. XForms provides a more elements for forms than
authors might be accustomed to. As a result, several tasks that would have
otherwise required complicated scripting can be accomplished declaratively,
just by putting a right elements in the place.
FEATURES OF XFORMS
Today, ten years after HTML forms became a part of the HTML stan-
dard, Web users do complex transactions that are starting to exceed the lim-
itations of standard HTML forms.
XForms provides a richer, more secure, and device-independent way of
handling Web input. We should expect future Web solutions to demand the
use of XForms-enabled browsers (all future browsers should support XForms).
●● XForms separate data from presentation
XForms uses XML for data definition and HTML or XHTML for data
display. XForms separates the data logic of a form from its presentation. This
way, the XForms data can be defined independently of how the end-user will
interact with the application.
●● XForms uses XML to define form data
With XForms, the rules for describing and validating data are expressed
in XML.
●● XForms uses XML to store and transport data
With XForms, the data displayed in a form are stored in an XML docu-
ment, and the data submitted from the form, are transported over the internet
using XML.
The data content is coded in, and transported as Unicode bytes.
●● XForms is device independent
Separating the data from the presentation makes XForms device-
independent because the data model can be used for all devices. The pre-
sentation can be customized for different user interfaces, like mobile phones,
handheld devices, and Braille readers for the blind.
Since XForms is device independent and based on XML, it is also possible
to add XForms elements directly into other XML applications like VoiceXML
(speaking Web data), WML (Wireless Markup Language), and SVG (Scalable
Vector Graphics).
PARTS OF XFORMS
Structurally, the form can be throught of having two parts: a specification
of what it must do and a specification of how it must look. In the XForms
these two parts are called the XForms Model and the XForms User Interface
respectively.
After collecting the data, the XML document might look like this:
<person>
<fname>Kshitij</fname>
<lname>Banzal</lname>
</person>
In the example above, the two <input> elements define two input fields.
The ref=“fname” and ref=“lname” attributes point to the <fname> and
<lname> elements in the XForms model.
The <submit> element has a submission=“form1” attribute, which refers
to the <submission> element in the XForms model. A submit element is usu-
ally displayed as a button.
Notice the <label> elements in the example. With XForms, every input
control element has a required <label> element. This means that there must
be a container.
XForms is not designed to work alone. There is no such thing as an
XForms document. XForms has to run inside another XML document. It
could run inside XHTML 1.0, and it will run inside XHTML 2.0.
If we put it all together, the document will look like this:
<xforms>
<model>
<instance>
<person>
<fname/>
<lname/>
</person>
</instance>
<submission id="form1" action="submit.asp" method="get"/>
</model>
<input ref="fname"><label>First Name</label></input>
<input ref="lname"><label>Last Name</label></input>
<submit submission="form1"><label>Submit</label></submit>
</xforms>
Most often, the input control will display as an input field, like this:
First Name: ----------
Most often the secret control will display as an input field like this:
Password: ∗ ∗ ∗ ∗ ∗ ∗
Example
<model>
<instance>
<person>
<fname>Hege</fname>
<lname>Refsnes</lname>
</person>
</instance>
</model>
<output ref="fname"/>
<output ref="lname"/>
(continued)
Every form control has a required label child (except output, where it’s
optional). This enforces the good design habit of always associating a label
with a form control. Other common child elements are help for a message at
the user’s request, hint for a message at the user agent’s request, and alert,
which is available for error messages.
If you want to use XForms in HTML (or XHTML 1.0), you should declare
all XForms elements with an XForms namespace. XForms is expected to be
a standard part of XHTML 2.0, eliminating the need for the XForms name-
space. This example uses the XForms namespace:
<html xmlns:xf="http://www.w3.org/2002/xforms">
<head>
<xf:model>
<xf:instance>
<person>
<fname/>
<lname/>
</person>
</xf:instance>
<xf:submission id="form1" method="get" action="submit.asp"/>
</xf:model>
</head>
<body>
<xf:input ref="fname"><xf:label>First Name</xf:label></xf:input><br />
<xf:input ref="lname"><xf:label>Last Name</xf:label></xf:input><br /><br />
<xf:submit submission="form1"><xf:label>Submit</xf:label></xf:submit>
</body>
</html>
In the example above, we have used the xf: prefix for the XForms name-
space, but you are free to call the prefix anything you want.
XForms Example
You can test XForms with Internet Explorer (XForms will not work in IE
prior version 5). Just click on the “Try it Yourself ” button under the example.
<xforms>
<model>
<instance>
<person>
<fname/>
<lname/>
</person>
</instance>
<submission id="form1" method="get"
action="submit.asp"/>
</model>
<input ref="fname">
<label>First Name</label>
</input>
<input ref="lname">
<label>Last Name</label>
</input>
<submit submission="form1">
<label>Submit</label>
</submit>
</xforms>
the XForms user interface can bind <input> elements using the ref attribute:
<input ref="name/fname"><label>First Name</label></input>
<input ref="name/lname"><label>Last Name</label></input>
In the example above, the slash (/) at the beginning of the XPath expres-
sion indicates the root of the XML document.
Binding Using Bind
With an XForms model instance like this:
<model>
<instance>
<person>
<name>
<fname/>
<lname/>
</name>
</person>
</instance>
<bind nodeset="/person/name/fname" id="firstname"/>
<bind nodeset="/person/name/lname" id="lastname"/>
</model>
the XForms user interface can bind <input> elements using the bind attribute:
<input bind="firstname"><label>First Name</label></input>
<input bind="lastname"><label>Last Name</label></input>
When you start using XForms in complex applications, you will find bind-
ing using bind to be a more flexible way to deal with multiple forms and mul-
tiple instance models.
XFORMS PROPERTIES
XForms uses properties to define data restrictions, types, and behaviors.
Examples
A required=“true()” property means that the input field is required (can-
not be empty on submit). A type=“decimal” property will only allow a decimal
value to be submitted. A calculate property can calculate a value.
Bind Properties to Data
XForms uses the bind element to bind XForms properties to XForms
data:
<model>
<instance>
<person>
<fname/>
<lname/>
</person>
</instance>
<bind nodeset="person/lname" required="true()"/>
</model>
Name Description
p3ptype Defines a P3P data type for the item
readonly Defines an edit restriction for the item (cannot be changed)
relevant Defines how relevant the data is (for display or submission)
required Defines that a data item is required (cannot be blank)
type Defines the data type for the item
XFORMS ACTIONS
XForms actions handle response to events.
The message Element
The XForms message element defines a message to be displayed in the
XForms user interface. Look at this simplified example:
<input ref="fname">
<label>First Name</label>
<message level="ephemeral" event="DOMFocusIn">
Input Your First Name
</message>
</input>
In the example above, the message “Input Your First Name” should be
displayed as a tool tip when the input field gets focus.
The event="DomFocusIn" defines the event to trigger the message.
The level="ephemeral" defines the message to be displayed as a tool tip.
Other values for the level attribute are modal and modeless, defining dif-
ferent types of message boxes.
The setvalue Element
The XForms setvalue element defines a value to be set in response to an
event. Look at this simplified example:
<input ref="size">
<label>Size</label>
<setvalue value="50" event="xforms-ready"/>
</input>
In the example above, the value 50 will be stored in the “size” input field
when the form opens.
16
XSL-FO
INTRODUCTION TO XSL-FO
XSL-FO is about formatting XML data for output. It is a language for for-
matting XML data. XSL-FO stands for Extensible Stylesheet Language
Formatting Objects. It is based on XML. XSL-FO is now formally named
XSL. XSL-FO describes the formatting of XML data for output to screen,
paper, or other media. XSL-FO is formally named XSL.
XSL-FO is an XML-based markup language that describes the format-
ting of XML data for output to the screen. After several years of develop-
ment, Extensible Stylesheet Language (XSL) Version 1.0 became a W3C
Recommendation on October 15, 2001. It enhances the flexibility of the XML
(Extensible Markup Language) standard. XSL draws on earlier specifications,
including CSS and DSSSL.
XSL-FO is an XML language designed for describing all visual aspects of
paginated documents. HTML is another language for specifying formatting
semantics, but is more for documents that are presented on screen and less for
materials created for printing because it does not support pagination elements
like headers and footers, page size specifications, and footnotes. XSL-FO is
part of the XSL language family:
●● XSLT: (XSL Transformations) a language for transforming XML
●● XSL-FO: (XSL Formatting Objects) a language that can be used in XSLT
for the purpose of “presenting” the XML
●● XPATH: A syntax for addressing parts of a document, a syntax which is
also significant in XPointer and to the emerging XQuery, an XML query
language
XML Data
XSL XF Rendering
XSL-FO Server
Transformation
Document
XSL Template
As you can see, the XML data is transformed together with the XSL
stylesheet to produce an XSL-FO document, and the document is then con-
verted to PDF.
XSL-FO DOCUMENTS
XSL-FO documents are XML files with output information. XSL-FO docu-
ments are stored in files with a .fo or a .fob file extension. You can also store
XSL-FO documents with an .xml extension (to make them more accessible to
XML editors).
XSL-FO documents contain two required sections. The first section
details a list of named page layouts. The second section is a list of document
data, with markup, that uses the various page layouts to determine how the
content fills the various document pages.
The properties of the page are define by the page layout. They can define
the directions for the flow of text, so as to match the conventions for the lan-
guage in question. They define the size of a page as well as the margins of that
page. Most important that they can define sequences of pages that allow for
effects where the odd and even pages look different. Example one can define
a page layout sequence that gives extra space to the inner margins for printing
purposes; this allows more space to be given to the margin where the book
will be bound.
The document data portion is brake up into a sequence of flow, where
each flow is attached to a page layout. The flows contain a list of blocks and
each contain a list of text data, inline markup elements, or a combination of
the two. Content may also be added to the margins of the document, for page
numbers, chapter headings and the like.
Blocks and inline element function are the same way as for CSS, though
some of the rules for padding and margins differ between CSS and FO. The
direction, relative to the page orientation, for the progression of inlines and
blocks can be fully specified, thus allowing FO documents to function under
languages that are read different from English. The language of the FO speci-
fication, unlike that of CSS 2.1, uses direction-neutral terms like start and end
rather than left and right when describing these directions.
Comparisons are often made between XSL-FO and CSS, and for the most
part they are valid. One critical distinction between the two technologies is
that CSS styles are always attached to an existing document tree, whereas
XSL-FO establishes its own document structure. In other words, you apply
CSS styles to XML data, whereas XSL-FO represents a complete merger of
data and styles. In practice, XML data is typically still maintained separately
from its XSLT stylesheet, which is then used to combine the data and XSL-FO
styles into a complete XSL-FO document.
XSL-FO’s basic content markup is derived from CSS and its cascading
rules. Many attributes in XSL-FO propagate into the child elements unless
explicitly overridden.
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference=”A4”>
<!-- Page content goes here -->
</fo:page-sequence>
</fo:root>
Explanation
XSL-FO documents are XML documents and must always start with an XML
declaration:
<?xml version="1.0" encoding="ISO-8859-1"?>
Example: 1
<?xml version="1.0" encoding="ISO-8859-1"?><fo:
root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="A4">
<!-- Page template goes here -->
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="A4">
<!-- Page content goes here -->
</fo:page-sequence></fo:root>
Explanation
XSL-FO documents always start with an XML declaration:
<?xml version="1.0" encoding="ISO-8859-1"?>
Example: 2
<?xml version="1.0" encoding="iso-8859-1"?>➊
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">➋
<fo:layout-master-set>➌
<fo:simple-page-master master-name="my-page">
<fo:region-body margin="1in"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="my-page">➍
<fo:flow flow-name="xsl-region-body">➎
<fo:block>Hello, world!</fo:block>➏
</fo:flow>
</fo:page-sequence>
</fo:root>
Explanation
Font attributes:
<fo:inline color="red">colored</fo:inline>,
<fo:inline font-weight="bold">bold</fo:inline>,
<fo:inline font-style="italic">italic</fo:inline>,
<fo:inline font-size="75%">small</fo:inline>,
<fo:inline font-size="133%">large</fo:inline>.
</fo:block>
<fo:block>
Text attributes: ➌
<fo:inline text-decoration="underline">underlined</fo:inline>,
<fo:inline letter-spacing="3pt"> expanded </fo:inline>,
<fo:inline word-spacing="6pt">
text with extra spacing between words
</fo:inline>,
<fo:inline text-transform="uppercase">all capitals</fo:inline>,
<fo:inline text-transform="capitalize">capitalized</fo:inline>,
text with <fo:inline baseline-shift="sub"
font-size="smaller">subscripts</fo:inline>
and <fo:inline baseline-shift="super"
font-size="smaller">superscripts</fo:inline>.
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
XSL-FO AREAS
The XSL formatting model defines a number of rectangular areas (boxes) to
display output. All output (text, pictures, etc.) will be formatted into these
boxes and then displayed or printed to a target media. We now take a closer
look at the following areas:
●● Pages
●● Regions
●● Block areas
●● Line areas
●● Inline areas
XSL-FO Pages
XSL-FO output is formatted into pages. Printed output normally goes into
many separate pages. Browser output often goes into one long page.
XSL-FO Pages contain Regions.
XSL-FO Regions
Each XSL-FO Page contains a number of regions:
●● region-body (the body of the page)
XSL-FO OUTPUT
XSL-FO defines output inside elements.
XSL-FO Flow
XSL-FO pages are filled with data from <fo:flow> elements.
XSL-FO pages are filled with data from <fo:flow> elements. XSL-FO
pages are filled with content from the <fo:flow> element. The <fo:flow> ele-
ment contains all the elements to be printed to the page. When the page is
full, the same page master will be used over (and over) again until all the text
is printed.
The <fo:flow> element has a “flow-name” attribute. The value of the
flow-name attribute defines where the content of the <fo:flow> element will
go. The legal values are
●● xsl-region-body (into the region-body)
●● xsl-region-before (into the region-before)
●● xsl-region-after (into the region-after)
●● xsl-region-start (into the region-start)
●● xsl-region-end (into the region-end)
PAGE LAYOUT
XSL-FO Page Sequences
XSL-FO uses <fo:page-sequence> tag to define output pages. Each Page out-
put refers to a page master which defines the layout. Each output page has
a <fo:flow> tag defining the output. Each output is displayed in a sequence.
First Block
</fo:block>
<fo:block break-before="page" space-before="2in" space-after="2in">➐
Second Block
</fo:block>
<fo:block break-before="page" space-before="2in" space-after="2in">
Third Block
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
Explanation
1 In XSL-FO, you can specify borders, padding, and background on regions
in exactly the same way as you do it on blocks. The first page in this
example has a border around it, while the others remain borderless.
2 The page sequence master defines the chain of page masters to use for a
page sequence.
3 <fo:single-page-master-reference> inserts a single page master in the
chain.
4 <fo:repeatable-page-master-reference> makes the specified page masters
repeat up to the end of the chain.
5 Note that master-reference attribute of a <fo:page-sequence> can refer
to either <fo:page-sequence-master> or <fo:simple-page-master>. In the
latter, all pages generated by this <fo:page-sequence> use the same page
master.
6, 7 Spaces are not inheritable: you cannot specify them on a surrounding
block. There’s no alternative to specifying them explicitly on every block
involved.
Where To Flow?
The <fo:flow> tag consist “flow-name” attribute. The value of the flow-name
attribute describes where the content of the <fo:flow> element will go. These
values are legal:
●● xsl-region-body
●● xsl-region-before
●● xsl-region-after
●● xsl-region-start
●● xsl-region-end
XSL-FO Pages
To define the layout of pages, XSL-FO uses page templates called “Page
Masters.”
Page Templates
To define the layout of pages, XSL-FO uses page templates called “Page
Masters” and each template must have a unique name. In the following
Page Size
To define the page size of a page XSL-FO page size, use the following
attributes:
●● page-width defines the width of a page
●● page-height defines the height of a page
Margin Top
REGION BEFORE
R M
M
E R a
a
G E r
r
I G g
g
O I i
i
N REGION BODY O n
n
N
S R
L
T E i
e
A N g
f
R D h
t
T t
REGION AFTER
Margin Bottom
FIGURE 16.2 XSL-FO Page
XSL-FO BLOCKS
XSL-FO output goes into blocks. This output is normally nested inside
<fo:block> elements. “Blocks” of content “Flow” into “Pages” of the output
Syntax
<fo:page-sequence>
<fo:flow flow-name="xsl-region-body">
<fo:block>
<!-- Output goes here -->
</fo:block>
</fo:flow>
</fo:page-sequence>
●● text-align-last
●● text-indent
●● start-indent
●● end-indent
●● wrap-option (defines word wrap)
●● break-before (defines page breaks)
●● break-after (defines page breaks)
●● reference-orientation (defines text rotation in 90" increments)
●● XSL-FO Lists
In this example, the text content Great Sporting Events is styled using a
10-point, serif font. Furthermore, the alignment of the text is set to end via
the text-align attribute, which is equivalent to right-alignment in CSS. There
is no concept of left or right in XSL-FO. Instead, you use start and end when
referring to the alignment of content that you might otherwise think of as
being left-aligned or right-aligned. Of course, center is used in XSL-FO when
it comes to alignment.
The background-color and color attributes in this code are direct
carry-overs from CSS. You can use them just as you would the similarly named
CSS styles.
Notice in this code that the padding-top attribute is set, which controls
the padding along the top of the block. All of the standard CSS margin and
padding styles are available for you in XSL-FO as attributes of the <fo:block>
tag. These attributes include margin, margin-left, margin-right, margin-top,
margin-bottom, padding, padding-left, padding-right, padding-top, and
padding-bottom. There are also several familiar border attributes that you
can use with blocks: border, border-left, border-right, border-top, and bor-
der-bottom.
<fo:list-block provisional-distance-between-starts="18pt"➊
provisional-label-separation="3pt"➋>
<fo:list-item>
<fo:list-item-label end-indent="label-end()"➌)>
<fo:block>•➍</fo:block>
</fo:list-item-label>
<fo:list-item-body start-indent="body-start()"➎>
<fo:block>First item</fo:block>
</fo:list-item-body>
</fo:list-item>
<fo:list-item>
<fo:list-item-label end-indent="label-end()">
<fo:block>•</fo:block>
</fo:list-item-label>
<fo:list-item-body start-indent="body-start()">
<fo:block>Second item</fo:block>
</fo:list-item-body>
</fo:list-item>
</fo:list-block>
1 This property specifies how far the left side of the label is from the left side
of the body.
2 This property specifies the separation between the right side of the label
and the left edge of the body.
3 The end-indent attribute specifies the offset of the right edge of <fo:list-
item-label> from the right edge of the reference area (i.e., page). A
special label-end() function sets it to the value calculated from provisional-
distance- between-starts and provisional-label-separation values. However,
this is not a default value: you have to specify end-indent=“label-end()” on
each <fo:list-item-label> in the list. Alternatively, you can use an explicit
value of end-indent.
4 This is a Unicode for a round bullet.
5 The start-indent attribute specifies the left offset of the <fo:list-item-body>
from the left. A special body-start() function sets it to the value calculated
from provisional-distance-between-starts. Like for the <fo:list- item-label>,
this is not a default value; don’t forget to specify it on each <fo:list-item-
body>.
●● provisional-distance-between-starts
●● provisional-label-separation
●● start-indent for list-item-label
●● start-indent for list-item-body
●● end-indent for list-item-label
●● end-indent for list-item-body
<xsl:template match="ol">
<fo:list-block
space-before="0.25em" space-after="0.25em">
<xsl:apply-templates/>
</fo:list-block>
</xsl:template>
<xsl:template match="ol/li">
<fo:list-item space-after="0.5ex">
<fo:list-item-label start-indent="1em">
<fo:block>
<xsl:number/>.
</fo:block>
</fo:list-item-label>
<fo:list-item-body>
<fo:block>
<xsl:apply-templates/>
</fo:block>
</fo:list-item-body>
</fo:list-item>
</xsl:template>
XSL-FO Lists
XSL-FO uses the <fo:list-block> element to define lists.
</fo:list-item-label>
<fo:list-item-body>
<fo:block>Saab</fo:block>
</fo:list-item-body>
</fo:list-item>
</fo:list-block>
The output from the code above would be something like this:
●● Volvo
●● Saab
TABLES
Tables in XSL-FO resemble HTML ones: they are made of cells grouped into
rows; rows are further grouped into row groups—table header, table footer,
and table bodies (one or more). There are also column descriptors.
Tables are described in XSL-FO using the fo:table element. A table can
have a header (fo:tableheader), a body (fo:table-body), and a footer (fo:table-
footer). Each of these groups contain rows (fo:table-row), which in turn con-
tain cells (fo:table-cell). The columns are described using the fo:table-column
elements.
A basic 2x2 table is as follows:
<fo:table border="0.5pt solid black" text-align="center">
<fo:table-body>
<fo:table-row>
<fo:table-cell padding="6pt" border="0.5pt solid black">➊
<fo:block> upper left </fo:block>
</fo:table-cell>
<fo:table-cell padding="6pt" border="0.5pt solid black">
<fo:block> upper right </fo:block>
</fo:table-cell>
</fo:table-row>
<fo:table-row>
<fo:table-cell padding="6pt" border="0.5pt solid black">
<fo:block> lower left </fo:block>
</fo:table-cell>
<fo:table-cell padding="6pt" border="0.5pt solid black">
<fo:block> lower right </fo:block>
</fo:table-cell>
</fo:table-row>
</fo:table-body>
</fo:table>
Table Columns
A column can have a proportional width or a fixed width. A fixed width
includes the length units (in, pt, cm; for example<fo:table-column
column-width=“3in”/>).
A proportional width is expressed via the proportional-column-width
function (for example, <fo:table-column column-width= “proportional-
column-width(20)”/>) or by using a percentage sign (<fo:table-column col-
umn-width=“20%”/>). There is a third way to specify a column width: by
omitting the column-width attribute, the column will size itself automatically,
depending on its content.
A table can mix fixed, proportional, and automatic columns. When a table
contains only proportional columns, XF will resize them even if the sum of
percentages is not 100.
For example:
<fo:table>
<fo:table-column column-width="50%"/>
<fo:table-column column-width="50%"/>
..
</fo:table>
And
<fo:table>
<fo:table-column column-width="proportional-column-width(1)"/>
<fo:table-column column-width="proportional-column-width(1)"/>
..
</fo:table>
And
<fo:table>
<fo:table-column column-width="proportional-column-width(60)"/>
<fo:table-column column-width="proportional-column-width(60)"/>
..
</fo:table>
XSL-FO OBJECTS
There are nine XSL-FO objects used to create tables:
●● fo:table-and-caption
●● fo:table
●● fo:table-caption
●● fo:table-column
●● fo:table-header
●● fo:table-footer
●● fo:table-body
●● fo:table-row
●● fo:table-cell
Example
The <fo:table-and-caption> element is used to define a table. It contains a
<fo:table> and an optional <fo:caption> element.
The <fo:table> element contains optional element like <fo:table-
column>, <fo:table-header>, <fo:table-body>, and <fo:table-footer>. Each of
these elements has one or more <fo:table-row> elements, with one or more
<fo:table-cell> elements:
<xsl:template match="ol">
<fo:list-block
space-before="0.25em" space-after="0.25em">
<xsl:apply-templates/>
</fo:list-block>
</xsl:template>
<xsl:template match="ol/li">
<fo:list-item space-after="0.5ex">
<fo:list-item-label start-indent="1em">
<fo:block>
<xsl:number/>.
</fo:block>
</fo:list-item-label>
<fo:list-item-body>
<fo:block>
<xsl:apply-templates/>
</fo:block>
</fo:list-item-body>
</fo:list-item>
</xsl:template>
Output
Car Price
Volve $50000
SAAB $48000
GRAPHICS
There is a special inline element for including graphics into XSL-FO—
<fo:external-graphic>. The source image is specified by the src attribute
whose value is a URI. XEP handles HTTP, FTP, data and filesystem resource
locators in URIs. An unqualified URI is treated as a path to a file in the local
file system; if the path is relative, it is calculated from the location of the
source XSL-FO document. Here’s an example:
<fo:block>
1. Note the url(‘…’) function-like wrapper around the file name: this is
required by the XSL 1.0 Recommendation. (XEP recognizes unwrapped
URLs, too).
2. In this example, the height and the width of the image are expressed in
units relative to the nominal font size.
3. This is a convenient technique to scale small inlined images proportion-
ally to the text height.
XSL-FO PROCESSORS
XSL-FO processors are a type of software program for formatting XSL
documents. Most XSL-FO processors output in PDF documents as well as
HTML and other formats. Some well-known XSL-FO processors are FOP,
PassiveTeX, and xmlroff.
XSL-FO SOFTWARE
Scriptura
Scriptura is a cross-platform document that generates solutions based on
XSLT and XSL-FO. Scriptura has a WYSIWYG design tool and engine. The
XSL-FO formatter used in the engine is no longer based on Apache FOP,
but was written from scratch by inventive designers. The new features in
this release are bulleted and numbered lists, break-after and break-before
properties, extended bar code options, and improved number and currency
formatting.
W3Schools
</fo:block>
<fo:block text-indent="5mm" font-family="verdana" font-size="12pt">
Welcome to the computer center !
</fo:block>
Result:
Welcome to the computer center!
17
XML WITH DATABASES
INTRODUCTION
This chapter gives a high-level overview of how to use XML with databases.
It describes how the differences between data-centric and document-centric
documents affect their usage with databases, how XML is commonly used
with relational databases, and what native XML databases are and when to
use them.
Although the information discussed in this chapter is (mostly) u
p-to-date,
the idea that the world of XML and databases can be seen through the
data-centric/document-centric divide is somewhat dated. It used to be a con-
venient metaphor for introducing native XML databases, which were then not
widely understood, even in the database community. However, it was always
somewhat unrealistic, as many XML documents are not strictly data-centric
or document-centric, but somewhere in between. So while the data-centric/
document-centric divide is a convenient starting point, it is better to under-
stand the differences between XML-enabled databases and native XML
databases and to choose the appropriate database based on your processing
needs.
DATA-CENTRIC DOCUMENTS
Data-centric documents are documents that use XML as a data transport.
They are designed for machine consumption and the fact that XML is used at
all is usually superfluous. That is, it is not important to the application or the
database that the data is, for some length of time, stored in an XML document.
Examples of data-centric documents are sales orders, flight schedules, scien-
tific data, and stock quotes.
Data-centric documents are characterized by fairly regular structure,
fine-grained data (that is, the smallest independent unit of data is at the level
of a PCDATA-only element or an attribute), and little or no mixed content.
The order in which sibling elements and PCDATA occurs is generally not
significant, except when validating the document.
Data of the kind that is found in data-centric documents can originate
both in the database (in which case you want to expose it as XML) and outside
the database (in which case you want to store it in a database). An example
of the former is the vast amount of legacy data stored in relational databases;
an example of the latter is scientific data gathered by a measurement system
and converted to XML. For example, the following sales order document is
data-centric:
<SalesOrder SONumber="12345">
<Customer CustNumber="543">
<CustName>ABC Industries</CustName>
<Street>123 Main St.</Street>
<City>Chicago</City>
<State>IL</State>
<PostCode>60609</PostCode>
</Customer>
<OrderDate>981215</OrderDate>
<Item ItemNumber="1">
<Part PartNumber="123">
<Description>
<p><b>Turkey wrench:</b><br />
Stainless steel, one-piece construction,
lifetime guarantee.</p>
</Description>
<Price>9.95</Price>
</Part>
<Quantity>10</Quantity>
</Item>
<Item ItemNumber="2">
<Part PartNumber="456">
<Description>
<p><b>Stuffing separator:<b><br />
Aluminum, one-year guarantee.</p>
</Description>
<Price>13.27</Price>
</Part>
<Quantity>5</Quantity>
</Item>
</SalesOrder>
This could be built from the following XML document and a simple
stylesheet:
<Flights>
<Airline>ABC Airways</Airline>
<Origin>Dallas</Origin>
<Destination>Fort Worth</Destination>
<Flight>
<Departure>09:15</Departure>
<Arrival>09:16</Arrival>
</Flight>
<Flight>
<Departure>11:15</Departure>
<Arrival>11:16</Arrival>
</Flight>
<Flight>
<Departure>13:15</Departure>
<Arrival>13:16</Arrival>
</Flight>
</Flights>
DOCUMENT-CENTRIC DOCUMENTS
Document-centric documents are (usually) documents that are designed for
human consumption. Examples are books, email, advertisements, and almost
any hand-written XHTML document. They are characterized by a less regular
or irregular structure, larger grained data (that is, the smallest independent
unit of data might be at the level of an element with mixed content or the entire
document itself), and considerable amounts of mixed content. The order in
which sibling elements and PCDATA occurs is almost always significant.
Document-centric documents are usually written by hand in XML or some
other format, such as RTF, PDF, or SGML, which is then converted to XML.
Unlike data-centric documents, they usually do not originate in the database.
For example, the following product description is document-centric:
<Product>
<Intro>
The <ProductName>Turkey Wrench</ProductName> from <Developer>Full
Fabrication Labs, Inc.</Developer> is <Summary>like a monkey wrench,
but not as big.</Summary>
</Intro>
<Description>
<Para>The turkey wrench, which comes in <i>both right- and left-
handed versions (skyhook optional)</i>, is made of the <b>finest
stainless steel</b>. The Readi-grip rubberized handle quickly adapts
to your hands, even in the greasiest situations. Adjustment is
possible through a variety of custom dials.</Para>
<Para>You can:</Para>
<List>
<Item><Link URL="Order.html">Order your own turkey wrench</Link></Item>
<Item><Link URL="Wrenches.htm">Read more about wrenches</Link></Item>
<Item><Link URL="Catalog.zip">Download the catalog</Link></Item>
</List>
<Para>The turkey wrench costs <b>just $19.99</b> and, if you
order now, comes with a <b>hand-crafted shrimp hammer</b> as a
bonus gift.</Para>
</Description>
</Product>
have a reference to only one record in the first table. Here’s an example:
Let’s say you create a table called grades, which contains a column for student
IDs as well as columns for class names and the grades themselves. Because a
student can take multiple classes, but each grade applies to only one student,
the relationship between students and grades is a one-to-many relationship.
In this case, id_students in the grades table is a foreign key relating to the
students table.
An example of a many-to-many relationship is the relationship between
students and classes. Each student is usually enrolled in several classes, and
each class usually contains multiple students. In a relational database, such a
relationship is expressed using what is sometimes referred to as a joining table
a table that exists solely to express the relationship between two pieces of data.
The schema contains two tables, students and classes. You already know about
the students table; the classes table contains information about the classes
offered the name of the professor, the room where the class is held, and the
time at which the class is scheduled.
Before you can deal with integrating databases and XML, you need to
understand both databases and XML. You’ve been learning about XML for a
while now, so consider this a crash course in database theory.
To relate students to classes, you need a third table, called classes_students
(or a similarly descriptive name). At a bare minimum, this table must include
two columns, id_students and id_classes, both of which are foreign keys point-
ing to the students and classes tables, respectively. These two columns are used
to express the many-to-many relationship. In other words, both of the other
two tables have one-to-many relationships with this table. Using this table, each
student can be associated with several classes, and each class can be associated
with any number of students. It may also contain properties that are specific to
the relationship, rather than to either a student or a class specifically.
In this case, the ∗ is the select list. The select list indicates which database
columns should be included in the query results. When a ∗ is supplied, it indi-
cates that all of the columns in the table or tables listed in the FROM clause
should be included in the query results.
The FROM clause contains the list of tables from which the data will be
retrieved. In this case, the data is retrieved from just one table, students. We
now explain how to retrieve data from multiple tables in a bit.
Let’s go back to the select list. If you use a select list that isn’t simply ∗,
you include a list of column names separated by commas. You can also rename
columns in the query results (useful in certain situations), using the AS key-
word, as follows:
SELECT id_students AS id, student_name, state
FROM students
As the results in Table 17.2 show, only the student name and state col-
umns are returned for the records.
Without DISTINCT, this query would return the city of every student in
the students table. In this case, it returns only the distinct values in the table,
regardless of how many of each of them there are. In this case, there are only
three records in the table and each of them has a unique city, so the result set
is the same as it would be if DISTINCT were left off.
When you use the WHERE clause, you must include an expression that
filters the query results. In this case, the expression is very simple. Given
that id_students is the primary key for this table, this query is sure to return
only one row. You can use other comparison operators as well, like the > or
!= operators. It’s also possible to use Boolean operators to create compound
expressions. For example, you can retrieve all of the students who pay more
than $10,000 per year in tuition and who are classified as freshmen using the
following query:
SELECT student_name
FROM students
WHERE tuition > 10000
AND classification = ‘freshman’
Table 17.5 shows the results of this query.
Table 17.5 Database Records as the Results of the Above Query
Student_name
James Polk
There are also several other functions you can use in the WHERE clause
that enable you to write more powerful queries. The LIKE function allows
you to search for fields containing a particular string using a regular expres-
sion like syntax. The BETWEEN function allows you to search for values
between the two you specify, and IN allows you to test whether a value is a
member of a set you specify.
INSERTING RECORDS
The INSERT statement is used to insert records into a table. The syntax is
simple, especially if you plan on populating every column in a table. To insert
a record into majors, use the following statement:
INSERT INTO majors
VALUES (115, 50, ‘Math’, ‘English’)
The values in the list correspond to the id_majors, id_students, major, and
minor columns, respectively. If you only want to specify values for a subset of
the columns in the table, you must specify the names of the columns as well:
INSERT INTO students
(id_students, student_name)
VALUES (50, ‘Milton James’)
When you create tables, you can specify whether values are required in
certain fields, and you can also specify default values for fields. For exam-
ple, the classification column might default to freshman because most new
student records being inserted will be for newly enrolled students, who are
classified as freshmen.
UPDATING RECORDS
When you want to modify one or more records in a table, the UPDATE state-
ment is used. Here’s an example:
UPDATE students
SET classification = ‘senior’
The previous SQL statement will work, but you can figure out what’s
wrong with it. Nowhere is it specified which records to update. If you don’t
tell it which records to update, it just assumes that you want to update all of
the records in the table, thus the previous query would turn all of the stu-
dents into seniors. That’s probably not what you have in mind. Fortunately,
the UPDATE statement supports the WHERE clause, just like the SELECT
statement.
UPDATE students
SET classification = ‘senior’
WHERE id_students = 1
That’s more like it. This statement updates the classification of only one
student. You can also update multiple columns with one query:
UPDATE students
SET classification = ‘freshman’, tuition = 7500
WHERE id_students = 5
As you can see from the example, you can supply a list of fields to update
with your UPDATE statement, and they will all be updated by the same query.
DELETING RECORDS
The last SQL statement, the DELETE statement, is similar to the UPDATE
statement. It accepts a FROM clause and optionally a WHERE clause. If you
leave out the WHERE clause, it deletes all the records in the table. Here’s an
example:
DELETE FROM students
WHERE id_students = 1
You now know just enough about SQL to get into trouble! Actually, your
newfound SQL knowledge will come in handy a bit later in the lesson when
you develop an application that carefully extracts data from a database and
encodes it in XML. But first, you find out how to export an entire database
table as XML.
The news articles could be distributed via XML files so that they could
easily be transformed for presentation on the Web, or they could be imported
into a relational database and published from there.
Now, let’s look at how you might design a database to store this informa-
tion. As mentioned earlier, the path of least resistance is just to stick the whole
XML document in a field. However, that probably isn’t a good idea for this
file because it contains more than one automobile “record.”
As you can see, the XML document has been turned into two tables,
automobiles and options. The automobiles table contains all the information
stored in the attributes of the automobile tag in the XML document. Because
automobiles have a one-to-many relationship to options, we created a sepa-
rate table for them. In the options table, id_automobiles is a foreign key that
relates back to a specific automobile in the automobiles table.
To make sure you understand why the automobile options were broken
out into a separate database table, consider that the number of options for
a single automobile can vary from one automobile to the next. This is a sce-
nario where a single database field in the automobiles table can’t account for
a varying amount of data (hence the one-to-many relationship). Therefore,
the solution is to break out the options into a separate table where each row
is tied back to a specific automobile. Then you can add as many options as
you want for one automobile as long as each option includes the appropriate
automobile ID.
MySQL is a very popular open source database that does a great job for
small- to medium-scale applications. A nice front-end is available for MySQL
called phpMyAdmin, which provides a Web-based user interface for interact-
ing with a MySQL database. phpMyAdmin provides a very easy-to-use export
feature that will export any MySQL database as an XML document.
To get started exporting an XML document from a MySQL database,
open the database in phpMyAdmin, and select the table you want to export.
Then click the Export tab. Within the Export options, click XML to indicate
that XML is the output data format. If you want to generate an XML file that
is stored on the Web server, click the Save Now. You can choose to save the
XML file locally or otherwise use the XML code for further processing and
manipulation. The key point to realize is that with one button click, you’ve
converted an entire tabular database into a well-formed XML document.
The first few lines of the page establish a database connection and open
the Music City Mafia hockey database. A SQL query is then constructed
based upon a parameter ($season) that is passed into the page via the URL.
The point of this parameter is to allow you to limit the XML file to a par-
ticular season of data (http://www.musiccitymafia.com/mcm_schedule.php?
season=Summer%202005).
The %20 near the end of URL is just a separator to provide a space
between the word Summer and the word 2005. The result of this URL is
that the mcm_schedule.php Web page assigns the value Summer 2005 to the
variable $season, which can then be used throughout the PHP code. And,
in fact, it is when the SQL query is issued in lines 7 through 9 of the listing.
More specifically, the date, time, opponent, location, type, outcome, goals for,
goals against, and overtime database fields are selected from the games table
but only for the Summer 2005 season. The result of this query is stored in the
$mcm_result variable (line 10).
In PHP programming, all variable names are preceded by a dollar sign
($). The next big chunk of code goes through the results of the SQL query
one record at a time, formatting the data into XML code. Notice that the
XML processor directive is first generated (line 15), followed by a root tag,
<games> (line 16). Each piece of pertinent game data is then further format-
ted into XML code in lines 17 through 28. The document is wrapped up with
a closing </games> tag in line 29.
The last important step in the PHP code is writing the XML data to a file.
The file is named mcm_results.xml, and the XML data is written to it with just a
few lines of code (lines 32 to 34). A simple line of HTML code is then written to
the browser so that you can access the XML file. More specifically, a link is gen-
erated that allows you to click and view the XML document (lines 36 and 37).
18
WEB SERVICES
WEB SERVICES
Web Services, in the general meaning of the term, are services offered via the
Web. In a typical Web services scenario, a business application sends a request
to a service at a given URL using the SOAP protocol over HTTP. The service
receives the request, processes it, and returns a response. As example of this,
consider a stock quote service, in which the request asks for the current price
of a specified stock, and the response gives the stock price. This is one of the
simplest forms of a Web service in that the request is filled almost immedi-
ately, with the request and response being parts of the same method call.
Another example could be a service that maps out an efficient route for
the delivery of goods. In this case, a business sends a request containing the
delivery destinations, which the service processes to determine the most
cost-effective delivery route. The time it takes to return the response depends
on the complexity of the routing, so the response will probably be sent as an
operation that is separate from the request.
Technically, Web services are actually application components which
communicate using open protocols. They are self-contained, self-describing
and modular applications that can be published, located, and can be invoked
across the Web. Web Services define a platform-independent standard based
on XML to communicate within distributed systems. XML is used to tag the
data.
Web services can convert an application into a Web-application, which
can publish its function or message to the world.
Service
provider
sh
bli
Bi
Pu
nd
Service Service
broker Find requester
Service
Broker
i
UDDI
WSDL WSDL
f(x)
SOAP
Service Service
Requester Provider
FIGURE 18.2 Web Services Architecture 2
Imports System
Imports System.Web.Services
Public Class TempConvert :Inherits WebService
<WebMethod()> Public Function FahrenheitToCelsius
(ByVal Fahrenheit As String) As String
dim fahr
fahr=trim(replace(Fahrenheit,",","."))
if fahr="" or IsNumeric(fahr)=false then return "Error"
return ((((fahr) - 32) / 9) * 5)
end function
<WebMethod()> Public Function CelsiusToFahrenheit
(ByVal Celsius As String) As String
dim cel
cel=trim(replace(Celsius,",","."))
if cel="" or IsNumeric(cel)=false then return "Error"
return ((((cel) * 9) / 5) + 32)
end function
end class
This document is saved as an .asmx file. This is the ASP.NET file exten-
sion for XML Web Services.
The first line in the example states that this is a Web Service, written in
VBScript, and has the class name “TempConvert:”
<%@ WebService Language="VBScript" Class="TempConvert" %>
The next steps are basic VB programming. This application has two func-
tions: one to convert from Fahrenheit to Celsius and one to convert from
Celsius to Fahrenheit.
The only difference from a normal application is that this function is
defined as a “WebMethod()”. Use “WebMethod()” to convert the functions
in your application into Web services:
<WebMethod()> Public Function FahrenheitToCelsius
(ByVal Fahrenheit As String) As String
dim fahr
fahr=trim(replace(Fahrenheit,",","."))
if fahr="" or IsNumeric(fahr)=false then return "Error"
return ((((fahr) - 32) / 9) * 5)
end function
<WebMethod()> Public Function CelsiusToFahrenheit
(ByVal Celsius As String) As String
dim cel
cel=trim(replace(Celsius,",","."))
if cel="" or IsNumeric(cel)=false then return "Error"
return ((((cel) * 9) / 5) + 32)
end function
Then, end the class:
end class
Publish the .asmx file on a server with .NET support, and you will have
your first working Web Service.
How To Do It
Here is the code to add the Web Service to a Web page:
<form action='tempconvert.asmx/FahrenheitToCelsius'
method="post" target="_blank">
<table>
<tr>
<td>Fahrenheit to Celsius:</td>
<td>
<input class="frmInput" type="text" size="30" name="Fahrenheit">
</td>
</tr>
<tr>
<td></td>
<td align="right">
<input type="submit" value="Submit" class="button">
</td>
</tr>
</table>
</form>
<form action='tempconvert.asmx/CelsiusToFahrenheit'
method="post" target="_blank">
<table>
<tr>
<td>Celsius to Fahrenheit:</td>
<td>
<input class="frmInput" type="text" size="30" name="Celsius">
</td>
</tr>
<tr>
<td></td>
<td align="right">
<input type="submit" value="Submit" class="button">
</td>
</tr>
</table>
</form>
Substitute the “tempconvert.asmx” with the address of your Web service like:
http://www.example.com/webservices/tempconvert.asmx
SOAP
SOAP is the protocol specification that defines a uniform way of passing the
XML-encoded data. It also defines the way to perform the remote procedure
calls (RPCs) using the HTTP as a underlying communication protocol.
SOAP arises from a realization that no matter how nifty a current middle-
ware offerings are, they need the WAN wrapper. Architecturally, sending the
messages as a plain XML has advantages in terms of ensuring the interopera-
bility (and debugging). Middleware players seem willing to put up with costs
of parsing and serializing the XML in order to scale their approach to wider
networks.
WSDL Documents
A WSDL document is just a simple XML document. It contains set of defini-
tions to describe a Web service.
definition of a port.......
</portType>
<binding>
definition of a binding....
</binding>
</definitions>
A WSDL document can also contain other elements, like extension ele-
ments, and a service element that makes it possible to group together the
definitions of several Web services in one single WSDL document.
WSDL Example
This is a simplified fraction of a WSDL document:
<message name="getTermRequest">
<part name="term" type="xs:string"/>
</message>
<message name="getTermResponse">
<part name="value" type="xs:string"/>
</message>
<portType name="glossaryTerms">
<operation name="getTerm">
<input message="getTermRequest"/>
<output message="getTermResponse"/>
</operation>
</portType>
UDDI BENEFITS
Any industry or businesses of all sizes can benefit from UDDI.
Before UDDI, there was no Internet standard for businesses to reach
their customers and partners with information about their products and ser-
vices. Nor was there a method of how to integrate into each other’s systems
and processes. Problems the UDDI specification can help to solve:
Making it possible to discover the right business from the millions cur-
rently online.
●● Defining how to enable commerce once the preferred business is
discovered.
●● Reaching new customers and increasing access to current customers.
●● Expanding offerings and extending market reach.
●● Solving customer-driven need to remove barriers to allow for rapid par-
ticipation in the global Internet economy.
●● Describing services and business processes programmatically in a single,
open, and secure environment.
A
XML BASICS
1. In the XML sample, we see that information is seeded over several lines.
However, this was generated by the parser (Microsoft Internet Explorer).
In general, an XML document is a continuous document without a car-
riage return or line feed characters in it.
2. Any XML document that meets the basic rules as defined by the XML
specification is called a well-formed XML document. An XML document
can be checked to determine whether it is well-formed—that is, whether
the document has the correct structure (syntax).
3. When an XML document meets the rules defined in the DTD, it is called
a valid XML document. (DTD: Document Type Definiton)
4. Schemas are similar to DTDs, but they use a different format. DTDs and
schemas are useful when the content of a group of documents shares a
common set of rules.
5. XML gives you the opportunity to create messages in standard forms.
6. XML separates data from presentation.
7. XML gives you the possibility to call methods behind firewalls and
between different platforms.
8. XML describes the contents but not the layout.
9. XML is tagged-based: every tag begins with <descr> and an end tag
</descr>, where descr is the name of the tag.
10. You can define the character encoding used by encoding = “UTF-8.”
13. The most basic components of an XML document are elements, attributes,
and comments.
14. Every tag should have an end-tag.
Correct is: <tag> <element> … </element></tag>
Incorrect is: <tag> <element> … </tag> </element>
15. All attribute values should be written between single or double quotes.
i.e. <element id="myvalue">
16. The designer of the XML document defines the structure of the document
and the mark-up elements.
17. XML must be properly written to get interpreted (not with HTML). This
means that every tag must have an end tag.
Elements can be nested, i.e.,
<Patient>
<PatientName>John Smith</PatientName>
<PatientAge>108</PatientAge>
<PatientWeight>155</PatientWeight>
</Patient>
18. -Names consist of one or more “no space” characters. If a name has only
one character, that character must be a letter, either uppercase (A-Z) or
lowercase (a-z).
-A name can only begin with a letter or an underscore.
-Beyond the first character, any character can be used, including those
defined in the Unicode standard.
-Element names are case sensitive.
or
<PatientWeight unit="KG"/> (=empty element)
tag The tag name comes just after a left angle bracket. Some tags may
consist of just the name, as in the <stream> tag. Other tags may
have attributes. Except for the XML version tag and the comment
tag, all tags in an XML file have a corresponding end tag. For
example, the <audience> tag has the end tag <Iaudience>.
type The type attributes define the type of data that the element
provides. For more information.
value The value is a character string, integer, or time value, that defines
the feature.
Value Types
Type Value Notes
type=“bag” group Indicates a group of properties
type=“bool” true|false True or false values use this type, which
stands for “Boolean.” You can also use 1 for
true and 0 for false.
(continued)
Value Types
Type Value Notes
type=“double” decimal values A double type is used for very large values,
values that include a decimal point, and
values that may be negative.
type=“duration” time value A duration type indicates a time value in the
format [d:][h:][m:]s[.xyz].
type=“string” text string A string can include letters and numbers.
Do not use double quotation marks within
the value string. Maximum lengths may vary,
but are typically at least 256 characters
type=“uint” unsigned Values that are positive integers, including
integer 0, use this type.
xsi:type=“value” customized An xsi: prefix allows for customized value
value types.
Duration Syntax
The format for the value of a parameter that specifies a duration is the
following:
[d:][h:][m:]s[.xyz]
30 30 seconds
45.5 45–1/2 seconds
5:35 5 minutes, 35 seconds
1:0.0 1 hour
1:22:30:0 1 day, 22 hours, 30 minutes
XML Recommendations
Although not strict rules, the following recommendations will help you keep
your XML markup organized and understandable.
XML Comments
As in HTML, XML has a comment tag that starts with these characters:
<!--
A comment can be any number of lines long. It can start and end any-
where in a XML file. Multiple comments cannot be nested, though. Use com-
ments to describe what various sections of your XML file are meant to do.
This helps other people understand your file more easily.
once for each level of indentation. In a stream section, for example, the element
tags are indented one level from the <stream> tag. The two tags that make
up the stream context are indented one level from the <streamContext> tag:
<stream xsi:type="audioStream">
<codecFlavor type="uint">25</codecFlavor>
<codecName type="string">cook</codecName>
<encodingComplexity type="string">high</encodingComplexity>
<pluginName type="string">rn-audiocodec-realaudio</pluginName>
<streamContext type="bag">
<audioMode type="string">voice</audioMode>
<presentationType type="string">audio-only</presentationType>
</streamContext>
</stream>
B
WELL FORMED XML
DOCUMENTS
XML Declaration
<?xml version="version_number" encoding="encoding_declaration"
standalone="standalone status"?>
The version attribute is the version of the XML standard that this docu-
ment complies with. The encoding attribute is the Unicode character set that
this document complies with. Using this encoding, you can create documents
in any language or character set. The standalone attribute specifies whether
the document is dependent on other files (standalone
= “no”) or complete by itself (standalone = “yes”).
The rule component defines the rule for the content contained in the
element. These rules define the logical structure of the XML document and
can be used to check the document’s validity. The rule can consist of a generic
declaration and one or more elements, either grouped or unordered.
●● The Predefined Content Declarations
Three generic content declarations are predefined for XML DTDs:
PCDATA, ANY, and EMPTY.
1. PCDATA: The PCDATA declaration can be used when the content
within an element is only text—that is, when the content contains no child
elements. Our sample document- snippet contains several such elements,
including title, a, h1, and b. These elements can be declared as follows
(the pound sign identifies a special predefined name).
<!ELEMENT title (#PCDATA)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT h1 (#PCDATA)>
<!ELEMENT b (#PCDATA)>
2. ANY: The ANY declaration can include both text content and child ele-
ments.The html element, for example, could use the ANY declaration as
<!ELEMENT html ANY>
Marker Meaning
? The element either does not appear or can
appear only once (0 or 1).
+ The element must appear at least once (1 or
more).
∗ The element can appear any number of times, or
it might not appear at all (0 or more).
utting no marker after the child element indicates that the element
P
must be included and that it can appear only one time. The head ele-
ment contains an optional base child element. Here are some sample
markers.
(2) You can separate the elements by a comma (,) or with a pipe (|).
If you use the pipe, it indicates that one or the other child ele-
ment will be included, but not both. The latest sample defines
an unsorted set of child elements.
Entities
Entities are like macros in the C programming language in that they allow
you to associate a string of characters with a name. This name can then be
used in either the DTD or the XML document; the XML parser will replace
the name with the string of characters. All entities consist of three parts: the
word ENTITY, the name of the entity (called the literal entity value), and
the replacement text—that is, the string of characters that the literal entity
value will be replaced with. All entities are declared in either an internal or
an external DTD.
C
XML OVERVIEW
●● The document must have only one top-level element. This element is
called the root element.
●● Every element must have both a start and an end tag.
●● All attributes must have values, and those values must be quoted.
●● Elements must not overlap. You cannot use <a><b></a></b>, because
the ending </a> tag comes before </b>.
●● You must convert &, <, and > to their entity equivalents. You can use
htmlentities( ) to solve this.
●● When a document meets these rules, it’s valid, or well-formed, XML.
C.3 Schemas
When you validate HTML, your file is checked not only to see if it’s well-
formed, but also that your markup corresponds to the specification. While
your application parses XML instead of HTML, it also expects data in a cer-
tain format. When it gets anything else, it can’t work correctly.
Therefore, it’s beneficial to create a data specification, or schema, that
outlines the layout of the XML document your program requires. This allows
you to check the input XML file against a specification to see if the XML is not
only well-formed, but also valid. There are three different schema formats:
DTDs, XML Schema, and RelaxNG.
DTD: DTDs, short for Document Type Definitions, are the old way to
write a schema. They come from SGML and have a more limited syntax than
other formats. They’re not written in XML, so they can be difficult to read.
Try to avoid DTDs when you can.
XML Schema: The XML schema is the W3-approved document speci-
fication format. XML schemas are written in XML, so your XML parser can
also validate the schema.
C.4 Transformations
One of XML’s great advantages is that you can easily manipulate an XML
document into another format. It could be HTML, PDF, or even another
XML document. For instance, you could create an RSS feed for the articles
in your XML-based CMS.
XSLT, short for Extensible Stylesheet Language Transformations, is a
W3C-defined language for modifying XML documents. With XSLT, you can
create templates (written, of course, in XML) that act as a series of instruc-
tions for how an XML document provided as input should end up as output.
C.5.2 Syntax
In XML, a namespace name is a string that looks like a URL, for example,
http:// www.example.org/namespace/. This URL doesn’t have to resolve to an
actual Web page that contains information about the namespace, but it can. A
namespace is not a URL, but a string that is formatted the same way as a URL.
This URL-based naming scheme is just a way for people to easily create
unique namespaces. Therefore, it’s best only to create namespaces that point
to a URL that you control. If everyone does this, there won’t be any name-
space conflicts. Technically, you can create a namespace that points at a loca-
tion you don’t own or use in any way, such as http:// www.yahoo.com. This is
not invalid, but it is confusing.
Unlike domain names, there’s no official registration process required
before you can use a new XML namespace. All you need to do is define the
namespace inside an XML document. That “creates” the namespace. To do
this, add an xmlns attribute to an XML element. For instance:
<tag xmlns:example="http://www.example.com/namespace/">
When an attribute name begins with the string xmlns, you’re defining a
namespace. The namespace’s name is the value of that attribute. In this case,
it’s http://www.example.com/ namespace/.
C.5.4 Examples
Example: This code snippet updates the address book from Example C-1
and places all the elements inside the http://www.example.com/address-book/
namespace.
Example: Simple address book in a namespace
<ab:address-book xmlns:ab="http://www.example.com/address-book/">
<ab:person id="1">
<ab:firstname>Rasmus</ab:firstname>
<ab:lastname>Lerdorf</ab:lastname>
<ab:city>Sunnyvale</ab:city>
<ab:state>CA</ab:state>
<ab:email>rasmus@php.net</ab:email>
</ab:person>
<!— more entries here —>
</ab:address-book>
</bigbird:person>
<!— more entries here —>
</bigbird:address-book>
The ab prefix has been changed to bigbird, but the namespace is still
http:// www.example.com/address-book/. Therefore, an XML parser would
treat these documents as if they were the same.
C.6 XPath
XPath is a W3C standard (http://www.w3.org/TR/xpath) for locating portions
of an XML document that match a set of criteria. Use XPath to find the names
of all the people in your XML address book who live in New York, all the
URLs for articles written on PHP in a Meerkat RSS feed, or the most recent
entry into your XML-based content management system.
Think of XPath as SQL for XML documents. You can do all kinds of
advanced queries using XPath, such as finding items with a certain parent,
attribute, or location in the tree. XPath uses the same syntax as XSLT, so you
might be familiar with parts of it, even if you’re not an XPath expert.
There are two parts to an XPath query: the portion of the XML document
you wish to retrieve and the restrictions you want to place upon your query.
This is analogous to SQL SELECT and WHERE clauses.
For example, you can search the XML address book in Example C-1 for
all the email addresses:
/address-book/person/email
The text inside square brackets refines the XPath query. [city = “New
York” and state = “NY”] restricts the search to entries where the city element
under person is New York and the state is NY. To check attributes instead
of elements, prepend an @:
/address-book/person[@id = "1"]
Child Element
A child element sits inside of the parent and further itemizes the tags within
the file. In an inventory listing of lamps, the element tag “desktop,” might be
a child element of both “lamps” and “inventory.”
<inventory> - root/parent element
<lamps> - parent element
<desktop> - child element
Comments
Comments are data strings not meant to be seem by those visiting a Web
page. Comments are intended to define or explain a coding section to anyone
who must update or review the XML file.
<!--this is a comment -->
CDATA
Character data. CDATA is text in a document that should not be parsed by the
XML parser. Any entities included in the CDATA block will not be replaced
by their value and markup (such as HTML tags) will not be treated as markup.
Character
A character is an atomic unit of text as specified by ISO/IEC 10646. A charac-
ter is a single alpha, numeric, or punctuation mark.
Character Set
A mapping of a set of characters to their numeric values. For example, Unicode
is a 16-bit character set capable of encoding all known characters; it is used as
a worldwide character-encoding standard.
Component
An object that encapsulates both data and code, and provides a well-specified
set of publicly available services.
Content
Content is all data between the start tag and end tag of an element. Content
may be made up of markup characters and character data.
Content Model
The content model in XML is the expression specifying what elements and
data are allowed within an element.
Declaration Statement
The declaration statement gives the browser information to recognize the lan-
guage and syntax of the file. Without a declaration statement, the Internet
processor is unable to compute the code. This is the first line of any XML
document and defines the language, version, specifies encoding, and declares
the standalone status of the file. Only the language definition and version are
required for a declaration statement. Encoding and standalone are optional
attributes.
<?xml version="1.0" encoding="UTF-8" standalone="yes">
Data Strings
A data string is the information you want the viewer to see. For example,
a description of an inventory item would be a data string. Data strings sit
between the opening and closing tags of element.
<description> - element tag
Data Island
Data islands are a proposed format for putting XML-based data inside HTML
pages (<XML> or <SCRIPT language=“XML”>). HTML is used as the pri-
mary document or display format, and XML is used to embed data within the
document.
Data Type
The type of content that an element contains such as a number or a date. In
XML, an author can specify an element’s data type.
Delimiter
A delimiter is a special character that marks the beginning and end of a string
or text field.
Document Element
Document element is the top-level element of an XML document. Only one
top-level element is allowed. The document element is a child of the docu-
ment root.
Document Root
Document root is the top-level node of an XML document. Its descendants
branch out from it to form the XML tree for that document. The document
root contains the document element and can also contain a set of processing
instructions and comments.
ECMA Script
ECMA Script is W3C’s evolving scripting specification (based on JavaScript).
ECMA is an international, Europe-based industry association founded in
1961 and dedicated to the standardization of information and communication
systems.
Empty Declaration
Empty declaration in XML is the DTD declaration for an empty tag. For exam-
ple, if <xyz/> is an empty tag, the empty declaration looks like: <!ELEMENT
xyz EMPTY>.
Empty Element
Not all elements have content. Those elements that do not have content are
empty elements and in XML may be noted with a special empty element
tag that ends with a slash directly preceding the closing angle bracket of the
tag, so an XML parser can immediately recognize it as an empty tag and not
bother looking for a matching end tag. If “xyz” is an empty tag, it looks like
<xyz/>.
Entity
Entity in XML is a virtual storage unit. It is often a separate file, but may be
a string or even a database record. In XML, an entity declaration provides
the ability to have constants or replacement strings, which are expanded by a
pre-processor. An entity declaration maps some token to a replacement string.
Later the token can be prefixed with the & character and the replacement
string is put in its place. An entity is a XML structural construct. It is a char-
acter sequence or well-formed XML hierarchy associated with a name. The
entity can be referred to by an entity reference to insert the entity’s contents
into the tree at that point. The function of an XML entity is similar to that of
a macro definition. Entity declarations occur in the DTD.
Entity Reference
XML structural construct. Refers to the content of a named entity. The name
is delimited by the ampersand and semicolon characters; for example, &book-
name; and <. It is used in much the same way as a macro.
Event Handler
The code that is executed when an event occurs.
Element Tags
Element tags are created by the author and establish a hierarchical syntax
to the code. When designing elements for an XML information file, supply
names to tags that are recognizable and easily managed. For example, when
creating an inventory file, you might use key names, such as “table” to supply
structure to the code. Within the element “table” you might list more tags
the further identify the inventory, such as “desktop” or “floor.” The simplicity
of XML lies is this process of naming the element tags. XML does not have
static tags that you must memorize in order to write valid code. All element
tags must have closing tags.
<table> - element tag
</table> - closing tag
Generic Identifier
Generic identifiers, often called the “GI” is the XML tag name. So <head>
has a generic identifier equal to “head.” A generic identifier is unique in its
namespace.
Grammar
The syntax of a language. It is expressed formally by a set of production rules,
such as the EBNF rules.
Granular Updating
Changing only an element of a page, rather than rebuilding the entire page.
The new element is sent from the server to the client, which replaces the old
element while leaving the rest of the page intact.
Graphing
A very generalized way to represent certain data relationships.
deliver simple documents over the Web, its simplicity imposes limitations
that significantly raise the cost of deploying complex Websites. Currently, ver-
sion HTML 4.0 is the official W3C Recommendation, but many authors and
browsers are still using HTML 3.2.
ID
A special attribute type within the XML language. The ID attribute on the
XML element provides a unique name, enabling links to that element using
the IDREF attribute type. The value associated with the ID attribute must be
unique within that XML document. IDs are currently declared with a DTD
or schema.
Java Script
Java Script is Netscape’s scripting language; current version is 1.2.
JScript
JScript is Microsoft’s scripting language derived from JavaScript.
Markup
Markup is a text character that identifies the storage and logical structures of
the data. Tags and entities are markup characters of an XML document. It is
the text in an XML document that does not represent character data: start
tags, end tags, empty-element tags, entity references, character references,
comments, CDATA section delimiters, DTDs, and processing instructions.
Metadata
Metadata is generally a machine understandable information about data, spe-
cifically for data describing Web resources.
Mixed Content
An element type has mixed content when elements of that type can contain
character data, optionally interspersed with child elements. In this case, the
types of the child elements can be constrained, but not their order or their
number of occurrences.
Namespace
A namespace is a set of unique identifiers. It is a mechanism to resolve naming
conflicts between elements in an XML document when each comes from a
different vocabulary. It allows the commingling of like tag names from differ-
ent namespaces. A namespace identifies an XML vocabulary defined within a
URN. An attribute on an element, attribute, or entity reference associates a
short name with the URN that defines the namespace; that short name is then
used as a prefix to the element, attribute, or entity reference name to uniquely
identify the namespace. Namespace references have scope. All child nodes
beneath the node that specifies the namespace inherit that namespace. This
allows nonqualified names to use the default namespace.
NDATA
The literal string “NDATA” is used as part of a notation declaration.
Normalize
To collapse two or more adjacent text nodes in the document tree into one
text node. This ensures that the tree structure will match the tree structure
generated when the document is stored and reloaded. The element object
offers a normalize method.
Notation
Usually refers to a data format, such as BMP. A notation identifies by name
the format of unparsed entities, the format of elements that bear a notation
attribute, or the application to which a processing instruction is addressed.
Notation Declaration
A notation declaration provides a name and an external identifier for a nota-
tion. The name is used in the entity and attribute-list declarations and in
attribute specifications.
The external identifier is used for the notation, which can allow an XML
processor or its client application to locate a helper application capable of
processing data in the given notation.
Parent Element
A parent element holds other related element tags. For example, a file that
lists inventory might have a parent tag called lamps‚ and contain tags the list
the individual lamps available in the product line. The root element is the par-
ent tag for all other elements in the XML file.
<inventory> - parent element
<lamps> - element tag
Prolog
An XML prolog consists of a declaration of the version of XML being used as
well as the DTD that the document will validate against.
PI (Processing Instruction)
PIs are instructions that are passed through to the application. The target is
specified as part of the PI. The syntax for a PI is <?pi-name content?>.
RDF Namespace
RDF namespace is a specialized XML syntax designed to provide a limited
form of RDF on the Web.
Reference Node
The reference node for a search context is the node that is the immediate par-
ent of all nodes in the search context. Every search context has an associated
reference node.
Root Element
The root element is the first named tag of every XML file and is a container
for all other elements.
<inventory> - root element
<lamps> - element tag
SOAP
SOAP is an acronym that stands for Simple Object Access Protocol.
SOAP is an XML-based protocol that allows you to activate an applica-
tion or object within an application across the Internet. SOAP is used for
distributed computing and Internet applications. It was developed by a group
of vendors, including Microsoft, to revolutionize how Web applications are
developed.
Outside of Web development, SOAP stands for Symbolic Optimal Assem-
bly Programming.
SAX
Stands for Simple API for XML. An event driven method of dealing with an
XML file.
Instead of containing the entire hierarchy in memory at one time, it pres-
ents elements as events which can then be exploited by your code. SAX has
the advantage of less memory consumption for large files, but has the disad-
vantage that the programmer must write code to save anything he wants saved
and must write changes to the XML file in sequential order. DOM allows
random changes to elements. Because it doesn’t need to keep entire files in
memory all at once, SAX is universally useful, whereas DOM is not useful for
truly large XML files.
Tags
Tags are text structures that mark the beginning and end of elements within
the XML document. Tags are markup characters.
Target
The application to which a processing instruction is directed. The target
names beginning with “XML” and “xml” are reserved. The target appears as
the first token in the PI. For example, in the XML declaration <?xml ver-
sion=“1.0”?>, the target is “xml.”
Text Markup
Inserting tags into the middle of an element’s text flow to mark certain parts
of the element with additional meta-information.
Unicode
Unicode is a standard for representing characters from languages around the
world. Unicode standards are synchronized with UCS-2 subset of ISO 10646.
Updategram
XML generated by agents to notify the client of changes to data on the server,
or vice versa; the agents could run on the middle tier to access multiple exist-
ing database management systems (DBMSs) and output XML.
Valid
An XML document is valid if it conforms to the vocabulary specified in a
DTD or schema. In other words, an XML document with an associated docu-
ment type declaration that follows all the rules of that declaration is valid.
Well Formed
A well-formed XML document follows all the rules of the XML specifica-
tion but is not necessarily valid according to an associated document type
declaration. A well formed XML document contains one or more elements;
it has a single document element, with any other elements properly nested
under it; each of the parsed entities referenced directly or indirectly within
the document is well-formed. A well-formed XML document does not neces-
sarily include a DTD.
Linking could be multidirectional, and links could exist at the object level
rather than just at a page level.
XML Aware
Any software application that recognizes the XML data format and under-
stands XML concepts. Often XML aware software contains an embedded
XML parser.
XML Data
XML data is a proposal, submitted by Microsoft and others to the W3C, to
define a number of common scalar data types that can be applied to elements.
The XML-Data proposal includes the concept of XML schemas.
XML Declaration
An XML declaration is an optional declaration at the top of an XML docu-
ment that specifies the version of XML and an encoding declaration. The first
line of an XML file can optionally contain the “xml” processing instruction,
which is known as the XML declaration. The XML declaration can contain
pseudo-attributes to indicate the XML language version, the character set,
and whether the document can be used as a standalone entity.
XML Document
A data object that is well-formed, according to the XML recommendation,
and that might (or might not) be valid. The XML document has a logical
structure (composed of declarations, elements, comments, character refer-
ences, and processing instructions) and a physical structure (composed of
entities, starting with the root, or document entity).
XML Engine
Software that supports XML functionality on the client; Internet Explorer 4.0
and Internet Explorer 5 include XML engines.
XML Vocabulary
An XML vocabulary is an XML tag set with a specific functionality. SMIL,
WIDL, MathML, and ICE are all examples of XML vocabularies. The actual
elements used in particular data formats. Channel Definition Format, for
example, is a format for describing collections of pages and when these pages
should be downloaded. Vocabularies, along with the structural relationships
between the elements, can be defined in a DTD or a schema.
XSL Pattern
Part of XSL that provides simple querying capability against an XML docu-
ment. Internet Explorer 5 supports XSL Patterns with some of the extensions
described in XML Query Language.
XML DOM
XML DOM provides a standard for structure and navigational properties of
all XML files.
XPath
XPath is a language that provides navigation through XML using path
expressions.
XQuery
XQuery is a language that provides a way to search and extract elements and
attributes within an XML document.
XML Schema
Schemas are XSL documents designed to provide structure to the linked
XML file for validation and output.
XForms
XForms provides a way to display forms within XML to create interactive
pages.
XML Editor
An XML editor is a software application that facilitates coding in the XML
markup language. There are many levels of XML editors. Some programmers
prefer a basic text editor, such as Notepad, to write XML documents. When
creating a platform that utilizes XML for Web design, a savvy author will look
to more advanced XML editors, such as Oxygen XML Editor. This package
provides not only text files to create a core data file, but also XSL formatting
and an HTML output stream. With the right editor, a designer can organize
the data and create the page all in one place.
XML Validator
XML is easy to create, but unwavering in the rules. Syntax must be followed
in order to provide pages on the internet. A validator will examine an XML
document and certify that all tags are closed and properly nested. XML makes
demands on the designer. It requires structure and proper format. Unlike
HTML, elements without closing tags or misplaced in the hierarchical stage
will generate an error. A validator will look closely at the file and help develop
well-formed XML. It is a valuable tool for both novices and veterans alike.
XML DOM
Document Object Model (DOM) is the interface that defines how data is
accessed. This is the place that allows programmers to create dynamic content
that will display the same basic way on any browser. The DOM is a standard
interface that enable all languages to work cohesively. It is not a language on
its own, but a mechanism that allows programming languages to exist. All
structured documents work within a DOM system. Without the DOM, pars-
ers would not be able to identify and processes any part of a file. It works to
locate and move the information. The DOM supplies a method for the ever
growing list of browsers to read and process code.
XML Parser
An XML parser is a module that reads the code and converts it into the XML
DOM. From this point, the file can be manipulated into presentable form.
Without a parser, computers would not understand the meaning of the files.
A parser reads the code within the XML file, determines it is well-formed,
and then assigns meaning to it. You cannot display information on a Web page
without a parser to read it.