Chapter 10: XML
Chapter 10: XML
Chapter 10: XML
Introduction
XML: Extensible Markup Language
Defined by the WWW Consortium (W3C)
Originally intended as a document markup language not a
database language
Much of the use of XML has been in data exchange applications, not as a
replacement for HTML
</bank>
XML: Motivation
Data interchange is critical in todays networked world
Examples:
Banking: funds transfer
Order processing (especially inter-company orders)
Scientific data
Chemistry: ChemML,
Genetics:
representing information
XML has become the basis for all new generation data
interchange formats
XML documents/data
Proper nesting
Improper nesting
Formally: every start tag must have a unique matching end tag, that
is in the context of the same parent element.
Every document must have a single top-level element
</account>
</customer>
.
.
</bank-1>
Attributes
Elements can have attributes
tag of an element
by ending the start tag with a /> and deleting the end tag
<![CDATA[<account> </account>]]>
Here, <account> and </account> are treated as just strings
Namespaces
XML data has to be exchanged between organizations
Same tag name may have different meaning in different
Namespaces
<bank Xmlns:FB=http://www.FirstBank.com>
<FB:branch>
<FB:branchname>Downtown</FB:branchname>
<FB:branchcity> Brooklyn</FB:branchcity>
</FB:branch>
</bank>
XML Schema
Newer, not yet widely used
| - alternatives
+ - 1 or more occurrences
* - 0 or more occurrences
Bank DTD
<!DOCTYPE bank [
<!ELEMENT bank ( ( account | customer | depositor)+)>
<!ELEMENT account (account-number branch-name balance)>
<! ELEMENT customer(customer-name customer-street
customer-city)>
<! ELEMENT depositor (customer-name account-number)>
<! ELEMENT account-number (#PCDATA)>
<! ELEMENT branch-name (#PCDATA)>
<! ELEMENT balance(#PCDATA)>
<! ELEMENT customer-name(#PCDATA)>
<! ELEMENT customer-street(#PCDATA)>
<! ELEMENT customer-city(#PCDATA)>
]>
Type of attribute
CDATA
ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)
more on this later
Whether
mandatory (#REQUIRED)
has a default value (value),
or neither (#IMPLIED)
Examples
<!ATTLIST account acct-type CDATA checking>
<!ATTLIST customer
customer-id ID
# REQUIRED
accounts
IDREFS # REQUIRED >
must be distinct
<!DOCTYPE bank-2[
<!ELEMENT account (branch, balance)>
<!ATTLIST account
account-number ID
# REQUIRED
owners
IDREFS # REQUIRED>
<!ELEMENT customer(customer-name, customer-street,
customer-city)>
<!ATTLIST customer
customer-id
ID
# REQUIRED
accounts
IDREFS # REQUIRED>
declarations for branch, balance, customer-name,
customer-street and customer-city
]>
Limitations of DTDs
No typing of text elements and attributes
customer elements
XML Schema
XML Schema is a more sophisticated schema language which
used.
XSLT
Simple language designed for translation from XML to XML and
XML to HTML
XQuery
An XML query language with a rich set of features
Wide variety of other languages have been proposed, and some
XML data
XPath
XPath is used to address (select) parts of documents using
path expressions
E.g.
E.g.
/bank-2/customer/name/text( )
returns the same names, but without the enclosing tags
XPath (Cont.)
The initial / denotes root of the document (above the top-level tag)
Path expressions are evaluated left to right
Each step operates on the set of instances produced by the previous step
Selection predicates may follow any step in a path, in [ ]
E.g.
balance subelement
Functions in XPath
XPath provides several functions
The function count() at the end of a path counts the number of
elements in the set generated by the path
E.g. /bank-2/account[customer/count() > 2]
in predicates
of account elements.
E.g. /bank-2//name
finds any name element anywhere under the /bank-2 element,
XSLT
A stylesheet stores formatting options for a document, usually
E.g. HTML style sheet may specify font colors and sizes for
headings, etc.
The XML Stylesheet Language (XSL) was originally designed
templates
XSLT Templates
Example of XSLT template with match and select part
<xsl:template match=/bank-2/customer>
<xsl:value-of select=customer-name/>
</xsl:template>
<xsl:template match=*/>
The match attribute of xsl:template specifies a pattern in XPath
Elements in the XML document matching the pattern are processed
namespace is output as is
E.g. to wrap results in new XML elements.
<xsl:template match=/bank-2/customer>
<customer>
<xsl:value-of select=customer-name/>
</customer>
</xsl;template>
<xsl:template match=*/>
Example output:
<customer> John </customer>
<customer> Mary </customer>
another tag
computed names
Structural Recursion
Action of a template can be to recursively apply templates to the
<xsl:template match=/bank>
<customers>
<xsl:template apply-templates/>
</customers >
<xsl:template match=/customer>
<customer>
<xsl:value-of select=customer-name/>
</customer>
</xsl:template>
<xsl:template match=*/>
Example output:
<customers>
<customer> John </customer>
<customer> Mary </customer>
</customers>
Joins in XSLT
XSLT keys allow elements to be looked up (indexed) by values of
subelements or attributes
Keys must be declared (with a name) and, the key() function can then
use=account-number/>
<xsl:value-of select=key(acctno, A-101)
Keys permit (some) joins to be expressed in XSLT
<xsl:key name=acctno match=account use=account-number/>
<xsl:key name=custno match=customer use=customer-name/>
<xsl:template match=depositor.
<cust-acct>
<xsl:value-of select=key(custno, customer-name)/>
<xsl:value-of select=key(acctno, account-number)/>
</cust-acct>
</xsl:template>
<xsl:template match=*/>
Sorting in XSLT
Using an xsl:sort directive inside a template causes all elements
<xsl:template match=/bank>
<xsl:apply-templates select=customer>
<xsl:sort select=customer-name/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match=customer>
<customer>
<xsl:value-of select=customer-name/>
<xsl:value-of select=customer-street/>
<xsl:value-of select=customer-city/>
</customer>
<xsl:template>
<xsl:template match=*/>
XQuery
XQuery is a general purpose query language for XML data
Currently being standardized by the World Wide Web Consortium (W3C)
XQuery uses a
find all accounts with balance > 400, with each result enclosed in
an <account-number> .. </account-number> tag
for
$x in /bank-2/account
let
$acctno := $x/@account-number
where $x/balance > 400
return <account-number> $acctno </account-number>
Let clause not really needed in this query, and selection can be
for $x in /bank-2/account[balance>400]
return <account-number> $X/@account-number
</account-number>
E.g. document(bank-2.xml)/bank-2/account
Aggregate functions such as sum( ) and count( ) can be applied
XQuery does not support groupby, but the same effect can be
Joins
Joins are specified in a manner very similar to SQL
for $b in /bank/account,
$c in /bank/customer,
$d in /bank/depositor
where $a/account-number = $d/account-number
and $c/customer-name = $d/customer-name
return <cust-acct> $c $a </cust-acct>
The same query can be expressed with the selections specified
as XPath selections:
for $a in /bank/account
$c in /bank/customer
$d in /bank/depositor[
account-number =$a/account-number and
customer-name = $c/customer-name]
return <cust-acct> $c $a</cust-acct>
$c/* denotes all the children of the node to which $c is bound, without the
subelements/tags
dereferencing IDREFs
Sorting in XQuery
Sortby clause can be used at the end of any expression. E.g. to
<bank-1>
for $c in /bank/customer
return
<customer>
$c/*
for $d in /bank/depositor[customer-name=$c/customer-name],
$a in /bank/account[account-number=$d/account-number]
return <account> $a/* </account> sortby(account-number)
</customer> sortby(customer-name)
</bank-1>
data:
SAX (Simple API for XML)
parsing events
Relational databases
Data must be translated into relational form
Advantage: mature database systems
Disadvantages: overhead of translating data and queries
Benefits:
Can store any XML data even without DTD
As long as there are many top-level elements in a document, strings are
small compared to full document, allowing faster access to individual
elements.
Drawback: Need to parse strings to access values inside the elements;
parsing is slow.
table
Benefits:
Efficient storage
Can translate XML queries into SQL, execute efficiently, and then
translate SQL results back to XML
Drawbacks: need to know DTD, translation overheads still present