Unit - 4 XML
Unit - 4 XML
Unit - 4 XML
K Mahesh
8
Well-Formed XML: Always Parsable
Any legal XML document is always parsable by an XML
parser, without knowledge of tag meaning
– The start – preamble – tells XML about the char. encoding
<?xml version=“1.0” encoding=“utf-8”?>
– There’s a single root element
– All open-tags have matching close-tags (unlike many HTML
documents!), or a special:
<tag/> shortcut for empty tags (equivalent to <tag></tag>)
– Attributes only appear once in an element
– XML is case-sensitive
– In XML all elements must be properly nested within each
other
<b><i>This text is bold and italic</i></b>
structure of an XML document
Comparisons
• HTML was designed to display data with focus on how data looks while XML was
designed to be a software and hardware independent tool used to transport and
store data, with focus on what data is.
• HTML is a markup language itself while XML provides a framework for defining
markup languages.
• HTML is a presentation language while XML is neither a programming language nor
a presentation language.
• HTML is case insensitive while XML is case sensitive.
• HTML is used for designing a web-page to be rendered on the client side while XML
is used basically to transport data between the application and the database.
• HTML has its own predefined tags while what makes XML flexible is that custom
tags can be defined and the tags are invented by the author of the XML document.
• HTML is not strict if the user does not use the closing tags but XML makes it
mandatory for the user the close each tag that has been used.
• HTML does not preserve white space while XML does.
• HTML is about displaying data, hence static but XML is about carrying information,
hence dynamic. Thus, it can be said that HTML and XML are not competitors but
rather complement to each other and clearly serving altogether different purposes.
Types of XML Documents / XML Validation
Validation is a process by which an XML document is
validated. An XML document is said to be valid if its
contents match with the elements, attributes and
associated document type declaration (DTD), and if the
document complies with the constraints expressed in it.
Validation is dealt in two ways by the XML parser.
1. Well-formed XML document
2. Valid XML document
• Well-formed
– A "Well Formed" XML document is a document that conforms to
the XML syntax rules.
– They contain text and XML tags.
– Everything is entered correctly.
– They do not, however, refer to a DTD.
<?xml version="1.0"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
• Valid
– Valid documents not only conform to XML syntax but they also
are error checked against a Document Type Definition (DTD) or
schema
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Building blocks of XML documents
• XML documents (and HTML documents) are made up by
the following building blocks:
– Elements,
– Tags,
– Attributes,
– Entities,
– PCDATA, and
– CDATA
– Processing instruction (to control application)
Building blocks of XML documents
• Elements
– Elements are the main building blocks of both XML and HTML
documents.
– Examples of HTML elements are "body" and "table".
– Examples of XML elements could be "note" and "message". Elements
can contain text, other elements, or be empty.
– Examples of empty HTML elements are "hr", "br" and "img".
– In a DTD, elements are declared with an ELEMENT declaration.
• Tags
– Tags are used to markup elements.
– A starting tag like <element_name> mark up the beginning of an
element, and an ending tag like </element_name> mark up the end of
an element.
– Examples: A body element: <body>body text in between</body>.
– A message element: <message>some message in between</message>
Building blocks of XML documents
• Attributes
– Attributes provide extra information about elements.
– Attributes are placed inside the start tag of an element. Attributes come in
name/value pairs. The following "img" element has an additional information about
a source file:
<img src="computer.gif" />
– The name of the element is "img". The name of the attribute is "src". The value of
the attribute is "computer.gif". Since the element itself is empty it is closed by a " /".
• PCDATA
– PCDATA stands for Parsed Character data.
– PCDATA is the text that will be parsed by a parser.
– Tags inside the PCDATA will be treated as markup and entities will be expanded.
• CDATA
– CDATA: (Unparsed Character data): CDATA contains the text which is not parsed
further in an XML document.
– Tags inside the CDATA text are not treated as markup and entities will not be
expanded.
<?xml version = "1.0" encoding = "UTF-8"?>
<!DOCTYPE garden [
<!ELEMENT garden (plants)*>
<!ELEMENT plants (#PCDATA)>
<!ATTLIST plants category CDATA #REQUIRED>
]>
<garden>
<plants category = "flowers" />
<plants category = "shrubs">
</plants>
</garden>
Building blocks of XML documents
• Entities
– Entities as variables used to define common text. Entity references are
references to entities.
– Most of you will known the HTML entity reference: " " that is used to
insert an extra space in an HTML document.
– Entities are expanded when a document is parsed by an XML parser.
– References always begin with the symbol "&" which is a reserved character
and end with the symbol ";".
– XML has two types of references −
1. Entity References − An entity reference contains a name between the
start and the end delimiters. For example & where amp is name.
The name refers to a predefined string of text and/or markup.
2. Character References − These contain references, such as A,
contains a hash mark (“#”) followed by a number. The number always
refers to the Unicode code of a character. In this case, 65 refers to
alphabet "A".
Predefined Character Entities
Entity References Character
< <
> >
& &
" "
' '
Numeric Character Entities
Entity name Character Decimal Hexadecimal
reference reference
Syntax :
<?target instructions?>
Where:
target - identifies the application to which the instruction is directed.
instruction - it is a character that describes the information for the
application to process.
Example:
<?xml-stylesheet href="tutorialspointstyle.css” type="text/css"?>
XML - Comments
XML comment has the following syntax −
<!--Your comment-->
A comment starts with <!-- and ends with -->.
You can add textual notes as comments between the characters.
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>Mahesh</firstname>
<lastname>Babu</lastname>
<email>kmaheshcse@gmail.com</email>
</employee>
employee.dtd
<!ELEMENT employee (firstname,lastname,email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!DOCTYPE employee : It defines that the root element of
the document is employee.
<!ELEMENT employee: It defines that the employee
element contains 3 elements "firstname, lastname and
email".
<!ELEMENT firstname: It defines that the firstname
element is #PCDATA typed. (parse-able data type).
<!ELEMENT lastname: It defines that the lastname element
is #PCDATA typed. (parse-able data type).
<!ELEMENT email: It defines that the email element is
#PCDATA typed. (parse-able data type).
Types of DTD
note.dtd
<?xml version="1.0"?>
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
When to Use a DTD?
• With a DTD, independent groups of people can agree to use a
standard DTD for interchanging data.
• With a DTD, you can verify that the data you receive from the
outside world is valid.
•You can also use a DTD to verify your own data.
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
The Schema above is interpreted like this:
<p id="demo"></p>
<script>
var parser, xmlDoc;
var text = "<bookstore><book>" +
"<title>Everyday Italian</title>" +
"<author>Giada De Laurentiis</author>" +
"<year>2005</year>" +
"</book></bookstore>";
document.getElementById("demo").innerHTML =
xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue;
</script>
</body>
</html>
Types of XML Parsers
• These are the two main types of XML Parsers:
– DOM
– SAX
XML DOM
XML DOM
• The W3C Document Object Model (DOM) is a platform
and language-neutral interface that allows programs and
scripts to dynamically access and update the content,
structure, and style of a document.
• A DOM Parser creates an internal structure in memory
which is a DOM document object and the client
applications get information of the original XML
document by invoking methods on this document object.
• The XML DOM defines a standard way for accessing and
manipulating XML documents. It presents an XML
document as a tree-structure.
• The XML DOM is a standard for how to get, change, add,
and delete XML elements.
XML DOM
Advantages
• It supports both read and write operations and the
API is very simple to use.
• It is preferred when random access to widely
separated parts of a document is required.
Disadvantages
• It is memory inefficient. (consumes more memory
because the whole XML document needs to loaded
into memory).
• It is comparatively slower than other parsers.
<!DOCTYPE html> Ex:1
<html>
<body>
<p id="demo"></p>
<script>
var parser, xmlDoc;
var text = "<bookstore><book>" +
"<title>Everyday Italian</title>" +
"<author>Giada De Laurentiis</author>" +
"<year>2005</year>" +
"</book></bookstore>";
document.getElementById("demo").innerHTML =
xmlDoc.getElementsByTagName("title")[0].childNodes[0].nodeValue;
</script>
</body>
</html>
<?xml version="1.0" encoding="ISO-8859-1"?> Ex:2
<note> note.xml
<to>kmahesh@gmail.com</to>
<from>recw@gmail.com</from>
<body>Hello XML DOM</body>
</note>
<!DOCTYPE html> Ex:2
<html>
<body>
xmldom.html
<h1>Important Note</h1>
<div>
<b>To:</b> <span id="to"></span><br>
<b>From:</b> <span id="from"></span><br>
<b>Message:</b> <span id="message"></span>
</div>
<script>
if (window.XMLHttpRequest)
{// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
}
else
{// code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.open("GET","note.xml",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;
document.getElementById("to").innerHTML=
xmlDoc.getElementsByTagName("to")[0].childNodes[0].nodeValue;
document.getElementById("from").innerHTML=
xmlDoc.getElementsByTagName("from")[0].childNodes[0].nodeValue;
document.getElementById("message").innerHTML=
xmlDoc.getElementsByTagName("body")[0].childNodes[0].nodeValue;
</script>
</body>
</html>
SAX (Simple API for XML)
• A SAX Parser implements SAX API. This API is an
event based API and less intuitive.
• It does not create any internal structure.
• Clients does not know what methods to call, they
just overrides the methods of the API and place
his own code inside method.
• It is an event based parser, it works like an event
handler in Java.
SAX (Simple API for XML)
Advantages
• 1) It is simple and memory efficient.
• 2) It is very fast and works for huge documents.
Disadvantages
• 1) It is event-based so its API is less intuitive.
• 2) Clients never know the full information because
the data is broken into pieces.
XSL and XSLT Transformation
• XSL (eXtensible Stylesheet Language) is a styling
language for XML.
• XSLT stands for XSL Transformations.
• XSLT is a language for transforming XML
documents.
• XPath is a language for navigating in XML
documents.
• XQuery is a language for querying XML documents.
• XSL = Style Sheets for XML
• XSL describes how the XML elements should be
displayed.
XSLT (XSL Transformation)
• XSLT is the most important part of XSL.
• XSLT is used to transform an XML document into another
XML document, or another type of document that is
recognized by a browser, like HTML and XHTML.
• Normally XSLT does this by transforming each XML element
into an (X)HTML element.
• With XSLT you can add/remove elements and attributes to
or from the output file. You can also rearrange and sort
elements, perform tests and make decisions about which
elements to hide and display, and a lot more.
• A common way to describe the transformation process is
to say that XSLT transforms an XML source-tree into an
XML result-tree.
XSLT Uses XPath
• XSLT uses XPath to find information in an XML document.
• XPath is used to navigate through elements and
attributes in XML documents.
• In the transformation process, XSLT uses XPath to define
parts of the source document that should match one or
more predefined templates.
• When a match is found, XSLT will transform the matching
part of the source document into the result document.
• All major browsers support XSLT and XPath.
• XSLT became a
W3C Recommendation 16. November 1999.
XSLT - Transformation
• How to transform XML into XHTML using XSLT?
Correct Style Sheet Declaration
• The root element that declares the document to be an
XSL style sheet is <xsl:stylesheet> or <xsl:transform>.
• <xsl:stylesheet> and <xsl:transform> are completely
synonymous and either can be used!
• create an XSL Style Sheet as "cdcatalog.xsl"
or
XSLT - Transformation
• Link the XSL Style Sheet to the XML Document
• If you have an XSLT compliant browser it will
nicely transform your XML into XHTML.
CDCatalog.xml CDCatalogXSLT.xsl
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
<?xml-stylesheet type="text/xsl"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
href="CDCatalogXSLT.xsl"?>
<xsl:template match="/">
<catalog> <html>
<cd> <body>
<title>Empire Burlesque</title> <h2>My CD Collection</h2>
<artist>Bob Dylan</artist> <table border="1">
<country>USA</country> <tr bgcolor="#9acd32">
<th>Title</th>
<company>Columbia</company> <th>Artist</th>
<price>10.90</price> </tr>
<year>1985</year> <xsl:for-each select="catalog/cd">
</cd> <tr>
. <td><xsl:value-of select="title"/></td>
. <td><xsl:value-of select="artist"/></td>
</catalog> </tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
XSLT <xsl:template> Element
• An XSL style sheet consists of one or more set of rules
that are called templates.
• A template contains rules to apply when a specified
node is matched.
• The <xsl:template> element is used to build templates.
<xsl:template match="/">
• The match attribute is used to associate a template
with an XML element. The match attribute can also be
used to define a template for the entire XML
document. The value of the match attribute is an
XPath expression (i.e. match="/" defines the whole
document).
<xsl:stylesheet>
• <xsl:stylesheet>, defines that this document is an
XSLT style sheet document (along with the version
number and XSLT namespace attributes).
• An XML namespace is a collection of names
that can be used as element or attribute names in
an XML document. The namespace qualifies element
names uniquely on the Web in order to avoid conflicts
between elements with the same name.
<xsl:stylesheet version="1.0“
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
XSLT <xsl:value-of> Element
• The <xsl:value-of> element is used to extract the
value of a selected node.
• The <xsl:value-of> element can be used to extract
the value of an XML element and add it to the
output stream of the transformation:
<td><xsl:value-of select="catalog/cd/title"/></td>
XSLT <xsl:for-each> Element
• The <xsl:for-each> element allows you to do
looping in XSLT.
• The XSL <xsl:for-each> element can be used to
select every XML element of a specified node-set:
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
XSLT <xsl:sort> Element
• The <xsl:sort> element is used to sort the output.
<xsl:for-each select="catalog/cd">
<xsl:sort select="artist"/>
<xsl:if> Element
• To put a conditional if test against the content of the
XML file, add an <xsl:if> element to the XSL
document.
<xsl:if test="expression">
...some output if the expression is true...
</xsl:if>
<xsl:for-each select="catalog/cd">
<xsl:if test="price > 10">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
<td><xsl:value-of select="price"/></td>
</tr>
</xsl:if>
<xsl:choose> Element
• The <xsl:choose> element is used in conjunction
with <xsl:when> and <xsl:otherwise> to express
multiple conditional tests.
<xsl:choose>
<xsl:when test="expression">
... some output ...
</xsl:when>
<xsl:otherwise>
... some output ....
</xsl:otherwise>
</xsl:choose>
<xsl:apply-templates> Element
• The <xsl:apply-templates> element applies a
template rule to the current element or to the
current element's child nodes.
• If we add a "select" attribute to the <xsl:apply-
templates> element, it will process only the child
elements that matches the value of the attribute.
• We can use the "select" attribute to specify in
which order the child nodes are to be processed.
<xsl:apply-templates/>
PHP Code: Transform XML to XHTML on the Server
<?php
// Load XML file
$xml = new DOMDocument;
$xml->load('cdcatalog.xml');
-Optional-
• uri
• email
Note: Not all required and optional elements or attributes are shown!
<?xml version="1.0"?>
Atom Sample
<feed xmlns="http://www.w3.org/2005/Atom">
<link rel="self" href="http://example.org/blog/index.atom"/>
<id>http://example.org/blog/index.atom</id>
<icon>../favicon.ico</icon>
<title>An Atom Sampler</title>
<subtitle>No Splitting</subtitle>
<author>
<name>Ernie Rutherford </name>
<email>ernie@example.org</email>
<uri>.</uri>
</author>
<updated>2006-10-25T03:38:08-04:00</updated>
<link href="."/>
<entry>
<id>tag:example.org,2004:2417</id>
<link href="2006/10/23/moonshine"/>
<title>Moonshine</title>
<content type="text">
Anyone who expects a source of power from the transformation of the atom is talking moonshine.
</content>
<published>2006-10-23T15:33:00-04:00</published>
<updated>2006-10-23T15:47:31-04:00</updated>
</entry>
<entry>
<id>>tag:example.org,2004:2416</id>
<link href="2006/10/21/think"/>
<title type="html"><strong>Think!</strong></title>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>We haven't got the money, so we've got to think!</p>
</div>
</content>
<updated>2006-10-21T06:02:39-04:00</updated>
</entry>
</feed>
END