Module 5 XML Notes
Module 5 XML Notes
JSON(Basics Only): Overview, Syntax, Data types, Objects, Schema, Comparison with XML.
1. INTRODUCTION
SGML is a meta-markup language is a language for defining markup languages. Developed in the early
1980s; In 1986 SGML was approved by ISO standard. HTML was developed using SGML in the early
1990s . . The first XML standard, 1.0, was published in February 1998. The second, 1.1, was published in
2004.We use XML 1.0 because this newer version is not yet widely supported.
Motivations for the development of XML is because of the problems associated with HTML:
Problems with HTML:
1. HTML is defined to describe the general form and layout of information in web documents without
considering its meaning of the information.
To describe a particular kind of information, it would be necessary to have tags indicating the
meaning of the element’s content. That would allow the processing of specific categories of information in
a document. For example, if the price of a used car is stored as the content of an element named price, an
application could find all cars in the document that cost less than $20,000.
2. HTML defines fixed set of tags and attributes. Given tags must fit every kind of document.
3. There are no restrictions on arrangement or order of tag appearance in HTML document. For example, an
opening tag can appear in the content of an element, but its corresponding closing tag can appear after the
end of the element in which it is nested.
Eg : <strong> Now <em> is </strong> the time </em>
One solution addresses the deficiencies of HTML is to allow for group of users with common needs to
define their own tags and attributes and then use the SGML standard to define a new markup language to
meet those needs. Each application area would have its own markup language.
Problem with using SGML:
1. It‘s too large and complex to use and it is very difficult to build a parser for it. SGML includes a large
number of capabilities that are only rarely used.
2. A program capable of parsing SGML documents would be very large and costly to develop.
3. SGML requires that a formal definition be provided with each new markup language. So having area-
specific markup language is a good idea, basing them on SGML is not.
A better solution: Define a simplified version of SGML and allow users to define their own markup
languages based on it. XML was designed to be that simplified version of SGML.
XML is not a replacement for HTML . Infact two have different goals . HTML is a markup language used
to describe the layout of any kind of information and provide some guidance as to how it should be
displayed. XML is a meta-markup language that provides framework for defining specialized markup
languages .
HTML Syntax
<html>
<head><title>name</title></head>……
XML Syntax
<name>
<first> nandini </first>
<last> sidnal </last>
</name>
XML is far more than a solution to the deficiencies of HTML:
Advantages of XML or Why XML is called universal data interchange language ?
It provides a simple and universal way of storing any textual data. Data stored in XML documents
can be electronically distributed and processed by any number of different applications. These applications
are relatively easy to write because of the standard way in which the data is stored. Therefore, XML is a
universal data interchange language.
XM L BASICS
- XML is not a markup language. It is meta markup language that specifies rules for creating markup
languages.When designing markup language using XML, the designer must define a collection of tags that
are useful in the intended area.
- XML tag and its content, together with closing tag is called element .
- XML has no hidden specifications .Therefore XML documents are plain text which is easily readable by
both people and application program.
- XML based markup language is called tag set.
- Document that uses XML based markup language is called XML document .
- There are many XML oriented text editors that assist with the creation and maintenance of XML
documents. Eg:Altova XMLspy, XMLFox etc.
- An XML application is a program that processes information stored in an xml document.
- An XML processor is a program that parses XML documents and provides the parts to an application .
- Application programs that process the data in XML documents must analyze the documents before they
gain access to the data. This analysis is performed by an XML processor, which has several tasks, one of
which is to parse XML documents, a process that isolates the constituent parts (such as tags, attributes, and
data strings) and provides them to an application.
- All contemporary browsers support XML. Both IE7 and FX2 support basic XML .
As XML is a proper subset of SGML, all XML documents are valid SGML documents .But not all
SGML documents are valid XML document.
XML can be considered as SGML-Lite: 20% of SGML's complexity, 80% of its capacity.
XML is a lightweight cut-down version of SGML i.e. XML uses only the most commonly-used
SGML features.
SGML Features
The term SGML stands for Standard Generalized Markup Language
It is a system for defining the markup language.
SGML is a meta language .It facilitates the creation of other languages.
SGML is extensible .
It is widely used to manage large document that are subject to frequent revisions and need to be print
in different format.
All XML document must begin with XML declaration. It identifies the document as being
XML and provides the version no. of the XML standard being used. It will also include
encoding standard.
Comments in XML are same as HTML
<!-- This is a comment -->
XML names must begin with a letter or underscore and can include digits, hyphens, and
periods.
XML names are case sensitive. , the tag <Letter> is different from the tag <letter>.
There is no length limitation for names.
Space is Preserved in XML
HTML truncates multiple white-space characters to one single white-space: With XML, the white-
space in a document is not truncated.
In XML closing tags are necessary.
In XML, all elements must be properly nested within each other:
XML Documents Must Have a Root Element .It contain one element that is the parent of all
other elements. This element is called the root element.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
XML tags can have attributes, which are specified with name/value assignments. XML Attribute
Values must be enclosed with single or double quotation marks. XML document that strictly
adheres to these syntax rule is considered as well formed.
Example :
<?xml version = “1.0” encoding =” utf-8”? >
<ad>
<year>1960</year>
<make>Cessna</make>
<model>Centurian</model>
<color>Yellow with white trim</color>
<location>
<city>Gulfport</city>
<state>Mississippi</state>
</location>
</ad>
None of this tag in the document is defined in HTML, all are designed for the specific content of the
document.
When designing an XML document, the designer is often faced with the choice between adding a
new attribute to an element or defining a nested element.
- In some cases there is no choices.
- In other cases, it may not matter whether an attribute or a nested element is used.
- Nested tags are used
<!-- A tag with one attribute -->
<patient name = "Maggie Dee Magpie">
...
</patient>
-------------------------------------------------------------
<!-- A tag with one nested tag -->
<patient>
<name> Maggie Dee Magpie </name>
...
</patient>
-------------------------------------------------------------
<!-- A tag with one nested tag, which contains three nested tags -->
<patient>
<name>
<first> Maggie </first>
<middle> Dee </middle>
<last> Magpie </last>
</name>
...
</patient>
--------------------------------------------------------------
Here third one is a better choice because it provides easy access to all of the parts of data.
WELL FORMED AND VALID XML DOCUMENTS
Well-formed documents applies basic xml rules on all its Documents
Valid documents are well-formed and also specified by DTDs or xml schemas
DTDs or xml schemas specify the set of tags and attributes that can appear in a particular
document/documents and also the order in which they appear.
Xml with correct syntax is "well formed" xml.
Xml validated against a dtd/schema is "valid" xml.
A "valid" xml document is a "well formed" xml document, which also conforms to/ suitable with
the rules of a document type definition (dtd)/schema:
Xml dtd
Defines the legal building blocks of an xml document
Can be inline in xml or as an external reference .
Xml schema
An xml based alternative to dtd, more powerful and support namespace and data types.
Tag names are case sensitive. XML supports two types of elements, closed and empty (open)
elements. Closed elements consist of both opening(start) and closing(ending) tags. The following example
presents a closed element. In the closing tag, a forward slash precedes the element name.
<Month>January</Month>
Elements can be nested, and all elements must be nested within a single root element. Elements must be
nested correctly, with child elements enclosed within their parent opening and closing element tags, as
follows:
<Year>
<Month>January</Month>
<Month>February</Month>
</Year>
Empty (open) elements contain no content. An empty or open element can be used to mark sections of the
document for the processor. .Empty element can have attributes.<hello happy=”TRUE”/> is valid. An
empty element has the following syntax; the element name is followed by a slash. Eg: <Year/>. Empty
element can also have matching start and end tags as given below.<hello></hello>.
Tag TagMeaning
3.ATTRIBUTES
Attributes provide extra information about elements. Attributes are placed inside the start tag of an
element. Attributes come in name/value pairs. <fruit type=”apple”>: type attribute have value “apple. The
attribute type provide additional information about fruit .The name of the element is “fruit". The name of the
attribute is “type".
4) <animal leg=”4” The leg attribute has the value 4,blood attribute has the
value . White space within start tag are ignored by the
blood=”cold”>
parser.
4.PCDATA
PCDATA means parsed character data. Think of character data as the text found between the start tag and
the end tag of an XML element. PCDATA is text that will be parsed by a parser. Tags inside the text will be
treated as markup and entities will be expanded.
5.CDATA
CDATA also means character data. This section is used to shield a body of text from the attentions of the
XML processor. CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be
treated as markup and entities will not be expanded that is used to insert an extra space in an HTML
document.Entities are expanded when a document is parsed by an XML parser.
syntax.
<![CDATA[content]]>
Example
<Document>
<![CDATA[ if a<b and b<c then a<c]] >
</Document>
6.ENTITIES
An XML document consist of one or more entities that are logically related collection of information, .
Entities range from a single special character to a book chapter. An XML document has one document
entity. The document entity can be the entire document , but in many cases it includes references to the
names of entities that are stored elsewhere. For eg: the document entity for a technical article might contain
beginning and ending material but have references to article body sections, which are entities stored in
separate files..All other entities except the document entity must have a name.
Reasons to break a document into multiple entities:
1. Good to define a Large documents as a smaller no. of parts to make it more easier to manage .
2. If the same data appears in more than one place in the document, defining it as an entity allows any no. of
references to a single copy of data.
3. Many documents include information that cannot be represented as text, such as images. Such information
units are usually stored as binary data. Binary entities can only be referenced in the document entities
because XML documents cannot include binary data.
Rules of Entity names:
• No length limitation
• Must begin with a letter, a dash, or a colon
• Can include letters, digits, periods, dashes, underscores, or colons.
• A reference to an entity (Entity Reference) has the form, name with prepended ampersand and
appended semicolon: &entity_name; Eg. &apple_image;
When the processor parses the document it will replace the entity reference with actual characters.
One of the common uses of entities is to allow characters that are normally used as markup
delimiters to appear as themselves in a document. Because of this XML includes the entities that are
predefined for HTML.
< <
> >
& &
' ‘
" “
Regardless of the entity types, all entities are referenced in the same way: &name; This code will
include the simple entity
within a sentence :
If several predefined entities must appear near each other in a document, it is better to avoid using
entity references. Character data section can be used. The content of a character data section is not
parsed by the XML parser, so it can include any tags.
The form of a character data section is as follows: <![CDATA[content]]> // no tags can be used
since it is not parsed For example, instead of Start >>>> HERE <<<<
use <![CDATA[Start >>>> HERE <<<<]]>
The opening keyword of a character data section is not just CDATA, it is in effect [CDATA[. There
can be any spaces between [ and C or between A and [.
Because the content of Character data section is not parsed by the XML,any entity references that are
included are not expanded.For Eg: the content of the line
<![CDATA[The form of a tag is <tag name>]]>
is as follows .
PI’s are defined as markup that provides information to be used by s/w application. PI’s begins with “<?”
and ends with “?>” pair.XML itself make use of processing instruction in what is known as XML
declaration. The simplest form of PI which should head up the entire XML document is<?xml
version=”1.0” encoding=”utf-8”?>
1.A Document Type Declaration is a statement embedded in an XML document whose purpose is to
acknowledge the existence and location of Document Type Definition(DTD).
2.Document Type Declaration is a statement that points to the Document Type Definition(DTD) .
3.Document Type Definition is a set of rules that defines the structure of an XML document where as a
Document Type Declaration is a statement that tells the parser which DTD to use for checking and
validation.
4.All Document Type Declaration starts with a string “<!DOCTYPE ”
5.The Document Type Declaration can be external or internal
6.If external the DTD must be specified either as SYSTEM or PUBLIC.
7.If PUBLIC the DTD can be used by anyone by referring the URL.
8.If SYSTEM that means it resides on local harddisk and may not be available for use by other
application. For example suppose there is an XML document called myfile.xml that we want to parse
and validate against a DTD called my-rules.dtd .
• The DTD for a document can be
- internal (embedded in XML document) or
- external (separate file)- can be used with more than one document.
• DTD declarations have the form: <!keyword … >
• There are four possible declaration keywords:
• ELEMENT, ATTLIST, ENTITY, and NOTATION
• ELEMENT- used to define tags
• ATTLIST – used to define tag attributes
• ENTITY- used to define entities
• NOTATION – used to define datatype notations
Declaring attributes
An attribute declaration must include the name of the element to which the attribute belongs, the attribute
name and its type.
Syntax
<!DOCTYPE mail [
<!ELEMENT mail(to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<mail>
<to>Rani</to>
<from>Ravi</from>
<heading>Remainder</heading>
</mail>
EXTERNAL DTD
• The DTD section can be placed in an external file .
• If the Document Type Declaration is external then the DTD must be specified either as SYSTEM or
PUBLIC in the Document Type Declaration.
• If the DTD’s are external then the syntax is
<!DOCTYPE root-element SYSTEM “DTD_file_URL”>
or
<!DOCTYPE root-element PUBLIC “public_id” “DTD_file_URL” >
• If SYSTEM the DTD resides on the local hard disk and may not be available for use by other
applications. If PUBLIC the DTD can be used by anyone by referring the URL . The public_id is
unique for standard DTD files on the web.
• //mail.xml
<mail>
<to>Rani</to>
<from>Ravi</from>
<heading>Remainder
</heading>
</mail>
//mail.dtd
<?xml version=”1.0”?>
<!ELEMENT mail(to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
4. XML NAMESPACES
Xml namespaces provide a method to avoid element name conflicts.
In xml, element names are defined by the developer. This often results in a conflict when trying to
mix xml documents from different xml applications.
An example of this situation is having a <table> tag for a category of furniture and a <table>tag from
HTML for information tables.
This xml carries html table information:
<table>
<tr>
<td>apples</td>
<td>bananas</td>
</tr>
</table>
This xml carries information about a table (a piece of furniture):
<table>
<name>african coffee table</name>
<width>80</width>
<length>120</length>
</table>
If these xml fragments were added together, there would be a name conflict. Both contain a <table> element,
but the elements have different content and meaning. An xml parser will not know how to handle these
differences.
An XML namespace is a collection of element and attribute names used in XML documents. The
name of the namespace usually has the form of a URL. But the XML processors never reference the site
whose URL is used as the name of a namespace. The purpose of the URL is to give the namespace a unique
name. The namespaces can be declared as the value of xmlns in the elements where they are used or in the
xml root element.
SOLVING THE NAME CONFLICT USING A PREFIX
Name conflicts in xml can easily be avoided using a name prefix.
This xml carries information about an html table, and a piece of furniture:
<h:table>
<h:tr>
<h:td>apples</h:td>
<h:td>bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>african coffee table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
In the example above, there will be no conflict because the two <table> elements have different
names.
XML NAMESPACES - the xmlns ATTRIBUTE
The namespace declaration has the following syntax:
<element_name xmlns[:prefix] ="URL>"
• The square bracket indicates that what is within them is optional. prefix[optional] specify name to be
attached to names in the declared namespace .
Two reasons for prefix :
1. Shorthand for URL is too long to be typed on every occurrence of every name from the namespace.
2. URL may includes characters that are illegal in XML
Namespaces can be declared in the elements where they are used or in the xml root element:
<root xmlns:h="http://www.w3.org/tr/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>apples</h:td>
<h:td>bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>africancoffeetable</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
OR
<root>
<h:table=xmlns:="http://www.w3.org/tr/html4/">
<h:tr>
<h:td>apples</h:td>
<h:td>bananas</h:td>
</h:tr>
</h:table>
<f:table=xmlns:f="http://www.abc.com/furniture">
<f:name>africancoffeetable</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
DEFAULT NAMESPACES
Defining a default namespace for an element saves us from using prefixes in all the child elements. It has the
following syntax:
xmlns="namespaceURL"
The next example declares two namespaces. The first is declared to be the default namespace; the second
defines the prefix, cap: So we can avoid the use of prefix in the first section.
5. XML SCHEMAS
An XML schema is an XML document so it can be parsed with an XML parser.
5.1Schema Fundamentals:
• Schema are related idea of class and an object in an OOP language. A Schema is similar to a class
definition ;an XML document that conforms to a specific schema are considered instances of that schema or
object of the schema’s class.
• Schemas have three primary purposes
- Specify elements and attributes of an XML language
- Specify the structure of its instance XML documents including where and how often the elements
may appear.
- Specify the data type of every element in its instance XML documents .
Disadvantages of DTD compared to XML schema
• Cannot be parsed by an XML parser
• All declarations in DTD are global (impossible to define two elements with same name.)
• DTD cannot control what kind of information a given element or attribute can contain.
5.2 Defining a schema:
Schemas are written from a namespace(schema of schemas): The name of this namespace is
http://www.w3.org/2001/XMLSchema
element, schema, sequence and string are some names from this namespace Every XML
schema has a single root, schema.
• The schema element must specify the namespace for the schema of schemas from which the schema‘s
elements and its attributes will be drawn. It often specifies a prefix that will be used for the names in the
schema. This name space specification appears as
xmlns:xsd = http://www.w3.org/2001/XMLSchema
Every XML schema itself defines a tag set like DTD, which must be named with the targetNamespace
attribute of schema element. The target namespace is specified by assigining a name space to the target
namespace attribute as the following:
targetNamespace = http://cs.uccs.edu/planeSchema
Every top-level element places its name in the target namespace If we want to include nested elements(ie if
the names of the elements and attributes that are not defined directly in the schema element), we must set the
elementFormDefault attribute to qualified.
elementFormDefault = “qualified”
The default namespace which is source of the unprefixed names in the schema is given
with another xmlns specification without the prefix.
xmlns = "http://cs.uccs.edu/planeSchema “
A complete example of opening tag for a schema element:
<xsd:schema
<!-- Namespace for the schema itself -->
xmlns:xsd = http://www.w3.org/2001/XMLSchema <!-- Namespace where
elements defined here will be placed -->
targetNamespace = http://cs.uccs.edu/planeSchema
<!-- Default namespace for this document -->
xmlns = http://cs.uccs.edu/planeSchema.
<!-- Specify non-top-level elements to be in the target namespace-->
elementFormDefault = "qualified” >
One alternative to the preceding opening tag would be to make the XMLSchema names the default so that
they do not need to be prefixed in the schema. Then the names in the target namespace would need to be
prefixed. The following schema tag illustrates this approach:
• XML Schema defines 44 data types .19 of which are primitive and 25 of which are derived.
• Primitive datatype: string, Boolean, float, time …
• Predefined Derived datatype: Examples byte, decimal, positiveInteger,.
• User-defined derived data types – These datatypes are defined by specifying restrictions on an existing
type,which is then called a basetype. Such user defined types are derived types. Constraints on derived types
are given in terms of facets of the base type .Ex: interget data type has 8 possible facets : totalDigits,
maxInclusive,,maxExclusive etc.
- Both simple and complex types can be either named or anonymous .
DTDs define global elements (context of reference is irrelevant). But context of reference is essential in
XML schema
Data declarations in an XML schema can be
1. Local ,which appears inside an element that is a child of schema
2. Global, which appears as a child of schema
5.5. Defining a simple type:
• Elements are defined in an XML schema with the element tag, which is from the XMLSchema
namespace. Recall that the prefix xsd is normally used for names from this namespace. Use the element
tag and set the name and type attributes
<xsd:element name = "bird” type = "xsd:string” />
The instance could be :
<bird> Yellow-bellied sap sucker </bird>
• An element can be given default value using default attribute .Default value is automatically
assigned to the element when no other value is specified.
<xsd:element name = "bird” type = "xsd:string” default=”Eagle” />
• An element can have constant value, using fixed attribute .A fixed value is automatically assigned
to the element , and we cannot specify another value.
<xsd:element name = "bird” type = "xsd:string” fixed=”Eagle” />
Declaring User-Defined Types:
• User-Define type is described in a simpleType element, using facets . facets must be specified in the
content of restriction element. Restrictons on XML elements are called facets. Restrictions are
used to define acceptable values for XML elements or attributes. Facets values are specified with the
value attribute.
For example, the following declares a user-defined type , firstName
<xsd:simpleType name = “firstName" >
<xsd:restriction base = "xsd:string" >
<xsd:maxLength value = "20" />
</xsd:restriction>
</xsd:simpleType>
The length facet is used to restrict the string to an exact number of characters. The minLength facet is
used to specify a minimum length.
<xsd:simpleType name = “phoneNumber" >
<xsd:restriction base = "xsd:decimal" >
<xsd:precision value = “10" />
</xsd:restriction>
</xsd:simpleType>
Declaring Complex Types:
• There are several categories of complex types, but we discuss just one, element-only elements
which can have elements in their content, but no text. Element-only elements are defined with the
complexType tag.
• The sequence tag is used to contain an ordered group of elements.
• Use the all tag if the order is not important .
• Nested elements can include attributes that give the allowed number of occurrences
(minOccurs, maxOccurs, unbounded)
• For ex:
<xsd:complexType name = "sports_car" >
<xsd:sequence>
<xsd:element name = "make” type = "xsd:string" />
<xsd:element name = "model” type = "xsd:string" />
<xsd:element name = "engine” type = "xsd:string" />
<xsd:element name = "year” type = "xsd:decimal" />
</xsd:sequence>
</xsd:complexType>
Complete example of schema
Example 1:
Schema definition document
Example 2
<?xml version="1.0"?>
<mail>
<to>Rani</to>
<from>Ravi</from>
<heading>Remainder</heading>
<body>
<title>Dear Sister</title>
<wish>Good Morning to you</wish>
<msg>Please dont forget about our parents Wedding Anniversary today.Be there in
time</msg>
</body>
</mail>
Raw xml document mail.xml
Some of the elements in the display are preceded by dashes. These elements can be elided (temporarily
suppressed) by placing the mouse cursor over the dash and clicking the left mouse button. For example, if
the mouse cursor is placed over the dash to the left of the first<ad> tag and the left mouse button is clicked,
the result is as shown below.
A CSS style sheet for an XML document is just a list of its tags and associated styles. The
connection of an XML document and its style sheet is made through an xmlstylesheet processing
instruction .Display is used to specify whether an element is to be displayed inline or in a separate block.
<?xml-stylesheet type = "text/css” href = “planes.css"?>
For example: planes.css
planes.dtd document
<!ELEMENT planes_for_sale (ad+)>
<!ELEMENT ad (year, make, model, color, description, price?, seller, location)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT make (#PCDATA)>
<!ELEMENT model (#PCDATA)>
<!ELEMENT color (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT seller (#PCDATA)>
<!ELEMENT location (city, state)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ATTLIST seller phone CDATA #REQUIRED>
<!ATTLIST seller email CDATA #IMPLIED>
<!ENTITY c "Cessna">
<!ENTITY p "Piper">
<!ENTITY b "Beechcraft">
Planes.xml
<?xml version = "1.0" encoding = "utf-8"?>
<?xml version="1.0"?>
<PCS>
<PC>
<NAME>Zenith</NAME>
<CAPACITY>100</CAPACITY>
<PRICE>20000</PRICE>
</PC>
<PC>
<NAME>DELL</NAME>
<CAPACITY>200</CAPACITY>
<PRICE>40000</PRICE>
</PC>
<PC>
<NAME>Acer</NAME>
<CAPACITY>300</CAPACITY>
<PRICE>35000</PRICE>
</PC>
</PCS>
//catalog1.html
<html>
<body>
<xml id="PCS" src="catalogs.xml">
</xml>
<table datasrc="#PCS" border="1" align="center">
<thead><th>Name</th><th>Capacity</th><th>Price</th></thead>
<tr><td><div datafld="NAME"></td>
<td><div datafld="CAPACITY"></td>
<td><div datafld="PRICE"></td>
</tr>
</table>
</body>
</html>
Example 2
Product.xml
<?xml version="1.0" ?>
<Products>
<Item>
<Name>Coffee Cup Warmer</Name>
<Number>0001</Number>
<Price>15.00</Price>
</Item>
<Item>
<Name>42 Cup Coffee Brewing System</Name>
<Number>0015</Number>
<Price>473.00</Price>
</Item></Products>
<TD><SPAN DATAFLD="Price"/></TD>
</TR>
</TABLE>
</BODY>
</HTML>
Example
Write an application to create an Address Book in XML and display the Address book in the browser using
HTML.
1) Create a valid XML document for Address Book with following elements.The address book stores 'N'
persons Name(fname,lname),Address(hname,city,state,country) and all the phone numbers(mob,off,res).
2) The sub elements are given in the bracket. Store the xml data in an HTML page and display the data in
HTML browser as HTML Tables. As we are storing the XML data in HTML page , there is only one
HTMLfilecalledaddressbook.html
3) Use XML tag for storing the content in an HTML page.
Xpath: identify parts of xml documents, such as specific elements that are in specific positions in
the document or element that have particular attribute values.
XSL-FO: is used to generate high quality printable documents in formats such as PDF etc.
XSLT describes how to transform XML documents into different form or formats such as HTML
,Plain text etc.
Overview of XSLT
XSLT processors take both an XML document and an XSLT document as input. The XSLT
document is the program to be executed; the XML document is the input data to the program. Parts of the
XML document are selected, possibly modified, and merged with parts of the XSLT document to form a
new document, which is sometimes called an XSL document. Note that the XSL document is also an XML
document, which could be again the input to an XSLT processor. The output document can be stored for
future use by applications, or it may be immediately displayed by an application, often a browser.
Neither the XSLT document nor the input XML document is changed by the XSLT processor.
Figure:XSLT processing
XSLT includes more than 50 formatting object(element) types and more than 230 attributes, so it is a
large and complex tag set.
In this section we assume that the XSLT processor process an XML document with its associated XSLT
style-sheet document and produces as its output and XSL document that is an HTML document to be
displayed.
An XML document that is to be used as data to an XSLT style sheet must include a processing
instruction to inform the XSLT processor that the style sheet is to be used. The form of this instruction is
as follows:
The following is a simple example of XML document that illustrate XSLT formatting:
Notice that this document specifies xslplane.xsl as its XSLT style sheet. An XSLT style sheet is an
XML document whose root element is the special-purpose element stylesheet. The
stylesheet tag defines namespaces as its attributes and encloses the collection of elements that
defines its transformations. It also identifies the document as an XSLT document. The namespace for all
XSLT elements is specified with a W3C URI. If the style sheet includes XHTML elements, the style
sheet tag also specifies the XHTML namespace. In the stylesheet tag
Notice that this tag specifies that the prefix for XSLT elements is xsl and the default namespace is that
for XHTML/HTML.
A style-sheet document must include at least one template element. The template opening tag
includes a match attribute to specify an XPath expression that selects a node in the XML document.
The content of a template element specifies what is to be placed in the output document. In many
XSLT documents, a template is included to match the root node of the XML document. This can be done
in two ways. One way is to use the XPath expression “/”, as in
The alternative to use ‘/’ is to use the actual root of the document. In the example xslplane.xml the
document root is plane.
If the output of the XSLT processor is an XHTML/HTML document, the template that matches the root
node is used to create the HTML header of the output document. The header code appears as the content
of the template element. An example of a complete template element is
Style sheets nearly always have templates for specific nodes of the XML document, which are descendants
of the root node, as in the following example:
Template elements are of two distinct kinds: those that literally contain content and those that specify
content to be copied from the associated XML document.
XSLT elements that represent HTML elements often are used to specify content. XSLT elements have the
appearance of their associated HTML elements, like the following HTML element:
In many cases, the content of an element of the XML document is to be copied to the output document.
This is done with the value-of element, which uses a select attribute to specify the element of the
XML document whose contents are to be copied. For example, the element
specifies that the content of the AUTHOR element of the XML document is to be copied to the output
document. Because the value-of element cannot have content, it is terminated with a slash and a right
angle bracket. The select attribute can specify any node of the XML document. This is an advantage of
XSLT formatting over CSS, in which the order of data as stored is the only possible order of display.
Example1:
The XSLT document, xslplane1.xsl, is more general and complex than necessary for the simple use for
which it was written. There is actually no need to include templates for all of the child nodes of plane,
because the select clause of the value-of element finds them. The following XSLT document, xslplane2.xsl,
produces the same output as xslplane1.xsl.
Example 2:
XML Applications
XML enables the users/organizations to create their own elements attributes and entities to describe the structure of
specific type of data they use.
Common Data Format (CDF) – for describing and storing scalar and multidimensional data
Mathematics Markup Language (MathML) – to integrate mathematical notation into a Web document
OML – allows a webpage designer to annotate their web pages so they can processed with intelligent agent software.
CML – CML was created to provide a way to describe molecular information & chemical equations within
a single language.
XML APPLICATIONS
• PUSH Technology
• Online Banking
• Software Distribution
• Web Automation
• Database Integration
• Scientific Publishing
JSON XML