XML Notes
XML Notes
XML Notes
XML Basics
XML stands for Extensible Markup Language and is a text-based markup language
derived from Standard Generalized Markup Language (SGML).
XML is a software- and hardware-independent tool for storing and transporting data.
XML tags identify the data and are used to store and organize the data, rather than
specifying how to display it like HTML tags, which are used to display the data. XML
is not going to replace HTML in the near future, but it introduces new possibilities by
adopting many successful features of HTML.
There are three important characteristics of XML that make it useful in a variety of
systems and solutions:
XML is extensible: XML allows you to create your own self-descriptive tags, or
language, that suits your application.
XML carries the data, does not present it: XML allows you to store the data
irrespective of how it will be presented.
XML Usage
XML can work behind the scene to simplify the creation of HTML documents for
large web sites.
XML can be used to exchange the information between organizations and
systems.
XML can be used for offloading and reloading of databases.
XML can be used to store and arrange the data, which can customize your data
handling needs.
XML can easily be merged with style sheets to create almost any desired
output.
Virtually, any type of data can be expressed as an XML document.
1
Lakireddy Bali Reddy College of Engineering (Autonomous)
XML Syntax:
<?xml version="1.0"?>
<contact_info>
<name>Rajesh</name>
<company>TCS</company>
<phone>9333332354</phone>
</contact_info>
You can notice there are two kinds of information in the above example:
The following diagram depicts the syntax rules to write different types of markup and
text in an XML document.
2
Lakireddy Bali Reddy College of Engineering (Autonomous)
XML Declaration
The XML document can optionally have an XML declaration. It is written as below:
Where version is the XML version and encoding specifies the character encoding used
in the document.
The XML declaration strictly needs be the first statement in the XML document.
An HTTP protocol can override the value of encoding that you put in the XML
declaration.
<element>
<element>....</element>
<element/>
Nesting of elements:
An XML-element can contain multiple XML-elements as its children, but the children
elements must not overlap. i.e., an end tag of an element must have the same name
as that of the most recent unmatched start tag.
3
Lakireddy Bali Reddy College of Engineering (Autonomous)
<?xml version="1.0"?>
<contact_info>
<company>TCS
<contact_info>
</company>
<?xml version="1.0"?>
<contact_info>
<company>TCS</company>
<contact_info>
Root element:
An XML document can have only one root element. For example, following is not a
correct XML document, because both the x and y elements occur at the top level
without a root element:
<x>...</x>
<y>...</y>
<root>
<x>...</x>
<y>...</y>
</root>
Case sensitivity:
The names of XML-elements are case-sensitive. That means the name of the start
and the end elements need to be exactly in the same case.
Attributes
An attribute specifies a single property for the element, using a name/value pair. An
XML-element can have one or more attributes. For example:
<a href="http://www.tutorialspoint.com/">Tutorialspoint!</a>
4
Lakireddy Bali Reddy College of Engineering (Autonomous)
Attribute names in XML (unlike HTML) are case sensitive. That is, HREF and href are
considered two different XML attributes.
Same attribute cannot have two values in a syntax. The following example shows
incorrect syntax because the attribute b is specified twice:
<a b="x" c="y" b="z">....</a>
Attribute names are defined without quotation marks, whereas attribute values must
always appear in quotation marks. Following example demonstrates incorrect xml
syntax:
<a b=x>....</a>
In the above syntax, the attribute value is not defined in quotation marks.
XML References
References usually allow you to add or include additional text or markup in an XML
document. References always begin with the symbol "&" ,which is a reserved
character and end with the symbol ";". XML has two types of references:
Entity References: An entity reference contains a name between the start and the
end delimiters. For example & where amp is name. The name refers to a
predefined string of text and/or markup.
XML Text
5
Lakireddy Bali Reddy College of Engineering (Autonomous)
6
Lakireddy Bali Reddy College of Engineering (Autonomous)
--------------------------------------------------------------------------
---------------------------------------------------------------
An XML tree starts at a root element and branches from the root to child elements.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
The terms parent, child, and sibling are used to describe the relationships between
elements.
Parents have children. Children have parents. Siblings are children on the same level
(brothers and sisters).
7
Lakireddy Bali Reddy College of Engineering (Autonomous)
XML Namespaces
Name Conflicts
In XML, element names are defined by the developer. This often results in a conflict when
trying to mix XML documents from different XML applications.
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
If these XML fragments were added together, there would be a name conflict. Both
contain a <table> element, but the elements have different content and meaning.
A user or an XML application will not know how to handle these differences.
This XML carries information about an HTML table, and a piece of furniture:
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
In the example above, there will be no conflict because the two <table> elements have
different names.
8
Lakireddy Bali Reddy College of Engineering (Autonomous)
When using prefixes in XML, a namespace for the prefix must be defined.
The namespace can be defined by an xmlns attribute in the start tag of an element.
-----------------------------------------------------
<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="https://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
--------------------------------------------------
The xmlns attribute in the first <table> element gives the h: prefix a qualified
namespace.
The xmlns attribute in the second <table> element gives the f: prefix a qualified
namespace.
When a namespace is defined for an element, all child elements with the same prefix are
associated with the same namespace.
9
Lakireddy Bali Reddy College of Engineering (Autonomous)
XML Validator
Example 1:
<?xml version=”1.0”?>
<book>
<title>Java</Title>
<author>James</book>
<pirce>570
</author>
The above XML document is not a well formed document. Reasons given below...
tags are not matching <title> … </Title>
There is no proper nesting <author>….</book>
Tag doesn’t closed <price>
Example 2:
<?xml version=”1.0”?>
<book>
<title>Java</title>
<author>James</author>
<price>500</price>
</book>
A "well formed" XML document is not the same as a "valid" XML document.
A "valid" XML document must be well formed. In addition, it must conform to a document
type definition.
10
Lakireddy Bali Reddy College of Engineering (Autonomous)
There are two different document type definitions that can be used with XML:
A document type definition defines the rules and the legal elements and attributes for an
XML document.
--------------------------------------------------------------------------
<!DOCTYPE book
[
<!ELEMENT book (title,author,price)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT price (#PCDATA)>
]>
-------------------------------------------------------------
!DOCTYPE book defines that the root element of the document is book
!ELEMENT book defines that the book element must contain the elements:
"title, author, price”
!ELEMENT title defines the title element to be of type "#PCDATA"
!ELEMENT author defines the author element to be of type "#PCDATA"
!ELEMENT price defines the price element to be of type "#PCDATA"
2) External DTD.
11
Lakireddy Bali Reddy College of Engineering (Autonomous)
<student>
<id>543</id>
<name>Ravi</name>
<age>21</age>
<addr>Guntur</addr>
<email>nsr@gmail.com</email>
<ph>9855555</ph>
<gender>male</gender>
</student>
2) External DTD.
If the above xml code follows the exact rules defined in DTD then we can conclude
that our xml document is a valid document. Otherwise it is an invalid document.
12
Lakireddy Bali Reddy College of Engineering (Autonomous)
With a DTD, independent groups of people can agree to use a standard DTD for
interchanging data.
With a DTD, you can verify that the data you receive from the outside world is valid.
You can also use a DTD to verify your own data.
XML Schema
An XML Schema describes the structure of an XML document, just like a DTD.
An XML document validated against an XML Schema is both "Well Formed" and
"Valid".
Syntax
You need to declare a schema in your XML document as follows:
<xs:schema>
<xs:element name="book">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
<xs:element name="price" type="xs:integer"/>
<xs:element name="edition" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
13
Lakireddy Bali Reddy College of Engineering (Autonomous)
Example:
---------------------------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.w3schools.com" elementFormDefault="qualified">
<xs:element name="student">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="age" type="xs:integer"/>
<xs:element name="addr">
<xs:complexType>
<xs:sequence>
<xs:element name="city" type="xs:string"/>
<xs:element name="pincode" type="xs:long"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="ph" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
-----------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------
If the above xml code follows the exact rules defined in “student.xsd” then we can
conclude that our xml document is a valid document. Otherwise it is an invalid
document.
14
Lakireddy Bali Reddy College of Engineering (Autonomous)
XSLT Introduction
HTML uses predefined tags. The meaning of, and how to display each tag is well
understood.
XML does not use predefined tags, and therefore the meaning of each tag is not well
understood.
A <table> element could indicate an HTML table, a piece of furniture, or something else -
and browsers do not know how to display it!
What is XSLT?
Example:
<book_store>
<book>
<title>JAVA</title>
<author>James</author>
</book>
<book>
<title>DBMS</title>
<author>Raghu</author>
</book>
</book_store>
Save the above code as “book.xml” and prepare the style sheet for this xml file.
15
Lakireddy Bali Reddy College of Engineering (Autonomous)
Now open “book.xml” file through any browser, observe the output as given below
Title Author
Java James
DBMS Raghu
The HTML DOM defines a standard way for accessing and manipulating HTML documents.
It presents an HTML document as a tree-structure.
The XML DOM defines a standard way for accessing and manipulating XML documents. It
presents an XML document as a tree-structure.
16
Lakireddy Bali Reddy College of Engineering (Autonomous)
Example:
<!DOCTYPE html>
<html>
<body>
<button type="button"
onclick="document.getElementById('demo').innerHTML = 'Hello World!'">Click Me!
</button>
</body>
</html>
In other words: The XML DOM is a standard for how to get, change, add, or delete
XML elements.
Programming Interface
The DOM models XML as a set of node objects. The nodes can be accessed with
JavaScript or other programming languages.
The programming interface to the DOM is defined by a set standard properties and
methods.
Methods are often referred to as something that is done (i.e. delete "book").
17
Lakireddy Bali Reddy College of Engineering (Autonomous)
DOM Example:
<html>
<body>
<script>
function myfun()
{
var text, parser, xmlDoc;
text = "<bookstore><book>" +
"<title>Java</title>" +
"<author>James</author>" +
"<year>1991</year>" +
"<book>" +
"<title>DBMS</title>" +
"<author>Raghu</author>" +
"<year>1970</year>" +
"</book>"+
"</book></bookstore>";
document.getElementById("demo").innerHTML =
xmlDoc.getElementsByTagName("year")[0].childNodes[0].nodeValue;
}
</script>
</body>
</html>
Output:
1991
Example Explained
18