Module 3 (Chapter 1)
Module 3 (Chapter 1)
Module-3
Chapter 1
Introduction to XML
3.1 Introduction
a) History
A meta-markup language is a language for defining markup languages.
The Standard Generalized Markup Language (SGML) is a meta-markup language for
defining markup languages that can describe a wide variety of document types.
In 1986, SGML was approved as an International Standards Organization (ISO)
standard.
In 1990, SGML was used as the basis for the development of HTML as the standard
markup language for Web documents.
In 1996, the World Wide Web Consortium (W3C) began work on XML, another
meta-markup language.
The first XML standard, 1.0, was published in February 1998. The second, 1.1, was
published in 2004
c) Solution:
group of users with common document needs to develop its own set of tags and
attributes and then use the SGML standard to define a new markup language to meet
those needs.
SGML includes a large number of capabilities that are only rarely used.
A program capable of parsing SGML documents would be very large and costly to
develop.
In addition, SGML requires that a formal definition be provided with each new
markup language.
An alternative solution to the problems of HTML is to define a simplified version of
SGML and allow users to define their own markup languages based on it.
XML was designed to be that simplified version of SGML.
d)Features of XML
XML is far more than a solution to the deficiencies of HTML
It provides a simple and universal way of storing any textual data.
Data stored in XML documents can be electronically distributed and processed by any
number of different applications
XML is a universal data interchange language
it is a meta-markup language that specifies rules for creating markup languages.
XML documents can be written by hand with a simple text editor.
Example:
<?xml version = "1.0" encoding = "utf-8"?>
<ad>
<year>1960</year>
<make> cessna </make>
<model> alto </model>
<color> yellow with white trim </color>
<location>
<city> Bangalore </city>
<state> Karnataka </state>
</location>
</ad>
<!—A tag with one nested tag which contains three nested tags--->
<patient>
<name>
<first> Magiee </first>
<middle> Dee </middle>
<name> Mapie </name>
i)Declaring Elements:
The element declarations of a DTD have a form that is related to that of the rules of
context-free grammars, also known as Backus–Naur form (BNF).
BNF is used to define the syntactic structure of programming languages.
A DTD describes the syntactic structure of a particular set of documents, so it is
natural for its rules to be similar to those of BNF.
Each element declaration in a DTD specifies the structure of one category of
elements.
The declaration provides the name of the element whose structure is being defined,
along with the specification of the structure of that element.
An element is a node in such a tree, either a leaf node or an internal node.
If the element is a leaf node, its syntactic description is its character pattern.
If the element is an internal node, its syntactic description is a list of its child
elements, each of which can be a leaf node or an internal node.
The form of an element declaration for elements that contain elements is as follows:
<!ELEMENT element_name (list of names of child elements)>
In many cases, it is necessary to specify the number of times that a child element may
appear. This can be done in a DTD declaration by adding a modifier to the child
element specification.
+ One or more occurrence
* Zero or more occurrences
? Zero or one occurrences
Consider the following DTD declaration:
<!ELEMENT person (parent+, age, spouse?, sibling*)>
In most cases, the content of an element is type PCDATA, for parsable character data.
Parsable character data is a string of any printable characters except “less than” (<),
“greater than” (>), and the ampersand (&). Two other content types can be specified:
EMPTY and ANY
<!ELEMENT element_name(#PCDATA)>
ii)Declaring attributes
The attributes of an element are declared separately from the element declaration in a
DTD.
An attribute declaration must include the name of the element to which the attribute
belongs, the attribute’s name, its type, and a default option.
The general form of an attribute declaration is as follows:
<!ATTLIST element_name attribute_name attribute type default_option>
If more than one attribute is declared for a given element, the declarations can be
combined, as in the following element
<!ATTLIST element_name
attribute_name_1 attribute type default_value_1
attribute_name_2 attribute type default_value_2
Example:
<!ATTLIST airplane places CDATA “4”>
<!ATTLIST airplane engine_type CDATA #REQUIRED>
<!ATTLIST airplane price CDATA #IMPLIED>
<!ATTLIST airplane manfacture CDATA #FIXED “Cessna”>
iii)Declaring Entities
Entities can be defined so that they can be referenced anywhere in the content of an
XML document, in which case they are called general entities. The predefined entities
are all general entities.
Entities can also be defined so that they can be referenced only in DTDs, in which
case they are called parameter entities.
The form of an entity declaration is
<!ENTITY [%] entity_name “entity_value”>
When the optional percent sign (%) is present in an entity declaration, it specifies that
the entity is a parameter entity rather than a general entity.
Consider the following example of an entity
<!ENTITY jfk “John Fitzgerald Kennedy”>
Any XML document that uses a DTD that includes this declaration can specify the
complete name with just the reference &jfk;.
When an entity is longer than a few words, such as a section of a technical article, its
text is defined outside the DTD. In such cases, the entity is called an external text
entity.
The form of the declaration of an external text entity is
<!ENTITY entity_name SYSTEM “file_location”>
Some XML parsers check documents that have DTDs in order to ensure that the
documents conform to the structure specified in the DTDs. These parsers are called
validating parsers.
</location>
</ad>
<ad>
<year>1980</year>
<make> &p; </make>
<model> cherokee </model>
<color> gold</color>
<pescription> Old point ,nearly old interior</pescription>
<seller phone ="555-222-333"
email ="jseller@axl.com"> John Seller </seller>
<location>
<city> Bangalore </city>
<state> Karnataka </state>
</location>
</ad>
</planes_for_sale>
3.5 Namespaces
It is often convenient to construct XML documents that use tag sets that are defined
for and used by other documents.
When a tag set is available and appropriate for a particular XML document or class of
documents, it is better to use it than to invent a new collection of element types.
For example, suppose you must define an XML markup language for a furniture
catalog with <chair>, <sofa>, and <table> tags. Suppose also that the catalog
document must include as well several different tables of specific furniture pieces,
wood types, finishes, and prices.
One problem with using different markup vocabularies in the same document is that
collisions between names that are defined in two or more of those tag sets could result
To deal with this problem, the W3C has developed a standard for XML namespaces
(at http://www.w3.org/TR/REC-xml-names)
A n XML namespace is a collection of element and attribute names used in XML
documents. The name of a namespace usually has the form of a uniform resource
identifier (URI).
A namespace for the elements and attributes of the hierarchy rooted at a particular
element is declared as the value of the attribute xmlns.
The form of a namespace declaration for an element is
<element_name xmlns[:prefix] = URI>
The square brackets indicate that what is within them is optional. The prefix, if
included, is the name that must be attached to the names in the declared namespace.
If the prefix is not included, the namespace is the default for the document.
A prefix is used for two reasons. First, most URIs are too long to be typed on every
occurrence of every name from the namespace. Second, a URI includes characters
that are invalid in XML. Note that the element for which a namespace is declared is
usually the root of a document.
For example, all XHTML documents in this book declare the xmlns namespace on the
root element, html:
<html xmlns = “http://www.w3.org/1999/xhtml”>
This declaration defines the default namespace for XHTML documents, which is
http://www.w3.org/1999/xhtml.
Xml Program
<?xml version = "1.0" encoding = "utf-8"?>
<!DOCTYPE plane_for_sale [
<!ELEMENT planes_for_sale(ad+)>
<!ELEMENT ad(year,make,model,color,pescription,price?,seller,location)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT make (#PCDATA)>
<!ELEMENT model(#PCDATA)>
<!ELEMENT color(#PCDATA)>
<!ELEMENT pescription(#PCDATA)>
<!ELEMENT price(#PCDATA)>
<!ELEMENT seller(#PCDATA)>
<!ELEMENT location(city,state)>
<!ELEMENT city(#PCDATA)>
<!ELEMENT state(#PCDATA)>
<!ATTLIST seller phone CDATA #REQUIRED>
<!ATTLIST seller email CDATA #IMPLIED>
<!ENTITY c "cessna">
<!ENTITY p "piper">]>
<planes_for_sale>
<ad>
<year>1960</year>
<make> &c; </make>
<model> alto </model>
<color> yellow with white trim </color>
<pescription> New point ,nearly new interior</pescription>
<price> 23,495 </price>
<seller phone ="555-222-333"> Sky way </seller>
<location>
<city> Bangalore </city>
<state> Karnataka </state>
</location>
</ad>
<ad>
<year>1980</year>
<make> &p; </make>
<model> cherokee </model>
<color> gold</color>
<pescription> Old point ,nearly old interior</pescription>
<seller phone ="555-222-333"
email ="jseller@axl.com"> John Seller </seller>
<location>
<city> Bangalore </city>
<state> Karnataka </state>
</location>
</ad>
</planes_for_sale>
a) Schema Fundamentals
Schemas can conveniently be related to the idea of a class and an object in an object-
oriented programming language.
A schema is similar to a class definition; an XML document that conforms to the
structure defined in the schema is similar to an object of the schema’s class.
Schemas have two primary purposes.
First, a schema specifies the structure of its instance XML documents, including
which elements and attributes may appear in the instance document, as well as where
and how often they may appear.
Second, a schema specifies the data type of every element and attribute in its instance
XML documents.
It has been said that XML schemas are “namespace centric.”
b) Defining a Schema
Schemas themselves are written with the use of a collection of tags, or a vocabulary,
from a namespace that is, in effect, a schema of schemas. The name of this namespace
is http://www.w3.org/2001/XMLSchema.
Some of the elements in the namespace are element, schema, sequence, and string.
A schema defines a namespace in the same sense as a DTD defines a tag set.
The name of the namespace defined by a schema must be specified with the
targetNamespace attribute of the schema element.
The name of every top-level (not nested) element that appears in a schema is placed in
the target namespace, which is specified by assigning a namespace to the target
namespace attribute:
targetNamespace = “http://cs.uccs.edu/planeSchema”
If the elements and attributes that are not defined directly in the schema element
(because they are nested inside top-level elements) are to be included in the target
namespace, schema’s elementFormDefault must be set to qualified, as follows:
elementFormDefault = “qualified”
The default namespace, which is the source of the unprefixed names in the schema, is
given with another xmlns specification, but this time without the prefix:
xmlns = “http://cs.uccs.edu/planeSchema”
<planes
xmlns ="http://cs.uccs.edu/planeschema"
xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation= "http://cs.uccs.edu/planeschema planes.xsd">
e) Simple Types:
Elements are defined in an XML schema with the element tag, which is from the
XMLSchema namespace
An element that is named includes the name attribute for that purpose. The other
attribute that is necessary in a simple element declaration is type, which is used to
specify the type of content allowed in the element.
Here is an example:
<xsd:element name = “engine” type = “xsd:string” />
An element can be given a default value with the default attribute:
<xsd:element name =”engine” type=”xsd:string” default= “fuel injected”/>
Elements can have constant values,constant values are given with fixed attribute
<xsd:element name =”plane” type=”xsd:string” fixed=”single wing”>
A simple user-defined data type is described in a simpleType element with the use of
facets.
Facets must be specified in the content of a restriction element, which gives the base
type name. The facets themselves are given in elements named for the facets:
For example, the following element declares a user-defined type, firstName, for
strings of fewer than 11 characters:
<xsd:simpleType name=”firstName”>
<xsd:restriction base =”xsd:string”>
</xsd:restriction>
</xsd:simpleType>
The number of digits of decimal number is restricted with the precision facet
<xsd:simpleType name=”phoneNumber”>
f) Complex Types:
Most XML documents include nested elements, so few XML schemas do not have
complex types.
Although there are several categories of complex element types, the discussion here is
restricted to those called element-only elements, which can have elements in their
content, but no text.
Complex types are defined with the complexType tag. The elements that are the
content of an element-only element must be contained in an ordered group, an
unordered group, a choice, or a named group.
The sequence element is used to contain an ordered group of elements
<xsd:complexType name=”sports_car”>
<xsd:sequence>
<xsd:element name=”make” type=”xsd:string”/>
<xsd:element name=”model” type=”xsd:string”/>
<xsd:element name=”engine” type=”xsd:string”/>
<xsd:element name=”year” type=”xsd:decimal”/>
</xsd:sequence>
</xsd:complexType>
A complex type whose elements are an unordered group is defined in an all element
<xsd:element name =”planes”>
<xsd:complexType>
<xsd:all>
<xsd:element name=”make”
Type =”xsd:string”
minOccurs= “1”
maxOccurs =”unbounded” />
</xsd:all>
</xsd:complexType>
<.xsd:element>
<xsd:element name="planes">
<xsd:complexType>
<xsd:sequence>
<xsd:element name=”make” type=”xsd:string”/>
//newplanes.xml//
<?xml version="1.0" encoding="utf-8"?>
<planes
xmlns ="http://cs.uccs.edu/planeSchema"
xmlns:xsi ="http://www.w3.org/2001/planeSchema"
xsi:schemaLocation ="http://cs.uccs.edu/planeSchema planes.xsd">
<make>Maruthi </make>
<model>alto </model>
<engine>Diesel engine </engine>
<year>2016 </year>
</planes>
1. DOM
2. SAX