Session 1
Session 1
Session 1
nl
O
se
Module 1
rU
te
en
C
Introduction to XML
h
ec
pt
rA
Fo
Module Overview
y
nl
O
In this module, you will learn about:
se
Introduction to XML
rU
Exploring XML
te
Working with XML
en
XML Syntax
C
h
ec
pt
rA
Fo
y
nl
O
In this first lesson, Introduction to XML, you will
se
learn to:
rU
Outline the features of markup languages and list
their drawbacks.
te
Define and describe XML.
en
State the benefits and scope of XML.
C
h
ec
pt
rA
Fo
y
nl
O
Generalized Markup Language (GML) helps the
se
documents to be edited, formatted, and searched by
rU
different programs using its content-based tags.
te
en
Standard Generalized Markup Language (SGML) is a
coding scheme for developing specialized markup
languages. C
h
ec
pt
y
nl
O
se
rU
te
en
C
h
ec
pt
rA
Fo
y
nl
Features
O
GML describes the document in terms of its format,
se
structure and other properties.
rU
SGML ensures that system can represent the data
te
in its own way.
en
HTML used ASCII text, which allows the user to
C
use any text editor.
h
ec
Drawbacks
pt
y
nl
O
XML is a W3C recommendation.
se
rU
It is a set of rules for defining semantic tags that
break a document into parts.
te
en
C
XML was developed over HTML.
h
ec
pt
rA
Fo
y
nl
O
HTML XML
se
HTML was designed to display XML was designed to carry
rU
data. data.
HTML displays data and XML describes data and
te
focuses on how data looks. focuses on what data is.
en
HTML displays information. XML describes information.
C
h
ec
pt
rA
Fo
y
nl
O
XML stands for Extensible Markup Language
se
XML is a markup language much like HTML
rU
XML was designed to describe data
te
XML tags are not predefined
en
y
nl
O
XML markup defines the physical and logical layout of
se
the document.
rU
XML markup divides a document into separate
te
en
C
A document consists of one outermost element,
h
ec
y
nl
O
Code Snippet
se
<?xml version=”1.0” encoding=”iso-8859-1” ?>
rU
- <FlowerPlanet>
<Name>Rose</Name>
te
<Price>$1</Price>
en
<Description>Red in color</Description>
<Number>700</Number>
C
</FlowerPlanet> h
where,
ec
y
nl
Data independence - separates the content from its presentation
O
se
Easier to parse - absence of formatting instructions makes it easy to
rU
parse
te
Reducing Server Load - semantic and structural information enables it
en
to be manipulated by any application, can now be performed by clients
processing tools C
Easier to create – can easily create with the most primitive text
h
ec
y
nl
O
In this second lesson, Exploring XML, you will learn
se
to:
rU
Describe the structure of an XML document.
te
en
State the functions of editors for XML and list the
y
nl
O
The two sections of an XML document are:
se
Document Prolog
rU
Root Element
te
An XML document consists of a set of unambiguously
en
named "entities".
C
Every document starts with a "root" or document
entity. All other entities are optional.
h
ec
of the alias.
Fo
y
nl
O
Document Prolog
se
Document prolog contains metadata and consists of two parts -
XML Declaration and Document Type Declaration.
rU
XML Declaration specifies the version of XML being used.
te
Document Type Declaration defines entities’ or attributes’ values
en
and checks grammar and vocabulary of markup.
Root Element C
h
Also called a document element.
ec
document.
rA
y
nl
O
se
rU
te
en
C
h
ec
pt
rA
Fo
y
nl
O
Code Snippet
se
<?xml version=”1.0” encoding=”iso-8859-1”?>
rU
<!DOCTYPE Music_Library [
<!ELEMENT Music_Library (Title,Artist,Country,Price,Year)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Artist (#PCDATA)>
te
<!ELEMENT Country (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
en
<!ELEMENT Year (#PCDATA)>
<!ENTITY MS “Thatz life”>
C
]>
<Music_Library>
<Title>YLO</Title>
h
<Artist>&MS; - Jenny Dan</Artist>
ec
<Country>Germanyo</Country>
<Price>$12</Price>
pt
<Year>2002</Year>
</Music_Library>
rA
where,
The first block indicates xml declaration and document type declaration. Music_Library is
Fo
y
nl
The logical structure gives information about the elements and the
O
order in which they are to be included in the document.
se
It shows how a document is constructed rather than what it contains.
rU
Document Prolog forms the basis of the logical structure of the XML
te
document.
en
XML Declaration and Document Type Definition are its components.
C
h
ec
pt
rA
Fo
y
nl
O
Code Snippet
se
<?xml version=”1.0” encoding=”iso-8859-1” ?>
rU
te
Code Snippet
en
<!DOCTYPE Music_Library SYSTEM
“Mlibrary.dtd”>
C
h
ec
pt
rA
Fo
y
nl
O
1. XML document creation
se
2. Scanning
rU
3. Parsing
te
4. Access
en
5. Conversion into Application program
6. ModificationC
h
7. Serialization
ec
pt
rA
Fo
y
nl
O
Main Functions:
Add opening and closing tags to the code
se
rU
Verify XML against a DTD/Schema
Perform series of transforms over a document
te
en
Display the line numbers
XML Spy
rA
XML Pro
Fo
XMLmind
XMetal
@ Aptech Limited Modern Markup for Data Interchange/ Module 1/ 21 of 55
Parsers 1-2
y
nl
An XML parser:
O
se
rU
After verification, converts it into a tree of elements
te
Commonly used parsers:
en
Crimson, Xerces, Oracle XML Parser, JAXP, MSXML
C
h
Type of parsers:
ec
Validating parser
rA
Fo
y
nl
O
Non Validating parser
se
It checks the well-formedness of the document.
rU
Reads the document and checks for its conformity
te
with XML standards.
en
Validating parser C
h
It checks the validity of the document using DTD.
ec
pt
rA
Fo
y
nl
O
Web browsers can format XML data and display it to
se
the user.
rU
Other programs like database, Musical Instrument
Digital Interface (MIDI) program or a spreadsheet
te
program may present XML data accordingly.
en
C
Commonly used web browsers:
h
ec
y
nl
O
In this third lesson, Working with XML, you will learn
se
to:
rU
Explain the steps towards building an XML document.
te
en
C
h
ec
pt
rA
Fo
y
nl
O
An XML document has three main components:
se
Tags(markup) and text(content)
rU
DTD or Schema
Formatting or display specifications
te
en
The steps to build an XML document are as follows:
C
y
nl
O
Various building blocks of an XML document:
se
XML Version Declaration
rU
Document Type Definition
te
Document instance in which the content is defined
en
by the mark up
C
h
ec
pt
rA
Fo
y
nl
O
se
rU
te
Characters are encoded using
various encoding formats. The
en
character encoding is declared in
encoding declaration.
XML document.
y
nl
O
se
rU
Indicates the presence of external
te
markup declarations. “Yes”
indicates that there are no
en
external markup declarations and
“no” indicate that external markup
C
declarations might exist.
y
nl
O
Structure
se
Semantic
rU
Style
te
en
C
h
ec
pt
rA
Fo
y
nl
O
XML tags should be valid
se
Length of the tags depend on the XML
rU
processors
te
XML attributes should be valid
en
XML documents should be verified
C
Minimum of one element is required
h
ec
y
nl
O
Code Snippet
se
rU
<Book>
<Name>Good XML</Name>
te
<Cost>$20</Cost>
</Book>
en
C
h
ec
pt
rA
Fo
y
nl
O
se
rU
te
en
C
h
ec
pt
rA
Fo
y
nl
O
In this last lesson, XML Syntax, you will learn to:
se
State and describe the use of comments and
rU
processing instructions in XML.
Classify character data that is written between tags.
te
en
Describe entities, DOCTYPE declarations and
attributes.
C
h
ec
pt
rA
Fo
y
nl
O
Used for the people to give information about the code
se
Usually made for the content
rU
te
en
C
h
ec
pt
rA
Fo
y
nl
O
Rules:
se
Comments should not include “-” or “
rU
Comments should not be placed within a tag or
entity declaration
te
en
Comments should not be placed before the XML
declaration
C
Comments can be used to comment the tag sets
h
ec
y
nl
O
Code Snippet
se
rU
<Name NickName=’John’>
<First>John</First>
te
<!--John is yet to pay the term fees-->
<Last>Brown</Last> <Semester>Final</Semester>
en
</Name>
C
h
ec
pt
rA
Fo
y
nl
O
The main objective is to present some special
se
instructions to the application.
rU
All the processing instructions should begin with an
identifier called target.
te
en
Syntax
<?PITarget <instruction>?>
C
h
ec
where,
pt
rA
PITarget is the name of the application that should receive the processing
instructions.
Fo
y
nl
O
Code Snippet
se
rU
<Name NickName=’John’>
<First>John</First>
<!--John is yet to pay the term fees-->
te
<Last>Brown</Last>
en
<?feesprocessor SELECT fees FROM
STUDENTFEES?>
C
<Semester>Final</Semester>
</Name>
h
ec
where,
pt
processing instruction
y
nl
O
Character data describes the document’s actual
se
content with the white space.
rU
The character data can be classified into:
te
Character Data (CDATA)
en
Parsed Character Data (PCDATA)
C
h
ec
pt
rA
Fo
y
nl
O
The data that is parsed by the parser is called as
se
PCDATA.
rU
The main objective is to present some special
instructions to the application.
te
en
C
h
ec
pt
rA
Fo
y
nl
O
se
Code Snippet
rU
<Name nickname=’John’>
te
<First>John</First>
en
<!--John is yet to pay the term fees-->
<Last>Brown</Last>
C
<Semester>Final> 10 & <20</Semester>
</Name>
h
ec
pt
rA
Fo
y
nl
O
CDATA part begins with a “<![CDATA[“ and ends
se
with “]]>”.
rU
CDATA sections cannot be nested.
te
en
Code Snippet
C
<?xml version=”1.0” standalone=”yes”?>
h
<Svg>
ec
<Desc>Three shapes</Desc>
<Para>But the formula was <![CDATA[if (&x1 +
pt
&x2)]]> which
resulted in 7.</Para>
rA
</Svg>>
Fo
y
nl
O
XML document is made up of large amount of information
se
called as entities.
rU
Every entity consists a name and a value.
te
Entity reference consists of an ampersand (&), the entity
en
name, and a semicolon (;).
C
h
ec
pt
rA
Fo
y
nl
O
Predefined Description Output
se
Entity
rU
< produces the left angle <
bracket
te
> produces the right angle >
en
bracket
& C
produces the ampersand &
h
' produces a single quote ‘
ec
character
pt
character
Fo
y
nl
O
Code Snippet
se
<?xml version=”1.0”?>
rU
<!DOCTYPE Letter [
<!ENTITY address “15 Downing St Floor 1”>
<!ENTITY city “New York”>
te
]>
en
<Letter>
<To>"Tom Smith"</To>
C
<Address>&address;</Address>
<City>&city;</City>
h
<Body>
ec
</Body>
rA
<From>ARNOLD</From>
</Letter>
Fo
y
nl
O
se
rU
te
en
C
h
ec
pt
rA
Fo
y
nl
O
General Entity
se
These entities are used within the document
rU
content.
te
Code Snippet
en
C
<<!ENTITY % ADDRESS “text that is to be
represented by an entity”>
h
ec
A well-formed parameter entity will look like a general entity, except that it will
include the “%” specifier.
pt
rA
Fo
y
nl
O
Parameter Entity
se
These entities are used only in the DTD.
rU
te
Code Snippet
en
C
<!DOCTYPE MusicCollection [
<!ENTITY R “Rock”>
h
<!ENTITY S “Soft”>
ec
<!ENTITY RA “Rap”>
<!ENTITY HH “Hiphop”>
pt
<!ENTITY F “Folk”>
rA
]>
Fo
y
nl
O
DTD File:
se
rU
Syntax
te
en
<! DOCTYPE name_of_root_element
SYSTEM “URL of the external DTD subset” [
]> C
Internal DTD subset
h
ec
where,
pt
y
nl
O
Code Snippet
se
rU
<?xml version=”1.0” encoding=”ISO-8859-1”?>
<!DOCTYPE program SYSTEM “HelloJimmy.dtd”>
te
<Program>
en
<Comments>
This is a simple Java Program. It will display
C
the message “Hello Jimmy,How are you?” on
execution.
h
</Comments>
ec
}
</Code>
Fo
</Program>
y
nl
O
Document Type Definition defines the elements in the document
se
rU
<!ELEMENT Program (comments, code)>
<!ELEMENT Comments (#PCDATA)>
te
<!ELEMENT Code (#PCDATA)>
en
Output:
C
h
ec
pt
rA
Fo
y
nl
O
Attributes are case sensitive and must start with a
se
letter or underscore.
rU
te
Syntax
en
<elementName attName1=”attValue2”
C
attName2=”attValue2”...>
h
Code Snippet
ec
<Player Sex=”male”>
rA
<FirstName>Tom</FirstName>
<LastName>Federer</LastName>
Fo
</Player>
y
nl
O
Introduction to XML
se
XML was developed to overcome the drawbacks of earlier
markup languages.
rU
XML consists of set of rules that describe the content to
be displayed in the document.
te
XML markup contains the content in the information
en
containers called as elements.
Exploring XML
C
h
ec
y
nl
O
Working with XML
se
XML document is divided into XML Version Declaration,
DTD and the document instance in which the markup
rU
defines the content.
The XML markup is again categorized into structural,
te
semantic and stylistic.
The output of the XML document is displayed in the
en
browser if it is well formed.
XML Syntax C
h