02 - XML - Fundamentals
02 - XML - Fundamentals
0> <course startdate=February 06, 2006> <title> eXtensible Markup Language </title> <lecturer>Phan Vo Minh Thang</lecturer> </course>
<person> and </person> are markup Alan Turing and its surrounding whitespace are character data
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
Tag Syntax
Like XML tags
Start-tags begin with < and end-tags begin with </ Both of start-tags and end-tags are followed by the name of the element and are closed by >
Case sensitivity
<Person> <PERSON> <person>
Empty Element
Empty element: elements that have no content
For the value of their attributes
Example
<email href=mailto></email>
Shorthand notation
<email href=mailto />
XML Trees
Elements can contain other elements that in turn can contain text or elements and so on
Start and end tags must always be balanced and children are always completed enclosed in their parents. Use Last-In-First-Out
<name><fname>Jack</fname><lname>Smith<name></lname>
Illegal
Parent Child Sibling Each element (except the root element) has exactly one parent element An XML document is a tree of elements Root (document) element: the first element in the document and the element that contains all other elements
eXtensible Markup Language
name
address
tel
tel
name
tel
street
region
postalcode
locality
country
fname
lname
Mixed Content
The dichotomy between elements that contain only character data and elements that contain only child elements is common in data-oriented XML documents Mixed content: some elements may contain sub-elements and raw data
Common in XML documents containing articles, essays, stories, books, novels, reports, web pages document-oriented applications
Attributes
Attach additional information to elements An attribute is a name-value pair attached to an elements start-tag
One element can have more than one attribute Name and value are separated by = and optional whitespace Attribute value is enclosed in double or single quotation marks <tel preferred=true>03-5712121</tel> Attribute order is not significant
Example 2-4
<person born=1912-06-23 died=1954-06-07> Alan Turing </person>
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
Each element may have no more than one attribute with a given name The value of attribute is simply a text string limited in structure An element-based structure is a lot more flexible and extensible If you are designing your own XML vocabulary, it is up to you to decide when to use which
eXtensible Markup Language
XML Names
Rules for naming elements, attributes
May contain essentially any alphanumeric character and non-english letters, numbers, and ideograms May contain underscore(_), period (.), and hyphen (-) XML may not contain whitespace of any kind All names beginning with the string xml (in any combination of case) are reserved for standardization in W3C XML-related specifications Start with either letters. ideograms and underscore (_) No limit to the name length
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
OR The first letter of each word in XML elements are frequently in uppercase and no separation character
AddressBook
<SUV-rating> <engine:part>
Entity References
What if the character data inside an element contains < ? Entity reference when an application parses an XML document, it replaces the entity reference with the actual characters to which the entity reference refers Entity references are markups XML predefines 5 entity references you can define more
< the less-than sign &s; the ampersand (&) > the greater-than sign " the straight, double quotation marks (") ' the straight single quote (')
CDATA Sections
What if your character data have a lot of <, &, ', " Enclose the character data in a CDATA section
<![CDATA[ .. ]]>
Everything inside a CDATA section is treated as raw character data not markup The only thing that cannot appear in a CDATA section is the CDATA section end delimiter ]]>
Comments
XML documents can be commented so that coauthors can leave notes for each other and themselves
Begin with <!-- and end with the first occurrence of --> The double hyphen -- should not appear anywhere inside the comment until the closing -->
Comments may appear anywhere in the character data of a document Comments may appear before or after the root element Comments may not appear inside a tag or inside another comment Comments are strictly for making the raw source code of an XML document more legible to human readers
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
Standalone: if the value is "no", then an application may be required to read an external DTD to determine the proper values for parts of the document
Illegal XML
<person><name>Claven</name> <keypoint><hd>XML provides a data bus</hd> </person><more></more></keypoint>
Illegal
<font size=6>
Rule 5: An element may not have two attributes with the same name Rule 6: Comments and processing instructions may not appear inside tags Rule 7: No unescaped < or & signs may occur in the character data of an element or attributes
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
Illegal
<BR> <HR>
Exercise
Is it a well-formed XML document?
<?xml version=1.0 standalone=yes?> <Book> <Title>The XML Handbook</title> <Publisher>Prentice Hall PTR </Publisher> <Author>Charles F. Goldfarb</Author> </Book>
<web-class> <Title>XML Basics</Title> <xml-professor>Carolyn Strong</xml-professor> <1st.class>April 17</1st.class > </web-class>
eXtensible Markup Language
Lecturer: Phan Vo Minh Thang MSc.
Info
Course name: