Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
24 views

Lesson+9.1+ +Data+Representation +XML+and+JSON+ +STUDENT+

Uploaded by

Ynn Delro
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lesson+9.1+ +Data+Representation +XML+and+JSON+ +STUDENT+

Uploaded by

Ynn Delro
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 83

Module 3:

Data Representation and Web Services

3.1 Data Representation: XML and JSON


Objectives

After completing this lesson, you should be able to do the following:

1. Determine the current trends in XML and JSON;


2. Differentiate the functions of XML and JSON; and
3. Represent data in different format.
Topics

This chapter will cover the following topics:


● Introduction
● About XML
● About JSON
● Comparison of XML and JSON
● XML and JSON Schema
● Description of the JSON data format
● How to create JSON Schemas
Introduction

Trends in XML and JSON usage


Introduction

Overview: Gaining Access to Diverse Data

Real-world data is often not in relational form


e.g., Excel spreadsheets, Web tables, Java objects, RDF, …
● One approach: convert using custom wrappers
● But suppose tools would adopt a standard export (and import)
mechanism?
This is the role of XML, the eXtensible Markup Language
XML
What Is XML?
Hierarchical, human-readable format
● A “sibling” to HTML, always parsable
● “Lingua franca” of data: encodes documents and
structured data
● Blends data and schema (structure)
Core of a broader ecosystem
● Data – XML (also RDF, etc)
● Schema – DTD and XML Schema
● Programmatic access – DOM and SAX
● Query – XPath, XSLT, XQuery
● Distributed programs – Web services
Definition: XML
● XML stands for EXtensible Markup Language
● XML is a markup language much like HTML
● XML was designed to carry data, not to display data
● XML tags are not predefined. You must define your own tags
● XML is designed to be self-descriptive
XML Data Components
XML includes two kinds of data items:
Elements
● Hierarchical structure with open tag-close tag pairs
● May include nested elements
● May include attributes within the element’s open-tag
● Multiple elements may have same name
● Order matters

Attributes
● Named values – not hierarchical
● Only one attribute with a given name per element
● Order does NOT matter
Well-Formed XML: Always Parsable
Any legal XML document is always parsable by an XML parser, without
knowledge of tag meaning
● The start – preamble – tells XML about the char. encoding
<?xml version=“1.0” encoding=“utf-8”?>

● There’s a single root element


● All open-tags have matching close-tags (unlike many HTML documents!), or a
special:
<tag/> shortcut for empty tags (equivalent to <tag></tag>)

● Attributes only appear once in an element


● XML is case-sensitive
XML: Well-formed
● XML documents must have a root element
● XML elements must have a closing tag
● XML tags are case sensitive
● XML elements must be properly nested
● XML attribute values must always be quoted
Example of XML-formatted data
The below XML document contains data about a book: its title, authors, date
of publication, and publisher.
JSON
What is JSON
● JavaScript Object Notation

● Used to format data


● Commonly used in Web as a vehicle to describe data being sent
between systems
JSON example
—“JSON” stands for “JavaScript Object Notation”
—Despite the name, JSON is a (mostly) language-independent way of specifying objects as name-value pairs
—Example (http://secretgeek.net/json_3mins.asp):
—{"skillz": {
"web":[
{ "name": "html",
"years": 5
},
{ "name": "css",
"years": 3
}]
"database":[
{ "name": "sql",
"years": 7
}]
}}
JSON syntax
—An object is an unordered set of name/value pairs
● The pairs are enclosed within braces, { }
● There is a colon between the name and the value
● Pairs are separated by commas
● Example: { "name": "html", "years": 5 }
—An array is an ordered collection of values
● The values are enclosed within brackets, [ ]
● Values are separated by commas
● Example: [ "html", ”xml", "css" ]
JSON syntax
—A value can be: A string, a number, true, false, null, an object, or an array
● Values can be nested
—Strings are enclosed in double quotes, and can contain the usual assortment of
escaped characters
—Numbers have the usual C/C++/Java syntax, including exponential (E) notation
● All numbers are decimal--no octal or hexadecimal
—Whitespace can be used between any pair of tokens
Comparison of JSON and XML
—Similarities: —Differences:
● Both are human readable ● Syntax is different
● Both have very simple syntax ● JSON is less verbose
● Both are hierarchical ● JSON can be parsed by
● Both are language independent JavaScript’s eval method
● Both can be used by Ajax ● JSON includes arrays
● Both supported in APIs of many ● Names in JSON must not be
programming languages JavaScript reserved words
● XML can be validated
Specifications
The JSON specification is here:
–https://tools.ietf.org/html/rfc7159
There are two specifications for JSON-Schema
–http://json-schema.org/latest/json-schema-core.html
–http://json-schema.org/latest/json-schema-validation.html
Same data, JSON-formatted
{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ],
"Date": "2007",
"Publisher": "Springer"
}
}
Comparison of XML and JSON
XML and JSON, side-by-side
Creating lists in XML and JSON
XML is a meta-language
● XML is a language that you use to create other languages.
● For example, on the previous slides we saw how to use XML to
create a Book language, consisting of <Book>, <Title>, <Author>, and
so forth.
JSON is a meta-language
● JSON is also a language that you use to create other languages.
● For example, on the previous slides we saw how to use JSON to
create a Book language, consisting of "Book", "Title", "Author", and so
forth.
An XML document is a tree
A JSON Object is a tree
Trees are well-studied
● The tree data structure has been well-studied by computer scientists
and mathematicians.
● There are many well-known algorithms for processing and traversing
trees.
● Both XML and JSON are able to leverage this.
XML Schema and JSON Schema
Study the Schema of XML and JSON

Perform the discussion in the Canvas


Title with string type
{
"$schema": "http://json-schema.org/draft-04/schema",
"type": "object",
"properties": { <xs:element name="Title" type="xs:string" />
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
Authors list
{
"$schema": "http://json-schema.org/draft-04/schema",
<xs:element name="Authors">
"type": "object", <xs:complexType>
"properties": { <xs:sequence>
"Book": { <xs:element name="Author" type="xs:string"
maxOccurs="5"/>
"type": "object",
</xs:sequence>
"properties": { </xs:complexType>
"Title": {"type": "string"}, </xs:element>
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
Date with year type
{
"$schema": "http://json-schema.org/draft-04/schema",
"type": "object",
"properties": {
"Book": {
<xs:element name="Date" type="xs:gYear" />
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]}
},
"required": ["Title", "Authors", "Date"],
"additionalProperties": false
}
},
"required": ["Book"],
"additionalProperties": false
}
Publisher
{
with enumeration
"$schema": "http://json-schema.org/draft-04/schema",
"type": "object",
"properties": {
"Book": {
"type": "object",
"properties": {
"Title": {"type": "string"},
"Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }},
"Date": {"type": "string", "pattern": "^[0-9]{4}$"},
"Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard
Press"]}
},
"required": ["Title", "Authors", "Date"], <xs:element name="Publisher" minOccurs="0">
<xs:simpleType>
"additionalProperties": false
<xs:restriction base="xs:string">
} <xs:enumeration value="Springer" />
}, <xs:enumeration value="MIT Press" />
"required": ["Book"], <xs:enumeration value="Harvard Press" />
"additionalProperties": false </xs:restriction>
} </xs:simpleType>
</xs:element>
Bootstrap the schema language
▪ An XML Schema is written in XML.
▪ A JSON Schema is written in JSON.
Validate XML docs against XML Schema

XML
Schema XML
Schema XML instance is valid/invalid
XML Validator
(instance)
Validate JSON docs against JSON Schema

JSON
Schema JSON
Schema JSON instance is valid/invalid
JSON Validator
(instance)
JSON Schema validators
http://json-schema.org/implementations.html
JSON used to store location of tweets (cont.)

▪ Tracking geotagged tweets from Twitter’s public API for the last three and a
half years.
▪ There are about 10 million public geotagged tweets every day, which is about
120 per second.
▪ The accumulated history adds up to nearly three terabytes of compressed
JSON and is growing by four gigabytes a day.
▪ The map on the previous slide shows what 6,341,973,478 tweets look like on
a map.
▪ Using this program to parse the JSON and pull out just each tweet’s
username, date, time, location, client, and text:
Example of XML-to-JSON
The online freeformatter.com tool converts this XML:
<Book id="MCD">
<Title>Modern Compiler Design</Title>
<Author>Dick Grune</Author>
<Publisher>Springer</Publisher>
</Book>
to this JSON:
{
"@id": "MCD",
"Title": "Modern Compiler Design",
"Author": "Dick Grune",
"Publisher": "Springer"
}
It encodes XML attributes by prefixing the attribute name with the @
symbol.
Auto-converting XML to JSON: a bad idea?
Should you devise a way to auto-convert XML to JSON? In the below message the person says:

Based on our real-world experience, it is best to create the JSON design from scratch.
Do not auto-generate it from XML.

Hi all,

To throw in a view from a long-time XML user:


IPTC - www.iptc.org - builds XML-based news exchange formats for 17 years
now and was also challenged to do the same in JSON. After a long discussion
we refrained from automatically converting an existing XML data model to
JSON:
- currently no shared/common way to deal with namespaces in JSON
- designs like inline elements don't exist in JSON
- the element/attribute model has no corresponding design in JSON
- and a basic requirement of JSON users is: no complex data model, please!

Therefore we created
- a simplified data model for the news exchange in JSON - www.newsinjson.org
- compared to the richer but also more complex XML format www.newsml-g2.org
- a highly corresponding JSON model to an initial XML model for the rights
expression language ODRL as this is a set of data which cannot be
simplified: http://www.w3.org/community/odrl/work/json/ vs
http://www.w3.org/community/odrl/work/2-0-xml-encoding-constraint-draft-changes/

Both approaches were welcome and are used - and we learned: an XML-to-JSON
tool only is of limited help. http://lists.xml.org/archives/xml-dev/201412/msg00022.html
The upcoming slides
The following slides are organized as follows:
– Description of the JSON data format
– How to create JSON Schemas
The JSON data format
JSON value

A JSON instance contains a single JSON value. A JSON value may


be either an object, array, number, string, true, false, or null:

Acknowledgement: This “railroad” graphic comes from the JSON specification, as do the railroad graphics on the next several
slides.
JSON object

A JSON object is zero or more string-colon-value pairs, separated


by comma and wrapped within curly braces:

Example of a JSON object:


{ "name": "John Doe",
"age": 30,
"married": true }
Empty object

A JSON object may be empty.

This is a JSON object: { }


No duplicate keys
▪ You should consider JSON objects as containing key/value
pairs.
▪ Just as in a database the primary keys must be unique, so too in
a JSON object the keys must be unique.
▪ This JSON object has duplicate keys:

{ "Title": "A story by Mark Twain",


"Title": "The Adventures of Huckleberry Finn" }
JSON parsers have unpredictable behavior on JSON
objects with duplicate keys
A JSON object whose names are all unique is
interoperable in the sense that all software
implementations receiving that object will agree on
the name-value mappings. When the names within an
object are not unique, the behavior of software that
receives such an object is unpredictable. Many
implementations report the last name/value pair only.
Other implementations report an error or fail to parse
the object and some implementations report all of the
name/value pairs, including duplicates.
RFC 7159
This web site (https://web.archive.org/web/20151004063037/http://fadefade.com/json-
comments.html) describes a JSON parser that ignores the first key/value pair and as a result a
form of covert channel is generated.
JSON array

A JSON array is used to express a list of values. A JSON array


contains zero or more values, separated by comma and wrapped
within square brackets:

Example of a JSON array


{ "name": "John Doe",
"age": 30,
"married": true,
"siblings": ["John", "Mary", "Pat"] }
Empty array vs. array with a null value

[]
[ null ]
Array with no items in it.
Array with one item in it.
Array of objects
Each item in an array may be any of the seven JSON values.

{ The array contains 3 items.


"name": "John Doe", The first item is an object,
"age": 30, the second item is a boolean,
"married": true, and the third item is a string.
"siblings": [
{"name": "John", "age": 25},
Do Lab1
true,
"Hello World"
]
}
JSON number

A number is an integer or a decimal and it may have an exponent:

{
"name": "John Doe",
"age": 30, Example of a JSON number
"married": true,
"siblings": ["John", "Mary", "Pat"]
}
JSON string

A string is a sequence of Unicode characters wrapped within


quotes (").

{
"name": "John Doe", Example of a JSON string
"age": 30,
"married": true,
"siblings": ["John", "Mary", "Pat"]
}
JSON chars are a superset of XML chars

JSON

XML

Lesson Learned: be careful converting JSON to XML


as the result may be a non-well-formed XML document.
These characters must be escaped

If any of the following characters occur within a string, they must


be escaped by preceding them with a backslash (\):
– quotation mark ("),
– backslash (\),
– the control characters U+0000 to U+001F
Each character corresponds to a number

ASCII table
Expressing characters in hex format

A character may be represented as a hexadecimal number using


this notation: \uXXXX
For example, instead of using 'j' in your JSON document, you can
use either \u006A or \u006a.
Unicode details
▪ Use the \uXXXX notation for Unicode code points in the Basic
Multilingual Plane.
▪ Use a 12-character sequence \uYYYY\uZZZZ to encode code
points above the Basic Multilingual Plane.
– (0xD800 ≤ YYYY < 0xDC00 and 0xDC00 ≤ ZZZZ ≤ 0xDFFF)
No multiline strings

JSON does not allow multiline strings.


Legal:
{
"comment": "This is a very, very long comment"
}

Not legal:
{
"comment": "This is a very,
very long comment"
}
No multiline strings (cont.)

Legal:
{
"comment": "This is a very, \n very long comment"
}
Just 2 symbols
JSON parser converts \n to the newline
character
newline control character (invisible)
{
"comment": "This is a very, very long comment"
}

JSON parser

{
"comment": "This is a very, \n very long comment"
}
Achieving interoperability in a world where different OS's
represent newline differently
Each operating system has its own convention for signifying the end of a line of text:

Unix: the newline is a character, with the value hex A (LF).

MS Windows: the newline is a combination of two characters, with the values hex D (CR) and hex A (LF), in that order.

Mac OS: the newline is a character, with the value hex D (CR).

This operating-system-dependency of newlines can cause interoperability problems, e.g., the newlines in a string
created on a Unix box will not be understood by applications running on a Windows box.

Here is how the newline problem is resolved in XML and in JSON:

XML: all newlines are normalized by an XML parser to hex A (LF). So it doesn't matter whether you create your XML
document on a Unix box, a Windows box, or a Macintosh box, all newlines will be represented as hex A (LF).

JSON: multi-line strings are not permitted! So the newline problem is avoided completely. You can, however, embed
within your JSON strings the \n (LF) or \r\n (CRLF) symbols, to instruct processing applications: "Hey, I would like a
newline here."

That is quite an interesting difference in approach between XML and JSON for dealing with the newline problem!
JSON strings

http://json.org/

http://stackoverflow.com/questions/2392766/multiline-strings-in-json
Other JSON values

The values true, false, and null are literal values; they are not
wrapped in quotes.
{
"name": "John Doe",
"age": 30,
"married": true, Example of a JSON boolean
"siblings": ["John", "Mary", "Pat"]
}
Good use-case for null

Some people do not have a middle name, we can use null to


indicate “no value”:
{
“first-name": "John",
“middle-name": null,
“last-name": “Doe"
}
This is a legal JSON instance

42
so is this

"Hello World"
and so is this

true
and this

[ true, null, 12, "ABC" ]


Whitespace is irrelevant

{
"name": "John Doe",
"age": 30,
"married": true
}

equivalent

{"name":"John Doe","age":30,"married":true}
String delimiters: JSON vs. XML
▪ JSON strings are always delimited by double quotes.
▪ XML strings (such as attribute values) may be delimited by
either double quotes or single quotes.
JSON is recursively defined

{
"foo": json-value
}
The above JSON instance is an object. The
object has a single property, "foo". Its value is
any JSON value – recursive definition!
Using JSON you can define arbitrarily complex structures
{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ]
}
}
{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [
{"name":"Dick Grune", "university": "Vrije Universiteit"},
{"name":"Ceriel J.H. Jacobs", "university": "Vrije Universiteit"}
]
}
}
Extend, ad infinitum

{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [
{"name": {"first":"Dick", "last":"Grune"},
"university": "Vrije Universiteit"},
{"name": {"first":"Ceriel", "last":"Jacobs"},
"university": "Vrije Universiteit"}
]
}
}
7 simple JSON components, assemble to
generate unlimited complexity

fals
e
null
true

objec array string numbe


JSON provides the structures and assembly
points, you customize them for your needs

object array
{ [ json-value, json-value, json-value, … ]
"___": json-value,
string
"___": json-value, "___"
"___": json-value,

}
JSON provides the structures and assembly
points, you customize them for your needs
structures

object array
{ [ json-value, json-value, json-value, … ]
"___": json-value,
string
"___": json-value, "___"
"___": json-value,

}
JSON provides the structures and assembly
points, you customize them for your needs
assembly points

object array
{ [ json-value, json-value, json-value, … ]
"___": json-value,
string
"___": json-value, "___"
"___": json-value,

}
JSON provides the structures and assembly
points, you customize them for your needs

object array
{ [ json-value, json-value, json-value, … ]
"___": json-value,
string
"___": json-value, "___"
"___": json-value,

}

customize
Comments not allowed!
▪ You cannot comment a JSON instance document.
▪ There is no syntax for commenting JSON instances.
▪ Bummer.
Summary
In this lesson, you should have learned the:

● XML
○ A “sibling” to HTML, always parsable
○ “Lingua franca” of data: encodes documents and structured data
○ Blends data and schema (structure)
● JSON
○ JavaScript Object Notation
○ Used to format data
○ Commonly used in Web as a vehicle to describe data being sent between systems
● Similarities
○ Both are human readable
○ Both have very simple syntax
○ Both are hierarchical
Summary
In this lesson, you should have learned the:

● Differences
○ JSON includes arrays
○ Names in JSON must not be JavaScript reserved words
○ XML can be validated
● JSON is also a language that you use to create other languages.
● There are many well-known algorithms for processing and traversing trees.
● Both XML and JSON are able to leverage this.
● The accumulated history adds up to nearly three terabytes of compressed JSON and is
growing by four gigabytes a day.
● A JSON array is used to express a list of values. A JSON array contains zero or more
values, separated by comma and wrapped within square brackets

You might also like