Lesson+9.1+ +Data+Representation +XML+and+JSON+ +STUDENT+
Lesson+9.1+ +Data+Representation +XML+and+JSON+ +STUDENT+
Attributes
● Named values – not hierarchical
● Only one attribute with a given name per element
● Order does NOT matter
Well-Formed XML: Always Parsable
Any legal XML document is always parsable by an XML parser, without
knowledge of tag meaning
● The start – preamble – tells XML about the char. encoding
<?xml version=“1.0” encoding=“utf-8”?>
XML
Schema XML
Schema XML instance is valid/invalid
XML Validator
(instance)
Validate JSON docs against JSON Schema
JSON
Schema JSON
Schema JSON instance is valid/invalid
JSON Validator
(instance)
JSON Schema validators
http://json-schema.org/implementations.html
JSON used to store location of tweets (cont.)
▪ Tracking geotagged tweets from Twitter’s public API for the last three and a
half years.
▪ There are about 10 million public geotagged tweets every day, which is about
120 per second.
▪ The accumulated history adds up to nearly three terabytes of compressed
JSON and is growing by four gigabytes a day.
▪ The map on the previous slide shows what 6,341,973,478 tweets look like on
a map.
▪ Using this program to parse the JSON and pull out just each tweet’s
username, date, time, location, client, and text:
Example of XML-to-JSON
The online freeformatter.com tool converts this XML:
<Book id="MCD">
<Title>Modern Compiler Design</Title>
<Author>Dick Grune</Author>
<Publisher>Springer</Publisher>
</Book>
to this JSON:
{
"@id": "MCD",
"Title": "Modern Compiler Design",
"Author": "Dick Grune",
"Publisher": "Springer"
}
It encodes XML attributes by prefixing the attribute name with the @
symbol.
Auto-converting XML to JSON: a bad idea?
Should you devise a way to auto-convert XML to JSON? In the below message the person says:
Based on our real-world experience, it is best to create the JSON design from scratch.
Do not auto-generate it from XML.
Hi all,
Therefore we created
- a simplified data model for the news exchange in JSON - www.newsinjson.org
- compared to the richer but also more complex XML format www.newsml-g2.org
- a highly corresponding JSON model to an initial XML model for the rights
expression language ODRL as this is a set of data which cannot be
simplified: http://www.w3.org/community/odrl/work/json/ vs
http://www.w3.org/community/odrl/work/2-0-xml-encoding-constraint-draft-changes/
Both approaches were welcome and are used - and we learned: an XML-to-JSON
tool only is of limited help. http://lists.xml.org/archives/xml-dev/201412/msg00022.html
The upcoming slides
The following slides are organized as follows:
– Description of the JSON data format
– How to create JSON Schemas
The JSON data format
JSON value
Acknowledgement: This “railroad” graphic comes from the JSON specification, as do the railroad graphics on the next several
slides.
JSON object
[]
[ null ]
Array with no items in it.
Array with one item in it.
Array of objects
Each item in an array may be any of the seven JSON values.
{
"name": "John Doe",
"age": 30, Example of a JSON number
"married": true,
"siblings": ["John", "Mary", "Pat"]
}
JSON string
{
"name": "John Doe", Example of a JSON string
"age": 30,
"married": true,
"siblings": ["John", "Mary", "Pat"]
}
JSON chars are a superset of XML chars
JSON
XML
ASCII table
Expressing characters in hex format
Not legal:
{
"comment": "This is a very,
very long comment"
}
No multiline strings (cont.)
Legal:
{
"comment": "This is a very, \n very long comment"
}
Just 2 symbols
JSON parser converts \n to the newline
character
newline control character (invisible)
{
"comment": "This is a very, very long comment"
}
JSON parser
{
"comment": "This is a very, \n very long comment"
}
Achieving interoperability in a world where different OS's
represent newline differently
Each operating system has its own convention for signifying the end of a line of text:
MS Windows: the newline is a combination of two characters, with the values hex D (CR) and hex A (LF), in that order.
Mac OS: the newline is a character, with the value hex D (CR).
This operating-system-dependency of newlines can cause interoperability problems, e.g., the newlines in a string
created on a Unix box will not be understood by applications running on a Windows box.
XML: all newlines are normalized by an XML parser to hex A (LF). So it doesn't matter whether you create your XML
document on a Unix box, a Windows box, or a Macintosh box, all newlines will be represented as hex A (LF).
JSON: multi-line strings are not permitted! So the newline problem is avoided completely. You can, however, embed
within your JSON strings the \n (LF) or \r\n (CRLF) symbols, to instruct processing applications: "Hey, I would like a
newline here."
That is quite an interesting difference in approach between XML and JSON for dealing with the newline problem!
JSON strings
http://json.org/
http://stackoverflow.com/questions/2392766/multiline-strings-in-json
Other JSON values
The values true, false, and null are literal values; they are not
wrapped in quotes.
{
"name": "John Doe",
"age": 30,
"married": true, Example of a JSON boolean
"siblings": ["John", "Mary", "Pat"]
}
Good use-case for null
42
so is this
"Hello World"
and so is this
true
and this
{
"name": "John Doe",
"age": 30,
"married": true
}
equivalent
{"name":"John Doe","age":30,"married":true}
String delimiters: JSON vs. XML
▪ JSON strings are always delimited by double quotes.
▪ XML strings (such as attribute values) may be delimited by
either double quotes or single quotes.
JSON is recursively defined
{
"foo": json-value
}
The above JSON instance is an object. The
object has a single property, "foo". Its value is
any JSON value – recursive definition!
Using JSON you can define arbitrarily complex structures
{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ]
}
}
{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [
{"name":"Dick Grune", "university": "Vrije Universiteit"},
{"name":"Ceriel J.H. Jacobs", "university": "Vrije Universiteit"}
]
}
}
Extend, ad infinitum
{
"Book":
{
"Title": "Parsing Techniques",
"Authors": [
{"name": {"first":"Dick", "last":"Grune"},
"university": "Vrije Universiteit"},
{"name": {"first":"Ceriel", "last":"Jacobs"},
"university": "Vrije Universiteit"}
]
}
}
7 simple JSON components, assemble to
generate unlimited complexity
fals
e
null
true
object array
{ [ json-value, json-value, json-value, … ]
"___": json-value,
string
"___": json-value, "___"
"___": json-value,
…
}
JSON provides the structures and assembly
points, you customize them for your needs
structures
object array
{ [ json-value, json-value, json-value, … ]
"___": json-value,
string
"___": json-value, "___"
"___": json-value,
…
}
JSON provides the structures and assembly
points, you customize them for your needs
assembly points
object array
{ [ json-value, json-value, json-value, … ]
"___": json-value,
string
"___": json-value, "___"
"___": json-value,
…
}
JSON provides the structures and assembly
points, you customize them for your needs
object array
{ [ json-value, json-value, json-value, … ]
"___": json-value,
string
"___": json-value, "___"
"___": json-value,
…
}
customize
Comments not allowed!
▪ You cannot comment a JSON instance document.
▪ There is no syntax for commenting JSON instances.
▪ Bummer.
Summary
In this lesson, you should have learned the:
● XML
○ A “sibling” to HTML, always parsable
○ “Lingua franca” of data: encodes documents and structured data
○ Blends data and schema (structure)
● JSON
○ JavaScript Object Notation
○ Used to format data
○ Commonly used in Web as a vehicle to describe data being sent between systems
● Similarities
○ Both are human readable
○ Both have very simple syntax
○ Both are hierarchical
Summary
In this lesson, you should have learned the:
● Differences
○ JSON includes arrays
○ Names in JSON must not be JavaScript reserved words
○ XML can be validated
● JSON is also a language that you use to create other languages.
● There are many well-known algorithms for processing and traversing trees.
● Both XML and JSON are able to leverage this.
● The accumulated history adds up to nearly three terabytes of compressed JSON and is
growing by four gigabytes a day.
● A JSON array is used to express a list of values. A JSON array contains zero or more
values, separated by comma and wrapped within square brackets