Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

lecture16-xpath-xquery

The document is a lecture on XPath, XQuery, and JSON from a database systems course, covering topics such as querying XML data, XPath expressions, and XQuery syntax. It includes examples of XPath queries, predicates, and the structure of XQuery expressions. Additionally, it discusses the use of functions, aggregates, and the importance of well-formed XML in query results.

Uploaded by

theumesh001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

lecture16-xpath-xquery

The document is a lecture on XPath, XQuery, and JSON from a database systems course, covering topics such as querying XML data, XPath expressions, and XQuery syntax. It includes examples of XPath queries, predicates, and the structure of XQuery expressions. Additionally, it discusses the use of functions, aggregates, and the importance of well-formed XML in query results.

Uploaded by

theumesh001
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Introduction to Database Systems

CSE 414

Lecture 16: XPath, XQuery, JSON

CSE 414 - Spring 2013 1


Announcements
• Next webquiz out now, due Friday night
• Homework 5 (XML/XQuery) out now, due
Wednesday night
• Midterm
– Returned today. Please hold off questions until
tomorrow after you’ve had a chance to review your
work, compare to sample solution, etc.
• (Although we can correct arithmetic bugs right away)
– If we goofed, we’ll fix it!! (But let’s be sure it’s a
goof first)

CSE 414 - Spring 2013 2


Querying XML Data
• XPath = simple navigation

• XQuery = the SQL of XML

• XSLT = recursive traversal


– will not discuss in class

CSE 414 - Spring 2013 3


Sample Data for Queries
<bib>
<book> <publisher> Addison-Wesley </publisher>
<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<title> Foundations of Databases </title>
<year> 1995 </year>
</book>
<book price=“55”>
<publisher> Freeman </publisher>
<author> Jeffrey D. Ullman </author>
<title> Principles of Database and Knowledge Base Systems </title>
<year> 1998 </year>
</book>
</bib>
4
Data Model for XPath
XPath returns a sequence of items. An item is either:
• A value of primitive type, or
• A node (doc, element, or attribute)

The root

bib
The root element
book book

publisher author . . . .
Addison-Wesley Serge Abiteboul
5
XPath: Simple Expressions
/bib/book/year

Result: <year> 1995 </year>


<year> 1998 </year>

/bib/paper/year
Result: empty (there were no papers)

/bib What’s the difference ? /


CSE 414 - Spring 2013 6
XPath: Restricted Kleene Closure
//author
Result:<author> Serge Abiteboul </author>
<author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>
<author> Victor Vianu </author>
<author> Jeffrey D. Ullman </author>

/bib//first-name
Result: <first-name> Rick </first-name>
CSE 414 - Spring 2013 7
XPath: Attribute Nodes
/bib/book/@price
Result: “55”

@price means that price has to be an attribute

CSE 414 - Spring 2013 8


XPath: Wildcard

//author/*

Result: <first-name> Rick </first-name>


<last-name> Hull </last-name>

* Matches any element


@* Matches any attribute

CSE 414 - Spring 2013 9


XPath: Text Nodes
/bib/book/author/text()
Result: Serge Abiteboul
Victor Vianu
Jeffrey D. Ullman

Rick Hull doesn’t appear because he has first-name, last-name

Functions in XPath:
– text() = matches the text value
– node() = matches any node (= * or @* or text())
– name() = returns the name of the current tag

CSE 414 - Spring 2013 10


XPath: Predicates
/bib/book/author[first-name]
Result: <author> <first-name> Rick </first-name>
<last-name> Hull </last-name>
</author>

CSE 414 - Spring 2013 11


XPath: More Predicates
/bib/book/author[first-name][address[.//zip][city]]/last-name
Result: <last-name> … </last-name>
<last-name> … </last-name>

How do we read this ?


First remove all qualifiers (predicates):

/bib/book/author/last-name
Then add them one by one:
/bib/book/author[first-name][address]/last-name
CSE 414 - Spring 2013 12
XPath: More Predicates

/bib/book[@price < 60]

/bib/book[author/@age < 25]

/bib/book[author/text()]

CSE 414 - Spring 2013 13


XPath: Position Predicates

/bib/book[2] The 2nd book

/bib/book[last()] The last book

/bib/book[@year = 1998] [2] The 2nd of all


books in 1998

/bib/book[2][@year = 1998] 2nd book IF it


is in 1998
CSE 414 - Spring 2013 14
XPath: More Axes

. means current node /bib/book[.//review]

/bib/book[./review] Same as /bib/book[review]

/bib/author/. /first-name Same as /bib/author/first-name

CSE 414 - Spring 2013 15


XPath: More Axes

.. means parent node


/bib/author/.. /author/zip Same as /bib/author/zip

/bib/book[.//review/../comments]
Same as
/bib/book[.//*[comments][review]] Hint: don’t use ..

CSE 414 - Spring 2013 16


A Few Extra Examples
Run these examples on the sample xml posted on course website
Follow hw5 instructions

Each line is a separate example:


doc("sample-xml.xml")//book/price
doc("sample-xml.xml")//book[editor]/price
doc("sample-xml.xml")//book[price/text() > 100]/title

CSE 414 - Spring 2013 17


XPath: Summary
bib matches a bib element
* matches any element
/ matches the root element
/bib matches a bib element under root
bib/paper matches a paper in bib
bib//paper matches a paper in bib, at any depth
//paper matches a paper at any depth
paper|book matches a paper or a book
@price matches a price attribute
bib/book/@price matches price attribute in book, in bib
bib/book[@price<“55”]/author/last-name matches…
bib/book[@price<“55” or @price>”99”]/author/last-name matches…
CSE 414 - Spring 2013 18
XQuery
• Standard for high-level querying of databases
containing data in XML form
• Based on Quilt, which is based on XML-QL
• Uses XPath to express more complex queries

CSE 414 - Spring 2013 19


FLWR (“Flower”) Expressions

Zero or more
FOR ...
LET... Zero or more
WHERE...
RETURN... Zero or one

Exactly one

CSE 414 - Spring 2013 20


FOR-WHERE-RETURN
Find all book titles published after 1995:

FOR $x IN doc("bib.xml")/bib/book
WHERE $x/year/text() > 1995
RETURN $x/title
Result:
<title> abc </title>
<title> def </title>
<title> ghi </title>
CSE 414 - Spring 2013 21
FOR-WHERE-RETURN
Equivalently (perhaps more geekish)

FOR $x IN doc("bib.xml")/bib/book[year/text() > 1995] /title


RETURN $x

And even shorter:

doc("bib.xml")/bib/book[year/text() > 1995] /title

CSE 414 - Spring 2013 22


COERCION
The query:

FOR $x IN doc("bib.xml")/bib/book[year > 1995] /title


RETURN $x

Is rewritten by the system into:


FOR $x IN doc("bib.xml")/bib/book[year/text() > 1995] /title
RETURN $x
CSE 414 - Spring 2013 23
FOR-WHERE-RETURN
• Find all book titles and the year when they
were published:
FOR $x IN doc("bib.xml")/ bib/book
RETURN <answer>
<title>{ $x/title/text() } </title>
<year>{ $x/year/text() } </year>
</answer>

Result:
<answer> <title> abc </title> <year> 1995 </ year > </answer>
<answer> <title> def </title> < year > 2002 </ year > </answer>
<answer> <title> ghk </title> < year > 1980 </ year > </answer>
24
FOR-WHERE-RETURN
• Notice the use of “{“ and “}”
• What is the result without them ?
FOR $x IN doc("bib.xml")/ bib/book
RETURN <answer>
<title> $x/title/text() </title>
<year> $x/year/text() </year>
</answer>
<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
<answer> <title> $x/title/text() </title> <year> $x/year/text() </year> </answer>
CSE 414 - Spring 2013 25
Nesting
• For each author of a book by Morgan
Kaufmann, list all books he/she published:
FOR $b IN doc(“bib.xml”)/bib,
$a IN $b/book[publisher /text()=“Morgan Kaufmann”]/author
RETURN <result>
{ $a,
FOR $t IN $b/book[author/text()=$a/text()]/title
RETURN $t
}
</result>

In the RETURN clause comma concatenates XML fragments


26
Result

<result>
<author>Jones</author>
<title> abc </title>
<title> def </title>
</result>
<result>
<author> Smith </author>
<title> ghi </title>
</result>
CSE 414 - Spring 2013 27
Aggregates
Find all books with more than 3 authors:

FOR $x IN doc("bib.xml")/bib/book
WHERE count($x/author)>3
RETURN $x

count = a function that counts


avg = computes the average
sum = computes the sum
distinct-values = eliminates duplicates

CSE 414 - Spring 2013 28


Aggregates
Same thing:

FOR $x IN doc("bib.xml")/bib/book[count(author)>3]
RETURN $x

CSE 414 - Spring 2013 29


Eliminating Duplicates
Print all authors:

FOR $a IN distinct-values($b/book/author/text())
RETURN <author> { $a } </author>

Note: distinct-values applies ONLY to values, NOT elements

CSE 414 - Spring 2013 30


The LET Clause
Find books whose price is larger than average:

FOR $b in doc(“bib.xml”)/bib
LET $a:=avg($b/book/price/text())
FOR $x in $b/book
WHERE $x/price/text() > $a
RETURN $x
LET enables us to declare variables

CSE 414 - Spring 2013 31


Flattening
Compute a list of (author, title) pairs
Input:
<book>
<title> Databases </title>
<author> Widom </author>
<author> Ullman </author> FOR $b IN doc("bib.xml")/bib/book,
</book> $x IN $b/title/text(),
Output: $y IN $b/author/text()
<answer> RETURN <answer>
<title> Databases </title> <title> { $x } </title>
<author> Widom </author> <author> { $y } </author>
</answer>
</answer>
<answer>
<title> Databases </title>
<author> Ullman </author>
</answer> CSE 414 - Spring 2013 32
Re-grouping
For each author, return all titles of her/his books

Result:
FOR $b IN doc("bib.xml")/bib, <answer>
$x IN $b/book/author/text() <author> efg </author>
RETURN <title> abc </title>
<answer> <title> klm </title>
<author> { $x } </author> ....
{ FOR $y IN $b/book[author/text()=$x]/title </answer>
RETURN $y } What about
</answer> duplicate
authors ?
CSE 414 - Spring 2013 33
Re-grouping
Same, but eliminate duplicate authors:
FOR $b IN doc("bib.xml")/bib
LET $a := distinct-values($b/book/author/text())
FOR $x IN $a
RETURN
<answer>
<author> $x </author>
{ FOR $y IN $b/book[author/text()=$x]/title
RETURN $y }
</answer>
CSE 414 - Spring 2013 34
Re-grouping
Same thing:

FOR $b IN doc("bib.xml")/bib,
$x IN distinct-values($b/book/author/text())
RETURN
<answer>
<author> $x </author>
{ FOR $y IN $b/book[author/text()=$x]/title
RETURN $y }
</answer>

CSE 414 - Spring 2013 35


SQL and XQuery Side-by-side
Product(pid, name, maker, price) Find all product names, prices,
sort by price

SELECT x.name, FOR $x in doc(“db.xml”)/db/Product/row


x.price ORDER BY $x/price/text()
FROM Product x RETURN <answer>
ORDER BY x.price { $x/name, $x/price }
</answer>

SQL
XQuery

CSE 414 - Spring 2013 36


XQuery’s Answer

<answer>
<name> abc </name>
<price> 7 </price>
</answer>
<answer>
<name> def </name>
<price> 23 </price> Notice: this is NOT a
</answer> well-formed document !
.... (WHY ???)

CSE 414 - Spring 2013 37


Producing a Well-Formed Answer

<aQuery>
{ FOR $x in doc(“db.xml”)/db/Product/row
ORDER BY $x/price/text()
RETURN <answer>
{ $x/name, $x/price }
</answer>
}
</aQuery>

CSE 414 - Spring 2013 38


XQuery’s Answer

<aQuery>
<answer>
<name> abc </name>
Now it is well-formed !
<price> 7 </price>
</answer>
<answer>
<name> def </name>
<price> 23 </price>
</answer>
....
</aQuery>
CSE 414 - Spring 2013 39
SQL and XQuery Side-by-side
Product(pid, name, maker, price)
Company(cid, name, city, revenues) Find all products made in Seattle
FOR $r in doc(“db.xml”)/db,
$x in $r/Product/row,
SELECT x.name $y in $r/Company/row
FROM Product x, Company y WHERE
WHERE x.maker=y.cid $x/maker/text()=$y/cid/text()
and y.city=“Seattle” and $y/city/text() = “Seattle”
RETURN { $x/name }
SQL XQuery

FOR $y in /db/Company/row[city/text()=“Seattle”],
Cool $x in /db/Product/row[maker/text()=$y/cid/text()]
XQuery RETURN { $x/name } 40
<product>
<row> <pid> 123 </pid>
<name> abc </name>
<maker> efg </maker>
</row>
<row> …. </row>

</product>
<product>
...
</product>
....

CSE 414 - Spring 2013 41


SQL and XQuery Side-by-side
For each company with revenues < 1M count the products over $100
SELECT y.name, count(*)
FROM Product x, Company y
WHERE x.price > 100 and x.maker=y.cid and y.revenue < 1000000
GROUP BY y.cid, y.name

FOR $r in doc(“db.xml”)/db,
$y in $r/Company/row[revenue/text()<1000000]
RETURN
<proudCompany>
<companyName> { $y/name/text() } </companyName>
<numberOfExpensiveProducts>
{ count($r/Product/row[maker/text()=$y/cid/text()][price/text()>100])}
</numberOfExpensiveProducts>
</proudCompany> 42
SQL and XQuery Side-by-side
Find companies with at least 30 products, and their average price
SELECT y.name, avg(x.price)
FROM Product x, Company y
WHERE x.maker=y.cid
GROUP BY y.cid, y.name An element
HAVING count(*) > 30
FOR $r in doc(“db.xml”)/db,
$y in $r/Company/row
LET $p := $r/Product/row[maker/text()=$y/cid/text()]
WHERE count($p) > 30
A collection RETURN
<theCompany>
<companyName> { $y/name/text() }
</companyName>
<avgPrice> avg($p/price/text()) </avgPrice>
</theCompany> 43
XML Summary
• Stands for eXtensible Markup Language
1. Advanced, self-describing file format
2. Based on a flexible, semi-structured data model

• Query languages for XML


– XPath
– XQuery

CSE 414 - Spring 2013 44


Beyond XML: JSON
• JSON stands for “JavaScript Object Notation”
– Lightweight text-data interchange format
– Language independent
– “Self-describing" and easy to understand

• JSON is quickly replacing XML for


– Data interchange
– Representing and storing semi-structure data

CSE 414 - Spring 2013 45


JSON
Example from: http://www.jsonexample.com/
myObject = {
"first": "John",
"last": "Doe",
"salary": 70000,
"registered": true,
"interests": [ "Reading", “Biking”, "Hacking" ]
}

CSE 414 - Spring 2013 46

You might also like