Advanced Database Topics: Querying XML With Xpath and Xquery
Advanced Database Topics: Querying XML With Xpath and Xquery
Topics
XPath 1.0 Predicates XPath Nodes & Axes XPath 2.0 XQuery Element Construction with XQuery
XPath 1.0
Hierarchy Navigation
XPath uses / to navigate down the tree Selects a nodeset a set of nodes, not just a single node Absolute Navigation
Start XPath expression with / Starts at root of document
Relative Navigation
Doesn't start with / Starts at the context (i.e. current node)
CS 779 Spring 2005 Ellis Cohen, 2002-2005 6
Simple Navigation
/CourseBooks
The coursebooks element(s) of the root
/CourseBooks/Book
All the book element children of the coursebooks child of the root
/CourseBooks/Book/Author
All the authors of all the books
Author
All the author children of the context node
./Author
Same, since . means the context node
Navigation to Descendants
//Author
All author descendents of the root
.//Author
All author descendents of the context node
//Author/Name
The names (i.e. child name elements) of all the authors
/CourseBooks//Author
All author descendents of coursebooks
CourseBooks
Course
Book
Book
"CS779"
Author
Author
"Williams"
Author
Author
"Williams"
10
//Author/*
All child elements of all the authors
//Book//*
All descendent elements of all the books
11
Predicates
12
Positional Predicates
//Book/Author[1]
1st author of each book
//Book/Author[3]/Address[1]
First address of the 3rd author of each book (ignores books that don't have 3 authors)
(//Book/Author)[1]
The 1st of all the book authors
13
(//Author)[1]
The 1st of all the authors
(//Author)[3]/Address[1]
The 1st address of the 3rd author
14
Comparative Predicates
//Author[Lastname="Cohen"] //Author[./Lastname="Cohen"]
Authors whose lastname is Cohen Note: Lastname is evaluated in the context of each author Note: Lastname is a node; by comparing it to a string, automatically use its string-value (for simple elements, its contents)
//Book[Author/Lastname="Cohen"]
Books with an author whose lastname is Cohen
//Book[.//Lastname="Cohen"]
Same, if lastnames are only children of authors
//Book/Author[starts-with(Lastname,"C")]
Book authors whose lastname starts with C
CS 779 Spring 2005 Ellis Cohen, 2002-2005 15
Comparison Problem
What's the difference between
//Book/Author[Lastname = "Cohen"]
and
//Book[Author/Lastname = "Cohen"]/Author
16
Comparison Answer
//Book/Author[Lastname = "Cohen"] -- Book authors whose last name is Cohen //Book[Author/Lastname = "Cohen"] -- Books who have an author whose last name is Cohen //Book[Author/Lastname = "Cohen"]/Author -- Authors of Books who have an author whose last name is Cohen If a book has two authors, Kelly and Cohen, then the first expression will just include the Author node for Cohen, while the second expression will include the author nodes for both Kelly and Cohen
CS 779 Spring 2005 Ellis Cohen, 2002-2005 17
//Author[Lastname="Cohen"]/..
Function-Based Predicates
//Book/Author[2] //Book/Author[position()=2]
The second author of each book
//Book/Author[last()] //Book/Author[position()=last()]
The last author of each book
//Book/Author[position()<3]
The first 2 authors of each book
(//book/author[1..2] or //book/author[(1,2)] are not legal!)
//Book[count(Author)>2]
Books with 3 or more authors
//Book[count(Author)>2]/Author
The authors of those books
CS 779 Spring 2005 Ellis Cohen, 2002-2005 19
Aggregate Functions
count(//Book[.//Lastname="Cohen"])
# of books authored by Cohen
//Book[Price = min(//Book/Price)]
Books whose price is equal to the minimum book price What is //Book[Price > avg(//Book/Price)]/Price
20
What query would find all books that have more than one author named Cohen? Hint: The answer is of the form: //Book[ count( something ) > 1 ]
CS 779 Spring 2005 Ellis Cohen, 2002-2005 21
22
//Book[Price>50]/Author | //Book/Author[count(Address)>1]
Authors of books whose price is more than $50 or who have more than one address
23
Multiple Predicates
//Book[(count(Author)>2) and (Price>50)] //Book[count(Author)>2][Price>50] Books that have more than 2 authors and that cost more than $50 //Book[Price > 50][position() < 11] (//Book[Price > 50])[position() < 11] The first 10 books that cost more than $50
25
//Book[Author/Lastname="Cohen"]
Books with some author whose lastname is Cohen
//Book[Author/Lastname!="Cohen"]
Books with some author whose lastname is not Cohen
//Book[not(Author/Lastname="Cohen")]
Books that do not have some author whose lastname is Cohen What is //Book[not(Author/Lastname!="Cohen")]
CS 779 Spring 2005 Ellis Cohen, 2002-2005 26
Every Author
//Book[not(Author/Lastname!="Cohen")]
Books that do not have some author whose lastname is not Cohen That is: Books whose authors' lastnames are all Cohen
27
//Book[Author/Lastname!=("Cohen","Jones")]
book who have some author whose lastname is neither Cohen nor Jones
What do you think the meaning of this is: //Book[Publisher!="WroxPress"] [Author/Lastname= //Book[Publisher="WroxPress"]/Author/Lastname]
28
29
30
32
Attributes
//Book[@isbn]
All books with an isbn attribute
//Book[Author/@status="deceased"]
All books with a deceased author
//Book[@*]
All books that have some attribute
//Book[not(@*)]
All books that have no attributes
//Book/@id
The id attributes of all books. WHOA! Those aren't element nodes!
CS 779 Spring 2005 Ellis Cohen, 2002-2005 33
34
Attribute Nodes
//Book/@isbn
The isbn attribute nodes of all the books
//Book/@*
All attributes of all books
//id()
All ID-type attribute nodes
//id("here")
The ID-type attribute node named "here"
//Book[id("curbook")]
The book with an ID of "curbook"
What is the result of //Book[starts-with(.//@authid, "fr_")]
CS 779 Spring 2005 Ellis Cohen, 2002-2005 35
Attribute Solution
What is the result of //Book[starts-with(.//@authid, "fr_")] Well, assuming that authors have an authid attribute, and that authid attributes start with a prefix which indicates the author's nationality ("fr_" meaning French), This identifies all books with a French author.
36
Text Nodes
Book
Title
Description
"XML Stuff"
"It's "
em
" cool"
Text Node
"way"
37
(//Book)[1]/Title/text()
Would correspond to the text node containng: "XML Stuff"
(//Book)[1]/Description
Might correspond to the element node <Description> It's <em>way</em> cool </Description>
(//Book)[1]/Description/node()
This returns a set of the 3 child nodes: a text node, an element node, and another text node
(//Book)[1]/Description/*
Would correspond to the one child element node: <em>way</em>
(//Book)[1]/Description/text()
Would correspond to the two text nodes for "It's " and " cool"
"way"
(//Book)[1]/Description/text() Would correspond to the two child text nodes for "It's " and " cool" (//Book)[1]/Description//text() Would correspond to the three descendent text nodes for "It's ", "way" and " cool"
CS 779 Spring 2005 Ellis Cohen, 2002-2005 39
followingsibling self
child
preceding
attribute namespace
following
descendant
40
41
42
Uses of XPath
To specify queries To specify which set of elements need to have unique values, be keys, or contain keyrefs for XML Schema To identify sets of nodes to be formatted or transformed by XSLT To identify the parts of documents to be hyperlinked using XPointer
43
Authlist
Booklist
Author
Book
name
address
dob
authid
Authref
title
publisher
44
Cross Referencing
In BookDB, Find the title of books whose author is Williams
//Book[@authref = //Author[@name="Williams"]/@authid] /@title
46
47
Early OPath
Filters Employee[empno = 3417] Employee[job = 'ANALYST']
The employee whose empno is 3417 The employees who are analysts
"Collecting" Navigation Employee[dept.dname = 'RESEARCH'] Dept[dname = 'RESEARCH'].empls Employee[job = 'CLERK'].dept Dept[empls.job = 'CLERK']
Departments that have clerks Employees in the research department
Initial versions of OPath (Microsoft's language for querying in object models) used the same syntax as XPath. The next iteration replaced the "/" (standard in the web-based world) with "." (standard in the OO world). OPath has since evolved farther from XPath.
CS 779 Spring 2005 Ellis Cohen, 2002-2005 48
XPath 2.0
49
XPath 2.0
All primitive XML Schema Datatypes Sequences of nodes and/or primitive values
Ordered (not always document order), allow duplicates (use distinct-values/distinct-nodes fns). XPath 1.0 expressions always in document order with duplicates removed Flattened (no sequences of sequences, though a node can represent an arbitrary hierarchy) No difference between a single node/value and a singleton sequence
Ellis Cohen, 2002-2005 51
Iteration
XPath 1.0 Expressions
( <Price>32.95</Price>, <Price>18.25</Price>, )
52
Obtaining Values
for $x in //Book return $x/Price/text()
returns a sequence of the prices as text equivalent to //Book/Price/text()
( 32.95, 18.25, )
CS 779 Spring 2005 Ellis Cohen, 2002-2005 53
Conditional Expressions
for $x in //Book return if (count($x/Author) > 2) then $x/Price * .5 else $x/Price Note the resulting sequence has both price nodes and numbers
CS 779 Spring 2005 Ellis Cohen, 2002-2005 54
Note: () denotes the empty sequence The final result is the concatenation of the sequences returned for each iteration of the for loop. Concatenating the empty sequence has no effect
CS 779 Spring 2005 Ellis Cohen, 2002-2005 56
Quantified Expressions
//Book[some $a in Author satisfies starts-with($a/Lastname,"C")]
Books with some author whose lastname starts with C
Note: This will include authors who are not authors of any books. How could this be fixed?
58
books with an author whose lastname is Cohen (assuming that authors only appear as children of books). A book that has two authors whose lastnames are both Cohen will appear twice. Duplicate nodes are only eliminated automatically in XPath 1.0 expressions.
59
XQuery
60
FLWOR Expressions
for let where
Optional Any number of these (at least one) in any order
order by result
CS 779 Spring 2005
Required
61
Variable Binding
for $x in //Book let $p := $x/Price return number($p)
( 32.95, 18.25, ) Let is a binding operator. There is no assignment operator.
CS 779 Spring 2005 Ellis Cohen, 2002-2005 62
Where Clause
for $x in //Book let $p := $x/Price where $p > 5.00 return $p equivalent to //Book[Price > 5.00]/Price
63
XQuery Problem
Suppose that Author is a subelement of Book, and that Author has a Name element, as well as an authid attribute which uniquely identifies the author. What's the clearest XQuery expression which returns the string values of the names of authors who have authored more than one book
64
XQuery Solution
What's the clearest XQuery expression which returns the string values of the names of authors who have authored more than one book for $a in //Author let $abooks = //Book[Author/@authid=$a/@authid] where count($abook) > 1 return string($a/Name)
65
Ordering
for $b in //Book[Price > 100] order by $b/Author[1]/Name, $b/Price descending return string($b/Title) Return a set of the string values of the titles of all the books whose price is greater than $100 Order them First, by the name of the first author Secondly, by price, highest price first
CS 779 Spring 2005 Ellis Cohen, 2002-2005 66
User-Defined Functions
declare function depth($e as node) as xs:integer {
XQueryX
The XQuery syntax is not XML-Based XQueryX has the same semantics as XQuery, but it is XML-Based. Too large & ugly to include See http://w3.org/TR/xqueryx
68
69
Element Construction
<somenum>20</somenum> <somenum>20</somenum>
This doesn't generate text It generates a somenum element
70
Expression Substitution
let $s := 20 return <Somenum>$s</Somenum> <Somenum>$s</Somenum> let $s := 20 return <Somenum>{ $s }</Somenum> <Somenum>20</Somenum>
71
72
Nested Substitution
<Result> { for $j in (3, 1, 2) return <Val>{ $j }</Val> } </Result> <Result> <Val>3</Val> <Val>1</Val> <Val>2</Val> </Result>
CS 779 Spring 2005 Ellis Cohen, 2002-2005 75
Calculated Selection
<Names> { let $nms := distinct-values( //Author/Name[starts-with(.,"John")] ) for $j in (3, 1, 3) return <Name>{ $nms[$j] }</Name> } </Names> <Names> <Name>John YaYa</Name> <Name>John Doe</Name> <Name>John YaYa</Name> </Names>
CS 779 Spring 2005 Ellis Cohen, 2002-2005 77
78