XML Language Path
XML Language Path
Version 1.0
W3C Recommendation 16 November 1999 (Status
updated October 2016)
This version:
http://www.w3.org/TR/1999/REC-xpath-19991116
(available in XML or HTML)
1 Introduction
XPath is the result of an effort to provide a common syntax and semantics for
functionality shared between XSL Transformations [XSLT] and
XPointer [XPointer]. The primary purpose of XPath is to address parts of an
XML [XML] document. In support of this primary purpose, it also provides
basic facilities for manipulation of strings, numbers and booleans. XPath uses
a compact, non-XML syntax to facilitate use of XPath within URIs and XML
attribute values. XPath operates on the abstract, logical structure of an XML
document, rather than its surface syntax. XPath gets its name from its use of
a path notation as in URLs for navigating through the hierarchical structure of
an XML document.
In addition to its use for addressing, XPath is also designed so that it has a
natural subset that can be used for matching (testing whether or not a node
matches a pattern); this use of XPath is described in XSLT.
XPath models an XML document as a tree of nodes. There are different types
of nodes, including element nodes, attribute nodes and text nodes. XPath
defines a way to compute a string-value for each type of node. Some types of
nodes also have names. XPath fully supports XML Namespaces [XML
Names]. Thus, the name of a node is modeled as a pair consisting of a local
part and a possibly null namespace URI; this is called an expanded-name.
The data model is described in detail in [5 Data Model].
The context position is always less than or equal to the context size.
In the following grammar, the non-terminals QName and NCName are defined
in [XML Names], and S is defined in [XML]. The grammar uses the same
EBNF notation as [XML] (except that grammar symbols always have initial
capital letters).
Expressions are parsed by first dividing the character string to be parsed into
tokens and then parsing the resulting sequence of tokens. Whitespace can be
freely used between tokens. The tokenization process is described in [3.7
Lexical Structure].
2 Location Paths
Although location paths are not the most general grammatical construct in the
language (a LocationPath is a special case of an Expr), they are the most
important construct and will therefore be described first.
Every location path can be expressed using a straightforward but rather
verbose syntax. There are also a number of syntactic abbreviations that allow
common cases to be expressed concisely. This section will explain the
semantics of location paths using the unabbreviated syntax. The abbreviated
syntax will then be explained by showing how it expands into the
unabbreviated syntax (see [2.5 Abbreviated Syntax]).
Here are some examples of location paths using the unabbreviated syntax:
There are two kinds of location path: relative location paths and absolute
location paths.
Location Paths
[1] LocationPath ::= RelativeLocationPath
| AbsoluteLocationPath
| AbbreviatedAbsoluteLocationPath
| AbbreviatedRelativeLocationPath
The syntax for a location step is the axis name and node test separated by a
double colon, followed by zero or more expressions each in square brackets.
For example, in child::para[position()=1], child is the name of the
axis, para is the node test and [position()=1] is a predicate.
The node-set selected by the location step is the node-set that results from
generating an initial node-set from the axis and node-test, and then filtering
that node-set by each of the predicates in turn.
The initial node-set consists of the nodes having the relationship to the
context node specified by the axis, and having the node type and expanded-
name specified by the node test. For example, a location
step descendant::para selects the para element descendants of the context
node: descendant specifies that each node in the initial node-set must be a
descendant of the context; para specifies that each node in the initial node-set
must be an element named para. The available axes are described in [2.2
Axes]. The available node tests are described in [2.3 Node Tests]. The
meaning of some node tests is dependent on the axis.
The initial node-set is filtered by the first predicate to generate a new node-
set; this new node-set is then filtered using the second predicate, and so on.
The final node-set is the node-set selected by the location step. The axis
affects how the expression in each predicate is evaluated and so the
semantics of a predicate is defined with respect to an axis. See [2.4
Predicates].
Location Steps
[4] Step ::= AxisSpecifier NodeTest Predicate*
| AbbreviatedStep
| AbbreviatedAxisSpecifier
2.2 Axes
NOTE: The ancestor, descendant, following, preceding and self axes partition a
document (ignoring attribute and namespace nodes): they do not overlap and
together they contain all the nodes in the document.
Axes
[6] AxisName ::= 'ancestor'
| 'ancestor-or-self'
| 'attribute'
| 'child'
| 'descendant'
| 'descendant-or-self'
| 'following'
| 'following-sibling'
| 'namespace'
| 'parent'
| 'preceding'
| 'preceding-sibling'
| 'self'
Every axis has a principal node type. If an axis can contain elements, then
the principal node type is element; otherwise, it is the type of the nodes that
the axis can contain. Thus,
A node test that is a QName is true if and only if the type of the node (see [5
Data Model]) is the principal node type and has an expanded-name equal to
the expanded-name specified by the QName. For
example, child::para selects the para element children of the context node; if
the context node has no para children, it will select an empty set of
nodes. attribute::href selects the href attribute of the context node; if the
context node has no href attribute, it will select an empty set of nodes.
A node test * is true for any node of the principal node type. For
example, child::* will select all element children of the context node,
and attribute::* will select all attributes of the context node.
A node test can have the form NCName:*. In this case, the prefix is expanded
in the same way as with a QName, using the context namespace
declarations. It is an error if there is no namespace declaration for the prefix in
the expression context. The node test will be true for any node of the principal
type whose expanded-name has the namespace URI to which the prefix
expands, regardless of the local part of the name.
The node test text() is true for any text node. For example, child::text() will
select the text node children of the context node. Similarly, the node
test comment() is true for any comment node, and the node test processing-
instruction() is true for any processing instruction. The processing-
instruction() test may have an argument that is Literal; in this case, it is true
for any processing instruction that has a name equal to the value of
the Literal.
A node test node() is true for any node of any type whatsoever.
2.4 Predicates
An axis is either a forward axis or a reverse axis. An axis that only ever
contains the context node or nodes that are after the context node
in document order is a forward axis. An axis that only ever contains the
context node or nodes that are before the context node in document order is a
reverse axis. Thus, the ancestor, ancestor-or-self, preceding, and preceding-
sibling axes are reverse axes; all other axes are forward axes. Since the self
axis always contains at most one node, it makes no difference whether it is a
forward or reverse axis. The proximity position of a member of a node-set
with respect to an axis is defined to be the position of the node in the node-set
ordered in document order if the axis is a forward axis and ordered in reverse
document order if the axis is a reverse axis. The first position is 1.
Predicates
[8] Predicate ::= '[' PredicateExpr ']'
The most important abbreviation is that child:: can be omitted from a location
step. In effect, child is the default axis. For example, a location
path div/para is short for child::div/child::para.
NOTE: The location path //para[1] does not mean the same as the location
path /descendant::para[1]. The latter selects the first descendant para element;
the former selects all descendant para elements that are the first para children
of their parents.
and so will select all para descendant elements of the context node.
Abbreviations
[10] AbbreviatedAbsoluteLocationPath ::= '//' RelativeLocationPath
| '..'
3 Expressions
3.1 Basics
| Literal
| Number
| FunctionCall
3.3 Node-sets
A location path can be used as an expression. The expression returns the set
of nodes selected by the path.
The | operator computes the union of its operands, which must be node-sets.
Predicates are used to filter expressions in the same way that they are used in
location paths. It is an error if the expression to be filtered does not evaluate
to a node-set. The Predicate filters the node-set with respect to the child axis.
| FilterExpr
| FilterExpr Predicate
3.4 Booleans
An object of type boolean can have one of two values, true and false.
If both objects to be compared are node-sets, then the comparison will be true
if and only if there is a node in the first node-set and a node in the second
node-set such that the result of performing the comparison on the string-
values of the two nodes is true. If one object to be compared is a node-set
and the other is a number, then the comparison will be true if and only if there
is a node in the node-set such that the result of performing the comparison on
the number to be compared and on the result of converting the string-value of
that node to a number using the number function is true. If one object to be
compared is a node-set and the other is a string, then the comparison will be
true if and only if there is a node in the node-set such that the result of
performing the comparison on the string-value of the node and the other string
is true. If one object to be compared is a node-set and the other is a boolean,
then the comparison will be true if and only if the result of performing the
comparison on the boolean and on the result of converting the node-set to a
boolean using the boolean function is true.
NOTE: If $x is bound to a node-set, then $x="foo" does not mean the same
as not($x!="foo"): the former is true if and only if some node in $x has the
string-value foo; the latter is true if and only if all nodes in $x have the string-
value foo.
NOTE: The effect of the above grammar is that the order of precedence is
(lowest precedence first):
or
and
=, !=
<=, <, >=, >
and the operators are all left associative. For example, 3 > 2 > 1 is equivalent
to (3 > 2) > 1, which evaluates to false.
3.5 Numbers
The mod operator returns the remainder from a truncating division. For
example,
5 mod 2 returns 1
5 mod -2 returns 1
-5 mod 2 returns -1
-5 mod -2 returns -1
| '-' UnaryExpr
3.6 Strings
The following special tokenization rules must be applied in the order specified
to disambiguate the ExprToken grammar:
| NameTest
| NodeType
| Operator
| FunctionName
| AxisName
| Literal
| Number
| VariableReference
| '.' Digits
| MultiplyOperator
| '/' | '//' | '|' | '+' | '-' | '=' | '!=' | '<' | '<=' | '>' | '>='
| QName
| 'text'
| 'processing-instruction'
| 'node'
The last function returns a number equal to the context size from the
expression evaluation context.
The position function returns a number equal to the context position from the
expression evaluation context.
The count function returns the number of nodes in the argument node-set.
The id function selects elements by their unique ID (see [5.2.1 Unique IDs]).
When the argument to id is of type node-set, then the result is the union of the
result of applying id to the string-value of each of the nodes in the argument
node-set. When the argument to id is of any other type, the argument is
converted to a string as if by a call to the string function; the string is split into
a whitespace-separated list of tokens (whitespace is any sequence of
characters matching the production S); the result is a node-set containing the
elements in the same document as the context node that have a unique ID
equal to any of the tokens in the list.
The local-name function returns the local part of the expanded-name of the
node in the argument node-set that is first in document order. If the argument
node-set is empty or the first node has no expanded-name, an empty string is
returned. If the argument is omitted, it defaults to a node-set with the context
node as its only member.
NOTE: The string returned by the name function will be the same as the
string returned by the local-name function except for element nodes and
attribute nodes.
4.2 String Functions
If the argument is omitted, it defaults to a node-set with the context node as its
only member.
NOTE: The string function is not intended for converting numbers into strings
for presentation to users. The format-number function and xsl:number element
in [XSLT] provide this functionality.
The contains function returns true if the first argument string contains the
second argument string, and otherwise returns false.
The substring-after function returns the substring of the first argument string
that follows the first occurrence of the second argument string in the first
argument string, or the empty string if the first argument string does not
contain the second argument string. For example, substring-
after("1999/04/01","/") returns 04/01, and substring-
after("1999/04/01","19") returns 99/04/01.
The substring function returns the substring of the first argument starting at
the position specified in the second argument with length specified in the third
argument. For example, substring("12345",2,3) returns "234". If the third
argument is not specified, it returns the substring starting at the position
specified in the second argument and continuing to the end of the string. For
example, substring("12345",2) returns "2345".
More precisely, each character in the string (see [3.6 Strings]) is considered
to have a numeric position: the position of the first character is 1, the position
of the second character is 2 and so on.
The string-length returns the number of characters in the string (see [3.6
Strings]). If the argument is omitted, it defaults to the context node converted
to a string, in other words the string-value of the context node.
The translate function returns the first argument string with occurrences of
characters in the second argument string replaced by the character at the
corresponding position in the third argument string. For
example, translate("bar","abc","ABC") returns the string BAr. If there is a
character in the second argument string with no character at a corresponding
position in the third argument string (because the second argument string is
longer than the third argument string), then occurrences of that character in
the first argument string are removed. For example, translate("--
aaa--","abc-","ABC") returns "AAA". If a character occurs more than once in the
second argument string, then the first occurrence determines the replacement
character. If the third argument string is longer than the second argument
string, then excess characters are ignored.
NOTE: The translate function is not a sufficient solution for case conversion
in all languages. A future version of XPath may provide additional functions for
case conversion.
4.3 Boolean Functions
The not function returns true if its argument is false, and false otherwise.
The lang function returns true or false depending on whether the language of
the context node as specified by xml:lang attributes is the same as or is a
sublanguage of the language specified by the argument string. The language
of the context node is determined by the value of the xml:lang attribute on the
context node, or, if the context node has no xml:lang attribute, by the value of
the xml:lang attribute on the nearest ancestor of the context node that has
an xml:lang attribute. If there is no such attribute, then lang returns false. If
there is such an attribute, then lang returns true if the attribute value is equal
to the argument ignoring case, or if there is some suffix starting with - such
that the attribute value is equal to the argument ignoring that suffix of the
attribute value and ignoring case. For example, lang("en") would return true if
the context node is any of these five elements:
<para xml:lang="en"/>
<div xml:lang="en"><para/></div>
<para xml:lang="EN"/>
<para xml:lang="en-us"/>
4.4 Number Functions
If the argument is omitted, it defaults to a node-set with the context node as its
only member.
NOTE: The number function should not be used for conversion of numeric
data occurring in an element in an XML document unless the element is of a
type that represents numeric data in a language-neutral format (which would
typically be transformed into a language-specific format for presentation to a
user). In addition, the number function cannot be used unless the language-
neutral format used by the element is consistent with the XPath syntax for
a Number.
The sum function returns the sum, for each node in the argument node-set, of
the result of converting the string-values of the node to a number.
Function: number floor(number)
The floor function returns the largest (closest to positive infinity) number that
is not greater than the argument and that is an integer.
The ceiling function returns the smallest (closest to negative infinity) number
that is not less than the argument and that is an integer.
The round function returns the number that is closest to the argument and
that is an integer. If there are two such numbers, then the one that is closest
to positive infinity is returned. If the argument is NaN, then NaN is returned. If
the argument is positive infinity, then positive infinity is returned. If the
argument is negative infinity, then negative infinity is returned. If the argument
is positive zero, then positive zero is returned. If the argument is negative
zero, then negative zero is returned. If the argument is less than zero, but
greater than or equal to -0.5, then negative zero is returned.
NOTE: For these last two cases, the result of calling the round function is not
the same as the result of adding 0.5 and then calling the floor function.
5 Data Model
XPath operates on an XML document as a tree. This section describes how
XPath models an XML document as a tree. This model is conceptual only and
does not mandate any particular implementation. The relationship of this
model to the XML Information Set [XML Infoset] is described in [B XML
Information Set Mapping].
root nodes
element nodes
text nodes
attribute nodes
namespace nodes
processing instruction nodes
comment nodes
NOTE: For element nodes and root nodes, the string-value of a node is not
the same as the string returned by the DOM nodeValue method (see [DOM]).
Root nodes and element nodes have an ordered list of child nodes. Nodes
never share children: if one node is not the same node as another node, then
none of the children of the one node will be the same node as any of the
children of another node. Every node other than the root node has exactly
one parent, which is either an element node or the root node. A root node or
an element node is the parent of each of its child nodes. The descendants of
a node are the children of the node and the descendants of the children of the
node.
The root node is the root of the tree. A root node does not occur except as the
root of the tree. The element node for the document element is a child of the
root node. The root node also has as children processing instruction and
comment nodes for processing instructions and comments that occur in the
prolog and after the end of the document element.
NOTE: In the notation of Appendix A.3 of [XML Names], the local part of the
expanded-name corresponds to the type attribute of the ExpEType element; the
namespace URI of the expanded-name corresponds to the ns attribute of
the ExpEType element, and is null if the ns attribute of the ExpEType element is
omitted.
The children of an element node are the element nodes, comment nodes,
processing instruction nodes and text nodes for its content. Entity references
to both internal and external entities are expanded. Character references are
resolved.
NOTE: If a document does not have a DTD, then no element in the document
will have a unique ID.
5.3 Attribute Nodes
Each element node has an associated set of attribute nodes; the element is
the parent of each of these attribute nodes; however, an attribute node is not
a child of its parent element.
NOTE: This is different from the DOM, which does not treat the element
bearing an attribute as the parent of the attribute (see [DOM]).
Elements never share attribute nodes: if one element node is not the same
node as another element node, then none of the attribute nodes of the one
element node will be the same node as the attribute nodes of another element
node.
NOTE: The = operator tests whether two nodes have the same
value, not whether they are the same node. Thus attributes of two different
elements may compare as equal using =, even though they are not the same
node.
Some attributes, such as xml:lang and xml:space, have the semantics that they
apply to all elements that are descendants of the element bearing the
attribute, unless overridden with an instance of the same attribute on another
descendant element. However, this does not affect where attribute nodes
appear in the tree: an element has attribute nodes only for attributes that were
explicitly specified in the start-tag or empty-element tag of that element or that
were explicitly declared in the DTD with a default value.
An attribute node has an expanded-name and a string-value. The expanded-
name is computed by expanding the QName specified in the tag in the XML
document in accordance with the XML Namespaces Recommendation [XML
Names]. The namespace URI of the attribute's name will be null if
the QName of the attribute does not have a prefix.
NOTE: In the notation of Appendix A.3 of [XML Names], the local part of the
expanded-name corresponds to the name attribute of the ExpAName element; the
namespace URI of the expanded-name corresponds to the ns attribute of
the ExpAName element, and is null if the ns attribute of the ExpAName element is
omitted.
Each element has an associated set of namespace nodes, one for each
distinct namespace prefix that is in scope for the element (including
the xml prefix, which is implicitly declared by the XML Namespaces
Recommendation [XML Names]) and one for the default namespace if one is
in scope for the element. The element is the parent of each of these
namespace nodes; however, a namespace node is not a child of its parent
element. Elements never share namespace nodes: if one element node is not
the same node as another element node, then none of the namespace nodes
of the one element node will be the same node as the namespace nodes of
another element node. This means that an element will have a namespace
node:
for every attribute on the element whose name starts with xmlns:;
for every attribute on an ancestor element whose name
starts xmlns: unless the element itself or a nearer ancestor redeclares
the prefix;
for an xmlns attribute, if the element or some ancestor has
an xmlns attribute, and the value of the xmlns attribute for the nearest
such element is non-empty
There is a comment node for every comment, except for any comment that
occurs within the document type declaration.
The string-value of comment is the content of the comment not including the
opening <!-- or the closing -->.
Each character within a CDATA section is treated as character data. Thus, <!
[CDATA[<]]> in the source document will treated the same as <. Both will
result in a single < character in a text node in the tree. Thus, a CDATA section
is treated as if the <![CDATA[ and ]]> were removed and every occurrence
of < and & were replaced by < and & respectively.
NOTE: When a text node that contains a < character is written out as XML,
the < character must be escaped by, for example, using <, or including it in
a CDATA section.
6 Conformance
XPath is intended primarily as a component that can be used by other
specifications. Therefore, XPath relies on specifications that use XPath (such
as [XPointer] and [XSLT]) to specify criteria for conformance of
implementations of XPath and does not define any conformance criteria for
independent implementations of XPath.
A References
A.1 Normative References
IEEE 754
Institute of Electrical and Electronics Engineers. IEEE Standard for
Binary Floating-Point Arithmetic. ANSI/IEEE Std 754-1985.
RFC2396
T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource
Identifiers (URI): Generic Syntax. IETF RFC 2396.
See http://www.ietf.org/rfc/rfc2396.txt.
XML
World Wide Web Consortium. Extensible Markup Language (XML)
1.0. W3C Recommendation. See http://www.w3.org/TR/1998/REC-xml-
19980210
XML Names
World Wide Web Consortium. Namespaces in XML. W3C
Recommendation. See http://www.w3.org/TR/REC-xml-names
A.2 Other References
Character Model
World Wide Web Consortium. Character Model for the World Wide
Web. W3C Working Draft. See http://www.w3.org/TR/WD-charmod
DOM
World Wide Web Consortium. Document Object Model (DOM) Level 1
Specification. W3C Recommendation. See http://www.w3.org/TR/REC-
DOM-Level-1
JLS
J. Gosling, B. Joy, and G. Steele. The Java Language Specification.
See http://java.sun.com/docs/books/jls/index.html.
ISO/IEC 10646
ISO (International Organization for Standardization). ISO/IEC 10646-
1:1993, Information technology -- Universal Multiple-Octet Coded
Character Set (UCS) -- Part 1: Architecture and Basic Multilingual
Plane. International Standard. See http://www.iso.ch/cate/d18741.html.
TEI
C.M. Sperberg-McQueen, L. Burnard Guidelines for Electronic Text
Encoding and Interchange. See http://etext.virginia.edu/TEI.html.
Unicode
Unicode Consortium. The Unicode Standard.
See http://www.unicode.org/unicode/standard/standard.html.
XML Infoset
World Wide Web Consortium. XML Information Set. W3C Working Draft.
See http://www.w3.org/TR/xml-infoset
XPointer
World Wide Web Consortium. XML Pointer Language (XPointer). W3C
Working Draft. See http://www.w3.org/TR/WD-xptr
XQL
J. Robie, J. Lapp, D. Schach. XML Query Language (XQL).
See http://www.w3.org/TandS/QL/QL98/pp/xql.html
XSLT
World Wide Web Consortium. XSL Transformations (XSLT). W3C
Recommendation. See http://www.w3.org/TR/xslt
NOTE: A new version of the XML Information Set Working Draft, which will
replace the May 17 version, was close to completion at the time when the
preparation of this version of XPath was completed and was expected to be
released at the same time or shortly after the release of this version of XPath.
The mapping is given for this new version of the XML Information Set Working
Draft. If the new version of the XML Information Set Working has not yet been
released, W3C members may consult the internal Working Group
version http://www.w3.org/XML/Group/1999/09/WD-xml-infoset-
19990915.html (members only).
The root node comes from the document information item. The children
of the root node come from the children and children -
comments properties.
An element node comes from an element information item. The children
of an element node come from the children and children -
comments properties. The attributes of an element node come from
the attributes property. The namespaces of an element node come from
the in-scope namespaces property. The local part of the expanded-
name of the element node comes from the local name property. The
namespace URI of the expanded-name of the element node comes
from the namespace URI property. The unique ID of the element node
comes from the children property of the attribute information item in
the attributes property that has an attribute type property equal to ID.
An attribute node comes from an attribute information item. The local
part of the expanded-name of the attribute node comes from the local
name property. The namespace URI of the expanded-name of the
attribute node comes from the namespace URI property. The string-
value of the node comes from concatenating the character
code property of each member of the children property.
A text node comes from a sequence of one or more consecutive
character information items. The string-value of the node comes from
concatenating the character code property of each of the character
information items.
A processing instruction node comes from a processing instruction
information item. The local part of the expanded-name of the node
comes from the target property. (The namespace URI part of
the expanded-name of the node is null.) The string-value of the node
comes from the content property. There are no processing instruction
nodes for processing instruction items that are children of document
type declaration information item.
A comment node comes from a comment information item. The string-
value of the node comes from the content property. There are no
comment nodes for comment information items that are children of
document type declaration information item.
A namespace node comes from a namespace declaration information
item. The local part of the expanded-name of the node comes from
the prefix property. (The namespace URI part of the expanded-name of
the node is null.) The string-value of the node comes from
the namespace URI property.