Xmlstarlet Command Line XML Toolkit User'S Guide: Mikhail Grushinskiy
Xmlstarlet Command Line XML Toolkit User'S Guide: Mikhail Grushinskiy
User’s Guide
Mikhail Grushinskiy
XmlStarlet Command Line XML Toolkit User’s Guide
by Mikhail Grushinskiy
Table of Contents
1. Introduction............................................................................................................................................ 1
1.1. About XmlStarlet ........................................................................................................................ 1
1.2. Main Features.............................................................................................................................. 1
1.3. Supported Platforms.................................................................................................................... 2
2. Installation.............................................................................................................................................. 3
2.1. Installation on Linux ................................................................................................................... 3
2.2. Installation on Solaris.................................................................................................................. 3
2.3. Installation on MacOS X............................................................................................................. 3
2.4. Installation on Windows.............................................................................................................. 3
3. Getting Started....................................................................................................................................... 4
3.1. Basic Command-Line Options.................................................................................................... 4
3.2. Studying Structure of XML Document....................................................................................... 4
4. XmlStarlet Reference ............................................................................................................................ 7
4.1. Querying XML documents ......................................................................................................... 7
4.2. Transforming XML documents................................................................................................. 14
4.3. Editing XML documents........................................................................................................... 15
4.4. Validating XML documents ...................................................................................................... 19
4.5. Formatting XML documents..................................................................................................... 20
4.6. Canonicalization of XML documents ....................................................................................... 22
4.7. XML and PYX format .............................................................................................................. 24
4.8. Escape/Unescape special XML characters................................................................................ 25
4.9. List directory as XML............................................................................................................... 27
5. Common problems............................................................................................................................... 28
5.1. Namespaces and default namespace ......................................................................................... 28
5.2. Special characters...................................................................................................................... 29
5.3. Sorting....................................................................................................................................... 30
5.4. Validation .................................................................................................................................. 30
iii
Chapter 1. Introduction
This set of command line utilities can be used by those who deal with many XML documents on UNIX
shell command prompt as well as for automated XML processing with shell scripts.
XMLStarlet command line utility is written in C and uses libxml2 and libxslt from http://xmlsoft.org/.
Implementation of extensive choice of options for XMLStarlet utility was only possible because of rich
feature set of libxml2 and libxslt (many thanks to the developers of those libraries for great work).
’diff’ and ’patch’ options are not currently implemented. Other features need some work too. Please, send
an email to the project administrator (see http://sourceforge.net/projects/xmlstar/) if you wish to help.
XMLStarlet is linked statically to both libxml2 and libxslt, so generally all you need to process XML
documents is one executable file. To run XmlStarlet utility you can simple type ’xml’ on command line
and see list of options available.
XMLStarlet is open source freeware under MIT license which allows free use and distribution for both
commercial and non-commercial projects.
We welcome any user’s feedback on this project which would greatly help us to improve its quality.
Comments, suggestions, feature requests, bug reports can be done via SourceForge project web site (see
XMLStarlet Sourceforge forums (http://sourceforge.net/forum/?group_id=66612), or XMLStarlet
mailing list (http://lists.sourceforge.net/lists/listinfo/xmlstar-devel/))
• Check or validate XML files (simple well-formedness check, DTD, XSD, RelaxNG)
• Calculate values of XPath expressions on XML files (such as running sums, etc)
• Search XML files for matches to given XPath expressions
• Apply XSLT stylesheets to XML documents (including EXSLT support, and passing parameters to
stylesheets)
1
Chapter 1. Introduction
• Query XML documents (ex. query for value of some elements of attributes, sorting, etc)
• Modify or edit XML documents (ex. delete some elements)
• Format or "beautify" XML documents (as changing indentation, etc)
• Fetch XML documents using http:// or ftp:// URLs
• Browse tree structure of XML documents (in similar way to ’ls’ command for directories)
• Include one XML document into another using XInclude
• XML c14n canonicalization
• Escape/unescape special XML characters in input text
• Print directory as XML document
• Convert XML into PYX format (based on ESIS - ISO 8879), and vice versa
• Linux
• Solaris
• Windows
• MacOS X
• FreeBSD
• HP-UX
2
Chapter 2. Installation
rpm -i xmlstarlet-x.x.x-1.i386.rpm
gunzip xmlstarlet-x.x.x-sol8-sparc-local.gz
pkgadd -d xmlstarlet-x.x.x-sol8-sparc-local all
3
Chapter 3. Getting Started
bash-2.03$ xml
XMLStarlet Toolkit: Command line utilities for XML
Usage: xml [<options>] <command> [<cmd-options>]
where <command> is one of:
ed (or edit) - Edit/Update XML document(s)
sel (or select) - Select data or query XML document(s) (XPATH, etc)
tr (or transform) - Transform XML document(s) using XSLT
val (or validate) - Validate XML document(s) (well-formed/DTD/XSD/RelaxNG)
fo (or format) - Format XML document(s)
el (or elements) - Display element structure of XML document
c14n (or canonic) - XML canonicalization
ls (or list) - List directory as XML
esc (or escape) - Escape special XML characters
unesc (or unescape) - Unescape special XML characters
pyx (or xmln) - Convert XML into PYX format (based on ESIS - ISO 8879)
p2x (or depyx) - Convert PYX into XML
<options> are:
--version - show version
--help - show help
Wherever file name mentioned in command help it is assumed
that URL can be used instead as well.
<xml>
<table>
<rec id="1">
<numField>123</numField>
4
Chapter 3. Getting Started
<stringField>String Value</stringField>
</rec>
<rec id="2">
<numField>346</numField>
<stringField>Text Value</stringField>
</rec>
<rec id="3">
<numField>-23</numField>
<stringField>stringValue</stringField>
</rec>
</table>
</xml>
xml el table.xml
xml
xml/table
xml/table/rec
xml/table/rec/numField
xml/table/rec/stringField
xml/table/rec
xml/table/rec/numField
xml/table/rec/stringField
xml/table/rec
xml/table/rec/numField
xml/table/rec/stringField
Every line in this output is an XPath expression which indicates a ’path’ to elements in XML document.
You would use these XPath expressions to navigate through your XML documents in other XmlStarlet
options.
XML documents can be pretty large but with a very simple structure. (This is espesially true for data
driven XML documents ex: XML formatted result of select from SQL table). If you just interested in
structure but not order of the elements you can use -u switch combined with ’el’ option.
EXAMPLE:
xml el -u table.xml
Output:
xml
xml/table
xml/table/rec
xml/table/rec/numField
xml/table/rec/stringField
5
Chapter 3. Getting Started
If you are interested not just in elements of your XML document, but you want to see attributes as well
you can use -a switch with ’el’ option. And every line of the output will still be a valid XPath expression.
EXAMPLE:
xml el -a table.xml
Output:
xml
xml/table
xml/table/rec
xml/table/rec/@id
xml/table/rec/numField
xml/table/rec/stringField
xml/table/rec
xml/table/rec/@id
xml/table/rec/numField
xml/table/rec/stringField
xml/table/rec
xml/table/rec/@id
xml/table/rec/numField
xml/table/rec/stringField
If you are looking for attribute values as well use -v switch of ’el’ option. And again - every line of
output is a valid XPath expression.
EXAMPLE:
xml el -v table.xml
Output:
xml
xml/table
xml/table/rec[@id=’1’]
xml/table/rec/numField
xml/table/rec/stringField
xml/table/rec[@id=’2’]
xml/table/rec/numField
xml/table/rec/stringField
xml/table/rec[@id=’3’]
xml/table/rec/numField
xml/table/rec/stringField
6
Chapter 4. XmlStarlet Reference
<global-options> are:
-C or --comp - display generated XSLT
-R or --root - print root element <xsl-select>
-T or --text - output is text (default is XML)
-I or --indent - indent output
-D or --xml-decl - do not omit xml declaration line
-B or --noblanks - remove insignificant spaces from XML tree
-N <name>=<value> - predefine namespaces (name without ’xmlns:’)
ex: xsql=urn:oracle-xsql
Multiple -N options are allowed.
--net - allow fetch DTDs or entities over network
--help - display help
7
Chapter 4. XmlStarlet Reference
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:call-template name="t1"/>
<xsl:call-template name="t2"/>
</xsl:template>
<xsl:template name="t1">
<xsl:copy-of select="xpath0"/>
<xsl:for-each select="xpath1">
<xsl:for-each select="xpath2">
<xsl:value-of select="xpath3"/>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
<xsl:template name="t2">
<xsl:for-each select="xpath4">
<xsl:copy-of select="xpath5"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
’select’ option allows you basically avoid writting XSLT stylesheet to perform some queries on XML
documents. I.e. various combinations of command line parameters will let you to generate XSLT
stylesheet and apply in to XML documents with a single command line. Very often you do not really
care what XSLT was created for you ’select’ command, but in those cases when you do; you can always
use -C or --comp switch which will let you see exactly which XSLT is applied to your input.
Here are few examples which will help to understand how ’xml select’ works:
EXAMPLE:
8
Chapter 4. XmlStarlet Reference
Input (table.xml):
<xml>
<table>
<rec id="1">
<numField>123</numField>
<stringField>String Value</stringField>
</rec>
<rec id="2">
<numField>346</numField>
<stringField>Text Value</stringField>
</rec>
<rec id="3">
<numField>-23</numField>
<stringField>stringValue</stringField>
</rec>
</table>
</xml>
Output:
Let’s take a close look what it did internally. For that we will use ’-C’ option
9
Chapter 4. XmlStarlet Reference
<xsl:call-template name="t1"/>
</xsl:template>
<xsl:template name="t1">
<xsl:value-of select="count(/xml/table/rec/numField)"/>
</xsl:template>
</xsl:stylesheet>
Every -t option is mapped into XSLT template. Options after ’-t’ are mapped into XSLT elements:
• -v to <xsl:value-of>
• -c to <xsl:copy-of>
• -e to <xsl:element>
• -a to <xsl:attribute>
• -s to <xsl:sort>
• -m to <xsl:for-each>
• -i to <xsl:if>
• and so on
By default subsequent options (for instance: -m) will result in nested corresponding XSLT elements
(<xsl:for-each> for ’-m’). To break this nesting you would have to put ’-b’ or ’--break’ after first ’-m’.
EXAMPLE
Count all nodes in XML documents. Print input name and node count after it.
Output:
10
Chapter 4. XmlStarlet Reference
xml/table.xml 32
xml/tab-obj.xml 41
EXAMPLE
Result output:
xml/tab-obj.xml
EXAMPLE
Result output:
1000
EXAMPLE
Result Output:
<xml><child data="value"/></xml>
EXAMPLE
11
Chapter 4. XmlStarlet Reference
Result Output:
3|-23|stringValue
2|346|Text Value
1|123|String Value
Equivalent stylesheet
EXAMPLE
Input (xsql/jobserve.xsql)
$ cat xsql/jobserve.xsql
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="jobserve.xsl"?>
<xsql:query connection="jobs" xmlns:xsql="urn:oracle-xsql" max-rows="5">
SELECT substr(title,1,26) short_title, title, location, skills
FROM job
WHERE UPPER(title) LIKE ’%ORACLE%’
ORDER BY first_posted DESC
</xsql:query>
Result output
12
Chapter 4. XmlStarlet Reference
EXAMPLE
Print structure of XML element using xml sel (advanced XPath expressions and xml sel command usage)
Input (xml/structure.xml)
<a1>
<a11>
<a111>
<a1111/>
</a111>
<a112>
<a1121/>
</a112>
</a11>
<a12/>
<a13>
<a131/>
</a13>
</a1>
Result Output:
a1
a1.a11
a1.a11.a111
a1.a11.a111.a1111
a1.a11.a112
a1.a11.a112.a1121
a1.a12
a1.a13
a1.a13.a131
13
Chapter 4. XmlStarlet Reference
<xsl:value-of select="’.’"/>
</xsl:if>
</xsl:for-each>
<xsl:value-of select="’ ’"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
EXAMPLE
Sample output
14
Chapter 4. XmlStarlet Reference
EXAMPLE:
Input xsl/params1.xsl
Output
Count=3
15
Chapter 4. XmlStarlet Reference
<global-options> are:
-P (or --pf) - preserve original formatting
-S (or --ps) - preserve non-significant spaces
-O (or --omit-decl) - omit XML declaration (<?xml ...?>)
-N <name>=<value> - predefine namespaces (name without ’xmlns:’)
ex: xsql=urn:oracle-xsql
Multiple -N options are allowed.
-N options must be last global options.
--help or -h - display help
where <action>
-d or --delete <xpath>
-i or --insert <xpath> -t (--type) elem|text|attr -n <name> -v (--value) <value>
-a or --append <xpath> -t (--type) elem|text|attr -n <name> -v (--value) <value>
-s or --subnode <xpath> -t (--type) elem|text|attr -n <name> -v (--value) <value>
-m or --move <xpath1> <xpath2>
-r or --rename <xpath1> -v <new-name>
-u or --update <xpath> -v (--value) <value>
-x (--expr) <xpath> (-x is not implemented yet)
EXAMPLE:
Input
<xml>
<table>
<rec id="1">
<numField>123</numField>
<stringField>String Value</stringField>
</rec>
<rec id="2">
<numField>346</numField>
<stringField>Text Value</stringField>
</rec>
<rec id="3">
16
Chapter 4. XmlStarlet Reference
<numField>-23</numField>
<stringField>stringValue</stringField>
</rec>
</table>
</xml>
Output
<xml>
<table>
<rec id="1">
<numField>123</numField>
<stringField>String Value</stringField>
</rec>
<rec id="3">
<numField>-23</numField>
<stringField>stringValue</stringField>
</rec>
</table>
</xml>
EXAMPLE
Output
<x id="1">
<a>
<b/>
</a>
</x>
EXAMPLE
# Rename attributes
xml ed -r "//*/@id" -v ID xml/tab-obj.xml
Output:
<xml>
<table>
<rec ID="1">
<numField>123</numField>
<stringField>String Value</stringField>
<object name="Obj1">
<property name="size">10</property>
<property name="type">Data</property>
</object>
17
Chapter 4. XmlStarlet Reference
</rec>
<rec ID="2">
<numField>346</numField>
<stringField>Text Value</stringField>
</rec>
<rec ID="3">
<numField>-23</numField>
<stringField>stringValue</stringField>
</rec>
</table>
</xml>
EXAMPLE
# Rename elements
xml ed -r "/xml/table/rec" -v record xml/tab-obj.xml
Output:
<xml>
<table>
<record id="1">
<numField>123</numField>
<stringField>String Value</stringField>
<object name="Obj1">
<property name="size">10</property>
<property name="type">Data</property>
</object>
</record>
<record id="2">
<numField>346</numField>
<stringField>Text Value</stringField>
</record>
<record id="3">
<numField>-23</numField>
<stringField>stringValue</stringField>
</record>
</table>
</xml>
EXAMPLE
Output:
<xml>
<table>
<rec id="1">
18
Chapter 4. XmlStarlet Reference
<numField>123</numField>
<stringField>String Value</stringField>
<object name="Obj1">
<property name="size">10</property>
<property name="type">Data</property>
</object>
</rec>
<rec id="2">
<numField>346</numField>
<stringField>Text Value</stringField>
</rec>
<rec id="5">
<numField>-23</numField>
<stringField>stringValue</stringField>
</rec>
</table>
</xml>
EXAMPLE
Output:
<xml>
<table>
<rec id="1">
<numField>0</numField>
<stringField>String Value</stringField>
<object name="Obj1">
<property name="size">10</property>
<property name="type">Data</property>
</object>
</rec>
<rec id="2">
<numField>346</numField>
<stringField>Text Value</stringField>
</rec>
<rec id="3">
<numField>-23</numField>
<stringField>stringValue</stringField>
</rec>
</table>
</xml>
19
Chapter 4. XmlStarlet Reference
NOTE: XML Schemas are not fully supported yet due to its incomplete
support in libxml (see http://xmlsoft.org)
EXAMPLE
Output:
EXAMPLE
Output:
xml/tab-obj.xml
1
20
Chapter 4. XmlStarlet Reference
EXAMPLE
Output:
<xml>
<table>
<rec id="1">
<numField>123</numField>
<stringField>String Value</stringField>
<object name="Obj1">
<property name="size">10</property>
<property name="type">Data</property>
</object>
</rec>
<rec id="2">
<numField>346</numField>
<stringField>Text Value</stringField>
</rec>
<rec id="3">
<numField>-23</numField>
<stringField>stringValue</stringField>
</rec>
</table>
</xml>
EXAMPLE
21
Chapter 4. XmlStarlet Reference
Input:
<test_output>
<test_name>foo</testname>
<subtest>...</subtest>
</test_output>
Output:
<test_output>
<test_name>foo</test_name>
<subtest>...</subtest>
</test_output>
EXAMPLE
# XML canonicalization
xml c14n --with-comments ../examples/xml/structure.xml ; echo $?
22
Chapter 4. XmlStarlet Reference
Input ../examples/xml/structure.xml
<a1>
<a11>
<a111>
<a1111/>
</a111>
<a112>
<a1121/>
</a112>
</a11>
<a12/>
<a13>
<a131/>
</a13>
</a1>
Output
<a1>
<a11>
<a111>
<a1111></a1111>
</a111>
<a112>
<a1121></a1121>
</a112>
</a11>
<a12></a12>
<a13>
<a131></a131>
</a13>
</a1>
0
EXAMPLE
Input
../examples/xml/c14n.xml
<n0:pdu xmlns:n0=’http://a.example.com’>
<n1:elem1 xmlns:n1=’http://b.example’>
content
</n1:elem1>
</n0:pdu>
../examples/xml/c14n.xpath
23
Chapter 4. XmlStarlet Reference
Output
<n1:elem1 xmlns:n1="http://b.example">
content
</n1:elem1>
XMLStarlet Toolkit: Convert XML into PYX format (based on ESIS - ISO 8879)
Usage: xml pyx {<xml-file>}
where
<xml-file> - input XML document file name (stdin is used if missing)
EXAMPLE
Input (input.xml)
<books>
<book type=’hardback’>
<title>Atlas Shrugged</title>
<author>Ayn Rand</author>
<isbn id=’1’>0525934189</isbn>
</book>
</books>
24
Chapter 4. XmlStarlet Reference
Output
(books
-\n
(book
Atype hardback
-\n
(title
-Atlas Shrugged
)title
-\n
(author
-Ayn Rand
)author
-\n
(isbn
Aid 1
-0525934189
)isbn
-\n
)book
-\n
)books
PYX is a line oriented format for XML files which can be helpful (and very efficient) when used in
combination with regular line oriented UNIX command such as sed, grep, awk.
’depyx’ option is used for conversion back from PYX into XML.
EXAMPLE (Delete all attributes). This should work really fast for very large XML documents.
Output
<books>
<book>
<title>Atlas Shrugged</title>
<author>Ayn Rand</author>
<isbn>0525934189</isbn>
</book>
</books>
Here is an article which describes how PYX format can be used to grep XML.
http://www-106.ibm.com/developerworks/xml/library/x-matters17.html (???)
25
Chapter 4. XmlStarlet Reference
EXAMPLE
Input
<a1>
<a11>
<a111>
<a1111/>
</a111>
<a112>
<a1121/>
</a112>
</a11>
<a12/>
<a13>
<a131/>
</a13>
</a1>
Output
<a1>
<a11>
<a111>
<a1111/>
</a111>
<a112>
<a1121/>
</a112>
</a11>
<a12/>
<a13>
26
Chapter 4. XmlStarlet Reference
<a131/>
</a13>
</a1>
EXAMPLE
xml ls
Output
<xml>
<d a="rwxr-xr-x" acc="2004.02.13 00:06:03" mod="2004.02.13 00:06:00" sz="4096" n="."/>
<d a="rwxr-xr-x" acc="2004.02.12 23:54:35" mod="2004.02.13 00:00:09" sz="4096" n=".."/>
<f a="rw-r--r--" acc="2004.02.12 23:54:58" mod="2004.02.12 23:54:58" sz="0" n="resume.xm
<f a="rw-r--r--" acc="2004.02.12 23:54:58" mod="2004.02.12 23:54:58" sz="0" n="resume-20
<d a="rwxr-xr-x" acc="2004.02.13 00:04:52" mod="2004.02.13 00:04:52" sz="4096" n="old-resum
</xml>
27
Chapter 5. Common problems
For example the following XHTML document has a default namespace declaration
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Query Page</title>
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Script-Type" content="text/javascript" />
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<meta name="robots" content="noindex,nofollow" />
</head>
<body>
...
</body>
</html>
And the following (initially looking correct) query to print all links
would return nothing. The issue with this query is that it is not addressing element <a> in the right
namespace. XPath requires all namespaces used in XPath expression be defined. So for declared
namespace <html xmlns="http://www.w3.org/1999/xhtml"> in input XML, you have to do same for
XPath (or XSLT). There is another important detail: namespace equivalency is determined not by
namespace prefix, but by URI. See query below, which would return expected result
28
Chapter 5. Common problems
Delete namespace declarations and all elements from non default namespace from the following XML
document:
Command:
Output
<doc>
<A>test</A>
<B/>
</doc>
You should not forget about the fact that your command lines are executed by shell and shell does
substitutions of its special characters too. So for example, one may ask:
29
Chapter 5. Common problems
The answer lies in the way shell substitues ’foo’, which simply becomes foo before the command is run.
So the correct way to write that would be
Another example involves XML special characters. Question: How to search for ' in text nodes?
5.3. Sorting
Let’s take a look at XSLT produced by the following ’xml sel’ command:
-s option of ’xml sel’ command controls ’order’, ’data-type’, and ’case-order’ attributes of <xsl:sort/>
element .
30
Chapter 5. Common problems
5.4. Validation
Many questions are asked about XSD (XML schema) validation. Well, XmlStarlet relies on libxml2
which has incomplete support for XML schemas. Untill it is done in libxml2 it will not be in XmlStarlet.
31