Get Off To A Fast Start With Db2 V9 Purexml, Part 2
Get Off To A Fast Start With Db2 V9 Purexml, Part 2
Get Off To A Fast Start With Db2 V9 Purexml, Part 2
Cynthia M. Saracco
Senior Software Engineer
IBM
02 Mar 2006
The IBM® DB2® V9 for Linux®, UNIX®, and Windows® features significant new
support for storing, managing, and searching XML data, referred to as pureXML®.
This series helps you master these new XML features quickly through several
step-by-step articles that explain how to accomplish fundamental tasks. In this article,
learn how to create database objects for managing your XML data and how to
populate your DB2 database with XML data. [11 Mar 2010: Originally written in 2006,
this article has been updated to include changes in DB2 versions 9.5 and 9.7.--Ed.]
You've probably heard the buzz about DB2's V9 -- IBM's first database management
system to support both tabular (SQL-based) and hierarchical (XML-based) data
structures. If you're curious about DB2's new native support for XML and want to get
off to a fast start, you've come to the right place.
To help you quickly get up to speed on DB2's native XML features, this article walks
through several common tasks, such as:
Future articles will cover other topics, such as querying, updating, and deleting DB2
XML data with SQL, querying DB2 XML data with XQuery, and developing Java
applications and Web components that access DB2 XML data.
To create a new DB2 Unicode test database, open a DB2 command window and
issue a statement specifying a Unicode codeset and a supported territory, as shown
in Listing 1.
Once you create a database, you don't need to issue any special commands or take
any further action to enable DB2 to store XML data in its native hierarchical format.
Your DB2 system is ready to go.
To store XML data, you create tables that contain one or more XML columns. These
tables serve as logical containers for collections of documents. Behind the scenes,
DB2 actually uses a different storage scheme for XML and non-XML data. However,
using tables as a logical object for managing all forms of supported data simplifies
administration and application development issues, particularly when different forms
of data need to be integrated in a single query.
You can define DB2 tables to contain only XML columns, only columns of traditional
SQL types, or a combination of both. This article models the latter. The example in
Listing 2 connects to the test database and creates two tables. The first is an Items
table that tracks information about items for sale and comments that customers have
made about them. The second table tracks information about Clients, including
contact data. Note that Comments and Contactinfo are based on the new DB2 XML
data type, while all other columns in the tables are based on traditional SQL data
types.
connect to test;
create table items (
id int primary key not null,
brandname varchar(30),
itemname varchar(30),
sku int,
srp decimal(7,2),
comments xml
);
create table clients(
id int primary key not null,
name varchar(50),
status varchar(10),
contactinfo xml
);
If you look closely at these table definition examples, you'll notice that neither
specified the internal structure of the XML documents to be stored in the Comments
or Contactinfo columns. This is an important DB2 feature. Users do not need to
pre-define an XML data structure (or, more accurately, an XML schema) in order to
store their data. Indeed, DB2 can store any well-formed XML document in a single
column, which means that XML documents of different schemas (or documents not
associated with any registered schema) can be stored within the same DB2 column.
This article discusses this feature more when it discusses how to store data in DB2.
The option to store smaller XML documents inline was introduced in V9.5. If the XML
document is small enough to fit into the page size, it can be stored with the other
SQL elements. If it is not small enough to fit into a page, it will be stored separately.
Along with the inline keyword, you supply the maximum size of the XML to be
inlined. Base this value on the page size and on the size of the other relational
columns. Listing 3 shows the code snippet to do this:
Listing 3. Creating tables for XML data with the inline option
connect to test;
create table items (
id int primary key not null,
brandname varchar(30),
itemname varchar(30),
sku int,
srp decimal(7,2),
comments xml inline length 10240
);
Creating views
Optionally, you can create views over tables containing XML data, just as you can
create views over tables containing only traditional SQL data types. The example in
Listing 4 creates a view of clients with a Gold status:
Finally, note that you can create specialized indexes on your XML columns to speed
searches of your data. Because this is an introductory article and the sample data is
small, this article will not be covering that topic. However, in production
environments, defining appropriate indexes can be critical to achieving optimal
performance. See Resources for help on how to learn more about DB2's new
indexing technology.
With INSERT, you supply DB2 with the raw XML data directly. That's perhaps
easiest to do if you've written an application and stored the XML data in a variable.
But if you're just getting started with DB2 and don't want to write an application, you
can issue your INSERT statements interactively. (I find it convenient to use the DB2
Command Editor, although you can also use the command line processor, if you'd
prefer.)
To use the DB2 Command Editor, launch the DB2 Control Center. From the Tools
pull-down menu at the top, select Command Editor. A separate window appears,
as shown in Figure 1.
connect to test;
insert into clients values (77, 'John Smith', 'Gold',
'<addr>111 Main St., Dallas, TX, 00112</addr>')
In this case, the input document was quite simple. If the document was large or
complex, it would be impractical to type the XML data into the INSERT statement as
shown. In most cases, you'd write an application to insert the data using a host
variable or a parameter marker. You'll find a brief Java coding example that
accompanies this article. However, this introductory tutorial does not cover
application development topics in detail. Instead, we'll discuss another option for
populating DB2 XML columns with data—using the IMPORT facility.
If you already have your XML data in files, the DB2 IMPORT facility provides a
simple way for you to populate your DB2 tables with this data. You don't need to
write an application. You just need to create a delimited ASCII file containing the
data you want to load into your table. For XML data stored in files, a parameter
specifies the appropriate file names.
You can create the delimited ASCII file using the text editor of your choice. (By
convention, such files are usually of type .del.) Each line in your file represents a row
of data to be imported into your table. If your line contains an XML Data Specifier
(XDS), IMPORT will read the data contained in the referenced XML file and import
that into DB2. For example, the first line in Listing 6 contains information for Ella
Kimpton, including her ID, name, and customer status. Her contact information is
included in the Client3227.xml file.
The content of the Client3227.xml file is shown in Listing 7. The file contains XML
elements for Ella Kimpton's address, phone numbers, fax number, and email.
<?xml version="1.0"?>
<Client
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance
xsi:noNamespaceSchemaLocation="http://bogus">
<Address>
<street>5401 Julio Ave</street>
<city>San Jose</city>
<state>CA</state>
<zip>95116</zip>
</Address>
<phone>
<work>4084630000</work>
<home>4081111111</home>
<cell>4082222222</cell>
</phone>
<fax>4087776666</fax>
<email>love2shop@yahoo.com</email>
</Client>
Perhaps you're curious about importing data if you don't have XML files for all the
rows you wish to insert. That's easy to do. Omit the XDS information from your input
file. For example, the items.del file in Listing 8 omits the name of an XML file for Item
3641 (the Dress to Impress suit). As a result, the XML column for this row will not
contain any data.
items.del file
With your XML files and delimited ASCII files available, you're now ready to use DB2
IMPORT. The statement in Listing 9 imports the contents specified in the clients.del
file into the C:/XMLFILES directory into the clients table.
import from clients.del of del xml from C:/XMLFILES insert into user1.clients;
The clients.del file shown in Listing 6 contains data for six rows, including references
to six XML files. Successfully executing an IMPORT command results in output
similar to Listing 10.
import from clients.del of del xml from C:/XMLFiles insert into saracco.clients
SQL3109N The utility is beginning to load data from file "clients.del".
SQL3110N The utility has completed processing. "6" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "6".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "6" rows were processed from the input file. "6" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 6
Number of rows skipped = 0
Number of rows inserted = 6
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 6
Independent software vendors offer tools to help you convert Microsoft® Word,
Acrobat PDF, and other document formats into XML for import into DB2. See
Resources for more information about ISVs.
The INSERT and IMPORT examples just discussed can write any well-formed XML
data to your tables. They don't validate that data. In other words, they don't verify
that the data conforms to a particular XML schema and therefore adheres to a
certain structure. It is possible to direct DB2 to do that, however. Here is one
approach:
To validate XML data, you need to define an XML schema that specifies acceptable
XML elements, their order and data types, and so on. XML schemas are a W3C
industry standard and are written in XML. While it is beyond the scope of this article
to explain the features of XML schemas, various tutorials are available (see
Resources).
There are many ways to develop XML schemas, ranging from using your favorite
text editor to manually create your schema to using tools to graphically design or
generate a schema. Independent software vendors provide such XML tools, and
IBM also offers XML schema generation support through Java™-integrated
development environments.
For example, with IBM Rational® Application Developer or IBM Rational Software
Architect, you can import an xml file into a Web project. The xml file used in this
example was taken from the customer table in the sample database of DB2.
Right-click the project, and select Generate > XML Schema. This generates a valid
XML schema for your particular input file, as shown in Figure 2 (larger image). You
can then modify the file (if necessary) and register it with DB2.
Assume you need to make your XML schema rather flexible so that you can collect
different types of contact information for different customers. For example, some
customers might provide you with multiple phone numbers or email addresses, while
others might not. The XML schema shown in Listing 11, which was derived from the
schema that IBM Rational Software Architect generated, allows for this flexibility. It
includes additional specifications about the minimum and maximum number of
occurrences (minOccurs and maxOccurs) allowed for a given element. In this case,
the customer isn't required to give you any of the contact information you'd like to
collect. However, if a customer chooses to give you email information, this schema
enables conforming documents to contain up to five email addresses (that is, five
email element values).
<xsd:element name="Client">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="Address" minOccurs="0" maxOccurs="5"/>
<xsd:element ref="phone" minOccurs="0" maxOccurs="1"/>
<xsd:element ref="fax" minOccurs="0" maxOccurs="1"/>
<xsd:element ref="email" minOccurs="0" maxOccurs="5"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="cell type="xsd:string"/>
<xsd:element name="city type="xsd:string"/>
<xsd:element name="email" type="xsd:string"/>
<xsd:element name="fax" type="xsd:string"/>
<xsd:element name="home" type="xsd:string"/>
<xsd:element name="phone">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="work" minOccurs="0"/>
<xsd:element ref="home" minOccurs="0"/>
<xsd:element ref="cell" minOccurs="0"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
. . .
</xsd:schema>
XML schemas also contain type information. The schema shown in Listing 11
specifies that all base elements are treated as strings. However, most production
XML schemas make use of other data types as well, such as integer, decimal, date,
and so on. If you validate XML documents against a given schema as part of your
INSERT or IMPORT operation, DB2 automatically adds type annotations to your XML
documents.
Once you have created an appropriate XML schema, you need to register the
schema with DB2. IBM provides multiple ways to do this. You can launch graphical
wizards from the DB2 Control Center to guide you through the process, invoke
system-supplied stored procedures, or issue DB2 commands directly. For this
example, use the latter method, because it might help you more readily understand
what DB2 is doing behind the scenes on your behalf.
If your schema is very large, you may need to increase your application heap size
before attempting to register it. For example, issue the following statements:
connect to test;
update db cfg using applheapsz 10000;
Next, register your XML schema. If your XML schema does not reference other XML
schemas, you can register and complete the process with a single command.
Otherwise, you need to issue individual commands to register your primary XML
schema, add the other required schemas, and complete the registration process.
When a schema document becomes very large, it's common to divide its content into
multiple files to improve maintenance, readability, and reuse. This is akin to breaking
up a complex application or component into multiple modules. For details on this
topic, refer to the W3C XML Schema primer.
This article uses a simple, independent XML schema. You can register it with DB2
using the following command:
In this example, ClientInfo.xsd is the name of the XML schema file. It is located in
the C:/XMLFiles directory. This XML schema will be registered in DB2's internal
repository under the SQL schema user1 and the XML schema mysample. The
http://mysample.org parameter is just a placeholder in this example. It specifies the
uniform resource indicator (URI) referenced by XML instance documents. Many XML
documents use namespaces, which are specified using a URI. Finally, the
complete clause instructs DB2 to complete the XML schema registration process
so that the schema can be used to validate XML data.
Note that the schema registration process does not involve specifying table columns
to which the schema will be applied. In other words, schemas are not the equivalent
of SQL column constraints. A given schema can validate data for a variety of XML
columns in different tables. However, validation is not automatic. DB2 allows any
well-formed XML document to be stored in an XML column. If you want to validate
your data against a registered schema prior to storage, you need to instruct DB2 to
do so.
With an XML schema created and completely registered in DB2, you're now ready to
have DB2 validate XML data when inserting or importing it into a table. Revisit the
earlier IMPORT scenario with schema validation in mind.
If you've already populated your Clients table, you might find it convenient to delete
its contents or drop and recreate the table. This is only necessary if you plan to add
the same data to the table as you did previously. Recall that clients were defined
with a primary key on the client ID column, so attempting to import duplicate rows
will fail.
To validate the XML data while importing it into the Clients table, use the
XMLVALIDATE clause of DB2 IMPORT. The statement in Listing 14 instructs DB2 to
use your previously registered XML schema (user1.mysample) as the default XDS
(XML Data Specifier) for validating the XML files specified in the clients.del file
before inserting them into the Clients table.
import from clients.del of del xml from C:/XMLFILES xmlvalidate using xds default
user1.mysample insert into user1.clients;
If DB2 determines that an XML document does not conform to the specified schema,
the entire row associated with that document is rejected. Listing15 illustrates sample
output from an IMPORT operation in which one row of six was rejected because its
XML document did not conform to the specified schema.
SQL3149N "6 rows were processed from the input file. "5" rows were
successfully inserted into the table. "1" rows were rejected.
Number of rows read = 6
Number of rows skipped = 0
Number of rows inserted = 5
Number of rows updated = 0
Number of rows rejected = 1
Number or rows committed = 6
Note that XMLVALIDATE can also be used with INSERT statements to instruct DB2
to validate XML data before inserting it. The syntax is similar to the IMPORT example
just shown in that you specify a registered (and completed) XML schema when
invoking the XMLVALIDATE clause. (See "A simple Java example" for more
information.)
Summary
DB2 V9 provides significant new capabilities for supporting XML, including a new
XML data type and underlying engine-level components that automatically store and
process XML data in an efficient manner. To help you get up to speed quickly on
these features, this article described how to create a test database and sample
tables for storing XML documents. It also reviewed how you can populate your
database with XML data. Finally, it summarized DB2's ability to validate XML data
against user-supplied XML schemas and provided examples to show you how to get
started.
Now that you've learned how to store XML data using DB2's native XML capabilities,
you're ready to query that data. You'll see how to do that in subsequent articles,
which will introduce you to DB2's XQuery support and to its XML extensions to SQL
(sometimes called SQL/XML).
Acknowledgments
Thanks to Rav Ahuja, Matthias Nicola, and Gary Robinson for their comments on
this paper.
Downloads
Description Name Size Download
method
Sample code samples.zip 4KB HTTP
Resources
Learn
• Explore all the pieces of this series.
• XML Database - DB2 pureXML Learn more about DB2's XML support.
• "What's new in DB2 Viper: XML to the Core" (developerWorks, February 2006):
Get an overview of the new XML technologies.
• Exegenix offers tools that can help you convert Word, PDF, and other document
formats into XML for import into DB2.
• XML schemas:
• Various tutorials are available on the Web that explain the features of XML
schemas.
• W3C XML Schema primer provides an easily readable description of the XML
Schema facilities and is oriented towards quickly understanding how to create
schemas using the XML Schema language.
• "Firing up the Hybrid Engine" (DB2 Magazine, Quarter 3, 2005): Read more
about IBM's hybrid database management system.
• System RX: One Part Relational, One Part XML (SIGMOD conference, 2005):
Learn about the architecture and design aspects of building a hybrid relational
and XML DBMS.
• "Native XML Support in DB2 Universal Database" (VLDB conference, 2005):
Read more about DB2 XML support.
• "Managing XML for Maximum Return" (IBM, November 2005): This white paper
explores the business benefits of DB2's XML support.
• "Use DB2 native XML with PHP" (developerWorks, October 2005): Compare
and contrast DB2's new XML support with traditional relational database
technology.
• Stay current with developerWorks wiki on periodic pureXML topics given by the
experts .
• Learn about DB2 Express-C, the no-charge version of DB2 Express Edition for
the community.
• Learn more about Information Management at the developerWorks Information
Management zone. Find technical documentation, how-to articles, education,
downloads, product information, and more.
• Stay current with developerWorks technical events and webcasts.