Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Commit 113bb9b

Browse files
committed
XML conversion utility, requires expat library.
John Gray
1 parent d4cafeb commit 113bb9b

File tree

8 files changed

+764
-1
lines changed

8 files changed

+764
-1
lines changed

contrib/README

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
The PostgreSQL contrib tree
3-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
3+
---------------------------
44

55
This subtree contains tools, modules, and examples that are not
66
maintained as part of the core PostgreSQL system, mainly because
@@ -177,3 +177,7 @@ userlock -
177177
vacuumlo -
178178
Remove orphaned large objects
179179
by Peter T Mount <peter@retep.org.uk>
180+
181+
xml -
182+
Storing XML in PostgreSQL
183+
by John Gray <jgray@beansindustry.co.uk>

contrib/xml/Makefile

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
#-------------------------------------------------------------------------
2+
#
3+
# Makefile--
4+
# Adapted from tutorial makefile
5+
#-------------------------------------------------------------------------
6+
7+
subdir = contrib/xml
8+
top_builddir = ../..
9+
include $(top_builddir)/src/Makefile.global
10+
11+
override CFLAGS+= $(CFLAGS_SL)
12+
13+
14+
#
15+
# DLOBJS is the dynamically-loaded object files. The "funcs" queries
16+
# include CREATE FUNCTIONs that load routines from these files.
17+
#
18+
DLOBJS= pgxml$(DLSUFFIX)
19+
20+
21+
QUERIES= pgxml.sql
22+
23+
all: $(DLOBJS) $(QUERIES)
24+
25+
# Requires the expat library
26+
27+
%.so: %.o
28+
$(CC) -shared -lexpat -o $@ $<
29+
30+
31+
%.sql: %.source
32+
if [ -z "$$USER" ]; then USER=$$LOGNAME; fi; \
33+
if [ -z "$$USER" ]; then USER=`whoami`; fi; \
34+
if [ -z "$$USER" ]; then echo 'Cannot deduce $$USER.'; exit 1; fi; \
35+
rm -f $@; \
36+
C=`pwd`; \
37+
sed -e "s:_CWD_:$$C:g" \
38+
-e "s:_OBJWD_:$$C:g" \
39+
-e "s:_DLSUFFIX_:$(DLSUFFIX):g" \
40+
-e "s/_USER_/$$USER/g" < $< > $@
41+
42+
clean:
43+
rm -f $(DLOBJS) $(QUERIES)

contrib/xml/README

+78
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
This package contains a couple of simple routines for hooking the
2+
expat XML parser up to PostgreSQL. This is a work-in-progress and all
3+
very basic at the moment (see the file TODO for some outline of what
4+
remains to be done).
5+
6+
At present, two functions are defined, one which checks
7+
well-formedness, and the other which performs very simple XPath-type
8+
queries.
9+
10+
Prerequisite:
11+
12+
expat parser 1.95.0 or newer (http://expat.sourceforge.net)
13+
14+
I used a shared library version -I'm sure you could use a static
15+
library if you wished though. I had no problems compiling from source.
16+
17+
Function documentation and usage:
18+
---------------------------------
19+
20+
pgxml_parse(text) returns bool
21+
parses the provided text and returns true or false if it is
22+
well-formed or not. It returns NULL if the parser couldn't be
23+
created for any reason.
24+
25+
pgxml_xpath(text doc, text xpath, int n) returns text
26+
parses doc and returns the cdata of the nth occurence of
27+
the "XPath" listed. See below for details on the syntax.
28+
29+
30+
Example:
31+
32+
Given a table docstore:
33+
34+
Attribute | Type | Modifier
35+
-----------+---------+----------
36+
docid | integer |
37+
document | text |
38+
39+
containing documents such as (these are archaeological site
40+
descriptions, in case anyone is wondering):
41+
42+
<?XML version="1.0"?>
43+
<site provider="Foundations" sitecode="ak97" version="1">
44+
<name>Church Farm, Ashton Keynes</name>
45+
<invtype>watching brief</invtype>
46+
<location scheme="osgb">SU04209424</location>
47+
</site>
48+
49+
one can type:
50+
51+
select docid,
52+
pgxml_xpath(document,'/site/name',1) as sitename,
53+
pgxml_xpath(document,'/site/location',1) as location
54+
from docstore;
55+
56+
and get as output:
57+
58+
docid | sitename | location
59+
-------+-----------------------------+------------
60+
1 | Church Farm, Ashton Keynes | SU04209424
61+
2 | Glebe Farm, Long Itchington | SP41506500
62+
(2 rows)
63+
64+
65+
"XPath" syntax supported
66+
------------------------
67+
68+
At present it only supports paths of the form:
69+
'tag1/tag2' or '/tag1/tag2'
70+
71+
The first case will find any <tag2> within a <tag1>, the second will
72+
find any <tag2> within a <tag1> at the top level of the document.
73+
74+
The real XPath is much more complex (see TODO file).
75+
76+
77+
John Gray <jgray@azuli.co.uk> 26 July 2001
78+

contrib/xml/TODO

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
PGXML TODO List
2+
===============
3+
4+
Some of these items still require much more thought! The data model
5+
for XML documents and the parsing model of expat don't really fit so
6+
well with a standard SQL model.
7+
8+
1. Generalised XML parsing support
9+
10+
Allow a user to specify handlers (in any PL) to be used by the parser.
11+
This must permit distinct sets of parser settings -user may want some
12+
documents in a database to parsed with one set of handlers, others
13+
with a different set.
14+
15+
i.e. the pgxml_parse function would take as parameters (document,
16+
parsername) where parsername was the identifier for a collection of
17+
handler etc. settings.
18+
19+
"Stub" handlers in the pgxml code would invoke the functions through
20+
the standard fmgr interface. The parser interface would define the
21+
prototype for these functions. How does the handler function know
22+
which document/context has resulted it in being called?
23+
24+
Mechanism for defining collection of parser settings (in a table? -but
25+
maybe copied for efficiency into a structure when first required by a
26+
query?)
27+
28+
2. Support for other parsers
29+
30+
Expat may not be the best choice as a parser because a new parser
31+
instance is needed for each document i.e. all the handlers must be set
32+
again for each document. Another parser may have a more efficient way
33+
of parsing a set of documents identically.
34+
35+
3. XPath support
36+
37+
Proper XPath support. I really need to sit down and plough
38+
through the specification...
39+
40+
The very simple text comparison system currently used is too
41+
basic. Need to convert the path to an ordered list of nodes. Each node
42+
is an element qualifier, and may have a list of attribute
43+
qualifications attached. This probably requires lexx/yacc combination.
44+
(James Clark has written a yacc grammar for XPath). Not all the
45+
features of XPath are necessarily relevant.
46+
47+
An option to return subdocuments (i.e. subelements AND cdata, not just
48+
cdata). This should maybe be the default.
49+
50+
4. Multiple occurences of elements.
51+
52+
This section is all very sketchy, and has various weaknesses.
53+
54+
Is there a good way to optimise/index the results of certain XPath
55+
operations to make them faster?:
56+
57+
select docid, pgxml_xpath(document,'/site/location',1) as location
58+
where pgxml_xpath(document,'/site/name',1) = 'Church Farm';
59+
60+
and with multiple element occurences in a document?
61+
62+
select d.docid, pgxml_xpath(d.document,'/site/location',1)
63+
from docstore d,
64+
pgxml_xpaths('docstore','document','feature/type','docid') ft
65+
where ft.key = d.docid and ft.value ='Limekiln';
66+
67+
pgxml_xpaths params are relname, attrname, xpath, returnkey. It would
68+
return a set of two-element tuples (key,value) consisting of the value of
69+
returnkey, and the cdata value of the xpath. The XML document would be
70+
defined by relname and attrname.
71+
72+
The pgxml_xpaths function could be the basis of a functional index,
73+
which could speed up the above query very substantially, working
74+
through the normal query planner mechanism. Syntax above is fragile
75+
through using names rather than OID.
76+
77+
John Gray <jgray@azuli.co.uk>
78+
79+
80+
81+
82+
83+

0 commit comments

Comments
 (0)