postgrespro
diff --git a/‎contrib/README
+5-1 b/‎contrib/README
+5-1
diff --git a/‎contrib/xml/Makefile
+43 b/‎contrib/xml/Makefile
+43
diff --git a/‎contrib/xml/README
+78 b/‎contrib/xml/README
+78
diff --git a/‎contrib/xml/TODO
+83 b/‎contrib/xml/TODO
+83
@@ -1,6 +1,6 @@
 
 The PostgreSQL contrib tree
-~~~~~~~~~~~~~~~~~~~~~~~~~~~
+---------------------------
 
 This subtree contains tools, modules, and examples that are not
 maintained as part of the core PostgreSQL system, mainly because
@@ -177,3 +177,7 @@ userlock -
 vacuumlo -
 	Remove orphaned large objects
 	by Peter T Mount <peter@retep.org.uk>
+
+xml -
+	Storing XML in PostgreSQL
+	by John Gray <jgray@beansindustry.co.uk>
@@ -0,0 +1,43 @@
+#-------------------------------------------------------------------------
+#
+# Makefile--
+# Adapted from tutorial makefile
+#-------------------------------------------------------------------------
+
+subdir = contrib/xml
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+
+override CFLAGS+= $(CFLAGS_SL)
+
+
+#
+# DLOBJS is the dynamically-loaded object files.  The "funcs" queries
+# include CREATE FUNCTIONs that load routines from these files.
+#
+DLOBJS= pgxml$(DLSUFFIX)
+
+
+QUERIES= pgxml.sql
+
+all: $(DLOBJS) $(QUERIES)
+
+# Requires the expat library
+
+%.so: %.o
+	$(CC) -shared -lexpat -o $@ $<
+
+
+%.sql: %.source
+	if [ -z "$$USER" ]; then USER=$$LOGNAME; fi; \
+	if [ -z "$$USER" ]; then USER=`whoami`; fi; \
+	if [ -z "$$USER" ]; then echo 'Cannot deduce $$USER.'; exit 1; fi; \
+	rm -f $@; \
+	C=`pwd`; \
+	sed -e "s:_CWD_:$$C:g" \
+	    -e "s:_OBJWD_:$$C:g" \
+	    -e "s:_DLSUFFIX_:$(DLSUFFIX):g" \
+	    -e "s/_USER_/$$USER/g" < $< > $@
+
+clean:
+	rm -f $(DLOBJS) $(QUERIES)
@@ -0,0 +1,78 @@
+This package contains a couple of simple routines for hooking the
+expat XML parser up to PostgreSQL. This is a work-in-progress and all
+very basic at the moment (see the file TODO for some outline of what
+remains to be done).
+
+At present, two functions are defined, one which checks
+well-formedness, and the other which performs very simple XPath-type
+queries.
+
+Prerequisite:
+
+expat parser 1.95.0 or newer (http://expat.sourceforge.net)
+
+I used a shared library version -I'm sure you could use a static
+library if you wished though. I had no problems compiling from source.
+
+Function documentation and usage:
+---------------------------------
+
+pgxml_parse(text) returns bool
+  parses the provided text and returns true or false if it is 
+well-formed or not. It returns NULL if the parser couldn't be
+created for any reason.
+
+pgxml_xpath(text doc, text xpath, int n) returns text
+  parses doc and returns the cdata of the nth occurence of
+the "XPath" listed. See below for details on the syntax.
+
+
+Example:
+
+Given a  table docstore:
+
+ Attribute |  Type   | Modifier 
+-----------+---------+----------
+ docid     | integer | 
+ document  | text    | 
+
+containing documents such as (these are archaeological site
+descriptions, in case anyone is wondering):
+
+<?XML version="1.0"?>
+<site provider="Foundations" sitecode="ak97" version="1">
+   <name>Church Farm, Ashton Keynes</name>
+   <invtype>watching brief</invtype>
+   <location scheme="osgb">SU04209424</location>
+</site>
+
+one can type:
+
+select docid, 
+pgxml_xpath(document,'/site/name',1) as sitename,
+pgxml_xpath(document,'/site/location',1) as location
+ from docstore;
+ 
+and get as output:
+
+ docid |          sitename           |  location  
+-------+-----------------------------+------------
+     1 | Church Farm, Ashton Keynes  | SU04209424
+     2 | Glebe Farm, Long Itchington | SP41506500
+(2 rows)
+
+
+"XPath" syntax supported
+------------------------
+
+At present it only supports paths of the form:
+'tag1/tag2' or '/tag1/tag2'
+
+The first case will find any <tag2> within a <tag1>, the second will
+find any <tag2> within a <tag1> at the top level of the document.
+
+The real XPath is much more complex (see TODO file).
+
+
+John Gray <jgray@azuli.co.uk>  26 July 2001
+
@@ -0,0 +1,83 @@
+PGXML TODO List
+===============
+
+Some of these items still require much more thought! The data model
+for XML documents and the parsing model of expat don't really fit so
+well with a standard SQL model.
+
+1. Generalised XML parsing support
+
+Allow a user to specify handlers (in any PL) to be used by the parser.
+This must permit distinct sets of parser settings -user may want some
+documents in a database to parsed with one set of handlers, others
+with a different set.
+
+i.e. the pgxml_parse function would take as parameters (document,
+parsername) where parsername was the identifier for a collection of
+handler etc. settings.
+
+"Stub" handlers in the pgxml code would invoke the functions through
+the standard fmgr interface. The parser interface would define the
+prototype for these functions. How does the handler function know
+which document/context has resulted it in being called?
+
+Mechanism for defining collection of parser settings (in a table? -but
+maybe copied for efficiency into a structure when first required by a
+query?)
+
+2. Support for other parsers
+
+Expat may not be the best choice as a parser because a new parser
+instance is needed for each document i.e. all the handlers must be set
+again for each document. Another parser may have a more efficient way
+of parsing a set of documents identically.
+
+3. XPath support
+
+Proper XPath support. I really need to sit down and plough
+through the specification...
+
+The very simple text comparison system currently used is too
+basic. Need to convert the path to an ordered list of nodes. Each node
+is an element qualifier, and may have a list of attribute
+qualifications attached. This probably requires lexx/yacc combination.
+(James Clark has written a yacc grammar for XPath). Not all the
+features of XPath are necessarily relevant.
+
+An option to return subdocuments (i.e. subelements AND cdata, not just
+cdata). This should maybe be the default.
+
+4. Multiple occurences of elements.
+
+This section is all very sketchy, and has various weaknesses.
+ 
+Is there a good way to optimise/index the results of certain XPath
+operations to make them faster?:
+
+select docid, pgxml_xpath(document,'/site/location',1) as location 
+where pgxml_xpath(document,'/site/name',1) = 'Church Farm';
+
+and with multiple element occurences in a document?
+
+select d.docid, pgxml_xpath(d.document,'/site/location',1) 
+from docstore d, 
+pgxml_xpaths('docstore','document','feature/type','docid') ft 
+where ft.key = d.docid and ft.value ='Limekiln';
+
+pgxml_xpaths params are relname, attrname, xpath, returnkey. It would
+return a set of two-element tuples (key,value) consisting of the value of
+returnkey, and the cdata value of the xpath. The XML document would be
+defined by relname and attrname.
+
+The pgxml_xpaths function could be the basis of a functional index,
+which could speed up the above query very substantially, working
+through the normal query planner mechanism. Syntax above is fragile
+through using names rather than OID.
+ 
+John Gray <jgray@azuli.co.uk>
+
+
+
+
+
+