|
| 1 | +PGXML TODO List |
| 2 | +=============== |
| 3 | + |
| 4 | +Some of these items still require much more thought! The data model |
| 5 | +for XML documents and the parsing model of expat don't really fit so |
| 6 | +well with a standard SQL model. |
| 7 | + |
| 8 | +1. Generalised XML parsing support |
| 9 | + |
| 10 | +Allow a user to specify handlers (in any PL) to be used by the parser. |
| 11 | +This must permit distinct sets of parser settings -user may want some |
| 12 | +documents in a database to parsed with one set of handlers, others |
| 13 | +with a different set. |
| 14 | + |
| 15 | +i.e. the pgxml_parse function would take as parameters (document, |
| 16 | +parsername) where parsername was the identifier for a collection of |
| 17 | +handler etc. settings. |
| 18 | + |
| 19 | +"Stub" handlers in the pgxml code would invoke the functions through |
| 20 | +the standard fmgr interface. The parser interface would define the |
| 21 | +prototype for these functions. How does the handler function know |
| 22 | +which document/context has resulted it in being called? |
| 23 | + |
| 24 | +Mechanism for defining collection of parser settings (in a table? -but |
| 25 | +maybe copied for efficiency into a structure when first required by a |
| 26 | +query?) |
| 27 | + |
| 28 | +2. Support for other parsers |
| 29 | + |
| 30 | +Expat may not be the best choice as a parser because a new parser |
| 31 | +instance is needed for each document i.e. all the handlers must be set |
| 32 | +again for each document. Another parser may have a more efficient way |
| 33 | +of parsing a set of documents identically. |
| 34 | + |
| 35 | +3. XPath support |
| 36 | + |
| 37 | +Proper XPath support. I really need to sit down and plough |
| 38 | +through the specification... |
| 39 | + |
| 40 | +The very simple text comparison system currently used is too |
| 41 | +basic. Need to convert the path to an ordered list of nodes. Each node |
| 42 | +is an element qualifier, and may have a list of attribute |
| 43 | +qualifications attached. This probably requires lexx/yacc combination. |
| 44 | +(James Clark has written a yacc grammar for XPath). Not all the |
| 45 | +features of XPath are necessarily relevant. |
| 46 | + |
| 47 | +An option to return subdocuments (i.e. subelements AND cdata, not just |
| 48 | +cdata). This should maybe be the default. |
| 49 | + |
| 50 | +4. Multiple occurences of elements. |
| 51 | + |
| 52 | +This section is all very sketchy, and has various weaknesses. |
| 53 | + |
| 54 | +Is there a good way to optimise/index the results of certain XPath |
| 55 | +operations to make them faster?: |
| 56 | + |
| 57 | +select docid, pgxml_xpath(document,'/site/location',1) as location |
| 58 | +where pgxml_xpath(document,'/site/name',1) = 'Church Farm'; |
| 59 | + |
| 60 | +and with multiple element occurences in a document? |
| 61 | + |
| 62 | +select d.docid, pgxml_xpath(d.document,'/site/location',1) |
| 63 | +from docstore d, |
| 64 | +pgxml_xpaths('docstore','document','feature/type','docid') ft |
| 65 | +where ft.key = d.docid and ft.value ='Limekiln'; |
| 66 | + |
| 67 | +pgxml_xpaths params are relname, attrname, xpath, returnkey. It would |
| 68 | +return a set of two-element tuples (key,value) consisting of the value of |
| 69 | +returnkey, and the cdata value of the xpath. The XML document would be |
| 70 | +defined by relname and attrname. |
| 71 | + |
| 72 | +The pgxml_xpaths function could be the basis of a functional index, |
| 73 | +which could speed up the above query very substantially, working |
| 74 | +through the normal query planner mechanism. Syntax above is fragile |
| 75 | +through using names rather than OID. |
| 76 | + |
| 77 | +John Gray <jgray@azuli.co.uk> |
| 78 | + |
| 79 | + |
| 80 | + |
| 81 | + |
| 82 | + |
| 83 | + |
0 commit comments