|
| 1 | +PGXML TODO List |
| 2 | +=============== |
| 3 | + |
| 4 | +Some of these items still require much more thought! Since the first |
| 5 | +release, the XPath support has improved (because I'm no longer using a |
| 6 | +homemade algorithm!). |
| 7 | + |
| 8 | +1. Performance considerations |
| 9 | + |
| 10 | +At present each document is parsed to produce the DOM tree on every query. |
| 11 | + |
| 12 | +Pros: |
| 13 | + Easy |
| 14 | + No persistent memory or storage allocation for parsed trees |
| 15 | + (libxml docs suggest representation of a document might |
| 16 | + be 4 times the size of the text) |
| 17 | + |
| 18 | +Cons: |
| 19 | + Slow/ CPU intensive to parse. |
| 20 | + Makes it difficult for PLs to apply libxml manipulations to create |
| 21 | + new documents or amend existing ones. |
| 22 | + |
| 23 | + |
| 24 | +2. XQuery |
| 25 | + |
| 26 | +I'm not sure if the addition of XQuery would be best as a function or |
| 27 | +as a new front-end parser. This is one to think about, but with a |
| 28 | +decent implementation of XPath, one of the prerequisites is covered. |
| 29 | + |
| 30 | +3. DOM Interfaces |
| 31 | + |
| 32 | +Expose more aspects of the DOM to user functions/ PLs. This would |
| 33 | +allow a procedure in a PL to run some queries and then use exposed |
| 34 | +interfaces to libxml to create an XML document out of the query |
| 35 | +results. I accept the argument that this might be more properly |
| 36 | +performed on the client side. |
| 37 | + |
| 38 | +4. Returning sets of documents from XPath queries. |
| 39 | + |
| 40 | +Although the current implementation allows you to amalgamate the |
| 41 | +returned results into a single document, it's quite possible that |
| 42 | +you'd like to use the returned set of nodes as a source for FROM. |
| 43 | + |
| 44 | +Is there a good way to optimise/index the results of certain XPath |
| 45 | +operations to make them faster?: |
| 46 | + |
| 47 | +select docid, pgxml_xpath(document,'//site/location/text()','','') as location |
| 48 | +where pgxml_xpath(document,'//site/name/text()','','') = 'Church Farm'; |
| 49 | + |
| 50 | +and with multiple element occurences in a document? |
| 51 | + |
| 52 | +select d.docid, pgxml_xpath(d.document,'//site/location/text()','','') |
| 53 | +from docstore d, |
| 54 | +pgxml_xpaths('docstore','document','//feature/type/text()','docid') ft |
| 55 | +where ft.key = d.docid and ft.value ='Limekiln'; |
| 56 | + |
| 57 | +pgxml_xpaths params are relname, attrname, xpath, returnkey. It would |
| 58 | +return a set of two-element tuples (key,value) consisting of the value of |
| 59 | +returnkey, and the cdata value of the xpath. The XML document would be |
| 60 | +defined by relname and attrname. |
| 61 | + |
| 62 | +The pgxml_xpaths function could be the basis of a functional index, |
| 63 | +which could speed up the above query very substantially, working |
| 64 | +through the normal query planner mechanism. |
| 65 | + |
| 66 | +5. Return type support. |
| 67 | + |
| 68 | +Better support for returning e.g. numeric or boolean values. I need to |
| 69 | +get to grips with the returned data from libxml first. |
| 70 | + |
| 71 | + |
| 72 | +John Gray <jgray@azuli.co.uk> 16 August 2001 |
| 73 | + |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | + |
| 78 | + |
0 commit comments