Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '93, 1993
Structured texts (for example dictionaries and user manuals) typically have a heirarchical (tree-... more Structured texts (for example dictionaries and user manuals) typically have a heirarchical (tree-like) structure. We describe a query language for retrieving information from collections of hierarchical text. The language is based on a tree pattern matching notion called tree inclusion. Tree inclusion allows easy expression of queries that use the structure and the content of the document. In using it
Computers support the management of large collections of text documents, but efficient reuse of d... more Computers support the management of large collections of text documents, but efficient reuse of document collections for producing new documents remains inherently difficult. We describe and discuss the design and implementation of a document assembly system ...
The following learning task is considered: Given a set S of strings consisting of basic symbols a... more The following learning task is considered: Given a set S of strings consisting of basic symbols and a set C of patterns consisting of basic symbols and variables, compute a concise set C Í</font > CC \subseteq C such that each string in S is obtained from some pattern in C by substituting basic symbols for the variables. We apply
The Standard Generalized Markup Language (SGML) allows users to define documenttype definitions (... more The Standard Generalized Markup Language (SGML) allows users to define documenttype definitions (DTDs), which are essentially extended context-free grammars in a notationthat is similar to extended Backus--Naur form. The right-hand side of a production is calleda content model and its semantics can be modified by exceptions. We give precise definitionsof the semantics of exceptions and prove that they do not
Validation of XML documents is often treated as a major operation, performed only at major transi... more Validation of XML documents is often treated as a major operation, performed only at major transitions in the document's life cycle, after it has been created or when it enters some new stage of processing. Users editing XML documents, on the other hand, would appreciate instantaneous feedback of the correctness of the document each time anything changes. Such on-the-fly validation can be implemented in an XML editor using the current version of Java and freely available XML tools. Our experience is that on-the- fly validation can be implemented easily without introducing observable delays even on relatively large documents. To demonstrate this, we have built an experimental XML editor which validates documents on-the-fly after every modification. The editor supports editing of DTDs and validation according to DTDs and according to schemas written in W3C XML Schema and Relax NG.
SUMMARY We study different efficient implementations of an AhoCorasick pattern matching automato... more SUMMARY We study different efficient implementations of an AhoCorasick pattern matching automaton when searching for patterns in Unicode text. Much of the previous research has been based on the assumption of a relatively small alphabet, for example the 7-bit ASCII. Our ...
... Lähteet [AHH+98a] Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, Pekka Kil... more ... Lähteet [AHH+98a] Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, Pekka Kilpeläinen, and Greger Lindén. ... [AHH+98b] Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola ja Mika Klemettinen. ...
ABSTRACT TranSID is a tree-based SGML transformation language, which can also be used for other S... more ABSTRACT TranSID is a tree-based SGML transformation language, which can also be used for other SGML processing: for performing queries and for limited formatting. An evaluator of the TranSID language has been implemented, and tested to run in the Linux and Solaris environments. This report serves as a reference manual of th e TranSID language. The report describes the syntax and informal semantics of the language and its built-in functions, as of version 0.038 of the evaluator.
... by Helena Ahonen , Barbara Heikkinen , Oskari Heinonen , Jani Jaakkola , Pekka Kilpeläinen , ... more ... by Helena Ahonen , Barbara Heikkinen , Oskari Heinonen , Jani Jaakkola , Pekka Kilpeläinen , Greger Lindén , Heikki Mannila. In Proc. of SGML Finland. Add To MetaCart. ...
In structured text databases documents are represented as parse trees, and different tree matchin... more In structured text databases documents are represented as parse trees, and different tree matching notions can be used as primitives for query languages. Two useful notions of tree matching, tree inclusion and tree pattern matching both seem to require superlinear time. In this paper we give a general sufficient condition for a tree matching problem to be solvable in linear
Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '93, 1993
Structured texts (for example dictionaries and user manuals) typically have a heirarchical (tree-... more Structured texts (for example dictionaries and user manuals) typically have a heirarchical (tree-like) structure. We describe a query language for retrieving information from collections of hierarchical text. The language is based on a tree pattern matching notion called tree inclusion. Tree inclusion allows easy expression of queries that use the structure and the content of the document. In using it
Computers support the management of large collections of text documents, but efficient reuse of d... more Computers support the management of large collections of text documents, but efficient reuse of document collections for producing new documents remains inherently difficult. We describe and discuss the design and implementation of a document assembly system ...
The following learning task is considered: Given a set S of strings consisting of basic symbols a... more The following learning task is considered: Given a set S of strings consisting of basic symbols and a set C of patterns consisting of basic symbols and variables, compute a concise set C Í</font > CC \subseteq C such that each string in S is obtained from some pattern in C by substituting basic symbols for the variables. We apply
The Standard Generalized Markup Language (SGML) allows users to define documenttype definitions (... more The Standard Generalized Markup Language (SGML) allows users to define documenttype definitions (DTDs), which are essentially extended context-free grammars in a notationthat is similar to extended Backus--Naur form. The right-hand side of a production is calleda content model and its semantics can be modified by exceptions. We give precise definitionsof the semantics of exceptions and prove that they do not
Validation of XML documents is often treated as a major operation, performed only at major transi... more Validation of XML documents is often treated as a major operation, performed only at major transitions in the document's life cycle, after it has been created or when it enters some new stage of processing. Users editing XML documents, on the other hand, would appreciate instantaneous feedback of the correctness of the document each time anything changes. Such on-the-fly validation can be implemented in an XML editor using the current version of Java and freely available XML tools. Our experience is that on-the- fly validation can be implemented easily without introducing observable delays even on relatively large documents. To demonstrate this, we have built an experimental XML editor which validates documents on-the-fly after every modification. The editor supports editing of DTDs and validation according to DTDs and according to schemas written in W3C XML Schema and Relax NG.
SUMMARY We study different efficient implementations of an AhoCorasick pattern matching automato... more SUMMARY We study different efficient implementations of an AhoCorasick pattern matching automaton when searching for patterns in Unicode text. Much of the previous research has been based on the assumption of a relatively small alphabet, for example the 7-bit ASCII. Our ...
... Lähteet [AHH+98a] Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, Pekka Kil... more ... Lähteet [AHH+98a] Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, Pekka Kilpeläinen, and Greger Lindén. ... [AHH+98b] Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola ja Mika Klemettinen. ...
ABSTRACT TranSID is a tree-based SGML transformation language, which can also be used for other S... more ABSTRACT TranSID is a tree-based SGML transformation language, which can also be used for other SGML processing: for performing queries and for limited formatting. An evaluator of the TranSID language has been implemented, and tested to run in the Linux and Solaris environments. This report serves as a reference manual of th e TranSID language. The report describes the syntax and informal semantics of the language and its built-in functions, as of version 0.038 of the evaluator.
... by Helena Ahonen , Barbara Heikkinen , Oskari Heinonen , Jani Jaakkola , Pekka Kilpeläinen , ... more ... by Helena Ahonen , Barbara Heikkinen , Oskari Heinonen , Jani Jaakkola , Pekka Kilpeläinen , Greger Lindén , Heikki Mannila. In Proc. of SGML Finland. Add To MetaCart. ...
In structured text databases documents are represented as parse trees, and different tree matchin... more In structured text databases documents are represented as parse trees, and different tree matching notions can be used as primitives for query languages. Two useful notions of tree matching, tree inclusion and tree pattern matching both seem to require superlinear time. In this paper we give a general sufficient condition for a tree matching problem to be solvable in linear
Uploads
Papers by Pekka Kilpeläinen