Abstract. As part of the INEX 2006 Adhoc Track, we conducted a range of experiments with filtering and clustering XML element retrieval results. Our basic retrieval engine retrieves arbitrary elements from the collection (corresponding to... more
Abstract. As part of the INEX 2006 Adhoc Track, we conducted a range of experiments with filtering and clustering XML element retrieval results. Our basic retrieval engine retrieves arbitrary elements from the collection (corresponding to the Thorough Task). These runs are filtered to ...
Use of XML offers a structured approach for representing information while maintaining separation of form and content. XML information retrieval is different from standard text retrieval in two aspects: the XML structure may be of... more
Use of XML offers a structured approach for representing information while maintaining separation of form and content. XML information retrieval is different from standard text retrieval in two aspects: the XML structure may be of interest as part of the query; and the information does not have to be text. In this paper, we describe an investigation of approaches to retrieve text and images from a large collection of XML documents, performed in the course of our participation in the INEX 2006 Ad Hoc and Multimedia tracks. We evaluate three information retrieval similarity measures: Pivoted Cosine, Okapi BM25 and Dirichlet. We show that on the INEX 2006 Ad Hoc queries Okapi BM25 is the most effective among the three similarity measures used for retrieving text only, while Dirichlet is more suitable when retrieving heterogeneous (text and image) data.
XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML... more
XML files contain data which is in well formatted manner. By studying the format or semantics of the grammar it will be helpful for fast retrieval of the data. There are many algorithms which describes about searching the data from XML files. There are no. of approaches which uses data structure or are related to the contents of the document. In these cases user must know about the structure of the document and information retrieval techniques using NLPs is related to content of the document. Hence the result may be irrelevant or not so successful and may take more time to search.. This paper presents fast XML retrieval techniques by using new indexing technique and the concept of RXML. When indexing an XML document, the system takes into account both the document content and the document structure and assigns the value to each tag from file. To query the system, a user is not constrained about fixed format of query. Keywords—XML Retrieval, Indexed Search, Information Retrieval.
Since 2002, INEX has been working towards the goal of establishing an infrastructure, in the form of a large XML test collection and appropriate scoring methods, for the evaluation of content-oriented XML retrieval systems. This paper... more
Since 2002, INEX has been working towards the goal of establishing an infrastructure, in the form of a large XML test collection and appropriate scoring methods, for the evaluation of content-oriented XML retrieval systems. This paper provides an overview of the work carried out as part of INEX 2005.
XML is a flexible and powerful tool that enables information a nd security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for... more
XML is a flexible and powerful tool that enables information a nd security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree
A realistic measure of relevance is necessary for meaningful comparison of alternative XML retrieval approaches. Pre- vious studies have shown that the current INEX relevance denition, comprising two dimensions based on topical rele-... more
A realistic measure of relevance is necessary for meaningful comparison of alternative XML retrieval approaches. Pre- vious studies have shown that the current INEX relevance denition, comprising two dimensions based on topical rele- vance, is too hard for users to understand. In this paper, we propose and evaluate a new relevance denition that uses v e-point scale to assess the
This paper describes the retrieval approach proposed by the SIG/EVI group of the IRIT research centre in INEX’2004 evaluation. The approach uses a voting method coupled with some processes to answer content only and content and structure... more
This paper describes the retrieval approach proposed by the SIG/EVI group of the IRIT research centre in INEX’2004 evaluation. The approach uses a voting method coupled with some processes to answer content only and content and structure queries. This approach is based on previous works we leaded in the context of automatic text categorization.
As opposed to traditional Information Retrieval (IR) which views whole documents as atomic units of retrieval, XML IR processes XML elements as possible units of retrieval. Many open issues appear when considering Relevance Feedback (RF)... more
As opposed to traditional Information Retrieval (IR) which views whole documents as atomic units of retrieval, XML IR processes XML elements as possible units of retrieval. Many open issues appear when considering Relevance Feedback (RF) in XML documents. They are mainly related to the form of XML documents that mix content and structure and to the new granularity of information processed by the Information Retrieval Systems (IRS). Most of the RF approaches proposed in XML retrieval are simple adaptations of traditional RF to the new granularity of information. They enrich queries by adding terms extracted from relevant elements instead of terms extracted from whole documents. In this paper, we propose to extend the initial query by adding both content and structural constraints. Experiments are carried out with the INEX evaluation campaign and results show the interest of our method.
Purpose – Information retrieval (IR) and feedback in Extensible Markup Language (XML) are rather new fields for researchers; natural questions arise, such as: how good are the feedback algorithms in XML IR? Can they be evaluated with... more
Purpose – Information retrieval (IR) and feedback in Extensible Markup Language (XML) are rather new fields for researchers; natural questions arise, such as: how good are the feedback algorithms in XML IR? Can they be evaluated with standard evaluation tools? Even though some evaluation methods have been proposed in the literature it is still not clear yet which of them
Abstract. Different scenarios of XML retrieval are analysed in the INEX 2005 ad hoc track, which reflect different query interpretations and user behaviours that may be observed during XML retrieval. The RMIT University group's... more
Abstract. Different scenarios of XML retrieval are analysed in the INEX 2005 ad hoc track, which reflect different query interpretations and user behaviours that may be observed during XML retrieval. The RMIT University group's participation in the INEX 2005 ad hoc track in- ...