Abstract
We study the expressiveness and performance of full-text search languages. Our motivation is to provide a formal basis for comparing full-text search languages and to develop a model for full-text search that can be tightly integrated with structured search. We design a model based on the positions of tokens (words) in the input text, and develop a full-text calculus (FTC) and a full-text algebra (FTA) with equivalent expressive power; this suggests a notion of completeness for full-text search languages. We show that existing full-text languages are incomplete and identify a practical subset of the FTC and FTA that is more powerful than existing languages, but which can still be evaluated efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., Ribiero-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword Searching and Browsing in Databases using BANKS. ICDE (2002)
Botev, C., Amer-Yahia, S., Shanmugasundaram, J.: ”On the Completeness of Full-Text Search Languages”. Technical Report, Cornell University (2005), http://www.cs.cornell.edu/database/TeXQuery/Expressiveness.pdf
Bremer, J.M., Gertz, M.: XQuery/IR: Integrating XML Document and Data Retrieval. In: WebDB (2002)
Brown, E.W.: Fast Evaluation of Structured Queries for Information Retrieval. SIGIR (1995)
Chinenyanga, T.T., Kushmerick, N.: Expressive and Efficient Ranked Querying of XML Data. WebDB (2001)
Clarke, C., Cormack, G., Burkowski, F.: An Algebra for Structured Text Search and a Framework for its Implementation. Comput. J. 38(1), 43–56 (1995)
Codd, E.F.: Relational Completeness of Database Sublanguages. In: Rustin, R. (ed.) Database Systems (1972)
Cohen, S., et al.: XSEarch: A Semantic Search Engine for XML. In: VLDB (2003)
Consens, M.P., Milo, T.: Algebras for Querying Text Regions: Expressive Power and Optimization. J. Comput. Syst. Sci. 57(3), 272–288 (1998)
Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. J. of Comp. and Syst. Sciences 66 (2003)
Florescu, D., Kossmann, D., Manolescu, I.: Integrating Keyword Search into XML Query Processing. WWW (2000)
Fuhr, N., Grossjohann, K.: XIRQL: An Extension of XQL for Information Retrieval. SIGIR (2000)
Fuhr, N., Rölleke, T.: A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. ACM TOIS 15(1) (1997)
Hayashi, Y., Tomita, J., Kikui, G.: Searching Text-rich XML Documents with Relevance Ranking. In: SIGIR Workshop on XML and Information Retrieval (2000)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-Style Keyword Search over Relational Databases. In: VLDB (2003)
Jaakkola, J., Kilpelinen, P.: Nested Text-Region Algebra Report C-1999-2, Dept. of Computer Science, University of Helsinki (January 1999)
Melton, J., Eisenberg, A.: SQL Multimedia and Application Packages (SQL/MM). SIGMOD Record 30(4) (2001)
Myaeng, S.-H., Jang, D.-H., Kim, M.-S., Zhoo, Z.-C.: A FlexibleModel for Retrieval of SGML Documents. In: SIGIR (1998)
Navarro, G., Baeza-Yates, R.: Proximal Nodes: a Model to Query Document Databases by Content and Structure. ACM Trans. Inf. Syst. 15(4) (1997)
Salminen, A.: A Relational Model for Unstructured Documents. In: SIGIR 1987 (1987)
Salminen, A., Tompa, F.: PAT Expressions: an Algebra for Text Search. Acta Linguistica Hungar 41(1-4) (1992)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983); Expressiveness and Performance of Full-Text Search Languages 367
Salton, G.: Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Theobald, A., Weikum, G.: The index-based XXL search engine for querying XML data with relevance ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 477. Springer, Heidelberg (2002)
Vardi, M.: The Complexity of Relational Query Languages. STOC (1982)
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishing, San Francisco (1999)
Young-Lai, M., Tompa, F.: One-pass Evaluation of Region Algebra Expressions. Inf. Syst. 28(3) (2003)
Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On Supporting Containment Queries in Relational Database Management Systems. SIGMOD (2001)
Zimanyi, E.: Query Evaluations in Probabilistic Relational Databases. Theoretical Computer Science (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Botev, C., Amer-Yahia, S., Shanmugasundaram, J. (2006). Expressiveness and Performance of Full-Text Search Languages. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_23
Download citation
DOI: https://doi.org/10.1007/11687238_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32960-2
Online ISBN: 978-3-540-32961-9
eBook Packages: Computer ScienceComputer Science (R0)