Abstract
Term-partitioned indexes are generally inefficient for the evaluation of conjunctive queries, as they require the communication of long posting lists. On the other side, document-partitioned indexes incur in excessive overheads as the evaluation of every query involves the participation of all the processors, therefore their scalability is not adequate for real systems. We propose to arrange a set of processors in a two-dimensional array, applying term-partitioning at row level and document-partitioning at column level. Choosing the adequate number of rows and columns given the available number of processors, together with the selection of the proper ways of partitioning the index over that topology is the subject of this paper.
This research was funded by a Yahoo! Research Alliance Grant.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Badue, C., Baeza-Yates, R., Ribeiro, B., Ziviani, N.: Distributed query processing using partitioned inverted files. In: SPIRE (2001)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval
Costa, G.V., Marin, M., Reyes, N.: Parallel query processing on distributed clustering indexes. Journal of Discrete AlgorithmsĀ (7) , 03ā17 (2009)
Jeong, B.S., Omiecinski, E.: Inverted file partitioning schemes in multiple disk systems. IEEE Trans. Parallel and Distributed SystemsĀ 16(2), 142ā153 (1995)
Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: Mining query logs to optimize index partitioning in parallel web search engines. In: INFOSCALE (2007)
MacFarlane, A.A., McCann, J.A., Robertson, S.E.: Parallel search using partitioned inverted files. In: SPIRE (2000)
Marin, M., Costa, G.V.: High-performance distributed inverted files. In: CIKM 2007 (2007)
Marin, M., Gomez-Pantoja, C., Gonzalez, S., Gil-Costa, V.: Scheduling Intersection Queries in Term Partitioned Inverted Files. In: Luque, E., Margalef, T., BenĆtez, D. (eds.) Euro-Par 2008. LNCS, vol.Ā 5168, pp. 434ā443. Springer, Heidelberg (2008)
Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Information RetrievalĀ 10(3), 205ā231 (2007)
Ribeiro-Neto, B.A., Barbosa, R.A.: Query performance for tightly coupled distributed digital libraries. In: ACM Conf. Digital Libraries, pp. 182ā190 (1998)
Stanfill, C.: Partitioned posting files: a parallel inverted file structure for information retrieval. In: SIGIR (1990)
Suel, T., Mathur, C., Wu, J.W., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: ODISSEA: A peer-to-peer architecture for scalable web search and information retrieval. In: WWW 2003 (2003)
Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: NSDI (2004)
Tomasic, A., GarcĆa-Molina, H.: Performance issues in distributed shared-nothing information-retrieval systems. Information Processing & ManagementĀ 32(6), 647ā665 (1996)
Xi, W., Sornil, O., Luo, M., Fox, E.A.: Hybrid partition inverted files: Experimental validation. In: Agosti, M., Thanos, C. (eds.) ECDL 2002, vol.Ā 2458, p. 422. Springer, Heidelberg (2002)
Zhang, J., Suel, T.: Optimized inverted list assignment in distributed search engine architectures. In: IEEE IPDPS 2007(2007)
Zhong, M., Shen, K., Seiferas, J.I.: Correlation-aware object placement for multi-object operations. In: ICDCS 2008, pp. 512ā521 (2008)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computing SurveysĀ 38(2) (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feuerstein, E., Marin, M., Mizrahi, M., Gil-Costa, V., Baeza-Yates, R. (2009). Two-Dimensional Distributed Inverted Files. In: Karlgren, J., Tarhio, J., Hyyrƶ, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-03784-9_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03783-2
Online ISBN: 978-3-642-03784-9
eBook Packages: Computer ScienceComputer Science (R0)