Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2009916.2009994acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Indexing strategies for graceful degradation of search quality

Published: 24 July 2011 Publication History

Abstract

Large web search engines process billions of queries each day over tens of billions of documents with often very stringent requirements for a user's search experience, in particular, low latency and highly relevant search results. Index generation and serving are key to satisfying both these requirements. For example, the load to search engines can vary drastically when popular events happen around the world. In the case when the load is exceeding what the search engine can serve, queries will get dropped. This results in an un- graceful degradation in search quality. Another example that could increase the query load and affect the user's search experience are ambiguous queries which often result in the execution of multiple query alterations in the back end.
In this paper, we look into the problem of designing robust indexing strategies, i.e. strategies that allow for a graceful degradation of search quality in both the above scenarios. We study the problems of index generation and serving using the notions of document allocation, server selection, and document replication. We explore the space of efficient algorithms for these problems and empirically corroborate with existing theory that it is hard to optimally solve the alocation and selection problems without any replication. We propose a greedy replication algorithm and study its performance under different choices of allocation and selection. Further, we show hat under random selection and allocation, our algorithm is optimal.

References

[1]
GOV2 Dataset, TREC Terabyte Track. http://www-nlpir.nist.gov/projects/terabyte/.
[2]
Job Shop Scheduling . http://wikipedia.org/wiki/Job_shop_scheduling.
[3]
N. Alon, Y. Azar, G. J. Woeginger, and T. Yadid. Approximation schemes for scheduling on parallel machines. Journal of Scheduling, 1:55--66, 1998.
[4]
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Plachouras, and L. Telloli. On the feasibility of multi-site web search engines. In CIKM, 2009.
[5]
M. Bawa, R. J. B. Jr., S. Rajagopalan, and E. J. Shekita. Make it fresh, make it quick - searching a network of personal webservers. In 12th International World Wide Web Conference (WWW2003), 2003.
[6]
J. W. Byers, J. Considine, and M. Mitzenmacher. Simple load balancing for distributed hash tables. In IPTPS, pages 80--87, 2003.
[7]
B. B. Cambazoglu, E. Varol, E. Kayaaslan, C. Aykanat, and R. Baeza-Yates. Query forwarding in geographically distributed search engines. In SIGIR, 2010.
[8]
R. L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17:416--429, 1969.
[9]
L. Gravano and H. Garcia-molina. Generalizing gioss to vector-space databases and broker hierarchies. In VLDB, pages 78--89, 1995.
[10]
A. Kulkarni and J. Callan. Document allocation policies for selective searching of distributed indexes. In CIKM, 2010.
[11]
A. Kulkarni and J. Callan. Topic-based index partitions for efficient and effective selective search. In SIGIR 2010 Workshop on Large-Scale Distributed Information Retrieval, 2010.
[12]
M. Mitzenmacher. The power of two choices in randomized load balancing. IEEE Trans. Parallel Distrib. Syst., 12(10):1094--1104, 2001.
[13]
R. Motwani and P. Raghavan. Randomized algorithms. 1995.
[14]
M. Persin, J. Zobel, and R. Sacks-davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science, 47:749--764, 1996.
[15]
T. Pitoura, N. Ntarmos, and P. Triantafillou. Replication, load balancing and efficient range query processing in dhts. In EDBT, pages 131--148, 2006.
[16]
D. Puppin, F. Silvestri, and D. Laforenza. Query-driven document partitioning and collection selection. In INFOSCALE 2006: Proc. of the first International Conference on Scalable Information Systems, pages 107--117, 2006.
[17]
S. Robertson, H. Zaragoza, and M. Taylor. Simple bm25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, CIKM '04, pages 42--49, 2004.
[18]
M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB, pages 648--659, 2004.
[19]
J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In Proc. of SIGIR, pages 254--261, 1999.

Cited By

View all
  • (2018) Brownout CC : Cascaded Control for Bounding the Response Times of Cloud Applications 2018 Annual American Control Conference (ACC)10.23919/ACC.2018.8431282(3354-3361)Online publication date: Jun-2018
  • (2018)Cloud Application Predictability through Integrated Load-Balancing and Service Time Control2018 IEEE International Conference on Autonomic Computing (ICAC)10.1109/ICAC.2018.00015(51-60)Online publication date: Sep-2018
  • (2017)Power-aware cloud brownout: Response time and power consumption control2017 IEEE 56th Annual Conference on Decision and Control (CDC)10.1109/CDC.2017.8264049(2686-2691)Online publication date: Dec-2017
  • Show More Cited By

Index Terms

  1. Indexing strategies for graceful degradation of search quality

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
    July 2011
    1374 pages
    ISBN:9781450307574
    DOI:10.1145/2009916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graceful degradation
    2. indexing stratagies
    3. search quality

    Qualifiers

    • Research-article

    Conference

    SIGIR '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018) Brownout CC : Cascaded Control for Bounding the Response Times of Cloud Applications 2018 Annual American Control Conference (ACC)10.23919/ACC.2018.8431282(3354-3361)Online publication date: Jun-2018
    • (2018)Cloud Application Predictability through Integrated Load-Balancing and Service Time Control2018 IEEE International Conference on Autonomic Computing (ICAC)10.1109/ICAC.2018.00015(51-60)Online publication date: Sep-2018
    • (2017)Power-aware cloud brownout: Response time and power consumption control2017 IEEE 56th Annual Conference on Decision and Control (CDC)10.1109/CDC.2017.8264049(2686-2691)Online publication date: Dec-2017
    • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media