Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Caching and database scaling in distributed shared-nothing information retrieval systems

Published: 01 June 1993 Publication History

Abstract

A common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPECT database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.

References

[1]
F. J. Burkowski. Retrieval performance of a distributed text database utilizing a parallel processor document server. In Proceedings of the Second InterT#ational Syrtzposium on Databases in Parallel and Distributed Systems, pages 71-79, Dublin, Ireland, 1990.
[2]
A.L. Chervenak. Performance measurements of the first raid prototype. Technical Report UCB/UCD 90/574, University of California, Berkley, May 1990.
[3]
J. K. Cringean, I%. England, G. A. Manson, and P. Willett. Parallel text searcldng in serial files using a processor farm. In SIGIR 1990, pages 429-453, 1990.
[4]
S. DeFazio and J. Hull. Toward servicing textual database transactions on symmetric shared memory multiprocessors. In Proceedings of the intern=tional Workshop on High Performance Transaction Systems, Asilomar, 1991.
[5]
P. A. Erarath. Page Indzzing Jot Teztual Inyorma#ion Retrieval Systems. PhD thesis, University of illinois at Urbane-Champaign, October 1983.
[6]
C. Faloutsos. Access methods for text. A CM Computing Sur#Je!/s, 17:50-74, 1985.
[7]
W. B. Frt#kes and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.
[8]
J. Gray and A, Router. Transaction Processing: Concepts and Techniq#tes. Morgan Kaufmann, 1993.
[9]
1%. Jain. The Art of Computer Systems Performance Analysis. John Wiley and Sons, New York, 1991.
[10]
B.-S. Jeong and E. Omleclnskl. Inverted file partitioning schemes for a shared-everytl#ing multlprocessor. Tech#cal Report GIT-CC-92/39, Georgia Institute of Technology, College of Computing, 1992.
[11]
C. StauF#. Partitioned posting files: A parallel inverted file structure for information retrieval. In A CM Special Interest Group on Information Retrieval (SIGIR), 1990.
[12]
H. S. Stone. Parallel querying of large databases: A case study. IEEE Computer, pages 11-21, October 1987.
[13]
A. Tome#ic and H. Garcia-Mollna. Caching and database scaling in distributed shared-nothlng information retrieval systems. T~c2xrdcal Report STAN-CS-92-14#6, Stanford University, December 1992.
[14]
A. Tomasic and H. Garcla,-Molinn, Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of the Second International Conference On Parallel and Distributed Information Systems, San Diego, 1993.

Cited By

View all
  • (2022)Maximizing Bigdata Retrieval: Block as a Value for NoSQL over SQLProceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1109/ASONAM55673.2022.10068692(556-563)Online publication date: 10-Nov-2022
  • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
  • (2014)Distributed text search using suffix arraysParallel Computing10.1016/j.parco.2014.06.00740:9(471-495)Online publication date: 1-Oct-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 22, Issue 2
June 1, 1993
558 pages
ISSN:0163-5808
DOI:10.1145/170036
Issue’s Table of Contents
  • cover image ACM Conferences
    SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data
    June 1993
    566 pages
    ISBN:0897915925
    DOI:10.1145/170035
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993
Published in SIGMOD Volume 22, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)73
  • Downloads (Last 6 weeks)16
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Maximizing Bigdata Retrieval: Block as a Value for NoSQL over SQLProceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1109/ASONAM55673.2022.10068692(556-563)Online publication date: 10-Nov-2022
  • (2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
  • (2014)Distributed text search using suffix arraysParallel Computing10.1016/j.parco.2014.06.00740:9(471-495)Online publication date: 1-Oct-2014
  • (2012)A five-level static cache architecture for web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2010.12.00748:5(828-840)Online publication date: 1-Sep-2012
  • (2011)Scalability Challenges in Web Search EnginesAdvanced Topics in Information Retrieval10.1007/978-3-642-20946-8_2(27-50)Online publication date: 2011
  • (2010)A refreshing perspective of search engine cachingProceedings of the 19th international conference on World wide web10.1145/1772690.1772710(181-190)Online publication date: 26-Apr-2010
  • (2001)Rank-preserving two-level caching for scalable search enginesProceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/383952.383959(51-58)Online publication date: 1-Sep-2001
  • (2019)Resource-Efficient Index Shard Replication in Large Scale Search EnginesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292442330:12(2820-2835)Online publication date: 1-Dec-2019
  • (2018)Power Efficient High Performance Packet I/OProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225129(1-10)Online publication date: 13-Aug-2018
  • (2018)Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search EnginesProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225102(1-10)Online publication date: 13-Aug-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media