article

Free access

Caching and database scaling in distributed shared-nothing information retrieval systems

Authors:

Anthony Tomasic,

Hector Garcia-MolinaAuthors Info & Claims

ACM SIGMOD Record, Volume 22, Issue 2

Pages 129 - 138

https://doi.org/10.1145/170036.170063

Published: 01 June 1993 Publication History

PDF eReader

Abstract

A common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPECT database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an “optimal” configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.

References

[1]

F. J. Burkowski. Retrieval performance of a distributed text database utilizing a parallel processor document server. In Proceedings of the Second InterT#ational Syrtzposium on Databases in Parallel and Distributed Systems, pages 71-79, Dublin, Ireland, 1990.

Digital Library

Google Scholar

[2]

A.L. Chervenak. Performance measurements of the first raid prototype. Technical Report UCB/UCD 90/574, University of California, Berkley, May 1990.

Digital Library

Google Scholar

[3]

J. K. Cringean, I%. England, G. A. Manson, and P. Willett. Parallel text searcldng in serial files using a processor farm. In SIGIR 1990, pages 429-453, 1990.

Digital Library

Google Scholar

[4]

S. DeFazio and J. Hull. Toward servicing textual database transactions on symmetric shared memory multiprocessors. In Proceedings of the intern=tional Workshop on High Performance Transaction Systems, Asilomar, 1991.

Google Scholar

[5]

P. A. Erarath. Page Indzzing Jot Teztual Inyorma#ion Retrieval Systems. PhD thesis, University of illinois at Urbane-Champaign, October 1983.

Google Scholar

[6]

C. Faloutsos. Access methods for text. A CM Computing Sur#Je!/s, 17:50-74, 1985.

Digital Library

Google Scholar

[7]

W. B. Frt#kes and R. Baeza-Yates. Information Retrieval: Data Structures and Algorithms. Prentice-Hall, 1992.

Digital Library

Google Scholar

[8]

J. Gray and A, Router. Transaction Processing: Concepts and Techniq#tes. Morgan Kaufmann, 1993.

Digital Library

Google Scholar

[9]

1%. Jain. The Art of Computer Systems Performance Analysis. John Wiley and Sons, New York, 1991.

Google Scholar

[10]

B.-S. Jeong and E. Omleclnskl. Inverted file partitioning schemes for a shared-everytl#ing multlprocessor. Tech#cal Report GIT-CC-92/39, Georgia Institute of Technology, College of Computing, 1992.

Google Scholar

[11]

C. StauF#. Partitioned posting files: A parallel inverted file structure for information retrieval. In A CM Special Interest Group on Information Retrieval (SIGIR), 1990.

Digital Library

Google Scholar

[12]

H. S. Stone. Parallel querying of large databases: A case study. IEEE Computer, pages 11-21, October 1987.

Digital Library

Google Scholar

[13]

A. Tome#ic and H. Garcia-Mollna. Caching and database scaling in distributed shared-nothlng information retrieval systems. T~c2xrdcal Report STAN-CS-92-14#6, Stanford University, December 1992.

Google Scholar

[14]

A. Tomasic and H. Garcla,-Molinn, Performance of inverted indices in shared-nothing distributed text document information retrieval systems. In Proceedings of the Second International Conference On Parallel and Distributed Information Systems, San Diego, 1993.

Digital Library

Google Scholar

Cited By

View all

Gidado AEzeife CAlhajj RAgarwal NMa ZRokne JAn JCharalampos CMagdy W(2022)Maximizing Bigdata Retrieval: Block as a Value for NoSQL over SQLProceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1109/ASONAM55673.2022.10068692(556-563)Online publication date: 10-Nov-2022
https://dl.acm.org/doi/10.1109/ASONAM55673.2022.10068692
Cambazoglu BBaeza-Yates R(2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
https://doi.org/10.2200/S00662ED1V01Y201508ICR045
Arroyuelo DBonacic CGil-Costa VMarin MNavarro G(2014)Distributed text search using suffix arraysParallel Computing10.1016/j.parco.2014.06.00740:9(471-495)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1016/j.parco.2014.06.007
Show More Cited By

Index Terms

Recommendations

Caching and database scaling in distributed shared-nothing information retrieval systems
SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data

A common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPECT database of abstracts of the literature on physics, computer science, ...
Database Systems: A Practical Approach to Design, Implementation and Management
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

The advent of affordable, shared-nothing computing systems portends a new class of parallel database management systems (DBMS) for on-line transaction processing (OLTP) applications that scale without sacrificing ACID guarantees [7, 9]. The performance ...

Comments

Information & Contributors

Information

Published In

ACM SIGMOD Record Volume 22, Issue 2

June 1, 1993

558 pages

ISSN:0163-5808

DOI:10.1145/170036

Editors:
Peter Buneman
Univ. of Pennsylvania
,
Sushil Jajodia,
Won Kim
UniSQL, Inc., Austin, TX

Issue’s Table of Contents

SIGMOD '93: Proceedings of the 1993 ACM SIGMOD international conference on Management of data
June 1993
566 pages
ISBN:0897915925
DOI:10.1145/170035
Editors:
Peter Buneman
Univ. of Pennsylvania
,
Sushil Jajodia

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1993

Published in SIGMOD Volume 22, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
568
Total Downloads

Downloads (Last 12 months)73
Downloads (Last 6 weeks)16

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Gidado AEzeife CAlhajj RAgarwal NMa ZRokne JAn JCharalampos CMagdy W(2022)Maximizing Bigdata Retrieval: Block as a Value for NoSQL over SQLProceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1109/ASONAM55673.2022.10068692(556-563)Online publication date: 10-Nov-2022
https://dl.acm.org/doi/10.1109/ASONAM55673.2022.10068692
Cambazoglu BBaeza-Yates R(2015)Scalability Challenges in Web Search EnginesSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00662ED1V01Y201508ICR0457:6(1-138)Online publication date: 29-Dec-2015
https://doi.org/10.2200/S00662ED1V01Y201508ICR045
Arroyuelo DBonacic CGil-Costa VMarin MNavarro G(2014)Distributed text search using suffix arraysParallel Computing10.1016/j.parco.2014.06.00740:9(471-495)Online publication date: 1-Oct-2014
https://dl.acm.org/doi/10.1016/j.parco.2014.06.007
Ozcan RSengor Altingovde IBarla Cambazoglu BJunqueira FUlusoy Ö(2012)A five-level static cache architecture for web search enginesInformation Processing and Management: an International Journal10.1016/j.ipm.2010.12.00748:5(828-840)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1016/j.ipm.2010.12.007
Cambazoglu BBaeza-Yates R(2011)Scalability Challenges in Web Search EnginesAdvanced Topics in Information Retrieval10.1007/978-3-642-20946-8_2(27-50)Online publication date: 2011
https://doi.org/10.1007/978-3-642-20946-8_2
Cambazoglu BJunqueira FPlachouras VBanachowski SCui BLim SBridge BRappa MJones PFreire JChakrabarti S(2010)A refreshing perspective of search engine cachingProceedings of the 19th international conference on World wide web10.1145/1772690.1772710(181-190)Online publication date: 26-Apr-2010
https://dl.acm.org/doi/10.1145/1772690.1772710
Saraiva PSilva de Moura EZiviani NMeira WFonseca RRibeiro-Neto BKraft DCroft WHarper DZobel J(2001)Rank-preserving two-level caching for scalable search enginesProceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/383952.383959(51-58)Online publication date: 1-Sep-2001
https://dl.acm.org/doi/10.1145/383952.383959
Li YTang XCai WTong JLiu XWang G(2019)Resource-Efficient Index Shard Replication in Large Scale Search EnginesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292442330:12(2820-2835)Online publication date: 1-Dec-2019
https://doi.org/10.1109/TPDS.2019.2924423
Li XCheng WZhang TXie JRen FYang B(2018)Power Efficient High Performance Packet I/OProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225129(1-10)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3225058.3225129
Li YTang XCai WTong JLiu XWang GGao CCao XGeng GLi M(2018)Index Shard Replication Strategies for Improving Resource Utilization in Large Scale Search EnginesProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225102(1-10)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3225058.3225102
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations