Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Cache Design of SSD-Based Search Engine Architectures: An Experimental Study

Published: 28 October 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Caching is an important optimization in search engine architectures. Existing caching techniques for search engine optimization are mostly biased towards the reduction of random accesses to disks, because random accesses are known to be much more expensive than sequential accesses in traditional magnetic hard disk drive (HDD). Recently, solid-state drive (SSD) has emerged as a new kind of secondary storage medium, and some search engines like Baidu have already used SSD to completely replace HDD in their infrastructure. One notable property of SSD is that its random access latency is comparable to its sequential access latency. Therefore, the use of SSDs to replace HDDs in a search engine infrastructure may void the cache management of existing search engines. In this article, we carry out a series of empirical experiments to study the impact of SSD on search engine cache management. Based on the results, we give insights to practitioners and researchers on how to adapt the infrastructure and caching policies for SSD-based search engines.

    References

    [1]
    Devesh Agrawal, Deepak Ganesan, Ramesh Sitaraman, Yanlei Diao, and Shashi Singh. 2009. Lazy-adaptive tree: An optimized index structure for flash devices. Proc. VLDB Endow. 2, 1 (2009), 361--372.
    [2]
    Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of the USENIX Conference on Annual Technical Conference (ATC). 57--70.
    [3]
    Ismail Sengor Altingovde, Rifat Ozcan, B. Barla Cambazoglu, and Özgür Ulusoy. 2011. Second chance: A hybrid approach for dynamic result caching in search engines. In Proceedings of the European Conference on Advances in Information Retrieval (ECIR). 510--516.
    [4]
    Ismail Sengor Altingovde, Rifat Ozcan, and Özgür Ulusoy. 2009. A cost-aware strategy for query result caching in web search engines. In Proceedings of the European Conference on Advances in Information Retrieval (ECIR). 628--636.
    [5]
    Ricardo Baeza-Yates, Carlos Castillo, Flavio Junqueira, Vassilis Plachouras, and Fabrizio Silvestri. 2007a. Challenges on distributed web retrieval. In Proceedings of the International Conference on Data Engineering (ICDE). 6--20.
    [6]
    Ricardo Baeza-Yates, Aristides Gionis, Flavio Junqueira, Vanessa Murdock, Vassilis Plachouras, and Fabrizio Silvestri. 2007b. The impact of caching on search engines. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR). 183--190.
    [7]
    Ricardo Baeza-Yates, Aristides Gionis, Flavio P. Junqueira, Vanessa Murdock, Vassilis Plachouras, and Fabrizio Silvestri. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4 (2008), 1--28.
    [8]
    Ricardo Baeza-Yates and Simon Jonassen. 2012. Modeling static caching in web search engines. In Proceedings of the European Conference on Advances in Information Retrieval (ECIR). 436--446.
    [9]
    Ricardo Baeza-Yates and Felipe Saint-Jean. 2003. A three level search engine index based in query log distribution. In Proceedings of the International Symposium on String Processing and Information Retrieval (SPIRE). 56--65.
    [10]
    Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro Mag. 23, 2 (2003), 22--28.
    [11]
    Laszlo A. Belady. 1966. A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5, 2 (1966), 78--101.
    [12]
    Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 426--434.
    [13]
    Pei Cao and Sandy Irani. 1997. Cost-aware WWW proxy caching algorithms. In Proceedings of the USENIX Symposium on Internet Technologies and Systems.
    [14]
    Diego Ceccarelli, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri. 2011. Caching query-biased snippets for efficient retrieval. In Proceedings of the International Conference on Extending Database Technology (EDBT). 93--104.
    [15]
    Feng Chen, David A. Koufaty, and Xiaodong Zhang. 2009. Understanding intrinsic characteristics and system implications of flash memory based solid state drives. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). 181--192.
    [16]
    Shimin Chen, Phillip B. Gibbons, and Suman Nath. 2011. Rethinking database algorithms for phase change memory. In Proceedings of the International Conference on Innovative Data Systems Research (CIDR). 21--31.
    [17]
    Jeffrey Dean. 2009. Challenges in building large-scale information retrieval systems: Invited talk. In Proceedings of the International Conference on Web Search and Data Mining (WSDM).
    [18]
    Biplob Debnath, Sudipta Sengupta, and Jin Li. 2010. FlashStore: High throughput persistent key-value store. Proc. VLDB Endow. 3, 1--2 (2010), 1414--1425.
    [19]
    Klaus Elhardt and Rudolf Bayer. 1984. A database cache for high performance and fast restart in database systems. ACM Trans. Datab. Syst. 9, 4 (1984), 503--525.
    [20]
    Tiziano Fagni, Raffaele Perego, Fabrizio Silvestri, and Salvatore Orlando. 2006. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24, 1 (2006), 51--78.
    [21]
    Brad Fitzpatrick. 2009. Memcached -- A distributed memory object caching system. http://memcached.org/. (2009).
    [22]
    Flexstar Technology. 2012. Flexstar SSD test market analysis. http://info.flexstar.com/Portals/161365/docs/SSD_Testing_Market_Analysis.pdf. (2012).
    [23]
    Eran Gal and Sivan Toledo. 2005. Algorithms and data structures for flash memories. ACM Comput. Surv. 37, 2 (2005), 138--163.
    [24]
    Qingqing Gan and Torsten Suel. 2009. Improved techniques for result caching in web search engines. In Proceedings of the International Conference on World Wide Web (WWW). 431--440.
    [25]
    Goetz Graefe. 2009. The five-minute rule 20 years later (and how flash memory changes the rules). Commun. ACM 52, 7 (2009), 48--59.
    [26]
    Jim Gray. 2006. Tape is dead, disk is tape, flash is disk, ram locality is king. http://research.microsoft.com/en-us/um/people/gray/talks/Flash_is_Good.ppt. (2006).
    [27]
    Ari Geir Hauksson and Sverrir Smundsson. 2007. Data storage technologies. http://olafurandri.com/nyti/papers2007/DST.pdf. (2007).
    [28]
    Enric Herrero, José González, and Ramon Canal. 2008. Distributed cooperative caching. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). 134--143.
    [29]
    Bojun Huang and Zenglin Xia. 2011. Allocating inverted index into flash memory for search engines. In Proceedings of the International Conference on World Wide Web (WWW). 61--62.
    [30]
    Song Jiang and Xiaodong Zhang. 2002. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). 31--42.
    [31]
    Atsuo Kawaguchi, Shingo Nishioka, and Hiroshi Motoda. 1995. A flash-memory based file system. In Proceedings of the USENIX Conference on Annual Technical Conference (ATC). 155--164.
    [32]
    Zsolt Kerekes. 2009. Storage market outlook to 2015. http://www.storagesearch.com/5year-2009.html. (2009).
    [33]
    Sang-Won Lee and Bongki Moon. 2007. Design of flash-based DBMS: An in-page logging approach. In Proceedings of the ACM Conference on Management of Data (SIGMOD). 55--66.
    [34]
    Sang-Won Lee, Bongki Moon, Chanik Park, Jae-Myung Kim, and Sang-Woo Kim. 2008. A case for flash memory SSD in enterprise database applications. In Proceedings of the ACM Conference on Management of Data (SIGMOD). 1075--1086.
    [35]
    Ruixuan Li, Xuefan Chen, Chengzhou Li, Xiwu Gu, and Kunmei Wen. 2012a. Efficient online index maintenance for SSD-based information retrieval systems. In Proceedings of the International Conference on High Performance Computing and Communication (HPCC). 262--269.
    [36]
    Ruixuan Li, Chengzhou Li, Weijun Xiao, Hai Jin, Heng He, Xiwu Gu, Kunmei Wen, and Zhiyong Xu. 2012b. An efficient SSD-based hybrid storage architecture for large-scale search engines. In Proceedings of the International Conference on Parallel Processing (ICPP). 450--459.
    [37]
    Yinan Li, Bingsheng He, Robin Jun Yang, Qiong Luo, and Ke Yi. 2010. Tree indexing on solid state drives. Proc. VLDB Endow. 3, 1--2 (2010), 1195--1206.
    [38]
    Xiaohui Long and Torsten Suel. 2005. Three-level caching for efficient query processing in large web search engines. In Proceedings of the International Conference on World Wide Web (WWW). 257--266.
    [39]
    Ruyue Ma. 2010. Baidu distributed database. In Proceedings of the System Architect Conference China (SACC).
    [40]
    Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schtze. 2008. Introduction to Information Retrieval. Cambridge University Press.
    [41]
    Mauricio Marin, Veronica Gil-Costa, and Carlos Gomez-Pantoja. 2010. New caching techniques for web search engines. In Proceedings of the ACM International Symposium on High Performance Distributed Computing (HPDC). 215--226.
    [42]
    Evangelos P. Markatos. 2001. On caching search engine query results. Comput. Commun. 24, 2 (2001), 137--143.
    [43]
    Dushyanth Narayanan, Eno Thereska, Austin Donnelly, Sameh Elnikety, and Antony Rowstron. 2009. Migrating server storage to SSDs: Analysis of tradeoffs. In Proceedings of the ACM European Conference on Computer Systems (EuroSys). 145--158.
    [44]
    Suman Nath and Phillip B. Gibbons. 2008. Online maintenance of very large random samples on flash storage. Proc. VLDB Endow. 1, 1 (2008), 970--983.
    [45]
    Rifat Ozcan, Ismail Sengor Altingovde, B. Barla Cambazoglu, Flavio P. Junqueira, and Özgür Ulusoy. 2011b. A five-level static cache architecture for web search engines. Inf. Process. Manag. 48, 5 (2011), 828--840.
    [46]
    Rifat Ozcan, Ismail Sengor Altingovde, and Özgür Ulusoy. 2008. Static query result caching revisited. In Proceedings of the International Conference on World Wide Web (WWW). 1169--1170.
    [47]
    Rifat Ozcan, Ismail Sengor Altingovde, and Özgür Ulusoy. 2011a. Cost-aware strategies for query result caching in web search engines. ACM Trans. Web 5, 2 (2011), 1--25.
    [48]
    Seon-yeong Park, Dawoon Jung, Jeong-uk Kang, Jin-soo Kim, and Joonwon Lee. 2006. CFLRU: A replacement algorithm for flash memory. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). 234--241.
    [49]
    Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In Proceedings of the International Conference on Scalable Information Systems (InfoScale).
    [50]
    Stefan Podlipnig and Laszlo Böszörmenyi. 2003. A survey of Web cache replacement strategies. ACM Comput. Surv. 35, 4 (2003), 374--398.
    [51]
    Hongchan Roh, Sanghyun Park, Sungho Kim, Mincheol Shin, and Sang-Won Lee. 2011. B+-tree index optimization by exploiting internal parallelism of flash-based solid state drives. Proceedings of the VLDB Endowment (PVLDB) 5, 4 (2011), 286--297.
    [52]
    Paricia Correia Saraiva, Edleno Silva de Moura, Novio Ziviani, Wagner Meira, Rodrigo Fonseca, and Berthier Riberio-Neto. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR). 51--58.
    [53]
    Mohit Saxena and Michael M. Swift. 2009. FlashVM: Revisiting the virtual memory hierarchy. In Proceedings of the International Conference on Hot Topics in Operating Systems (HotOS).
    [54]
    Falk Scholer, Hugh E. Williams, John Yiannis, and Justin Zobel. 2002. Compression of inverted indexes for fast query evaluation. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR). 222--229.
    [55]
    Euiseong Seo, Seon Yeong Park, and Bhuvan Urgaonkar. 2008. Empirical analysis on energy efficiency of flash-based SSDs. In Proceedings of the International Conference on Power Aware Computing and Systems (HotPower).
    [56]
    Mehul A. Shah, Stavros Harizopoulos, Janet L. Wiener, and Goetz Graefe. 2008. Fast scans and joins using flash drives. In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN). 17--24.
    [57]
    Anastasios Tombros and Mark Sanderson. 1998. Advantages of query biased summaries in information retrieval. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR). 2--10.
    [58]
    Andrew Trotman. 2003. Compressing inverted files. Inf. Retr. 6, 1 (2003), 5--19.
    [59]
    Dimitris Tsirogiannis, Stavros Harizopoulos, Mehul A. Shah, Janet L. Wiener, and Goetz Graefe. 2009. Query processing techniques for solid state drives. In Proceedings of the ACM Conference on Management of Data (SIGMOD). 59--72.
    [60]
    Andrew Turpin, Yohannes Tsegay, David Hawking, and Hugh E. Williams. 2007. Fast generation of result snippets in web search. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR). 127--134.
    [61]
    Howard Turtle and James Flood. 1995. Query evaluation: Strategies and optimizations. Inf. Process. Manag. 31, 6 (1995), 831--850.
    [62]
    Jianguo Wang, Eric Lo, Man Lung Yiu, Jiancong Tong, Gang Wang, and Xiaoguang Liu. 2013. The impact of solid state drive on search engine cache management. In Proceedings of the ACM Conference on Research and Development in Information Retrieval (SIGIR). 693--702.
    [63]
    William Webber and Alistair Moffat. 2005. In search of reliable retrieval experiments. In Proceedings of the Australasian Document Computing Symposium (ADCS). 26--33.
    [64]
    Jiangong Zhang, Xiaohui Long, and Torsten Suel. 2008. Performance of compressed inverted list caching in search engines. In Proceedings of the International Conference on World Wide Web (WWW). 387--396.

    Cited By

    View all
    • (2023)ISP Agent: A Generalized In-storage-processing Workload Offloading Framework by Providing Multiple Optimization OpportunitiesACM Transactions on Architecture and Code Optimization10.1145/363295121:1(1-24)Online publication date: 14-Nov-2023
    • (2022)NDANN: efficient SSD-based approximate nearest neighbor search through navigationInternational Conference on Mechanisms and Robotics (ICMAR 2022)10.1117/12.2652299(63)Online publication date: 10-Nov-2022
    • (2022)An NVM SSD-based High Performance Query Processing Framework for Search EnginesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3160557(1-1)Online publication date: 2022
    • Show More Cited By

    Index Terms

    1. Cache Design of SSD-Based Search Engine Architectures: An Experimental Study

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 32, Issue 4
      October 2014
      198 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/2684820
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 October 2014
      Received: 01 September 2014
      Accepted: 01 August 2014
      Revised: 01 June 2014
      Published in TOIS Volume 32, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Search engine
      2. cache
      3. query processing
      4. solid-state drive

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)15
      • Downloads (Last 6 weeks)1

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)ISP Agent: A Generalized In-storage-processing Workload Offloading Framework by Providing Multiple Optimization OpportunitiesACM Transactions on Architecture and Code Optimization10.1145/363295121:1(1-24)Online publication date: 14-Nov-2023
      • (2022)NDANN: efficient SSD-based approximate nearest neighbor search through navigationInternational Conference on Mechanisms and Robotics (ICMAR 2022)10.1117/12.2652299(63)Online publication date: 10-Nov-2022
      • (2022)An NVM SSD-based High Performance Query Processing Framework for Search EnginesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3160557(1-1)Online publication date: 2022
      • (2022)Distributed and Decentralized Edge Caching in 5G Networks Using Non-Volatile Memory Systems2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00048(425-435)Online publication date: Jul-2022
      • (2021)Evaluating List Intersection on SSDs for Parallel I/O Skipping2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00161(1823-1828)Online publication date: Apr-2021
      • (2021)Three-level Compact Caching for Search Engines Based on Solid State Drives2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00030(16-25)Online publication date: Dec-2021
      • (2021)Building an Accessible, Usable, Scalable, and Sustainable Service for Scholarly Big Data2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671612(141-152)Online publication date: 15-Dec-2021
      • (2021)Exploiting temporal changes in query submission behavior for improving the search engine result cache performanceInformation Processing & Management10.1016/j.ipm.2021.10253358:3(102533)Online publication date: May-2021
      • (2020)blockNDPProceedings of the 21st International Middleware Conference Industrial Track10.1145/3429357.3430519(8-15)Online publication date: 7-Dec-2020
      • (2020)An NVM SSD-Optimized Query Processing FrameworkProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412010(935-944)Online publication date: 19-Oct-2020
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media