Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An update-aware storage system for low-locality update-intensive workloads

Published: 03 March 2012 Publication History

Abstract

Traditional storage systems provide a simple read/write interface, which is inadequate for low-locality update-intensive workloads because it limits the disk scheduling flexibility and results in inefficient use of buffer memory and raw disk bandwidth. This paper describes an update-aware disk access interface that allows applications to explicitly specify disk update requests and associate with such requests call-back functions that will be invoked when the requested disk blocks are brought into memory. Because call-back functions offer a continuation mechanism after retrieval of requested blocks, storage systems supporting this interface are given more flexibility in scheduling pending disk update requests. In particular, this interface enables a simple but effective technique called Batching mOdifications with Sequential Commit (BOSC), which greatly improves the sustained throughput of a storage system under low-locality update-intensive workloads. In addition, together with a space-efficient low-latency disk logging technique, BOSC is able to deliver the same durability guarantee as synchronous disk updates. Empirical measurements show that the random update throughput of a BOSC-based B+ tree is more than an order of magnitude higher than that of the same B+ tree implementation on a traditional storage system.

References

[1]
L. Arge. The buffer tree: A new technique for optimal i/o-algorithms (extended abstract). In WADS '95: Proceedings of the 4th International Workshop on Algorithms and Data Structures, pages 334--345, London, UK, 1995. Springer-Verlag.
[2]
L. Arge, K. Hinrichs, J. Vahrenhold, and J. S. Vitter. Efficient bulk operations on dynamic r-trees. Algorithmica, 33(1):104--128, 2002.
[3]
L. Arge, O. Procopiuc, and J. S. Vitter. Implementing I/O-Efficient Data Structures Using TPIE. In ESA '02: Proceedings of the 10th Annual European Symposium on Algorithms, pages 88--100, London, UK, 2002. Springer-Verlag.
[4]
L. N. Bairavasundaram, M. Sivathanu, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. X-RAY: A Non-Invasive Exclusive Caching Mechanism for RAIDs. In ISCA '04: Proceedings of the 31st annual international symposium on Computer architecture, page 176, Washington, DC, USA, 2004. IEEE Computer Society.
[5]
M. A. Bender, G. S. Brodal, R. Fagerberg, D. Ge, S. He, H. Hu, J. Iacono, and A. Lopez-Ortiz. The cost of cache-oblivious searching. In Proceedings of FOCS 2003, pages p. 271--282, 2003.
[6]
M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees. In SIAM Journal of Computing, volume 35(2), pages p. 341--358, 2005.
[7]
M. A. Bender, J. T. Fineman, S. Gilbert, and B. C. Kuszmaul. Concurrent cache-oblivious b-trees. In Proceedings of SPAA 2005, pages p. 228--237, 2005.
[8]
L. Biveinis, S. Saltenis, and C. S. Jensen. Main-memory operation buffering for efficient R-tree update. In VLDB 2007: Proceedings of the 33rd International Conference on Very Large Data Bases, pages 591--602. VLDB Endowment, 2007.
[9]
D. D. Chamberlin, M. M. Astrahan, M. W. Blasgen, J. N. Gray, W. F. King, B. G. Lindsay, R. Lorie, J. W. Mehl, T. G. Price, F. Putzolu, P. G. Selinger, M. Schkolnick, D. R. Slutz, I. L. Traiger, B. W. Wade, and R. A. Yost. A History and Evaluation of System R. Communications of the ACM, 24(10):632--646, 1981.
[10]
C. Chao, R. English, D. Jacobson, A. Stepanov, and J. Wilkes. Mime: A High-Performance Parallel Storage Device with Strong Recovery Guarantees. Technical Report HPL-CSP-92--9 rev 1, HewlettPackard Laboratories Report, November 1992.
[11]
Y.-Y. Chen, Q. Gan, and T. Suel. I/O-Efficient Techniques for Computing PageRank. In CIKM '02: Proceedings of the 11th International Conference on Information and Knowledge Management, pages 549--557, New York, NY, USA, 2002. ACM Press.
[12]
Z. Chen, Y. Zhang, Y. Zhou, H. Scott, and B. Schiefer. Empirical Evaluation of Multi-level Buffer Cache Collaboration for Storage Systems. SIGMETRICS Perform. Eval. Rev., 33(1):145--156, 2005.
[13]
T. Chiueh and L. Huang. Track-Based Disk Logging. In DSN '02: Proceedings of the 2002 International Conference on Dependable Systems and Networks, pages 429--438, Washington, DC, USA, 2002. IEEE Computer Society.
[14]
S. Chutani, O. T. Anderson, M. L. Kazar, B. W. Leverett, W. A. Mason, and R. N. Sidebotham. The Episode File System. In Proceedings of the USENIX Winter 1992 Technical Conference, pages 43--60, San Fransisco, CA, USA, 1992. USENIX Association.
[15]
T. P. P. Council. TPC Benchmark C Standard Specification, volume 1 and 2. Waterside Associates, Fremont, CA, 1.0.a edition, Aug, 1996.
[16]
A. Dan and D. Towsley. An Approximate Analysis of the LRU and FIFO Buffer Replacement Schemes. SIGMETRICS Perform. Eval. Rev., 18(1):143--152, 1990.
[17]
A. Devulapalli, D. Dalessandro, P. Wyckoff, and N. Ali. Attribute Storage Design for Object-based Storage Devices. In MSST '07: Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies, pages 263--268, Washington, DC, USA, 2007. IEEE Computer Society.
[18]
G. R. Ganger, M. K. McKusick, C. A. N. Soules, and Y. N. Patt. Soft Updates: a Solution to the Metadata Update Problem in File Systems. ACM Transactions on Computer Systems (TOCS), 18(2):127--153, 2000.
[19]
G. Graefe. Sorting and Indexing with Partitioned B-Trees. In Conference on Innovative Data Systems Research, 2003.
[20]
G. Graefe. Write-optimized B-trees. In VLDB 2004: Proceedings of the Thirtieth International Conference on Very Large Data Bases, pages 672--683. VLDB Endowment, 2004.
[21]
G. Graefe. B-tree Indexes for High Update Rates. SIGMOD Rec., 35(1):39--44, 2006.
[22]
R. Hagmann. Reimplementing the Cedar File System using Logging and Group Commit. In SOSP '87: Proceedings of the 11th ACM Symposium on Operating Systems Principles, pages 155--162, New York, NY, USA, 1987. ACM Press.
[23]
T. Haveliwala. Efficient Computation of PageRank. Technical Report 1999--31, Stanford University, Stanford University, Feburary 1999.
[24]
L. O. X. B. He, M. J. Kosa, and S. L. Scott. A Unified Multiple-Level Cache for High Performance Storage Systems. In MASCOTS '05: Proceedings of the 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pages 143--152, Washington, DC, USA, 2005. IEEE Computer Society.
[25]
J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: a Repository of Web Pages. In Proceedings of the 9th International World Wide Web Conference on Computer Networks : the International Journal of Computer and Telecommunications Networking, pages 277--293, Amsterdam, The Netherlands, 2000. North-Holland Publishing Co.
[26]
G. F. Hughes and J. F. Murray. Reliability and Security of RAID Storage Systems and D2D Archives using SATA Disk Drives. Transactions on Storage, 1(1):95--107, 2005.
[27]
The InnoDB Storage Engine.
[28]
N. P. Jouppi. Cache Write Policies and Performance. In ISCA '93: Proceedings of the 20th annual international symposium on Computer architecture, pages 191--201, New York, NY, USA, 1993. ACM Press.
[29]
V. Y. K. Kher. Decentralized Authentication Mechanisms for Object-based Storage Devices. In SISW '03: Proceedings of the Second IEEE International Security in Storage Workshop, page 1, Washington, DC, USA, 2003. IEEE Computer Society.
[30]
M. Lifantsev and T. Chiueh. I/O-Conscious Data Preparation for Large-Scale Web Search Engines. In VLDB '02: Proceedings of the 32nd International Conference on Very Large Data Bases, pages 382--393, Hongkong, China, 2002. VLDB Endowment.
[31]
M. Lu, S. Lin, and T. Chiueh. Efficient Logging and Replication Techniques for Comprehensive Data Protection. In MSST '07: Proceedings of the 24th IEEE Conference on Mass Storage Systems and Technologies, pages 171--184, Washington, DC, USA, 2007. IEEE Computer Society.
[32]
C. R. Lumb, J. Schindler, and G. R. Ganger. Freeblock Scheduling Outside of Disk Firmware. In FAST '02: Proceedings of the Conference on File and Storage Technologies, pages 275--288, Berkeley, CA, USA, 2002. USENIX Association.
[33]
C. Malakapalli and V. Gunturu. Evaluation of SCSI over TCP/IP and SCSI over Fibre Channel Connections. In HOTI '01: Proceedings of the The Ninth Symposium on High Performance Interconnects (HOTI '01), page 87, Washington, DC, USA, 2001. IEEE Computer Society.
[34]
M. K. McKusick and G. R. Ganger. Soft Updates: a Technique for Eliminating Most Synchronous Writes in the Fast Filesystem. In ATEC'99: Proceedings of the Annual Technical Conference on 1999 USENIX Annual Technical Conference, pages 24--24, Berkeley, CA, USA, 1999. USENIX Association.
[35]
S. Melnik, S. Raghavan, B. Yang, and H. Garcia-Molina. Building a Distributed Full-Text Index for the Web. In WWW '01: Proceedings of the 10th International Conference on World Wide Web, pages 396--406, New York, NY, USA, 2001. ACM Press.
[36]
S. Mitra, W. W. Hsu, and M. Winslett. Trustworthy Keyword Search for Regulatory-Compliant Records Retention. In VLDB '06: Proceedings of the 32nd International Conference on Very Large Data Bases, pages 1001--1012. VLDB Endowment, 2006.
[37]
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. Aries: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. In ACM Transactions on Database Systems, vol. 17, pages 94--162, 1992.
[38]
E. B. Nightingale, K. Veeraraghavan, P. M. Chen, and J. Flinn. Rethink the Sync. In OSDI'06: Proceedings of the 7th conference on USENIX Symposium on Operating Systems Design and Implementation, pages 1--14, Berkeley, CA, USA, 2006. USENIX Association.
[39]
Open Source Development Labs (OSDL). Database Test Suite: DBT-{1,2,3,4,5}. http://osdldbt.sourceforge.net/, 2003.
[40]
C. U. Orji and J. A. Solworth. Write-Only Disk Cache Experiments on Multiple Surface Disks. In ICCI '92: Proceedings of the Fourth International Conference on Computing and Information, pages 385--388, Washington, DC, USA, 1992. IEEE Computer Society.
[41]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999--66, Stanford Digital Library Technologies Project, 1998.
[42]
O. Procopiuc, P. Agarwal, L. Arge, and J. Vitter. Bkd-tree: A dynamic scalable kd-tree, 2002.
[43]
B. Ribeiro-Neto, E. S. Moura, M. S. Neubert, and N. Ziviani. Efficient Distributed Algorithms to Build Inverted Files. In SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 105--112, New York, NY, USA, 1999. ACM Press.
[44]
E. Riedel, C. Faloutsos, G. A. Gibson, and D. Nagle. Active disks: Remote execution for network-attached storage. Technical Report Carnegie Mellon University-CS-97--198, Parallel Data Lab, Carnegie Mellon University, December 1997.
[45]
M. Rosenblum and J. K. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Transactions on Computer Systems (TOCS), 10(1):26--52, 1992.
[46]
R. Stata, K. Bharat, and F. Maghoul. The Term Vector Database: Fast Access to Indexing Terms for Web Pages. In Proceedings of the 9th International World Wide Web Conference on Computer Networks : the International Journal of Computer and Telecommunications Networking, pages 247--255, Amsterdam, The Netherlands, 2000. North-Holland Publishing Co.
[47]
M. Stonebraker. The Design of the POSTGRES Storage System. In VLDB '87: Proceedings of the 13th International Conference on Very Large Data Bases, pages 289--300, San Francisco, CA, USA, 1987. Morgan Kaufmann Publishers Inc.
[48]
D. E. Vengroff and J. S. Vitter. I/O-Efficient Algorithms and Environments. ACM Computing Surveys (CSUR), 28(4):212, 1996.
[49]
W. Wang, Y. Zhao, and R. Bunt. HyLog: A High Performance Approach to Managing Disk Layout. In FAST '04: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pages 145--158, Berkeley, CA, USA, 2004. USENIX Association.
[50]
C. Zhang, X. Yu, A. Krishnamurthy, and R. Y. Wang. Configuring and Scheduling an Eager-Writing Disk Array for a Transaction Processing Workload. In FAST '02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, page 24, Berkeley, CA, USA, 2002. USENIX Association.

Index Terms

  1. An update-aware storage system for low-locality update-intensive workloads

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 40, Issue 1
      ASPLOS '12
      March 2012
      453 pages
      ISSN:0163-5964
      DOI:10.1145/2189750
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS XVII: Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
        March 2012
        476 pages
        ISBN:9781450307598
        DOI:10.1145/2150976
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 03 March 2012
      Published in SIGARCH Volume 40, Issue 1

      Check for updates

      Author Tags

      1. B+ trees
      2. BOSC
      3. buffered writes
      4. fast logging
      5. hard disks
      6. low-locality
      7. storage
      8. update interface
      9. update-intensive

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 25 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media