Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1960475.1960480acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfastConference Proceedingsconference-collections
Article

Consistent and durable data structures for non-volatile byte-addressable memory

Published: 15 February 2011 Publication History

Abstract

The predicted shift to non-volatile, byte-addressable memory (e.g., Phase Change Memory and Memristor), the growth of "big data", and the subsequent emergence of frameworks such as memcached and NoSQL systems require us to rethink the design of data stores. To derive the maximum performance from these new memory technologies, this paper proposes the use of single-level data stores. For these systems, where no distinction is made between a volatile and a persistent copy of data, we present Consistent and Durable Data Structures (CDDSs) that, on current hardware, allows programmers to safely exploit the low-latency and non-volatile aspects of new memory technologies. CDDSs use versioning to allow atomic updates without requiring logging. The same versioning scheme also enables rollback for failure recovery. When compared to a memory-backed Berkeley DB B-Tree, our prototype-based results show that a CDDS B-Tree can increase put and get throughput by 74% and 138%. When compared to Cassandra, a two-level data store, Tembo, a CDDS B-Tree enabled distributed Key-Value system, increases throughput by up to 250%-286%.

References

[1]
The data deluge. The Economist, 394(8671):11, Feb. 2010.
[2]
A. Anand, C. Muthukrishnan, S. Kappes, A. Akella, and S. Nath. Cheap and large cams for high performance data-intensive networked systems. In Proceedings of the 7th Symposium on Networked Systems Design and Implementation (NSDI '10), pages 433-448, San Jose, CA, Apr. 2010.
[3]
D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: A fast array of wimpy nodes. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP 2009), pages 1-14, Big Sky, MT, Oct. 2009.
[4]
B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion b-tree. The VLDB Journal, 5(4):264-275, 1996.
[5]
K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R. S. Williams, K. Yelick, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Keckler, D. Klein, P. Kogge, R. S. Williams, and K. Yelick. Exascale computing study: Technology challenges in achieving exascale systems, 2008. DARPA IPTO, ExaScale Computing Study, http: //users.ece.gatech.edu/mrichard/ ExascaleComputingStudyReports/ECS_ reports.htm.
[6]
P. A. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys, 13(2):185-221, 1981.
[7]
T. Bingmann. STX B+ Tree, Sept. 2008. http: //idlebox.net/2007/stx-btree/.
[8]
H.-J. Boehm and M. Weiser. Garbage collection in an uncooperative environment. Software: Practices and Experience, 18(9):807-820, 1988.
[9]
R. Cattell. High performance data stores. http://www.cattell.net/datastores/ Datastores.pdf, Apr. 2010.
[10]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):1-26, 2008.
[11]
P. M. Chen, W. T. Ng, S. Chandra, C. M. Aycock, G. Rajamani, and D. E. Lowell. The rio file cache: Surviving operating system crashes. In Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII), pages 74- 83, Cambridge, MA, Oct. 1996.
[12]
J. Coburn, A. Caulfield, L. Grupp, A. Akel, and S. Swanson. NVTM: A transactional interface for next-generation non-volatile memories. Technical Report CS2009-0948, University of California, San Diego, Sept. 2009.
[13]
D. Comer. Ubiquitous b-tree. ACM Computing Surveys (CSUR), 11(2):121-137, 1979.
[14]
J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. C. Lee, D. Burger, and D. Coetzee. Better I/O through byte-addressable, persistent memory. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 133-146, Big Sky, MT, Oct. 2009.
[15]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10), pages 143-154, Indianapolis, IN, June 2010.
[16]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP '07), pages 205-220, Stevenson, WA, 2007.
[17]
J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. Journal of Computer and System Sciences, 38(1):86-124, 1989.
[18]
B. Fitzpatrick. Distributed caching with memcached. Linux Journal, 2004(124):5, 2004.
[19]
R. F. Freitas and W.W. Wilcke. Storage-class memory: The next storage system technology. IBM Journal of Research and Development, 52(4):439- 447, 2008.
[20]
FusionIO, Sept. 2010. http://www. fusionio.com/.
[21]
E. Gal and S. Toledo. Algorithms and data structures for flash memories. ACM Computing Surveys, 37:138-163, June 2005.
[22]
Hewlett-Packard Development Company. HP Collaborates with Hynix to Bring the Memristor to Market in Next-generation Memory, Aug. 2010. http://www.hp.com/hpinfo/ newsroom/press/2010/100831c.html.
[23]
D. Hitz, J. Lau, and M. Malcolm. File system design for an nfs file server appliance. In Proceedings of the USENIX Winter 1994 Technical Conference, pages 19-19, San Francisco, California, 1994.
[24]
B. Holden. Latency comparison between hyper-transport and pci-express in communications systems. Whitepaper, Nov. 2006.
[25]
International Technology Roadmap for Semiconductors, 2009. http://www.itrs.net/ Links/2009ITRS/Home2009.htm.
[26]
International Technology Roadmap for Semiconductors: Process integration, Devices, and Structures, 2007. http: //www.itrs.net/Links/2007ITRS/ 2007_Chapters/2007_PIDS.pdf.
[27]
D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC '97), pages 654-663, El Paso, TX, 1997.
[28]
M. Kwiatkowski. memcache@facebook, Apr. 2010. QCon Beijing 2010 Enterprise Software Development Conference. http://www.qconbeijing.com/ download/marc-facebook.pdf.
[29]
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35-40, 2010.
[30]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09), pages 2-13, Austin, TX, 2009.
[31]
Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. In Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE), pages 1303-1306, Washington, DC, USA, Apr. 2009.
[32]
D. E. Lowell and P. M. Chen. Free transactions with rio vista. In Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP), pages 92-101, St. Malo, France, Oct. 1997.
[33]
J. A. Mandelman, R. H. Dennard, G. B. Bronner, J. K. DeBrosse, R. Divakaruni, Y. Li, and C. J. Radens. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBM Journal of Research and Development, 46(2-3):187-212, 2002.
[34]
W. Mueller, G. Aichmayr, W. Bergner, E. Erben, T. Hecht, C. Kapteyn, A. Kersch, S. Kudelka, F. Lau, J. Luetzen, A. Orth, J. Nuetzel, T. Schloesser, A. Scholz, U. Schroeder, A. Sieck, A. Spitzer, M. Strasser, P.-F. Wang, S. Wege, and R. Weis. Challenges for the DRAM cell scaling to 40nm. In IEEE International Electron Devices Meeting, pages 339-342, May 2005.
[35]
C. Okasaki. Purely Functional Data Structures. Cambridge University Press, July 1999. ISBN 0521663504.
[36]
M. A. Olson, K. Bostic, and M. Seltzer. Berkeley DB. In Proceedings of the FREENIX Track: 1999 USENIX Annual Technical Conference, pages 183- 191, Monterey, CA, June 1999.
[37]
Oracle Corporation. BTRFS, June 2009. http: //btrfs.wiki.kernel.org.
[38]
J. K. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for RAMClouds: Scalable high-performance storage entirely in DRAM. ACM SIGOPS Operating Systems Review, 43:92-105, January 2010.
[39]
M. K. Qureshi, V. Srinivasan, and J. A. Rivers. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), pages 24-33, Austin, TX, June 2009.
[40]
S. Raoux, G. W. Burr., M. J. Breitwisch., C. T. Rettner., Y.-C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H.-L. Lung, and C. Lam. Phase-change random access memory: a scalable technology. IBM Journal of Research and Development, 52(4):465-479, 2008.
[41]
Redis, Sept. 2010. http://code.google. com/p/redis/.
[42]
O. Rodeh. B-trees, shadowing, and clones. ACM Transactions on Storage (TOS), 3:2:1-2:27, February 2008.
[43]
M. Satyanarayanan, H. H. Mashburn, P. Kumar, D. C. Steere, and J. J. Kistler. Lightweight recoverable virtual memory. ACM Transactions on Computer Systems, 12(1):33-57, 1994.
[44]
N. Shavit and D. Touitou. Software transactional memory. In Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing (PODC), pages 204-213, Ottawa, Canada, Aug. 1995.
[45]
C. A. N. Soules, G. R. Goodson, J. D. Strunk, and G. R. Ganger. Metadata efficiency in versioning file systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST '03), pages 43-58, San Francisco, CA, Mar. 2003.
[46]
Spansion, Inc. Using spansion ecoram to improve tco and power consumption in internet data centers, 2008. http://www.spansion.com/jp/ About/Documents/Spansion_EcoRAM_ Architecture_J.pdf.
[47]
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era (its time for a complete rewrite). In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07), pages 1150-1160, Vienna, Austria, Sept. 2007.
[48]
D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams. The missing memristor found. Nature, 453(7191):80-83, May 2008.
[49]
Sun Microsystems. ZFS, Nov. 2005. http://www.opensolaris.org/os/ community/zfs/.
[50]
P. J. Varman and R. M. Verma. An efficient multiversion access structure. IEEE Transactions on Knowledge and Data Engineering, 9(3):391-409, 1997.
[51]
S. Venkataraman and N. Tolia. Consistent and durable data structures for non-volatile byte-addressable memory. Technical Report HPL-2010- 110, HP Labs, Palo Alto, CA, Sept. 2010.
[52]
VoltDB, Sept. 2010. http://www.voltdb. com/.
[53]
R. C. Whaley and A. M. Castaldo. Achieving accurate and context-sensitive timing for code optimization. Software - Practice and Experience, 38(15): 1621-1642, 2008.
[54]
M. Wu and W. Zwaenepoel. eNVy: A non-volatile, main memory storage system. In Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI), pages 86-97, San Jose, CA, Oct. 1994.
[55]
P. Zhou, B. Zhao, J. Yang, and Y. Zhang. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA), pages 14-23, Austin, TX, June 2009.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
FAST'11: Proceedings of the 9th USENIX conference on File and stroage technologies
February 2011
20 pages
ISBN:9781931971829

Sponsors

  • OFS: OrangeFS
  • NetApp
  • Google Inc.
  • DELL
  • USENIX Assoc: USENIX Assoc

In-Cooperation

Publisher

USENIX Association

United States

Publication History

Published: 15 February 2011

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Concise Concurrent B+-Tree for Persistent MemoryACM Transactions on Architecture and Code Optimization10.1145/363871721:2(1-25)Online publication date: 25-Dec-2023
  • (2023)TL4xProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577495(245-259)Online publication date: 25-Feb-2023
  • (2022)NBTreeProceedings of the VLDB Endowment10.14778/3514061.351406615:6(1187-1200)Online publication date: 22-Jun-2022
  • (2022)APEXProceedings of the VLDB Endowment10.14778/3494124.349414115:3(597-610)Online publication date: 4-Feb-2022
  • (2022)PM-Rtree: A Highly-Efficient Crash-Consistent R-tree for Persistent MemoryProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538713(1-11)Online publication date: 6-Jul-2022
  • (2022)A Closer Look at Detectable Objects for Persistent MemoryProceedings of the 2022 Workshop on Advanced tools, programming languages, and PLatforms for Implementing and Evaluating algorithms for Distributed systems10.1145/3524053.3542749(56-64)Online publication date: 25-Jul-2022
  • (2022)Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile MemoryACM Transactions on Architecture and Code Optimization10.1145/351170619:2(1-26)Online publication date: 24-Mar-2022
  • (2022)Nap: Persistent Memory Indexes for NUMA ArchitecturesACM Transactions on Storage10.1145/350792218:1(1-35)Online publication date: 29-Jan-2022
  • (2022)Detectable recovery of lock-free data structuresProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508444(262-277)Online publication date: 2-Apr-2022
  • (2022)The performance power of software combining in persistenceProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508426(337-352)Online publication date: 2-Apr-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media