Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3034786.3056117acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article
Public Access

Write-Optimized Skip Lists

Published: 09 May 2017 Publication History

Abstract

The skip list is an elegant dictionary data structure that is commonly deployed in RAM. A skip list with N elements supports searches, inserts, and deletes in O(log N) operations with high probability (w.h.p.) and range queries returning K elements in O(log N + K) operations w.h.p.
A seemingly natural way to generalize the skip list to external memory with block size B is to "promote" with probability 1/B, rather than 1/2. However, there are practical and theoretical obstacles to getting the skip list to retain its efficient performance, space bounds, and high-probability guarantees.
We give an external-memory skip list that achieves write-optimized bounds. That is, for 0 < ε < 1, range queries take O(logBε N + K/B) I/Os w.h.p. and insertions and deletions take O((logBε N) / B1-ε) amortized I/Os w.h.p.
Our write-optimized skip list inherits the virtue of simplicity from RAM skip lists. Moreover, it matches or beats the asymptotic bounds of prior write-optimized data structures such as the Bε & tree or LSM trees, which are deployed in high-performance databases and file systems.
The main technical challenge in proving our bounds comes from the fact that there are so few levels in the skip list, an aspect of the data structure that is essential to getting strong external-memory bounds. We use extremal-graph coloring to show that it is possible to decompose paths in the skip list into uncorrelated groups, regardless of the insertion/deletion pattern. Thus, we achieve our bounds by averaging over these uncorrelated paths rather than by averaging over uncorrelated levels, as in the standard skip list.

References

[1]
I. Abraham, J. Aspnes, and J. Yuan. Skip B-trees. In Proc. 9th Annual International Conference on Principles of Distributed Systems (OPODIS), page 366, 2006.
[2]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, Sept. 1988.
[3]
A. Anagnostopoulos, M. Goodrich, and R. Tamassia. Persistent authenticated dictionaries and their applications. Information Security, pages 379--393, 2001.
[4]
L. Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.
[5]
L. Arge, D. Eppstein, and M. T. Goodrich. Skip-webs: efficient distributed data structures for multi-dimensional data sets. In Proc. 24th Annual ACM Symposium on Principles of Distributed Computing (PODC), pages 69--76, 2005.
[6]
J. Aspnes and G. Shah. Skip graphs. ACM Transactions on Algorithms, 3(4):37, 2007.
[7]
R. Bayer and E. M. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica, 1:173--189, 1972.
[8]
M. A. Bender, J. W. Berry, R. Johnson, T. M. Kroeger, S. McCauley, C. A. Phillips, B. Simon, S. Singh, and D. Zage. Anti-persistence on persistent storage: History-independent sparse tables and dictionaries. In Proc. 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), pages 289--302, 2016.
[9]
M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In Proc. 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 81--92, 2007.
[10]
M. A. Bender, M. Farach-Colton, W. Jannen, R. Johnson, B. C. Kuszmaul, D. E. Porter, J. Yuan, and Y. Zhan. An introduction to Bε-trees and write-optimization. :login; magazine, 40(5):22--28, October 2015.
[11]
M. A. Bender, M. Farach-Colton, R. Johnson, R. Kraner, B. C. Kuszmaul, D. Medjedovic, P. Montes, P. Shetty, R. P. Spillane, and E. Zadok. Don't thrash: How to cache your hash on flash. Proc. of the VLDB Endowment (PVLDB), 5(11):1627--1637, 2012.
[12]
M. A. Bender, J. T. Fineman, S. Gilbert, and B. C. Kuszmaul. Concurrent cache-oblivious B-trees. In Proc. 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 228--237, 2005.
[13]
G. S. Brodal, E. D. Demaine, J. T. Fineman, J. Iacono, S. Langerman, and J. I. Munro. Cache-oblivious dynamic dictionaries with update/query tradeoffs. In Proc.\ 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1448--1456, 2010.
[14]
G. S. Brodal and R. Fagerberg. Lower bounds for external memory dictionaries. In Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 546--554, 2003.
[15]
A. L. Buchsbaum, M. Goldwasser, S. Venkatasubramanian, and J. R. Westbrook. On external memory graph traversal. In Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 859--860, 2000.
[16]
P. Callahan, M. T. Goodrich, and K. Ramaiyer. Topology B-trees and their applications. In Proc. 4th International Workshop on Algorithms and Data Structures (WADS), pages 381--392, 1995.
[17]
V. Ciriani, P. Ferragina, F. Luccio, and S. Muthukrishnan. Static optimality theorem for external memory string access. In Proc. 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 219--227, 2002.
[18]
L. Devroye. A limit theory for random skip lists. The Annals of Applied Probability, pages 597--609, 1992.
[19]
M. Fomitchev and E. Ruppert. Lock-free linked lists and skip lists. In Proc. 23rd Annual ACM Symposium on Principles of Distributed Computing (PODC), pages 50--59, 2004.
[20]
D. Golovin. The B-skip-list: A simpler uniquely represented alternative to B-trees. arXiv preprint arXiv:1005.0662, 2010.
[21]
M. T. Goodrich and R. Tamassia. Efficient authenticated dictionaries with skip lists and commutative hashing. US Patent App, 10(416,015), 2000.
[22]
G. Graefe. Write-optimized B-trees. In Proc. 30th International Conference on Very Large Data Bases (VLDB), pages 672--683, 2004.
[23]
G. Graefe. B-tree indexes for high update rates. SIGMOD Rec., 35(1):39--44, Mar. 2006.
[24]
M. Herlihy, Y. Lev, V. Luchangco, and N. Shavit. A provably correct scalable concurrent skip list. In Proc. Conference On Principles of Distributed Systems (OPODIS), 2006.
[25]
M. Herlihy, Y. Lev, V. Luchangco, and N. Shavit. A simple optimistic skiplist algorithm. Proc. 14th Annual Colloquium on Structural Information and Communication Complexity (SIROCCO), page 124, 2007.
[26]
R. Jacob, A. Richa, C. Scheideler, S. Schmid, and H. Täubig. A distributed polylogarithmic time algorithm for self-stabilizing skip graphs. In Proc. 28th ACM Symposium on Principles of Distributed Computing (PODC), pages 131--140, 2009.
[27]
C. Jermaine, A. Datta, and E. Omiecinski. A novel index supporting high volume data warehouse insertion. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, Proc. 25th International Conference on Very Large Data Bases (VLDB), pages 235--246, 1999.
[28]
P. Kirschenhofer and H. Prodinger. The path length of random skip lists. Acta Informatica, 31(8):775--792, 1994.
[29]
G. Korland, N. Shavit, and P. Felber. Noninvasive concurrency with Java S™. In Third Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG), 2010.
[30]
Y. Li, B. He, Q. Luo, and K. Yi. Tree indexing on flash disks. In Proc. 2009 IEEE International Conference on Data Engineering (ICDE), pages 1303--1306, 2009.
[31]
P. A. Martin. Method and apparatus for implementing a lock-free skip list that supports concurrent accesses, Dec. 11 2007. US Patent 7,308,448.
[32]
M. M. Michael. High performance dynamic lock-free hash tables and list-based sets. In Proc. 14th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 73--82, 2002.
[33]
J. Nievergelt and E. M. Reingold. Binary search trees of bounded balance. SIAM Journal on Computing, 2(1):33--43, 1973.
[34]
P. O'Neil, E. Cheng, D. Gawlic, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996.
[35]
R. Oshman and N. Shavit. The SkipTrie: low-depth concurrent search without rebalancing. In Proc. 32nd Annual ACM Symposium on Principles of Distributed Computing (PODC), pages 23--32, 2013.
[36]
T. Papadakis, J. I. Munro, and P. V. Poblete. Analysis of the expected search cost in skip lists. In Proc. 2nd Scandinavian Workshop on Algorithm Theory (SWAT), pages 160--172, 1990.
[37]
T. Peierls, B. Goetz, J. Bloch, J. Bowbeer, D. Lea, and D. Holmes. ConcurrentSkipList. In Java concurrency in practice. Pearson Education, 2006.
[38]
W. Pugh. Concurrent maintenance of lists. Technical report, Dept. of Computer Science, University of Maryland, College Park, 1990.
[39]
W. Pugh. Skip lists: a probabilistic alternative to balanced trees. Communications of the ACM, 33(6):668--676, 1990.
[40]
R. Sears, M. Callaghan, and E. A. Brewer. Rose: compressed, log-structured replication. Proc. of the VLDB Endowment, 1(1):526--537, 2008.
[41]
R. Sears and R. Ramakrishnan. bLSM: a general purpose log structured merge tree. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 217--228, 2012.
[42]
N. Shamgunov. The MemSQL in-memory database system. In Proc. 2nd International Workshop on In Memory Data Management and Analytics (IMDM), 2014.
[43]
N. Shavit and I. Lotan. Skiplist-based concurrent priority queues. In Proc. 14th International Parallel and Distributed Processing Symposium (IPDPS), pages 263--268, 2000.
[44]
D. D. Sleator and R. E. Tarjan. Amortized efficiency of list update and paging rules. Communications of the ACM, 28(2):202--208, Feb. 1985.
[45]
G. L. Steele Jr, A. T. Garthwaite, P. A. Martin, N. N. Shavit, M. S. Moir, and D. L. Detlefs. Lock-free implementation of concurrent shared object with dynamic node allocation and distinguishing pointer value, Nov. 30 2004. US Patent 6,826,757.
[46]
H. Sundell and P. Tsigas. Scalable and lock-free concurrent dictionaries. In Proc. 2004 ACM Symposium on Applied Computing (SAC), pages 1438--1445, 2004.
[47]
J. D. Valois. Lock-free linked lists using compare-and-swap. In Proc. 14th Annual ACM Symposium on Principles of Distributed Computing (PODC), pages 214--222, 1995.
[48]
K. Yi. Dynamic indexability and the optimality of b-trees. Journal of the ACM, 59(4):21:1--21:19, Aug. 2012.

Cited By

View all
  • (2024)Brief Announcement: Root-to-Leaf Scheduling in Write-Optimized TreesProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660514(475-477)Online publication date: 17-Jun-2024
  • (2023)WARM-tree: Making Quadtrees Write-efficient and Space-economic on Persistent MemoriesACM Transactions on Embedded Computing Systems10.1145/360803322:5s(1-26)Online publication date: 31-Oct-2023
  • (2023)External Memory Fully Persistent Search TreesProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585140(1410-1423)Online publication date: 2-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '17: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
May 2017
458 pages
ISBN:9781450341981
DOI:10.1145/3034786
© 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. databases
  2. external memory
  3. skip list
  4. write optimization

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

PODS '17 Paper Acceptance Rate 29 of 101 submissions, 29%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)255
  • Downloads (Last 6 weeks)27
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Brief Announcement: Root-to-Leaf Scheduling in Write-Optimized TreesProceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3626183.3660514(475-477)Online publication date: 17-Jun-2024
  • (2023)WARM-tree: Making Quadtrees Write-efficient and Space-economic on Persistent MemoriesACM Transactions on Embedded Computing Systems10.1145/360803322:5s(1-26)Online publication date: 31-Oct-2023
  • (2023)External Memory Fully Persistent Search TreesProceedings of the 55th Annual ACM Symposium on Theory of Computing10.1145/3564246.3585140(1410-1423)Online publication date: 2-Jun-2023
  • (2021)Timely Reporting of Heavy Hitters Using External MemoryACM Transactions on Database Systems10.1145/347239246:4(1-35)Online publication date: 15-Nov-2021
  • (2021)External-memory Dictionaries in the Affine and PDAM ModelsACM Transactions on Parallel Computing10.1145/34706358:3(1-20)Online publication date: 20-Sep-2021
  • (2021)Copy-on-Abundant-Write for Nimble File System ClonesACM Transactions on Storage10.1145/342349517:1(1-27)Online publication date: 29-Jan-2021
  • (2021)Write-Optimized and Consistent Skiplists for Non-Volatile MemoryIEEE Access10.1109/ACCESS.2021.30778989(69850-69859)Online publication date: 2021
  • (2020)Transition Scale-SpacesNeural Computation10.1162/neco_a_0125532:2(330-394)Online publication date: 1-Feb-2020
  • (2020)Timely Reporting of Heavy Hitters using External MemoryProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380598(1431-1446)Online publication date: 11-Jun-2020
  • (2019)Small Refinements to the DAM Can Have Big Consequences for Data-Structure DesignThe 31st ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3323165.3323210(265-274)Online publication date: 17-Jun-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media