Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1583991.1584055acmconferencesArticle/Chapter ViewAbstractPublication PagesspaaConference Proceedingsconference-collections
research-article

Dynamic external hashing: the limit of buffering

Published: 11 August 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Hash tables are one of the most fundamental data structures in computer science, in both theory and practice. They are especially useful in external memory, where their query performance approaches the ideal cost of just one disk access. Knuth [16] gave an elegant analysis showing that with some simple collision resolution strategies such as linear probing or chaining, the expected average number of disk I/Os of a lookup is merely 1+1/2Ω(b), where each I/O can read and/or write a disk block containing b items. Inserting a new item into the hash table also costs 1+1/2Ω(b) I/Os, which is again almost the best one can do if the hash table is entirely stored on disk. However, this requirement is unrealistic since any algorithm operating on an external hash table must have some internal memory (at least Ω(1) blocks) to work with. The availability of a small internal memory buffer can dramatically reduce the amortized insertion cost to o(1) I/Os for many external memory data structures. In this paper we study the inherent query-insertion tradeoff of external hash tables in the presence of a memory buffer. In particular, we show that for any constant c>1, if the expected average successful query cost is targeted at 1+O(1/bc) I/Os, then it is not possible to support insertions in less than 1-O(1/bc-1/6) I/Os amortized, which means that the memory buffer is essentially useless. While if the query cost is relaxed to 1+O(1/bc) I/Os for any constant c<1, there is a simple dynamic hash table with o(1) insertion cost.

    References

    [1]
    A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988.
    [2]
    L. Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.
    [3]
    L. Arge, M. Bender, E. Demaine, B. Holland-Minkley, and J. I. Munro. Cache-oblivious priority-queue and graph algorithms. In Proc. ACM Symposium on Theory of Computation, pages 268--276, 2002.
    [4]
    L. Arge, V. Samoladas, and K. Yi. Optimal external memory planar point enclosure. Algorithmica, 54(3), 2009.
    [5]
    J. L. Bentley. Decomposable searching problems. Information Processing Letters, 8(5):244--251, 1979.
    [6]
    B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. In Communications of the ACM, volume 13, pages 422--426, 1970.
    [7]
    G. S. Brodal and R. Fagerberg. Lower bounds for external memory dictionaries. In Proc. ACM-SIAM Symposium on Discrete Algorithms, pages 546--554, 2003.
    [8]
    J. Carter and M. Wegman. Universal classes of hash functions. Journal of Computer and System Sciences, 18:143--154, 1979.
    [9]
    M. Dietzfelbinger, A. Karlin, K. Mehlhorn, F. Meyer auf der Heide, H. Rohnert, and R. E. Tarjan. Dynamic perfect hashing: upper and lower bounds. SIAM Journal on Computing, 23:738--761, 1994.
    [10]
    R. Fadel, K. V. Jakobsen, J. Katajainen, and J. Teuhola. Heaps and heapsort on secondary storage. Theoretical Computer Science, 220(2):345--362, 1999.
    [11]
    R. Fagin, J. Nievergelt, N. Pippenger, and H. Strong. Extendible hashing---a fast access method for dynamic files. ACM Transactions on Database Systems, 4(3):315--344, 1979.
    [12]
    M. L. Fredman, J. Komlos, and E. Szemeredi. Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31(3):538--544, 1984.
    [13]
    H. Garcia-Molina, J. D. Ullman, and J. Widom. Database Systems: The Complete Book. Prentice Hall, 2008.
    [14]
    J. M. Hellerstein, E. Koutsoupias, D. Miranker, C. H. Papadimitriou, and V. Samoladas. On a model of indexability and its bounds for range queries. Journal of the ACM, 49(1):35--55, 2002.
    [15]
    M. S. Jensen and R. Pagh. Optimality in external memory hashing. Algorithmica, 52(3):403--411, 2008.
    [16]
    D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley, Reading, MA, 1973.
    [17]
    W. Litwin. Linear hashing: a new tool for file and table addressing. In Proc. International Conference on Very Large Databases, pages 212--223, 1980.
    [18]
    R. Pagh and F. F. Rodler. Cuckoo hashing. Journal of Algorithms, 51:122--144, 2004.
    [19]
    J. S. Vitter. Algorithms and Data Structures for External Memory. Now Publishers, 2008.
    [20]
    A. C. Yao. Probabilistic computations: Towards a unified measure of complexity. In Proc. IEEE Symposium on Foundations of Computer Science, 1977.
    [21]
    K. Yi. Dynamic indexability and lower bounds for dynamic one-dimensional range query indexes. In Proc. ACM Symposium on Principles of Database Systems, 2009.

    Cited By

    View all
    • (2020)On the I/O Complexity of the k-Nearest Neighbors ProblemProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3375395.3387649(205-212)Online publication date: 14-Jun-2020
    • (2019)Can we overcome the n log n barrier for oblivious sorting?Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310583(2419-2438)Online publication date: 6-Jan-2019
    • (2018)Cache-oblivious and data-oblivious sorting and applicationsProceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3174304.3175448(2201-2220)Online publication date: 7-Jan-2018
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPAA '09: Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
    August 2009
    370 pages
    ISBN:9781605586069
    DOI:10.1145/1583991
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 August 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic hash table
    2. lower bound
    3. successful query

    Qualifiers

    • Research-article

    Conference

    SPAA 09

    Acceptance Rates

    Overall Acceptance Rate 447 of 1,461 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)On the I/O Complexity of the k-Nearest Neighbors ProblemProceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3375395.3387649(205-212)Online publication date: 14-Jun-2020
    • (2019)Can we overcome the n log n barrier for oblivious sorting?Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3310435.3310583(2419-2438)Online publication date: 6-Jan-2019
    • (2018)Cache-oblivious and data-oblivious sorting and applicationsProceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3174304.3175448(2201-2220)Online publication date: 7-Jan-2018
    • (2014)Cache-Oblivious HashingAlgorithmica10.1007/s00453-013-9763-669:4(864-883)Online publication date: 1-Aug-2014
    • (2013)The Limits of BufferingSIAM Journal on Computing10.1137/11084221142:1(212-229)Online publication date: 1-Jan-2013
    • (2011)An Email Server Optimized for Storage IssuesProceedings of the 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2011.197(1437-1443)Online publication date: 16-Nov-2011
    • (2010)On the cell probe complexity of dynamic membershipProceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms10.5555/1873601.1873613(123-133)Online publication date: 17-Jan-2010
    • (2010)Cheap and large CAMs for high performance data-intensive networked systemsProceedings of the 7th USENIX conference on Networked systems design and implementation10.5555/1855711.1855740(29-29)Online publication date: 28-Apr-2010
    • (2010)Cache-oblivious hashingProceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems10.1145/1807085.1807124(297-304)Online publication date: 6-Jun-2010
    • (2010)The limits of bufferingProceedings of the forty-second ACM symposium on Theory of computing10.1145/1806689.1806752(447-456)Online publication date: 5-Jun-2010

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media