Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183713.3183737acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Persistent Bloom Filter: Membership Testing for the Entire History

Published: 27 May 2018 Publication History

Abstract

Membership testing is the problem of testing whether an element is in a set of elements. Performing the test exactly is expensive space-wise, requiring the storage of all elements in a set. In many applications, an approximate testing that can be done quickly using small space is often desired. Bloom filter (BF) was designed and has witnessed great success across numerous application domains. But there is no compact structure that supports set membership testing for temporal queries, e.g., has person A visited a web server between 9:30am and 9:40am? And has the same person visited the web server again between 9:45am and 9:50am? It is possible to support such "temporal membership testing" using a BF, but we will show that this is fairly expensive. To that end, this paper designs persistent bloom filter (PBF), a novel data structure for temporal membership testing with compact space.

References

[1]
Edgar log file data set. https://www.sec.gov/data/edgar-log-file-data-set.
[2]
K. Alexiou, D. Kossmann, and P. Larson. Adaptive range filters for cold data: Avoiding trips to siberia. PVLDB, 6(14):1714--1725, 2013.
[3]
B. Becker, S. Gschwind, T. Ohler, B. Seeger, and P. Widmayer. An asymptotically optimal multiversion b-tree. VLDB J., 5(4):264--275, 1996.
[4]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970.
[5]
G. S. Brodal, K. Tsakalidis, S. Sioutas, and K. Tsichlas. Fully persistent b-trees. In SODA, pages 602--614. SIAM, 2012.
[6]
A. Z. Broder and M. Mitzenmacher. Survey: Network applications of bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2003.
[7]
J. Bruck, J. Gao, and A. Jiang. Weighted bloom filter. In 2006 IEEE International Symposium on Information Theory. IEEE, 2006.
[8]
L. Carter, R. W. Floyd, J. Gill, G. Markowsky, and M. N. Wegman. Exact and approximate membership testers. In STOC, pages 59--65. ACM, 1978.
[9]
B. Chazelle, J. Kilian, R. Rubinfeld, and A. Tal. The bloomier filter: an efficient data structure for static support lookup tables. In SODA, pages 30--39, 2004.
[10]
S. Cohen and Y. Matias. Spectral bloom filters. In SIGMOD Conference, pages 241--252. ACM, 2003.
[11]
G. Cormode, M. N. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 4(1--3):1--294, 2012.
[12]
G. Cormode and S. Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005.
[13]
M. De Berg, M. Van Kreveld, M. Overmars, and O. C. Schwarzkopf. Computational geometry. In Computational geometry. Springer, 2000.
[14]
J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. J. Comput. Syst. Sci., 38(1):86--124, 1989.
[15]
B. Fan, D. G. Andersen, M. Kaminsky, and M. Mitzenmacher. Cuckoo filter: Practically better than bloom. In CoNEXT, pages 75--88. ACM, 2014.
[16]
L. Fan, P. Cao, J. M. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM TON, 8(3):281--293, 2000.
[17]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of computer and system sciences, 31(2):182--209, 1985.
[18]
M. Goswami, A. G. Jørgensen, K. G. Larsen, and R. Pagh. Approximate range emptiness in constant time and optimal space. In SODA, pages 769--775. SIAM, 2015.
[19]
D. Guo, J. Wu, H. Chen, Y. Yuan, and X. Luo. The dynamic bloom filters. IEEE Trans. Knowl. Data Eng., 22(1):120--133, 2010.
[20]
D. M. Kane, J. Nelson, and D. P. Woodruff. An optimal algorithm for the distinct elements problem. In PODS, pages 41--52. ACM, 2010.
[21]
A. Kumar, J. J. Xu, and E. W. Zegura. Efficient and scalable query routing for unstructured peer-to-peer networks. In INFOCOM, pages 1162--1173. IEEE, 2005.
[22]
D. B. Lomet, R. S. Barga, M. F. Mokbel, G. Shegalov, R. Wang, and Y. Zhu. Immortal DB: transaction time support for SQL server. In SIGMOD Conference, pages 939--941. ACM, 2005.
[23]
D. B. Lomet and F. Li. Improving transaction-time DBMS performance and functionality. In ICDE, pages 581--591. IEEE Computer Society, 2009.
[24]
D. B. Lomet and B. Salzberg. Access methods for multiversion data. In SIGMOD Conference, pages 315--324. ACM Press, 1989.
[25]
M. Mitzenmacher. Compressed bloom filters. In PODC, pages 144--150. ACM, 2001.
[26]
C. Okasaki. Purely functional data structures. Cambridge University Press, 1999.
[27]
A. Pagh, R. Pagh, and S. S. Rao. An optimal bloom filter replacement. In SODA, pages 823--829. SIAM, 2005.
[28]
C. Plattner, A. Wapf, and G. Alonso. Searching in time. In SIGMOD Conference, pages 754--756. ACM, 2006.
[29]
F. Putze, P. Sanders, and J. Singler. Cache-, hash-, and space-efficient bloom filters. ACM Journal of Experimental Algorithmics, 14, 2009.
[30]
S. C. Rhea and J. Kubiatowicz. Probabilistic location and routing. In INFOCOM. IEEE, 2002.
[31]
A. D. Sarma, M. Theobald, and J. Widom. LIVE: A lineage-supported versioned DBMS. In SSDBM, volume 6187 of Lecture Notes in Computer Science, pages 416--433. Springer, 2010.
[32]
R. Shaull, L. Shrira, and H. Xu. Skippy: a new snapshot indexing method for time travel in the storage manager. In SIGMOD Conference, pages 637--648. ACM, 2008.
[33]
L. Shrira and H. Xu. SNAP: efficient snapshots for back-in-time execution. In ICDE, pages 434--445. IEEE Computer Society, 2005.
[34]
Y. Tao, K. Yi, C. Sheng, J. Pei, and F. Li. Logging every footstep: quantile summaries for the entire history. In SIGMOD, pages 639--650, 2010.
[35]
S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz. Theory and practice of bloom filters for distributed systems. IEEE CST, 14(1):131--155, 2012.
[36]
P. J. Varman and R. M. Verma. An efficient multiversion access structure. IEEE Trans. Knowl. Data Eng., 9(3):391--409, 1997.
[37]
Z. Wei, G. Luo, K. Yi, X. Du, and J. Wen. Persistent data sketching. In SIGMOD Conference, pages 795--810. ACM, 2015.
[38]
M. Zhong, P. Lu, K. Shen, and J. I. Seiferas. Optimizing data popularity conscious bloom filters. In PODC, pages 355--364. ACM, 2008.

Cited By

View all
  • (2024)Wormhole Filters: Caching Your Hash on Persistent MemoryProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629590(456-471)Online publication date: 22-Apr-2024
  • (2024)Unbiased Real-Time Traffic SketchingIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.328400411:3(2371-2383)Online publication date: May-2024
  • (2024)Enabling space-time efficient range queries with REncoderThe VLDB Journal10.1007/s00778-024-00873-wOnline publication date: 7-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bloom filter
  2. persistent bloom filter
  3. persistent data structure

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)123
  • Downloads (Last 6 weeks)12
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Wormhole Filters: Caching Your Hash on Persistent MemoryProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629590(456-471)Online publication date: 22-Apr-2024
  • (2024)Unbiased Real-Time Traffic SketchingIEEE Transactions on Network Science and Engineering10.1109/TNSE.2023.328400411:3(2371-2383)Online publication date: May-2024
  • (2024)Enabling space-time efficient range queries with REncoderThe VLDB Journal10.1007/s00778-024-00873-wOnline publication date: 7-Aug-2024
  • (2023)Dichotomy Graph Sketch: Summarizing Graph Streams with High Accuracy Based on Deep LearningApplied Sciences10.3390/app13241330613:24(13306)Online publication date: 16-Dec-2023
  • (2023)A Learned Cuckoo Filter for Approximate Membership Queries over Variable-sized Sliding Windows on Data StreamsProceedings of the ACM on Management of Data10.1145/36267581:4(1-26)Online publication date: 12-Dec-2023
  • (2023)Double-Anonymous Sketch: Achieving Top-K-fairness for Finding Global Top-K Frequent ItemsProceedings of the ACM on Management of Data10.1145/35889331:1(1-26)Online publication date: 30-May-2023
  • (2023)MicroscopeSketch: Accurate Sliding Estimation Using Adaptive ZoomingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599432(2660-2671)Online publication date: 6-Aug-2023
  • (2023)BurstSketch: Finding Bursts in Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322368635:11(11126-11140)Online publication date: 1-Nov-2023
  • (2023)HoppingSketch: More Accurate Temporal Membership Query and Frequency QueryIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322111135:9(9067-9072)Online publication date: 1-Sep-2023
  • (2023)Variable-length Encoding Framework: A Generic Framework for Enhancing the Accuracy of Approximate Membership Queries2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00015(61-70)Online publication date: 1-Dec-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media