Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2018436.2018462acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free access

What's the difference?: efficient set reconciliation without prior context

Published: 15 August 2011 Publication History

Abstract

We describe a synopsis structure, the Difference Digest, that allows two nodes to compute the elements belonging to the set difference in a single round with communication overhead proportional to the size of the difference times the logarithm of the keyspace. While set reconciliation can be done efficiently using logs, logs require overhead for every update and scale poorly when multiple users are to be reconciled. By contrast, our abstraction assumes no prior context and is useful in networking and distributed systems applications such as trading blocks in a peer-to-peer network, and synchronizing link-state databases after a partition.
Our basic set-reconciliation method has a similarity with the peeling algorithm used in Tornado codes [6], which is not surprising, as there is an intimate connection between set difference and coding. Beyond set reconciliation, an essential component in our Difference Digest is a new estimator for the size of the set difference that outperforms min-wise sketches [3] for small set differences.
Our experiments show that the Difference Digest is more efficient than prior approaches such as Approximate Reconciliation Trees [5] and Characteristic Polynomial Interpolation [17]. We use Difference Digests to implement a generic KeyDiff service in Linux that runs over TCP and returns the sets of keys that differ between machines.

Supplementary Material

MP4 File (sigcomm_7_1.mp4)

References

[1]
Data domain. http://www.datadomain.com/.
[2]
B. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13:422--426, 1970.
[3]
A. Broder. On the resemblance and containment of documents. Compression and Complexity of Sequences, '97.
[4]
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. Comput. Syst. Sci., 60:630--659, 2000.
[5]
J. Byers, J. Considine, M. Mitzenmacher, and S. Rost. Informed content delivery across adaptive overlay networks. In SIGCOMM, 2002.
[6]
J. W. Byers, M. Luby, M. Mitzenmacher, and A. Rege. A digital fountain approach to reliable distribution of bulk data. In SIGCOMM, 1998.
[7]
G. Cormode and S. Muthukrishnan. What's new: finding significant differences in network data streams. IEEE/ACM Trans. Netw., 13:1219--1232, 2005.
[8]
G. Cormode, S. Muthukrishnan, and I. Rozenbaum. Summarizing and mining inverse distributions on data streams via dynamic inverse sampling. VLDB '05.
[9]
D. Eppstein and M. Goodrich. Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom Filters. IEEE Trans. on Knowledge and Data Engineering, 23:297--306, 2011.
[10]
L. Fan, P. Cao, J. Almeida, and A. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking (TON), 8(3):281--293, 2000.
[11]
J. Feigenbaum, S. Kannan, M. J. Strauss, and M. Viswanathan. An approximate L1-difference algorithm for massive data streams. SIAM Journal on Computing, 32(1):131--151, 2002.
[12]
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. J. of Computer and System Sciences, 31(2):182--209, 1985.
[13]
M. T. Goodrich and M. Mitzenmacher. Invertible Bloom Lookup Tables. ArXiv e-prints, 2011. 1101.2245.
[14]
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. STOC, 1998.
[15]
M. Karpovsky, L. Levitin, and A. Trachtenberg. Data verification and reconciliation with generalized error-control codes. IEEE Trans. Info. Theory, 49(7), july 2003.
[16]
P. Kulkarni, F. Douglis, J. Lavoie, and J. M. Tracey. Redundancy elimination within large collections of files. In USENIX ATC, 2004.
[17]
Y. Minsky, A. Trachtenberg, and R. Zippel. Set reconciliation with nearly optimal communication complexity. IEEE Trans. Info. Theory, 49(9):2213--2218, 2003.
[18]
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge Univ. Press, 1995.
[19]
S. Muthukrishnan. Data streams: Algorithms and applications. Found. Trends Theor. Comput. Sci., 2005.
[20]
R. Schweller, Z. Li, Y. Chen, Y. Gao, A. Gupta, Y. Zhang, P. A. Dinda, M.-Y. Kao, and G. Memik. Reversible sketches: enabling monitoring and analysis over high-speed data streams. IEEE/ACM Trans. Netw., 15:1059--1072, 2007.
[21]
B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the data domain deduplication file system. In FAST'08.

Cited By

View all
  • (2024)Practical Rateless Set ReconciliationProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672219(595-612)Online publication date: 4-Aug-2024
  • (2023)Auditing of Outsourced Data Integrity - A TaxonomyRecent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering)10.2174/235209651666623011815321116:8(805-824)Online publication date: Dec-2023
  • (2023)A Shifting Filter Framework for Dynamic Set QueriesIEEE/ACM Transactions on Networking10.1109/TNET.2023.324762831:5(2329-2344)Online publication date: Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGCOMM '11: Proceedings of the ACM SIGCOMM 2011 conference
August 2011
502 pages
ISBN:9781450307970
DOI:10.1145/2018436
  • cover image ACM SIGCOMM Computer Communication Review
    ACM SIGCOMM Computer Communication Review  Volume 41, Issue 4
    SIGCOMM '11
    August 2011
    480 pages
    ISSN:0146-4833
    DOI:10.1145/2043164
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. difference digest
  2. invertible bloom filter
  3. set difference

Qualifiers

  • Research-article

Conference

SIGCOMM '11
Sponsor:
SIGCOMM '11: ACM SIGCOMM 2011 Conference
August 15 - 19, 2011
Ontario, Toronto, Canada

Acceptance Rates

SIGCOMM '11 Paper Acceptance Rate 32 of 223 submissions, 14%;
Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)207
  • Downloads (Last 6 weeks)65
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Practical Rateless Set ReconciliationProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672219(595-612)Online publication date: 4-Aug-2024
  • (2023)Auditing of Outsourced Data Integrity - A TaxonomyRecent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering)10.2174/235209651666623011815321116:8(805-824)Online publication date: Dec-2023
  • (2023)A Shifting Filter Framework for Dynamic Set QueriesIEEE/ACM Transactions on Networking10.1109/TNET.2023.324762831:5(2329-2344)Online publication date: Oct-2023
  • (2023)SREP: Out-Of-Band Sync of Transaction Pools for Large-Scale Blockchains2023 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)10.1109/ICBC56567.2023.10174977(1-9)Online publication date: 1-May-2023
  • (2022)SoK: The evolution of distributed dataset synchronization solutions in NDNProceedings of the 9th ACM Conference on Information-Centric Networking10.1145/3517212.3558092(33-44)Online publication date: 6-Sep-2022
  • (2022)Empirical Comparison of Block Relay ProtocolsIEEE Transactions on Network and Service Management10.1109/TNSM.2022.319597619:4(3960-3974)Online publication date: Dec-2022
  • (2022)GenSync: A New Framework for Benchmarking and Optimizing Reconciliation of DataIEEE Transactions on Network and Service Management10.1109/TNSM.2022.316436919:4(4408-4423)Online publication date: Dec-2022
  • (2022)Bloom Filter with Noisy Coding Framework for Multi-Set Membership TestingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3199646(1-14)Online publication date: 2022
  • (2021)Trust schemas and ICNProceedings of the 8th ACM Conference on Information-Centric Networking10.1145/3460417.3482972(95-106)Online publication date: 22-Sep-2021
  • (2021)MCFsyn: A Multi-Party Set Reconciliation Protocol With the Marked Cuckoo FilterIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.307444032:11(2705-2718)Online publication date: 1-Nov-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media