Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1142473.1142476acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Reconciling while tolerating disagreement in collaborative data sharing

Published: 27 June 2006 Publication History

Abstract

In many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites' data may be dirty, uncertain, or even controversial. Collaborators are willing to share their data, and in many cases they also want to selectively import data from others --- but must occasionally diverge when they disagree about uncertain or controversial facts or values. For this reason, traditional data sharing and data integration approaches are not applicable, since they require a globally consistent data instance. Additionally, many of these approaches do not allow participants to make updates; if they do, concurrency control algorithms or inconsistency repair techniques must be used to ensure a consistent view of the data for all users.In this paper, we develop and present a fully decentralized model of collaborative data sharing, in which participants publish their data on an ad hoc basis and simultaneously reconcile updates with those published by others. Individual updates are associated with provenance information, and each participant accepts only updates with a sufficient authority ranking, meaning that each participant may have a different (though conceptually overlapping) data instance. We define a consistency semantics for database instances under this model of disagreement, present algorithms that perform reconciliation for distributed clusters of participants, and demonstrate their ability to handle typical update and conflict loads in settings involving the sharing of curated data.

References

[1]
M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In PODS, pages 68--79, 1999.]]
[2]
A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Research, 28:45--48, 2000.]]
[3]
P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, pages 316--330, 2001.]]
[4]
D. Calvanese, G. D. Giacomo, M. Lenzerini, and R. Rosati. Logical foundations of peer-to-peer data integration. In PODS, pages 241--251, 2004.]]
[5]
S. Ceri, M. A. W. Houtsma, A. M. Keller, and P. Samarati. Independent updates and incremental agreement in replicated databases. Distributed and Parallel Databases, 3(3):225--246, 1995.]]
[6]
Y. Cui. Lineage Tracing in Data Warehouses. PhD thesis, Stanford University, 2001.]]
[7]
Concurrent versions system. Available from www.cvshome.org.]]
[8]
A. Demers, D. Greene, C. Hauser, W. Irish, and J. Larson. Epidemic algorithms for replicated database maintenance. In PODC, 1987.]]
[9]
W. K. Edwards, E. D. Mynatt, K. Petersen, M. J. Spreitzer, D. B. Terry, and M. M. Theimer. Designing and implementing asynchronous collaborative applications with Bayou. In UIST, pages 119--128, 1997.]]
[10]
J. N. Foster, M. B. Greenwald, J. T. Moore, B. C. Pierce, and A. Schmitt. Combinators for bi-directional tree transformations: A linguistic approach to the view update problem. Technical Report MS-CIS-004-15, University of Pennsylvania, July 2004.]]
[11]
H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. Journal of Intelligent Information Systems, 8(2):117--132, March 1997.]]
[12]
S. Ghandeharizadeh, R. Hull, and D. Jacobs. Heraclitus: Elevating deltas to be first-class citizens in a database programming language. TODS, 21(3):370--426, 1996.]]
[13]
A. Y. Halevy, Z. G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In ICDE, pages 505--516, March 2003.]]
[14]
Z. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Orchestra: Rapid, collaborative sharing of dynamic data. In CIDR, pages 107--118, January 2005.]]
[15]
A. Kementsietsidis, M. Arenas, and R. J. Miller. Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In SIGMOD, June 2003.]]
[16]
A.-M. Kermarrec, A. Rowstron, M. Shapiro, and P. Druschel. The IceCube approach to the reconciliation of diverging replicas. In PODC, August 2001.]]
[17]
H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. TODS, 6(2):213--226, 1981.]]
[18]
L. Lamport. Concurrent reading and writing of clocks. ACM Trans. Comput. Syst., 8(4):305--310, 1990.]]
[19]
D. Lembo, M. Lenzerini, and R. Rosati. Source inconsistency and incompleteness in data integration. In KRDB '02, April 2002.]]
[20]
A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying heterogeneous information sources using source descriptions. In VLDB, pages 251--262, 1996.]]
[21]
K. Moore. The Lotus Notes storage system. In SIGMOD, pages 427--428, 1995.]]
[22]
P. Mork, R. Shaker, A. Halevy, and P. Tarczy-Hornoch. PQL: A declarative query language over dynamic biological schemata. In American Medical Informatics Association (AMIA) Symposium, 2002, November 2002.]]
[23]
A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen. Ivy: A read/write peer-to-peer file system. In OSDI, 2002.]]
[24]
National Center for Biotechnology Information. GenBank. Available from www.ncbi.nlm.nih.gov/GenBank/.]]
[25]
D. S. Parker, Jr., G. J. Popek, G. Rudisin, A. Stoughton, B. J. Walker, E. Walton, J. M. Chow, D. A. Edwards, S. Kiser, and C. S. Kline. Detection of mutual inconsistency in distributed systems. IEEE Trans. Software Eng., 9(3):240--247, 1983.]]
[26]
B. C. Pierce, T. Jim, and J. Vouillon. Unison: A portable, cross-platform file synchronizer, 1999--2001.]]
[27]
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content-addressable network. In Proc. of ACM SIGCOMM '01, 2001.]]
[28]
A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), pages 329--350, Nov. 2001.]]
[29]
F. Sadri. Aggregate operations in the information source tracking method. Theor. Comput. Sci., 133(2):421--442, 1994.]]
[30]
M. Satyanarayanan, J. J. Kistler, P. Kumar, M. E. Okasaki, E. H. Siegel, and D. C. Steere. Coda: A highly available file system for a distributed workstation environment. IEEE Trans. Comput., 39(4):447--459, 1990.]]
[31]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proc. of ACM SIGCOMM '01, 2001.]]
[32]
J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.]]

Cited By

View all
  • (2019)Crowd Sourced Semantic Enrichment (CroSSE) for knowledge driven querying of digital resourcesJournal of Intelligent Information Systems10.1007/s10844-019-00559-8Online publication date: 25-Jul-2019
  • (2018)Contextually-Enriched Querying of Integrated Data Sources2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW.2018.00008(9-16)Online publication date: Apr-2018
  • (2018)FactCheck - Identify and Fix Conflicting Data on the WebWeb Engineering10.1007/978-3-319-91662-0_25(312-320)Online publication date: 20-May-2018
  • Show More Cited By

Index Terms

  1. Reconciling while tolerating disagreement in collaborative data sharing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
    June 2006
    830 pages
    ISBN:1595934340
    DOI:10.1145/1142473
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. collaborative data sharing
    2. data integration
    3. peer-to-peer
    4. reconciliation
    5. transactions
    6. updates

    Qualifiers

    • Article

    Conference

    SIGMOD/PODS06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Crowd Sourced Semantic Enrichment (CroSSE) for knowledge driven querying of digital resourcesJournal of Intelligent Information Systems10.1007/s10844-019-00559-8Online publication date: 25-Jul-2019
    • (2018)Contextually-Enriched Querying of Integrated Data Sources2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW.2018.00008(9-16)Online publication date: Apr-2018
    • (2018)FactCheck - Identify and Fix Conflicting Data on the WebWeb Engineering10.1007/978-3-319-91662-0_25(312-320)Online publication date: 20-May-2018
    • (2018)Updates and Transactions in Peer-to-Peer SystemsEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_1222(4345-4348)Online publication date: 7-Dec-2018
    • (2017)DEXProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3064056(171-186)Online publication date: 9-May-2017
    • (2017)Updates and Transactions in Peer-to-Peer SystemsEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_1222-2(1-5)Online publication date: 2-Aug-2017
    • (2017)An asynchronous collaborative reconciliation model based on data provenanceSoftware: Practice and Experience10.1002/spe.250648:1(197-232)Online publication date: 13-Jun-2017
    • (2016)CrowdSourced semantic enrichment for participatory e-GovernmentProceedings of the 8th International Conference on Management of Digital EcoSystems10.1145/3012071.3012102(82-89)Online publication date: 1-Nov-2016
    • (2015)On the emergence of semantic agreement among rational agentsWeb Intelligence10.3233/WEB-15032713:4(295-312)Online publication date: 23-Nov-2015
    • (2015)Linearized and single-pass belief propagationProceedings of the VLDB Endowment10.14778/2735479.27354908:5(581-592)Online publication date: 1-Jan-2015
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media