Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3200842.3200854acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicistConference Proceedingsconference-collections
research-article

Datafusion: taking source confidences into account

Published: 16 March 2018 Publication History

Abstract

Data fusion is a form of information integration where large amounts of data mined from sources such as web sites, Twitter feeds, Facebook postings, blogs, email messages, news streams, and the like are integrated. Such data is inherently uncertain and unreliable. The sources have different degrees of accuracy and the data mining process itself incurs additional uncertainty. The main goal of data fusion is to discover the correct data among the uncertain and possibly conflicting mined data.
We investigate a data fusion approach that, in addition to the accuracy of sources, incorporates the correctness (confidence) measures that most data mining approaches associate with mined data. There are a number of advantages in incorporating these confidences. First, we do not require a training set. The initial training set is obtained using the confidence measures. More importantly, a more accurate fusion can result by taking the confidences into account. We present an approach to determine the correctness threshold using users' feedback, and show it can significantly improve the accuracy of data fusion. We evaluate of the performance and accuracy of our data fusion approach for two groups of experiments. In the first group data sources contain random (unintentional) errors. In the second group data sources contain intentional falsifications.

References

[1]
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. Integrating conflicting data: The role of source dependence. Proceedings of the VLDB Endowment, 2(1):550--561, 2009.
[2]
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Kevin Murphy, Shaohua Sun, and Wei Zhang. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 7(10):881--892, 2014.
[3]
Xin Luna Dong, Barna Saha, and Divesh Srivastava. Less is more: Selecting sources wisely for integration. Proceedings of the VLDB Endowment, 6(2):37--48, 2012.
[4]
Alban Galland, Serge Abiteboul, Amelie Marian, and Pierre Senellart. Corroborating information from disagreeing views. In Proceedings of ACM International Conference on Web Search and Data Mining, pages 131--140, 2010.
[5]
Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, and Divesh Srivastava. Fusing data with correlations. In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 433--444, 2014.
[6]
Precision and recall - Wikipedia. https://en.wikipedia.org/wiki/Precision_and_recall.
[7]
Reverb: Open Information Extraction Software Project. http://reverb.cs.washington.edu/.
[8]
Anish Das Sarma, Xin Luna Dong, and Alon Y. Halevy. Data integration with dependent sources. In Proceedings of International Conference on Extending Database Technology (EDBT), pages 401--412, 2011.
[9]
Xiaoxin Yin, Jiawei Han, and Philip S. Yu. Truth discovery with multiple conflicting information providers on the web. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1048--1052, 2007.
[10]
Xiaoxin Yin, Jiawei Han, and Philip S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering, 20(6):796--808, 2008.
[11]
Bo Zhao, Benjamin I. P. Rubinstein, Jim Gemmell, and Jiawei Han. A bayesian approach to discovering truth from conflicting sources for data integration. Proceedings of the VLDB Endowment, 5(6):550--561, 2012.

Cited By

View all
  • (2024)Stochastic Fusion Techniques for State EstimationComputation10.3390/computation1210020912:10(209)Online publication date: 17-Oct-2024
  • (2020)A survey on data fusion: what for? in what form? what is next?Journal of Intelligent Information Systems10.1007/s10844-020-00627-4Online publication date: 2-Nov-2020

Index Terms

  1. Datafusion: taking source confidences into account

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICIST '18: Proceedings of the 8th International Conference on Information Systems and Technologies
    March 2018
    84 pages
    ISBN:9781450364041
    DOI:10.1145/3200842
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 March 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data fusion
    2. fusion precision
    3. source confidence
    4. source trustworthiness

    Qualifiers

    • Research-article

    Conference

    ICIST '18

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Stochastic Fusion Techniques for State EstimationComputation10.3390/computation1210020912:10(209)Online publication date: 17-Oct-2024
    • (2020)A survey on data fusion: what for? in what form? what is next?Journal of Intelligent Information Systems10.1007/s10844-020-00627-4Online publication date: 2-Nov-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media