Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2899403acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SourceSight: Enabling Effective Source Selection

Published: 26 June 2016 Publication History

Abstract

Recently there has been a rapid increase in the number of data sources and data services, such as cloud-based data markets and data portals, that facilitate the collection, publishing and trading of data. Data sources typically exhibit large heterogeneity in the type and quality of data they provide. Unfortunately, when the number of data sources is large, it is difficult for users to reason about the actual usefulness of sources for their applications and the trade-offs between the benefits and costs of acquiring and integrating sources. In this demonstration we present \textsc{SourceSight}, a system that allows users to interactively explore a large number of heterogeneous data sources, and discover valuable sets of sources for diverse integration tasks. \textsc{SourceSight}~uses a novel multi-level source quality index that enables effective source selection at different granularity levels, and introduces a collection of new techniques to discover and evaluate relevant sources for integration.

References

[1]
Excel Power BI. http://www.microsoft.com/en-us/powerbi.
[2]
M. Balazinska, B. Howe, and D. Suciu. Data markets in the cloud: An opportunity for the database community. PVLDB, 2011.
[3]
A. Bhardwaj, S. Bhattacherjee, A. Chavan, A. Deshpande, A. J. Elmore, S. Madden, and A. G. Parameswaran. Datahub: Collaborative data science & dataset version management at scale. In CIDR, 2015.
[4]
M. Bronzi, V. Crescenzi, P. Merialdo, and P. Papotti. Extraction and integration of partially overlapping web sources. PVLDB, 2013.
[5]
A. Das Sarma, L. Fang, N. Gupta, A. Halevy, H. Lee, F. Wu, R. Xin, and C. Yu. Finding related tables. In SIGMOD, 2012.
[6]
X. L. Dong, A. Halevy, and C. Yu. Data integration with uncertainty. VLDB Journal, 2009.
[7]
X. L. Dong, B. Saha, and D. Srivastava. Less is more: selecting sources wisely for integration. PVLDB, 2012.
[8]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000.
[9]
S. Kruse, P. Papotti, and F. Naumann. Estimating data integration and cleaning effort. In EDBT, 2015.
[10]
X. Li, X. L. Dong, K. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: is the problem solved? PVLDB, 2013.
[11]
G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and searching web tables using entities, types and relationships. PVLDB, 2010.
[12]
F. Niu, C. Zhang, C. Ré, and J. Shavlik. Deepdive: Web-scale knowledge-base construction using statistical learning and inference. In VLDS, 2012.
[13]
N. Ramakrishnan, C. Lu, M. Marathe, A. Marathe, A. Vullikanti, S. Eubank, S. Leman, M. Roan, J. Brownstein, K. Summers, et al. Model-based forecasting of significant societal events. Intelligent Systems, IEEE, 30:86--90, 2015.
[14]
T. Rekatsinas, X. L. Dong, and D. Srivastava. Characterizing and selecting fresh data sources. In SIGMOD, 2014.
[15]
T. Rekatsinas, X. L. Dong, L. Getoor, and D. Srivastava. Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration. In CIDR, 2015.
[16]
T. Rekatsinas, S. Ghosh, S. Mekaru, E. Nsoesie, J. Brownstein, L. Getoor, and N. Ramakrishnan. Sourceseer: Forecasting rare disease outbreaks using multiple data sources. In SDM, 2015.

Cited By

View all
  • (2024)Data distribution tailoring revisited: cost-efficient integration of representative dataThe VLDB Journal10.1007/s00778-024-00849-w33:5(1283-1306)Online publication date: 12-Apr-2024
  • (2021)Tailoring data source distributions for fairness-aware data integrationProceedings of the VLDB Endowment10.14778/3476249.347629914:11(2519-2532)Online publication date: 27-Oct-2021
  • (2020)Pairwise comparisons or constrained optimization? A usability evaluation of techniques for eliciting decision prioritiesInternational Transactions in Operational Research10.1111/itor.1290729:5(3190-3206)Online publication date: 18-Nov-2020
  • Show More Cited By

Index Terms

  1. SourceSight: Enabling Effective Source Selection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
    June 2016
    2300 pages
    ISBN:9781450335317
    DOI:10.1145/2882903
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tag

    1. SourceSight

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'16
    Sponsor:
    SIGMOD/PODS'16: International Conference on Management of Data
    June 26 - July 1, 2016
    California, San Francisco, USA

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 14 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Data distribution tailoring revisited: cost-efficient integration of representative dataThe VLDB Journal10.1007/s00778-024-00849-w33:5(1283-1306)Online publication date: 12-Apr-2024
    • (2021)Tailoring data source distributions for fairness-aware data integrationProceedings of the VLDB Endowment10.14778/3476249.347629914:11(2519-2532)Online publication date: 27-Oct-2021
    • (2020)Pairwise comparisons or constrained optimization? A usability evaluation of techniques for eliciting decision prioritiesInternational Transactions in Operational Research10.1111/itor.1290729:5(3190-3206)Online publication date: 18-Nov-2020
    • (2020)HASSO: A Highly-Automated Source Selection and Ordering System Based on Data Quality Factors2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS)10.1109/ICACSIS51025.2020.9263243(155-164)Online publication date: 17-Oct-2020
    • (2019)Crowdsourced Targeted Feedback Collection for Multicriteria Data Source SelectionJournal of Data and Information Quality10.1145/328493411:1(1-27)Online publication date: 4-Jan-2019
    • (2018)Computational fact checkingProceedings of the VLDB Endowment10.14778/3229863.322988011:12(2110-2113)Online publication date: 1-Aug-2018
    • (2018)SOURCERYProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3269209(1947-1950)Online publication date: 17-Oct-2018
    • (2018)Source Selection LanguagesProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/3209900.3209906(1-6)Online publication date: 10-Jun-2018
    • (2017)Targeted Feedback Collection Applied to Multi-Criteria Source SelectionAdvances in Databases and Information Systems10.1007/978-3-319-66917-5_10(136-150)Online publication date: 25-Aug-2017

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media