Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1559845.1559894acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Indexing correlated probabilistic databases

Published: 29 June 2009 Publication History
  • Get Citation Alerts
  • Abstract

    With large amounts of correlated probabilistic data being generated in a wide range of application domains including sensor networks, information extraction, event detection etc., effectively managing and querying them has become an important research direction. While there is an exhaustive body of literature on querying independent probabilistic data, supporting efficient queries over large-scale, correlated databases remains a challenge. In this paper, we develop efficient data structures and indexes for supporting inference and decision support queries over such databases. Our proposed hierarchical data structure is suitable both for in-memory and disk-resident databases. We represent the correlations in the probabilistic database using a junction tree over the tuple-existence or attribute-value random variables, and use tree partitioning techniques to build an index structure over it. We show how to efficiently answer inference and aggregation queries using such an index, resulting in orders of magnitude performance benefits in most cases. In addition, we develop novel algorithms for efficiently keeping the index structure up-to-date as changes (inserts, updates) are made to the probabilistic database. We present a comprehensive experimental study illustrating the benefits of our approach to query processing in probabilistic databases.

    References

    [1]
    P. Andritsos, A. Fuxman, and R. J. Miller. Clean answers over dirty databases. In ICDE, 2006.
    [2]
    L. Antova, T. Jansen, C. Koch, and D. Olteanu. Fast and simple relational processing of uncertain data. In ICDE, 2008.
    [3]
    L. Antova, C. Koch, and D. Olteanu. From complete to incomplete information and back. In SIGMOD, 2007.
    [4]
    A. Berry, P. Heggernes, and Y. Villanger. A vertex incremental approach for maintaining chordality. Discrete Mathematics, 2006.
    [5]
    H. C. Bravo and R. Ramakrishnan. Optimizing mpf queries: decision support and probabilistic inference. In SIGMOD, 2007.
    [6]
    R. Cheng, Y. Xia, S. Prabhakar, R. Shah, and J. S. Vitter. Efficient indexing methods for probabilistic threshold queries over uncertain data. In VLDB, 2004.
    [7]
    A. Choi and A. Darwiche. Focusing generalizations of belief propagation on targeted queries. In AAAI, 2008.
    [8]
    R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhater. Probabilistic Networks and Expert Systems. Springer, 1999.
    [9]
    N. Dalvi and D. Suciu. Management of probabilistic data: foundations and challenges. In PODS, 2007.
    [10]
    N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.
    [11]
    A. Darwiche and M. Hopkins. Using recursive decomposition to construct elimination orders, jointrees, and dtrees. In ECSQARU, 2001.
    [12]
    R. Dechter. Constraint Networks (Survey). John Wiley&Sons, 1992
    [13]
    A. Deshpande, L. Getoor, and P. Sen. Graphical Models for Uncertain Data. Managing and Mining Uncertain Data. Charu Aggarwal ed., Springer, 2009.
    [14]
    A. Deshpande, C. Guestrin, and S. Madden. Using probabilistic models for data management in acquisitional environments. In CIDR, 2005.
    [15]
    X. L. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. In VLDB, 2007.
    [16]
    J. Finn and J. Frank. Optimal junction trees. In UAI, 1994.
    [17]
    R. Gupta and S. Sarawagi. Creating probabilistic databases from information extraction models. In VLDB, 2006.
    [18]
    C. Huang and A. Darwiche. Inference in belief networks: A procedural guide. Int. J. Approx. Reasoning, 1996.
    [19]
    T. Imielinski and W. L. Jr. Incomplete information in relational databases. JACM, 31(4), 1984.
    [20]
    T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bull., 29(1), 2006.
    [21]
    B. Kanagal and A. Deshpande. Online filtering, smoothing and probabilistic modeling of streaming data. In ICDE, 2008.
    [22]
    B. Kanagal and A. Deshpande. Efficient query evaluation over temporally correlated probabilistic streams. In ICDE, 2009.
    [23]
    C. Koch and D. Olteanu. Conditioning probabilistic databases. PVLDB, 2008.
    [24]
    S. Kundu and J. Misra. A linear tree partitioning algorithm. SIAM J. Comput., 1977.
    [25]
    L. V. S. Lakshmanan et al. Probview: a exible probabilistic database system. ACM TODS, 1997.
    [26]
    J. Letchner, C. Re, M. Balazinska, and M. Philipose. Access methods for markovian streams. In ICDE, 2009.
    [27]
    R. Mateescu and R. Dechter. And/or cutset conditioning. In IJCAI, 2005.
    [28]
    D. Patterson, L. Liao, D. Fox, and H. Kautz. Inferring high level behavior from low level sensors. In UBICOMP, 2003.
    [29]
    C. Re, J. Letchner, M. Balazinska, and D. Suciu. Event queries on correlated probabilistic streams. In SIGMOD, 2008.
    [30]
    P. Sen and A. Deshpande. Representing and querying correlated tuples in probabilistic databases. In ICDE, 2007.
    [31]
    P. Sen, A. Deshpande, and L. Getoor. Exploiting shared correlations in probabilistic databases. In VLDB, 2008.
    [32]
    S. Singh, C. Mayfield, S. Prabhakar, R. Shah, and S. E. Hambrusch. Indexing uncertain categorical data. In ICDE, 2007.
    [33]
    Y. Tao et al. Indexing multi-dimensional uncertain data with arbitrary probability densityfunctions. In VLDB, 2005.
    [34]
    D. Z. Wang, E. Michelakis, M. N. Garofalakis, and J. M. Hellerstein. Bayesstore: managing large, uncertain data repositories with probabilistic graphical models. PVLDB, 2008.
    [35]
    J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.
    [36]
    J. S. Yedidia, W. T. Freeman, and Y. Weiss. Generalized belief propagation. In NIPS, 2000.

    Cited By

    View all
    • (2024)Space-Efficient Indexes for Uncertain Strings2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00367(4828-4842)Online publication date: 13-May-2024
    • (2022)Probabilistic Databases10.1007/978-3-031-01879-4Online publication date: 2-Mar-2022
    • (2021)Workload-aware Materialization for Efficient Variable Elimination on Bayesian Networks2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00104(1152-1163)Online publication date: Apr-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
    June 2009
    1168 pages
    ISBN:9781605585512
    DOI:10.1145/1559845
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. caching
    2. indexing
    3. inference queries
    4. junction trees
    5. probabilistic databases

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS '09
    Sponsor:
    SIGMOD/PODS '09: International Conference on Management of Data
    June 29 - July 2, 2009
    Rhode Island, Providence, USA

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Space-Efficient Indexes for Uncertain Strings2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00367(4828-4842)Online publication date: 13-May-2024
    • (2022)Probabilistic Databases10.1007/978-3-031-01879-4Online publication date: 2-Mar-2022
    • (2021)Workload-aware Materialization for Efficient Variable Elimination on Bayesian Networks2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00104(1152-1163)Online publication date: Apr-2021
    • (2020)A survey of uncertain data managementFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-017-7063-z14:1(162-190)Online publication date: 1-Feb-2020
    • (2019)Efficient User Guidance for Validating Participatory Sensing DataACM Transactions on Intelligent Systems and Technology10.1145/332616410:4(1-30)Online publication date: 17-Jul-2019
    • (2017)Graphical Models for Uncertain Data ManagementEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_80741-1(1-8)Online publication date: 19-Sep-2017
    • (2015)Fast Best-Effort Search on Graphs with Multiple AttributesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.234548227:3(755-768)Online publication date: 3-Feb-2015
    • (2015)Threshold-Based Shortest Path Query over Large Correlated Uncertain GraphsJournal of Computer Science and Technology10.1007/s11390-015-1559-530:4(762-780)Online publication date: 8-Jul-2015
    • (2015)Mining Frequent Itemsets in Correlated Uncertain DatabasesJournal of Computer Science and Technology10.1007/s11390-015-1555-930:4(696-712)Online publication date: 8-Jul-2015
    • (2014)Transducing Markov sequencesJournal of the ACM10.1145/263006561:5(1-48)Online publication date: 8-Sep-2014
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media