Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2661829.2661845acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
demonstration

VFDS: An Application to Generate Fast Sample Databases

Published: 03 November 2014 Publication History

Abstract

Large amounts of data often require expensive and time-consuming analysis. Therefore, highly scalable and efficient techniques are necessary to process, analyze and discover useful information. Database sampling has proven to be a powerful method to surpass these limitations. Using only a sample of the original large database brings the benefit of obtaining useful information faster, at the potential expense of lower accuracy. In this paper, we demonstrate \vfds, a novel fast database sampling system that maintains the referential integrity of the data. The system is developed over the open-source database management system, MySQL. We present various scenarios to demonstrate the effectiveness of VFDS in approximate query answering, sample size, and execution time, on both real and synthetic databases.

References

[1]
S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join synopses for approximate query answering. In SIGMOD, pages 275--286, 1999.
[2]
J. Bisbal, J. Grimson, and D. Bell. A formal framework for database sampling. Information and Software Technology, 47(12):819--828, 2005.
[3]
T. S. Buda, T. Cerqueus, J. Murphy, and M. Kristiansen. VFDS: Very fast database sampling system. In IEEE IRI, pages 153--160, 2013.
[4]
V. T. Chakaravarthy, V. Pandit, and Y. Sabharwal. Analysis of sampling techniques for association rule mining. In ICST, pages 276--283, 2009.
[5]
S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. In SIGMOD, pages 263--274, 1999.
[6]
R. Gemulla, P. Rösch, and W. Lehner. Linked bernoulli synopses: Sampling along foreign keys. In SSDBM, pages 6--23, 2008.
[7]
B. Goethals, W. Le Page, and M. Mampaey. Mining interesting sets and rules in relational databases. In SAC, pages 997--1001, 2010.
[8]
G. John and P. Langley. Static versus dynamic sampling for data mining. In KDD, pages 367--370, 1996.
[9]
F. Olken. Random Sampling from Databases. PhD thesis, University of California at Berkeley, 1993.
[10]
C. Olston, S. Chopra, and U. Srivastava. Generating example data for dataflow programs. In SIGMOD, pages 245--256, 2009.
[11]
K. Taneja, Y. Zhang, and T. Xie. MODA: Automated test generation for database applications via mock objects. In ASE, pages 289--292, 2010.
[12]
X. Wu, Y. Wang, S. Guo, and Y. Zheng. Privacy preserving database generation for database application testing. Fundamenta Informaticae, 78(4):595--612, 2007.
[13]
R. Yahalom, E. Shmueli, and T. Zrihen. Constrained anonymization of production data: a constraint satisfaction problem approach. In SDM, pages 41--53, 2010.

Cited By

View all
  • (2020)A Regular Expression-based DGL for Meaningful Synthetic Data Generation2020 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp48618.2020.00-42(396-401)Online publication date: Feb-2020
  • (2019)A Collaborative Framework for Similarity Enforcement in Synthetic Scaling of Relational Datasets2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00173(1686-1689)Online publication date: Apr-2019
  • (2018)A collaborative framework for tweaking properties in a synthetic datasetProceedings of the VLDB Endowment10.14778/3229863.323624711:12(2010-2013)Online publication date: 1-Aug-2018
  • Show More Cited By

Index Terms

  1. VFDS: An Application to Generate Fast Sample Databases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
    November 2014
    2152 pages
    ISBN:9781450325981
    DOI:10.1145/2661829
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 November 2014

    Check for updates

    Author Tags

    1. database sampling
    2. random
    3. relational database

    Qualifiers

    • Demonstration

    Funding Sources

    Conference

    CIKM '14
    Sponsor:

    Acceptance Rates

    CIKM '14 Paper Acceptance Rate 175 of 838 submissions, 21%;
    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)A Regular Expression-based DGL for Meaningful Synthetic Data Generation2020 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp48618.2020.00-42(396-401)Online publication date: Feb-2020
    • (2019)A Collaborative Framework for Similarity Enforcement in Synthetic Scaling of Relational Datasets2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00173(1686-1689)Online publication date: Apr-2019
    • (2018)A collaborative framework for tweaking properties in a synthetic datasetProceedings of the VLDB Endowment10.14778/3229863.323624711:12(2010-2013)Online publication date: 1-Aug-2018
    • (2016)DscalerProceedings of the VLDB Endowment10.14778/3007328.30073339:14(1671-1682)Online publication date: 1-Oct-2016

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media