Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2448556.2448568acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

Towards realistic sampling: generating dependencies in a relational database

Published: 17 January 2013 Publication History

Abstract

Managing large amounts of information is one of the most expensive, time-consuming and non-trivial activities and it usually requires expert knowledge. In a wide range of application areas, such as data mining, histogram construction, approximate query evaluation, and software validation, handling exponentially growing databases has become a difficult challenge, and a subset of the data is generally preferred. As a solution to the current challenges in managing large amounts of data, database sampling from the operational data available has proved to be a powerful technique. However, none of the existing sampling approaches consider the dependencies between the data in a relational database. In this paper, we propose a novel approach towards constructing a realistic testing environment, by analyzing the distribution of data in the original database along these dependencies before sampling, so that the sample database is representative to the original database.

References

[1]
IBM Optim Integrated Data Management. http://www-01.ibm.com/software/data/data-management/optim-solutions/.
[2]
Oracle Database Performance Tuning Guide. http://docs.oracle.com/cd/B19306_01/server.102/b14211/stats.htm#PFGRF003.
[3]
SELECT - Oracle Documentation. http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_10002.htm.
[4]
J. Bisbal and J. Grimson. Consistent database sampling as a database prototyping approach. Journal of Software Maintenance, 14(6): 447--459, Nov. 2002.
[5]
J. Bisbal, J. Grimson, and D. Bell. A formal framework for database sampling. Inf. Softw. Technol., 47(12): 819--828, Sept. 2005.
[6]
J. Bisbal, B. Wu, D. Lawless, and J. Grimson. Building consistent sample databases to support information system evolution and migration. In Proceedings of the 9th International Conference on Database and Expert Systems Applications (DEXA'98), volume 1460 of Lecture Notes in Computer Science, pages 196--205. Springer-Verlag, 1998.
[7]
V. T. Chakaravarthy, V. Pandit, and Y. Sabharwal. Analysis of sampling techniques for association rule mining. In Proceedings of the 12th International Conference on Database Theory, ICDT '09, pages 276--283, New York, NY, USA, 2009. ACM.
[8]
S. Chaudhuri, G. Das, and U. Srivastava. Effective use of block-level sampling in statistics estimation. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, SIGMOD '04, pages 287--298, New York, NY, USA, 2004. ACM.
[9]
G. John and P. Langley. Static versus dynamic sampling for data mining. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 367--370. AAAI Press, 1996.
[10]
H. Köhler, X. Zhou, S. Sadiq, Y. Shu, and K. Taylor. Sampling dirty data for matching attributes. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, SIGMOD '10, pages 63--74, New York, NY, USA, 2010. ACM.
[11]
F. Olken. Random Sampling from Databases. PhD thesis, University of California at Berkeley, 1993.
[12]
C. R. Palmer and C. Faloutsos. Density biased sampling: an improved method for data mining and clustering. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, SIGMOD '00, pages 82--92, New York, NY, USA, 2000. ACM.
[13]
F. Provost, D. Jensen, and T. Oates. Efficient progressive sampling. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 23--32. ACM Press, 1999.
[14]
H. Toivonen. Sampling large databases for association rules. In Proceedings of the 22th International Conference on Very Large Data Bases, VLDB '96, pages 134--145, San Francisco, CA, USA, 1996. Morgan Kaufmann Publishers Inc.
[15]
M. J. Zaki, S. Parthasarathy, W. Li, and M. Ogihara. Evaluation of sampling for data mining of association rules. Technical report, Rochester, NY, USA, 1996.

Cited By

View all
  • (2015)Data Summarization Techniques for Big Data—A SurveyHandbook on Data Centers10.1007/978-1-4939-2092-1_38(1109-1152)Online publication date: 17-Mar-2015

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICUIMC '13: Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
January 2013
772 pages
ISBN:9781450319584
DOI:10.1145/2448556
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 January 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. database sampling
  2. relational database
  3. test

Qualifiers

  • Research-article

Funding Sources

Conference

ICUIMC '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 251 of 941 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2015)Data Summarization Techniques for Big Data—A SurveyHandbook on Data Centers10.1007/978-1-4939-2092-1_38(1109-1152)Online publication date: 17-Mar-2015

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media