Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/974044.974060acmconferencesArticle/Chapter ViewAbstractPublication PagesicpeConference Proceedingsconference-collections
Article

MUDD: a multi-dimensional data generator

Published: 01 January 2004 Publication History

Abstract

Today's business intelligence systems consist of hundreds of processors with disk subsystems able to handle multiple Giga-bytes of IO-bandwidth. These systems usually contain terabytes of data. Evaluating database system performance of such systems often requires generating synthetic data with well defined statistical properties. To simulate different scenarios, it is important to vary statistical properties including row counts of tables. Foremost, in order to analyze large scale systems, data generators need to be able to produce hundreds of terabytes of data in a timely fashion. In this paper we present MUDD, a multi-dimensional data generator. Originally designed for TPC-DS, a decision support benchmark being developed by the TPC, MUDD is able to generate up to 100 Terabyte of flat file data in hours, utilizing modern multi processor architectures, including clusters. Its novel design separates data generation algorithms from data distribution definitions, enabling users to adjust their workload to individual needs and different scenarios.

References

[1]
Bitton, D., DeWitt. D, Turbyfill, C., Source code for Wisconsin Database Generator distributed on the "Wisconsin Benchmark Tape", Computer Science, U. Wisconsin, Madison, WI. 1984.
[2]
Datatect 'the universal test data generation tool", http://www.quest.com/.
[3]
J. Gray, P. Sundaresan, S. Englert, K. Baclawski, P. Weinberger, "Quickly Generating Billion-Record Synthetic Databases". Proc. ACM SIGMOD Conf., Minneapolis, MN, May, 1994. http://citeseer.nj.nec.com/gray94quickly.html.
[4]
Kimball, R. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley & Sons, 1996.
[5]
OLAP Council APB-1OLAP Benchmark Specification Release IIhttp://www.olapcouncil.org/research/bmarkco.htm, 1998.
[6]
Poess, M., Smith B., Kollar L., Larson P.: TPC-DS: Taking Decision Support Benchmarking to the Next Level. SIGMOD Conference 2002.
[7]
The Benchmark Factory, http://www.quest.com/benchmark_factory/.
[8]
Transaction Processing Performance Council (TPC), "TPC Benchmark D (Decision Support)", May 1995 http://www.tpc.org/tpcd/spec/tpcd_current.pdf.
[9]
United States Geologic Survey, "Geographic Name Information Server: Populated Places File", http://geonames.usgs.gov/stategaz/index.html.
[10]
US Census Bureau, Unadjusted and Adjusted Estimates of Monthly Retail and Food Services Sales by Kinds of Business:2001, Department stores (excl. L. D) 4521. http://www.census.gov/mrts/www/data/html/nsal01.html.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WOSP '04: Proceedings of the 4th international workshop on Software and performance
January 2004
313 pages
ISBN:1581136730
DOI:10.1145/974044
  • cover image ACM SIGSOFT Software Engineering Notes
    ACM SIGSOFT Software Engineering Notes  Volume 29, Issue 1
    January 2004
    300 pages
    ISSN:0163-5948
    DOI:10.1145/974043
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. TPC-DS
  2. decision support
  3. performance analysis

Qualifiers

  • Article

Conference

WOSP04
WOSP04: Fourth International Workshop on Software and Performance 2004
January 14 - 16, 2004
California, Redwood Shores

Acceptance Rates

WOSP '04 Paper Acceptance Rate 38 of 70 submissions, 54%;
Overall Acceptance Rate 149 of 241 submissions, 62%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Synthetic Data Generation for Enterprise DBMS2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00274(3585-3588)Online publication date: Apr-2023
  • (2020)SmartBenchProceedings of the VLDB Endowment10.14778/3407790.340779113:12(1807-1820)Online publication date: 14-Sep-2020
  • (2019)Data generators: a short survey of techniques and use cases with focus on testing2019 IEEE 9th International Conference on Consumer Electronics (ICCE-Berlin)10.1109/ICCE-Berlin47944.2019.8966202(189-194)Online publication date: Sep-2019
  • (2015)Just can't get enoughProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2735378(1457-1462)Online publication date: 27-May-2015
  • (2015)Chronos: An elastic parallel framework for stream benchmark generation and simulation2015 IEEE 31st International Conference on Data Engineering10.1109/ICDE.2015.7113276(101-112)Online publication date: Apr-2015
  • (2015)PopulAid: In-Memory Test Data GenerationBig Data Benchmarking10.1007/978-3-319-20233-4_10(101-108)Online publication date: 14-Jun-2015
  • (2014)A test-suite generator for database systems2014 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2014.7040957(1-6)Online publication date: Sep-2014
  • (2013)Rapid development of data generators using meta generators in PDGFProceedings of the Sixth International Workshop on Testing Database Systems10.1145/2479440.2479441(1-6)Online publication date: 24-Jun-2013
  • (2012)Efficient update data generation for DBMS benchmarksProceedings of the 3rd ACM/SPEC International Conference on Performance Engineering10.1145/2188286.2188315(169-180)Online publication date: 22-Apr-2012
  • (2012)Big Data GenerationRevised Selected Papers of the First Workshop on Specifying Big Data Benchmarks - Volume 816310.1007/978-3-642-53974-9_3(20-27)Online publication date: 17-Dec-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media