PopulAid: In-Memory Test Data Generation

Teusner, Ralf; Perscheid, Michael; Appeltauer, Malte; Enderlein, Jonas; Klingbeil, Thomas; Kusber, Michael

doi:10.1007/978-3-319-20233-4_10

Ralf Teusner¹⁸,
Michael Perscheid¹⁹,
Malte Appeltauer¹⁹,
Jonas Enderlein¹⁸,
Thomas Klingbeil¹⁹ &
…
Michael Kusber¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8991))

Included in the following conference series:

Workshop on Big Data Benchmarks

1035 Accesses

Abstract

During software development, it is often necessary to access real customer data in order to validate requirements and performance thoroughly. However, company and legal policies often restrict access to such sensitive information. Without real data, developers have to either create their own customized test data manually or rely on standardized benchmarks. While the first tends to lack scalability and edge cases, the latter solves these issues but cannot reflect the productive data distributions of a company.

In this paper, we propose PopulAid as a tool that allows developers to create customized benchmarks. We offer a convenient data generator that incorporates specific characteristics of real-world applications to generate synthetic data. So, companies have no need to reveal sensible data but yet developers have access to important development artifacts. We demonstrate our approach by generating a customized test set with medical information for developing SAP’s healthcare solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GreenScaler: training software energy models with automatic test generation

Article 20 July 2018

Challenges in Testing Big Data Systems

Application of deep learning models to generate rich, dynamic and production-like test data

Article Open access 18 October 2024

Notes

1.
More information (including a screencast) can be found at: https://epic.hpi.uni-potsdam.de/Home/PopulAid.
2.
More information concerning the tables and the included attributes is accessible under: http://help.sap.com/saphelp_crm60/helpdata/de/09/a4d2f5270f4e58b2358fc5519283be/content.htm.

References

Tay, Y.C.: Data generation for application-specific benchmarking. In: VLDB Challenges and Visions, pp. 1470–1473 (2011)
Google Scholar
Plattner, H.: A Course in In-Memory Data Management. Springer, Heidelberg (2013)
Book Google Scholar
Newman, M.E.: Power laws, pareto distributions and zipf’s law. Contemp. Phys. 46(5), 323–351 (2005)
Article Google Scholar
Stephens, J.M., Poess, M.: MUDD: a multi-dimensional data generator. In: Proceedings of the 4th International Workshop on Software and Performance. WOSP 2004, pp. 104–109, ACM (2004)
Google Scholar
Rabl, T., Jacobsen, H.-A.: Big data generation. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 20–27. Springer, Heidelberg (2014)
Chapter Google Scholar
Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: BDGS: A scalable big data generator suite in big data benchmarking, pp. 1–16 (2014). arXiv preprint arXiv:1401.5465
Alexandrov, A., Tzoumas, K., Markl, V.: Myriad: scalable and expressive data generation. Proc. VLDB Endow. 5(12), 1890–1893 (2012)
Article Google Scholar

Download references

Acknowledgments

We thank Janusch Jacoby, Benjamin Reissaus, Kai-Adrian Rollmann, and Hendrik Folkerts for their valuable contributions during the development of PopulAid.

Author information

Authors and Affiliations

Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
Ralf Teusner & Jonas Enderlein
SAP Innovation Center, Potsdam, Germany
Michael Perscheid, Malte Appeltauer, Thomas Klingbeil & Michael Kusber

Authors

Ralf Teusner
View author publications
You can also search for this author in PubMed Google Scholar
Michael Perscheid
View author publications
You can also search for this author in PubMed Google Scholar
Malte Appeltauer
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Enderlein
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Klingbeil
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kusber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ralf Teusner .

Editor information

Editors and Affiliations

University of Toronto, Toronto, Ontario, Canada
Tilmann Rabl
SAP SE, Köln, Germany
Kai Sachs
Server Technologies, Oracle Corporation, Redwood Shores, California, USA
Meikel Poess
University of California at San Diego, La Jolla, CA, USA
Chaitanya Baru
Middleware Systems Research Group, Toronto, Ontario, Canada
Hans-Arno Jacobson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Teusner, R., Perscheid, M., Appeltauer, M., Enderlein, J., Klingbeil, T., Kusber, M. (2015). PopulAid: In-Memory Test Data Generation. In: Rabl, T., Sachs, K., Poess, M., Baru, C., Jacobson, HA. (eds) Big Data Benchmarking. WBDB 2014. Lecture Notes in Computer Science(), vol 8991. Springer, Cham. https://doi.org/10.1007/978-3-319-20233-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-20233-4_10
Published: 14 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20232-7
Online ISBN: 978-3-319-20233-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PopulAid: In-Memory Test Data Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GreenScaler: training software energy models with automatic test generation

Challenges in Testing Big Data Systems

Application of deep learning models to generate rich, dynamic and production-like test data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PopulAid: In-Memory Test Data Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GreenScaler: training software energy models with automatic test generation

Challenges in Testing Big Data Systems

Application of deep learning models to generate rich, dynamic and production-like test data

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation