Abstract
During software development, it is often necessary to access real customer data in order to validate requirements and performance thoroughly. However, company and legal policies often restrict access to such sensitive information. Without real data, developers have to either create their own customized test data manually or rely on standardized benchmarks. While the first tends to lack scalability and edge cases, the latter solves these issues but cannot reflect the productive data distributions of a company.
In this paper, we propose PopulAid as a tool that allows developers to create customized benchmarks. We offer a convenient data generator that incorporates specific characteristics of real-world applications to generate synthetic data. So, companies have no need to reveal sensible data but yet developers have access to important development artifacts. We demonstrate our approach by generating a customized test set with medical information for developing SAP’s healthcare solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
More information (including a screencast) can be found at: https://epic.hpi.uni-potsdam.de/Home/PopulAid.
- 2.
More information concerning the tables and the included attributes is accessible under: http://help.sap.com/saphelp_crm60/helpdata/de/09/a4d2f5270f4e58b2358fc5519283be/content.htm.
References
Tay, Y.C.: Data generation for application-specific benchmarking. In: VLDB Challenges and Visions, pp. 1470–1473 (2011)
Plattner, H.: A Course in In-Memory Data Management. Springer, Heidelberg (2013)
Newman, M.E.: Power laws, pareto distributions and zipf’s law. Contemp. Phys. 46(5), 323–351 (2005)
Stephens, J.M., Poess, M.: MUDD: a multi-dimensional data generator. In: Proceedings of the 4th International Workshop on Software and Performance. WOSP 2004, pp. 104–109, ACM (2004)
Rabl, T., Jacobsen, H.-A.: Big data generation. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB 2012. LNCS, vol. 8163, pp. 20–27. Springer, Heidelberg (2014)
Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: BDGS: A scalable big data generator suite in big data benchmarking, pp. 1–16 (2014). arXiv preprint arXiv:1401.5465
Alexandrov, A., Tzoumas, K., Markl, V.: Myriad: scalable and expressive data generation. Proc. VLDB Endow. 5(12), 1890–1893 (2012)
Acknowledgments
We thank Janusch Jacoby, Benjamin Reissaus, Kai-Adrian Rollmann, and Hendrik Folkerts for their valuable contributions during the development of PopulAid.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Teusner, R., Perscheid, M., Appeltauer, M., Enderlein, J., Klingbeil, T., Kusber, M. (2015). PopulAid: In-Memory Test Data Generation. In: Rabl, T., Sachs, K., Poess, M., Baru, C., Jacobson, HA. (eds) Big Data Benchmarking. WBDB 2014. Lecture Notes in Computer Science(), vol 8991. Springer, Cham. https://doi.org/10.1007/978-3-319-20233-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-20233-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20232-7
Online ISBN: 978-3-319-20233-4
eBook Packages: Computer ScienceComputer Science (R0)