A DSL for Automated Data Quality Monitoring
Pages 89 - 105
Abstract
Data is getting more and more ubiquitous while its importance rises. The quality and outcome of business decisions is directly related to the accuracy of data used in predictions. Thus, a high data quality in database systems being used for business decisions is of high importance. Otherwise bad consequences in the form of commercial loss or even legal implications loom.
In this paper we focus on automating advanced data quality monitoring, and especially the aspect of expressing and evaluating rules for good data quality. We present a domain specific language (DSL) called RADAR for data quality rules, that fulfills our main requirements: reusability of check logic, separation of concerns for different user groups, support for heterogeneous data sources as well as advanced data quality rules such as time series rules. Also, it provides the option to automatically suggest potential rules based on historic data analysis. Furthermore, we show initial optimization approaches for the execution of rules on large data sets and evaluate our language based on these optimizations.
All in all the language presents a novel approach for a flexible and powerful management of data quality in practical applications while meeting the needs of actual data quality managers in being pragmatic and efficient.
References
[1]
Atzeni, P., Bugiotti, F., Rossi, L.: SOS (Save Our Systems): a uniform programming interface for non-relational systems. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 582–585. ACM, New York (2012)
[2]
Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: it is time to make SQL functional. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 41–46. ACM, New York (2012)
[3]
Buneman P, Fernandez M, and Suciu D UnQL: a query language and algebra for semistructured data based on structural recursion VLDB J. 2000 9 1 76-110
[4]
Cheney, J., Lindley, S., Wadler, P.: A practical theory of language-integrated query. In: Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, ICFP 2013, pp. 403–416. ACM, New York (2013)
[5]
Ehrlinger L, Haunschmid V, Palazzini D, and Lettner C Hartmann S, Küng J, Chakravarthy S, Anderst-Kotsis G, Tjoa AM, and Khalil I A DaQL to monitor data quality in machine learning applications Database and Expert Systems Applications 2019 Cham Springer 227-237
[6]
Ehrlinger, L., Rusz, E., Wöß, W.: A survey of data quality measurement and monitoring tools. arXiv preprint 1907.08138 (2019)
[7]
Ehrlinger, L., Wöß, W.: Automated data quality monitoring. In: MIT International Conference on Information Quality, vol. 22, pp. 19-1-19-8 (2017)
[8]
Endler, G., Schwab, P.K., Wahl, A.M., Tenschert, J., Lenz, R.: An architecture for continuous data quality monitoring in medical centers. In: MEDINFO (2015)
[9]
Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming. O’Reilly Media, Inc., Sebastopol (2005)
[10]
Garcia-Molina H et al. The TSIMMIS approach to mediation: data models and languages J. Intell. Inf. Syst. 1997 8 2 117-132
[11]
Heine F, Kleiner C, Koschel A, and Westermayer J The Data Checking Engine: Complex Rules for Data Quality Monitoring Int. J. Adv. Software 2014 7 12 171-181
[12]
Heine F, Kleiner C, and Oelsner T Hartmann S, Küng J, Chakravarthy S, Anderst-Kotsis G, Tjoa AM, and Khalil I Automated detection and monitoring of advanced data quality rules Database and Expert Systems Applications 2019 Cham Springer 238-247
[13]
Oelsner, T., Heine, F., Kleiner, C.: IQM4HD Concepts. Technical report, University of Applied Sciences and Arts Hannover, Germany (2018). http://iqm4hd.wp.hs-hannover.de/ConceptsIQM4HD_v09.pdf
[14]
Pipino LL, Lee YW, and Wang RY Data quality assessment Commun. ACM 2002 45 4 211-218
[15]
Rith, J., Lehmayr, P.S., Meyer-Wegener, K.: Speaking in tongues: SQL Access to NoSQL systems. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC 2014, pp. 855–857. ACM, New York (2014)
[16]
Schelter S, Lange D, Schmidt P, Celikel M, Biessmann F, and Grafberger A Automating large-scale data quality verification Proc. VLDB Endowment 2018 11 12 1781-1794
[17]
Sobieski, S., Zieliński, B.: Using maude rewriting system to modularize and extend SQL. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC 2013, pp. 853–858. ACM, New York (2013)
[18]
Zhang, C., Xu, J.: A unified SQL middleware for NoSQL databases. In: Proceedings of the 2018 International Conference on Big Data and Computing, ICBDC 2018, pp. 14–19. ACM, New York (2018)
Recommendations
Automated Detection and Monitoring of Advanced Data Quality Rules
Database and Expert Systems ApplicationsCan big data improve firm decision quality? The role of data quality and data diagnosticity
AbstractAnecdotal evidence suggests that, despite the large variety of data, the huge volume of generated data, and the fast velocity of obtaining data (i.e., big data), quality of big data is far from perfect. Therefore, many firms defer ...
Highlights- Data quality (DQ) enhances data diagnosticity and firm decision quality.
- Big ...
Strategies for Data Quality Monitoring in Business Processes
Web Information Systems Engineering – WISE 2014 WorkshopsAbstractThe relevance of data quality is continuously increasing in modern enterprises. This is due to the fact that poor data quality has often a negative impact on the business effectiveness and efficiency. Errors, missing or out-of-date data might ...
Comments
Information & Contributors
Information
Published In
Sep 2020
468 pages
ISBN:978-3-030-59002-4
DOI:10.1007/978-3-030-59003-1
© Springer Nature Switzerland AG 2020.
Publisher
Springer-Verlag
Berlin, Heidelberg
Publication History
Published: 14 September 2020
Author Tags
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024