Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-59003-1_6guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A DSL for Automated Data Quality Monitoring

Published: 14 September 2020 Publication History

Abstract

Data is getting more and more ubiquitous while its importance rises. The quality and outcome of business decisions is directly related to the accuracy of data used in predictions. Thus, a high data quality in database systems being used for business decisions is of high importance. Otherwise bad consequences in the form of commercial loss or even legal implications loom.
In this paper we focus on automating advanced data quality monitoring, and especially the aspect of expressing and evaluating rules for good data quality. We present a domain specific language (DSL) called RADAR for data quality rules, that fulfills our main requirements: reusability of check logic, separation of concerns for different user groups, support for heterogeneous data sources as well as advanced data quality rules such as time series rules. Also, it provides the option to automatically suggest potential rules based on historic data analysis. Furthermore, we show initial optimization approaches for the execution of rules on large data sets and evaluate our language based on these optimizations.
All in all the language presents a novel approach for a flexible and powerful management of data quality in practical applications while meeting the needs of actual data quality managers in being pragmatic and efficient.

References

[1]
Atzeni, P., Bugiotti, F., Rossi, L.: SOS (Save Our Systems): a uniform programming interface for non-relational systems. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 582–585. ACM, New York (2012)
[2]
Binnig, C., Rehrmann, R., Faerber, F., Riewe, R.: FunSQL: it is time to make SQL functional. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, pp. 41–46. ACM, New York (2012)
[3]
Buneman P, Fernandez M, and Suciu D UnQL: a query language and algebra for semistructured data based on structural recursion VLDB J. 2000 9 1 76-110
[4]
Cheney, J., Lindley, S., Wadler, P.: A practical theory of language-integrated query. In: Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming, ICFP 2013, pp. 403–416. ACM, New York (2013)
[5]
Ehrlinger L, Haunschmid V, Palazzini D, and Lettner C Hartmann S, Küng J, Chakravarthy S, Anderst-Kotsis G, Tjoa AM, and Khalil I A DaQL to monitor data quality in machine learning applications Database and Expert Systems Applications 2019 Cham Springer 227-237
[6]
Ehrlinger, L., Rusz, E., Wöß, W.: A survey of data quality measurement and monitoring tools. arXiv preprint 1907.08138 (2019)
[7]
Ehrlinger, L., Wöß, W.: Automated data quality monitoring. In: MIT International Conference on Information Quality, vol. 22, pp. 19-1-19-8 (2017)
[8]
Endler, G., Schwab, P.K., Wahl, A.M., Tenschert, J., Lenz, R.: An architecture for continuous data quality monitoring in medical centers. In: MEDINFO (2015)
[9]
Feuerstein, S., Pribyl, B.: Oracle PL/SQL Programming. O’Reilly Media, Inc., Sebastopol (2005)
[10]
Garcia-Molina H et al. The TSIMMIS approach to mediation: data models and languages J. Intell. Inf. Syst. 1997 8 2 117-132
[11]
Heine F, Kleiner C, Koschel A, and Westermayer J The Data Checking Engine: Complex Rules for Data Quality Monitoring Int. J. Adv. Software 2014 7 12 171-181
[12]
Heine F, Kleiner C, and Oelsner T Hartmann S, Küng J, Chakravarthy S, Anderst-Kotsis G, Tjoa AM, and Khalil I Automated detection and monitoring of advanced data quality rules Database and Expert Systems Applications 2019 Cham Springer 238-247
[13]
Oelsner, T., Heine, F., Kleiner, C.: IQM4HD Concepts. Technical report, University of Applied Sciences and Arts Hannover, Germany (2018). http://iqm4hd.wp.hs-hannover.de/ConceptsIQM4HD_v09.pdf
[14]
Pipino LL, Lee YW, and Wang RY Data quality assessment Commun. ACM 2002 45 4 211-218
[15]
Rith, J., Lehmayr, P.S., Meyer-Wegener, K.: Speaking in tongues: SQL Access to NoSQL systems. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, SAC 2014, pp. 855–857. ACM, New York (2014)
[16]
Schelter S, Lange D, Schmidt P, Celikel M, Biessmann F, and Grafberger A Automating large-scale data quality verification Proc. VLDB Endowment 2018 11 12 1781-1794
[17]
Sobieski, S., Zieliński, B.: Using maude rewriting system to modularize and extend SQL. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC 2013, pp. 853–858. ACM, New York (2013)
[18]
Zhang, C., Xu, J.: A unified SQL middleware for NoSQL databases. In: Proceedings of the 2018 International Conference on Big Data and Computing, ICBDC 2018, pp. 14–19. ACM, New York (2018)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Database and Expert Systems Applications: 31st International Conference, DEXA 2020, Bratislava, Slovakia, September 14–17, 2020, Proceedings, Part I
Sep 2020
468 pages
ISBN:978-3-030-59002-4
DOI:10.1007/978-3-030-59003-1
  • Editors:
  • Sven Hartmann,
  • Josef Küng,
  • Gabriele Kotsis,
  • A Min Tjoa,
  • Ismail Khalil

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 14 September 2020

Author Tags

  1. Data quality
  2. Domain specific language
  3. Data quality monitoring
  4. Rule based data quality
  5. Data heterogeneity

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media