Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Requirements for Data Quality Metrics

Published: 22 January 2018 Publication History

Abstract

Data quality and especially the assessment of data quality have been intensively discussed in research and practice alike. To support an economically oriented management of data quality and decision making under uncertainty, it is essential to assess the data quality level by means of well-founded metrics. However, if not adequately defined, these metrics can lead to wrong decisions and economic losses. Therefore, based on a decision-oriented framework, we present a set of five requirements for data quality metrics. These requirements are relevant for a metric that aims to support an economically oriented management of data quality and decision making under uncertainty. We further demonstrate the applicability and efficacy of these requirements by evaluating five data quality metrics for different data quality dimensions. Moreover, we discuss practical implications when applying the presented requirements.

References

[1]
R. Agrawal, T. Imieliński, and A. Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD/PODS’93), P. Buneman and S. Jajodia (Eds.). ACM Press, New York, 207--216.
[2]
R. Agrawal and R. Srikant. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), J. B. Bocca, M. Jarke, and C. Zaniolo (Eds.). Morgan Kaufmann Publishers, San Francisco, CA, 487--499.
[3]
M. Allen and D. Cervo. 2015. Multi-Domain Master Data Management. Advanced MDM and Data Governance in Practice. Morgan Kaufmann.
[4]
M. J. Allen and W. M. Yen. 2002. Introduction to Measurement Theory. Waveland Press, Long Grove, IL.
[5]
P. Alpar and S. Winkelsträter. 2014. Assessment of data quality in accounting data with association rules. Expert Systems with Applications 41, 5, 2259--2268.
[6]
M. Azuma. 2001. SQuaRE: The next generation of the ISO/IEC 9126 and 14598 international standards series on software product quality. In European Software Control and Metrics Conference (ESCOM’01), 337--346.
[7]
D. Ballou, R. Wang, H. Pazer, and G. K. Tayi. 1998. Modeling information manufacturing systems to determine information product quality. Management Science 44, 4, 462--484.
[8]
C. Batini and M. Scannapieco. 2006. Data quality: Concepts, Methodologies and Techniques. Springer, New York.
[9]
C. Batini and M. Scannapieco. 2016. Data quality dimensions. In Data and Information Quality. Springer, 21--51.
[10]
R. Blake and P. Mangiameli. 2011. The effects and interactions of data quality and problem complexity on classification. Journal of Data and Information Quality (JDIQ) 2, 2, 8.
[11]
L. C. Briand, S. Morasca, and V. R. Basili. 1996. Property-based software engineering measurement. IEEE Transactions on Software Engineering 22, 1, 68--86.
[12]
H. U. Buhl, M. Röglinger, F. Moser, and J. Heidemann. 2013. Big data. A fashionable topic with(out) sustainable relevance for research and practice? Business 8 Information Systems Engineering 5, 2, 65--69.
[13]
Bureau International des Poids et Mesures. 2006. The International System of Units (SI). National Institute of Standards and Technology, Paris.
[14]
L. Cai and Y. Zhu. 2015. The challenges of data quality and data quality assessment in the big data era. Data Science Journal 14, 2 (2015), 1--10.
[15]
Y. Cai and M. Ziad. 2003. Evaluating completeness of an information product. In Americas Conference on Information Systems (AMCIS’03). 2273--2281.
[16]
J. Campanella. 1999. Principles of Quality Costs: Principles, Implementation and Use. ASQ Quality Press, Milwaukee.
[17]
C. Cappiello and M. Comuzzi. 2009. A utility-based model to define the optimal data quality level in IT service offerings. In European Conference on Information Systems (ECIS’09).
[18]
C. Cappiello, T. Di Noia, B. A. Marcu, and M. Matera. 2016. A quality model for linked data exploration. In International Conference on Web Engineering (ICWE’16). 397--404.
[19]
P. Cozby and S. Bates. 2012. Methods in Behavioral Research. McGraw-Hill Higher Education, New York.
[20]
J. Debattista, S. Auer, and C. Lange. 2016. Luzzu—a methodology and framework for linked data quality assessment. Journal of Data and Information Quality (JDIQ) 8, 1, 4.
[21]
D. Driankov, H. Hellendoorn, and M. Reinfrank. 1996. An Introduction to Fuzzy Control. Springer, Berlin.
[22]
M. J. Eppler. 2003. Managing Information Quality: Increasing the Value of Information in Knowledge-Intensive Products and Processes. Springer, Berlin.
[23]
A. Even and G. Shankaranarayanan. 2007. Utility-driven assessment of data quality. Database for Advances in Information Systems 38, 2, 75--93.
[24]
A. Even, G. Shankaranarayanan, and P. D. Berger. 2010. Evaluating a model for cost-effective data quality management in a real-world CRM setting. Decision Support Systems 50, 1, 152--163.
[25]
Experian Information Solutions. 2016. Building a Business Case for Data Quality. Retrieved July 19, 2017, from https://www.edq.com/globalassets/white-papers/building-a-business-case-for-data-quality-report.pdf.
[26]
W. Fan. 2015. Data quality. from theory to practice. SIGMOD Record 44, 3, 7--18.
[27]
A. V. Feigenbaum. 2004. Total Quality Control. McGraw-Hill Professional New York.
[28]
C. W. Fisher, I. Chengalur-Smith, and D. P. Ballou. 2003. The impact of experience and time on the use of data quality information in decision making. Information Systems Research 14, 2, 170--188.
[29]
C. W. Fisher, E. J. M. Lauria, and C. C. Matheus. 2009. An accuracy metric: Percentages, randomness, and probabilities. Journal of Data and Information Quality (JDIQ) 1, 3, 16.
[30]
M. Flood, H. V. Jagadish, and L. Raschid. 2016. Big data challenges and opportunities in financial stability monitoring. Banque de France, Financial Stability Review 20.
[31]
Forbes Insights. 2017. The Data Differentiator. How Improving Data Quality Improves Business. Forbes Media, New York.
[32]
B. Heinrich and D. Hristova. 2014. A fuzzy metric for currency in the context of big data. In European Conference on Information Systems (ECIS’04).
[33]
B. Heinrich and D. Hristova. 2016. A quantitative approach for modelling the influence of currency of information on decision-making under uncertainty. Journal of Decision Systems 25, 1, 16--41.
[34]
B. Heinrich, M. Kaiser, and M. Klier. 2007. How to measure data quality? A metric-based approach. In International Conference on Information Systems (ICIS’07).
[35]
B. Heinrich and M. Klier. 2011. Assessing data currency-a probabilistic approach. Journal of Information Science 37, 1, 86--100.
[36]
B. Heinrich and M. Klier. 2015. Metric-based data quality assessment—Developing and evaluating a probability-based currency metric. Decision Support Systems 72, 82--96.
[37]
B. Heinrich, M. Klier, and Q. Görz. 2012. Data quality assessment: a metric-based approach to quantify the currency of data in information systems. Z Betriebswirtsch 82, 11, 1193--1228 (in German).
[38]
B. Heinrich, M. Klier, and M. Kaiser. 2009. A procedure to develop metrics for currency and its application in CRM. Journal of Data and Information Quality (JDIQ) 1, 1, 1.
[39]
H. Hinrichs. 2002. Datenqualitätsmanagement in Data-Warehouse-Systemen. Dissertation. Universität Oldenburg.
[40]
J. Hipp, U. Güntzer, and U. Grimmer. 2001. Data quality mining-making a virtue of necessity. In 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DKMD’01). 52--57.
[41]
J. Hipp, M. Müller, J. Hohendorff, and F. Naumann. 2007. Rule-based measurement of data quality in nominal data. In Proceedings of the 12th International Conference on Information Quality (ICIQ’07). 364--378.
[42]
K. M. Hüner. 2011. Führungssysteme Und Ausgewählte Maßnahmen Zur Steuerung Von Konzerndatenqualität. Dissertation. Universität St. Gallen.
[43]
K. M. Hüner, A. Schierning, B. Otto, and H. Österle. 2011. Product data quality in supply chains: The case of Beiersdorf. Electronic Markets 21, 2, 141--154.
[44]
IBM Big Data and Analytics Hub. 2016. Extracting Business Value from the 4 V's of Big Data. Retrieved July 19, 2017, from http://www.ibmbigdatahub.com/infographic/extracting-business-value-4-vs-big-data.
[45]
IBM Global Business Services. 2012. Analytics: Big Data in der Praxis. IBM Global Business Services, Armonk.
[46]
ISO/IEC 25020. 2007. Software Engineering - Software Product Quality Requirements and Evaluation (SQuaRE) - Measurement Reference Model and Guide 35.080.
[47]
Z. Jiang, S. Sarkar, P. De, and D. Dey. 2007. A framework for reconciling attribute values from multiple data sources. Management Science 53, 12, 1946--1963.
[48]
B. D. Jones. 1999. Bounded rationality. Annual Review of Political Science 2, 1, 297--321.
[49]
V. Khatri and C. V. Brown. 2010. Designing data governance. Communications of the ACM 53, 1, 148--152.
[50]
KPMG. 2016. Now or Never - 2016 Global CEO Outlook. Retrieved July 31, 2017, from https://home.kpmg.com/content/dam/kpmg/pdf/2016/06/2016-global-ceo-outlook.pdf.
[51]
H. Laux. 2007. Decision Theory. Springer Gabler, Wiesbaden (in German).
[52]
Y. W. Lee, D. M. Strong, B. K. Kahn, and R. Y. Wang. 2002. AIMQ: A methodology for information quality assessment. Information and Management 40, 2, 133--146.
[53]
Y. Levy and T. J. Ellis. 2006. A systems approach to conduct an effective literature review in support of information systems research. Informing Science 9, 1, 181--212.
[54]
F. Li, S. Nastic, and S. Dustdar. 2012. Data quality observation in pervasive environments. In Proceedings of the 2012 IEEE 15th International Conference on Computational Science and Engineering (CSE’12). 602--609.
[55]
M. S. Litwin, Ed. 1995. How to Measure Survey Reliability and Validity. The Survey Kit 7. Sage, Thousand Oaks, CA.
[56]
D. Loshin. 2010. The Practitioner's Guide to Data Quality Improvement. Morgan Kaufmann.
[57]
T. Lukoianova and V. L. Rubin. 2014. Veracity roadmap: Is big data objective, truthful and credible? Advances in Classification Research Online 24, 1, 4--15.
[58]
P. V. Marsden and J. D. Wright (Eds.). 2010. Handbook of Survey Research. Emerald, Bingley.
[59]
S. Moore. 2017. How to Create a Business Case for Data Quality Improvement. Retrieved July 19, 2017, from http://www.gartner.com/smarterwithgartner/how-to-create-a-business-case-for-data-quality-improvement/.
[60]
M. Mosley, M. Brackett, and S. Earley (Eds.). 2009. The DAMA Guide to the Data Management Body of Knowledge Enterprise Server Version. Technics Publications, Westfield.
[61]
R. von. Nitzsch. 2006. Entscheidungslehre. Verlag Mainz, Mainz.
[62]
K. Orr. 1998. Data quality and systems theory. Communications of the ACM 41, 2, 66--71.
[63]
B. Otto. 2011. Data governance. Business 8 Information Systems Engineering 3, 4, 241--244.
[64]
A. Parssian, S. Sarkar, and V. S. Jacob. 2004. Assessing data quality for information products: impact of selection, projection, and Cartesian product. Management Science 50, 7, 967--982.
[65]
M. Peterson. 2009. An Introduction to Decision Theory. Cambridge University Press, Cambridge.
[66]
L. L. Pipino, Y. W. Lee, and R. Y. Wang. 2002. Data quality assessment. Communications of the ACM 45, 4, 211--218.
[67]
T. C. Redman. 1996. Data Quality for the Information Age. Artech House, Boston.
[68]
S. Sarsfield. 2009. The Data Governance Imperative. IT Governance Publishing.
[69]
SAS Institute. 2013. 2013 Big Data Survey Research Brief. SAS Institute, Cary, NC.
[70]
H. A. Simon. 1956. Rational choice and the structure of the environment. Psychological Review 63, 2, 129--138.
[71]
H. A. Simon. 1969. The Sciences of the Artificial. MIT Press, Cambridge.
[72]
S. S. Stevens. 1946. On the theory of scales of measurement. Science 103, 2684, 677--680.
[73]
I. Taleb, H. T. El Kassabi, M. A. Serhani, R. Dssouli, and C. Bouhaddioui. 2016. Big data quality: A quality dimensions evaluation. In 2016 International IEEE Conferences on Ubiquitous Intelligence 8 Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld’16). 759--765.
[74]
R. Y. Wang. 1998. A product perspective on total data quality management. Communications of the ACM 41, 2, 58--65.
[75]
R. Y. Wang, V. C. Storey, and C. P. Firth. 1995. A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering 7, 4, 623--640.
[76]
K. Weber, B. Otto, and H. Österle. 2009. One size does not fit all‐-a contingency approach to data governance. Journal of Data and Information Quality (JDIQ) 1, 1, 4.
[77]
J. Webster and R. T. Watson. 2002. Analyzing the past to prepare for the future: Writing a literature review. Management Information Systems Quarterly 26, 2, 13--23.
[78]
A. Wechsler and A. Even. 2012. Using a Markov-chain model for assessing accuracy degradation and developing data maintenance policies. In Americas Conference on Information Systems (AMCIS’12).
[79]
L. Yang, D. Neagu, M. T. D. Cronin, M. Hewitt, S. J. Enoch, J. C. Madden, and K. Przybylak. 2013. Towards a fuzzy expert system on toxicological data quality assessment. Molecular Informatics 32, 1, 65--78.
[80]
W. Zikmund, B. Babin, J. Carr, and M. Griffin. 2012. Business Research Methods. Cengage Learning, Mason.

Cited By

View all
  • (2024)Use of Context in Data Quality Management: A Systematic Literature ReviewJournal of Data and Information Quality10.1145/367208216:3(1-41)Online publication date: 17-Jun-2024
  • (2024)Security for Machine Learning-based Software Systems: A Survey of Threats, Practices, and ChallengesACM Computing Surveys10.1145/363853156:6(1-38)Online publication date: 23-Feb-2024
  • (2024)Dimensions of data sparseness and their effect on supply chain visibilityComputers and Industrial Engineering10.1016/j.cie.2024.110108191:COnline publication date: 18-Jul-2024
  • Show More Cited By

Index Terms

  1. Requirements for Data Quality Metrics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Data and Information Quality
    Journal of Data and Information Quality  Volume 9, Issue 2
    Challenge Paper, Experience Paper and Research Paper
    June 2017
    77 pages
    ISSN:1936-1955
    EISSN:1936-1963
    DOI:10.1145/3155015
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 January 2018
    Accepted: 01 September 2017
    Revised: 01 August 2017
    Received: 01 July 2016
    Published in JDIQ Volume 9, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data quality
    2. data quality assessment
    3. data quality metrics
    4. requirements for metrics

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)391
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Use of Context in Data Quality Management: A Systematic Literature ReviewJournal of Data and Information Quality10.1145/367208216:3(1-41)Online publication date: 17-Jun-2024
    • (2024)Security for Machine Learning-based Software Systems: A Survey of Threats, Practices, and ChallengesACM Computing Surveys10.1145/363853156:6(1-38)Online publication date: 23-Feb-2024
    • (2024)Dimensions of data sparseness and their effect on supply chain visibilityComputers and Industrial Engineering10.1016/j.cie.2024.110108191:COnline publication date: 18-Jul-2024
    • (2024)Bayesian Decision-Making Process Including Structural Health Monitoring Data Quality for Bridge ManagementKSCE Journal of Civil Engineering10.1007/s12205-024-0030-y28:7(2818-2835)Online publication date: 18-Apr-2024
    • (2024)DQD: The Data Quality Definition OntologyMetadata and Semantic Research10.1007/978-3-031-65990-4_27(291-297)Online publication date: 31-Jul-2024
    • (2024)Operational Collective Intelligence of Humans and MachinesHuman Interface and the Management of Information10.1007/978-3-031-60125-5_20(296-308)Online publication date: 29-Jun-2024
    • (2023)DQSOps: Data Quality Scoring Operations Framework for Data-Driven ApplicationsProceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering10.1145/3593434.3593445(32-41)Online publication date: 14-Jun-2023
    • (2023)A Method to Classify Data Quality for Decision Making Under UncertaintyJournal of Data and Information Quality10.1145/359253415:2(1-27)Online publication date: 21-Apr-2023
    • (2023)Dissecting American Fuzzy Lop: A FuzzBench EvaluationACM Transactions on Software Engineering and Methodology10.1145/358059632:2(1-26)Online publication date: 20-Jan-2023
    • (2023)Handling Bias in Toxic Speech Detection: A SurveyACM Computing Surveys10.1145/358049455:13s(1-32)Online publication date: 13-Jul-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media