Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Towards a content agnostic computable knowledge repository for data quality assessment

Published: 01 August 2019 Publication History

Hightlights

We identified research gaps in data quality literature towards automating DQA methods.
In this process, we designed, developed and implemented a computable data quality knowledge repository for assessing quality and characterizing data in health data repositories.
In this process, we leveraged service-oriented architecture towards a scalable, reproducible framework in disparate biomedical data sources.

Abstract

Background and objective

In recent years, several data quality conceptual frameworks have been proposed across the Data Quality and Information Quality domains towards assessment of quality of data. These frameworks are diverse, varying from simple lists of concepts to complex ontological and taxonomical representations of data quality concepts. The goal of this study is to design, develop and implement a platform agnostic computable data quality knowledge repository for data quality assessments.

Methods

We identified computable data quality concepts by performing a comprehensive literature review of articles indexed in three major bibliographic data sources. From this corpus, we extracted data quality concepts, their definitions, applicable measures, their computability and identified conceptual relationships. We used these relationships to design and develop a data quality meta-model and implemented it in a quality knowledge repository.

Results

We identified three primitives for programmatically performing data quality assessments: data quality concept, its definition, its measure or rule for data quality assessment, and their associations. We modeled a computable data quality meta-data repository and extended this framework to adapt, store, retrieve and automate assessment of other existing data quality assessment models.

Conclusion

We identified research gaps in data quality literature towards automating data quality assessments methods. In this process, we designed, developed and implemented a computable data quality knowledge repository for assessing quality and characterizing data in health data repositories. We leverage this knowledge repository in a service-oriented architecture to perform scalable and reproducible framework for data quality assessments in disparate biomedical data sources.

References

[1]
I.-G. Todoran, L. Lecornu, A. Khenchaf, J.-M. Le Caillec, A methodology to evaluate important dimensions of information quality in systems, J. Data Inf. Qual. 6 (2015) 11.
[2]
M.G. Kahn, M.A. Raebel, J.M. Glanz, K. Riedlinger, J.F. Steiner, A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research, Med. Care (2012) 50,.
[3]
W.R. Hersh, M.G. Weiner, P.J. Embi, J.R. Logan, P.R.O. Payne, E.V Bernstam, H.P. Lehmann, G. Hripcsak, T.H. Hartzog, J.J. Cimino, Caveats for the use of operational electronic health record data in comparative effectiveness research, Med. Care 51 (2013) S30.
[4]
H.R. Warner, J.D. Morgan, High-density medical data management by computer, Comput. Biomed. Res. 3 (1970) 464–476. https://doi.org/10.1016/0010-4809(70)90008-X.
[5]
R. Gouripeddi, P.B. Warner, P. Mo, J.E. Levin, R. Srivastava, S.S. Shah, D. de Regt, E. Kirkendall, J. Bickel, E.K. Korgenski, Federating clinical data from six pediatric hospitals: process and initial results for microbiology from the PHIS+ Consortium, in: AMIA Annu. Symp. Proc., American Medical Informatics Association, 2012, p. 281.
[6]
R. Gouripeddi, D.N. Schultz, R.L. Bradshaw, P. Mo, R. Butcher, R.K. Madsen, P.B. Warner, B. LaSalle, J.C. Facelli, FURTHeR: an infrastructure for clinical, translational and comparative effectiveness research, (2013). http://knowledge.amia.org/amia-55142-a2013e-1.580047/t-10-1.581994/f-010-1.581995/a-184-1.582011/ap-247-1.582014.
[7]
N.G. Weiskopf, C. Weng, Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research., J. Am. Med. Inform. Assoc. 20 (2013) 144–151,.
[8]
A.L. Nobles, K. Vilankar, H. Wu, L.E. Barnes, Evaluation of data quality of multisite electronic health record data for secondary analysis, in: Big Data (Big Data), 2015 IEEE Int. Conf., IEEE, 2015, pp. 2612–2620.
[10]
T.J. Callahan, J.G. Barnard, L.J. Helmkamp, J.A. Maertens, M.G. Kahn, Reporting data quality assessment results: identifying individual and organizational barriers and solutions, eGEMs 5 (2017) 16,. (Generating Evid. Methods to Improv. Patient Outcomes).
[11]
M.G. Kahn, J.S. Brown, A.T. Chun, B.N. Davidson, D. Meeker, P.B. Ryan, L.M. Schilling, N.G. Weiskopf, A.E. Williams, M.N. Zozus, Transparent reporting of data quality in distributed data networks, EGEMS 3 (2015) 1052,. (Washington, DC).
[12]
A.P. Chapman, A. Rosenthal, L. Seligman, The challenge of ``Quick and Dirty'' information quality, J. Data Inf. Qual. 7 (2016) 1–4,.
[13]
C. Batini, M. Scannapieco, Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications), Springer-Verlag New York, Inc., 2006.
[14]
R.Y. Wang, D.M. Strong, Beyond accuracy: what data quality means to data consumers, J. Manag. Inf. Syst. 12 (1996) 5–33. http://dl.acm.org/citation.cfm?id=1189570.1189572 (accessed January 17, 2014).
[15]
O. Almutiry, G. Wills, R. Crowder, A dimension-oriented taxonomy of data quality problems in electronic health records, IADIS Int. J.WWW/Internet. 13 (n.d.) 98–114. http://eprints.soton.ac.uk/384258/.
[16]
S.G. Johnson, S. Speedie, G. Simon, V. Kumar, B.L. Westra, A data quality ontology for the secondary use of EHR data, in: AMIA Annu. Symp. Proc., 2015, 2015, pp. 1937–1946. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765682/.
[17]
M.G. Kahn, T.J. Callahan, J. Barnard, A.E. Bauck, J. Brown, B.N. Davidson, H. Estiri, C. Goerg, E. Holve, S.G. Johnson, A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data, EGEMS (2016) 4.
[18]
N. Sundar Rajan, R. Gouripeddi, J.C. Facelli, A service oriented framework to assess the quality of electronic health data for clinical research, in: Healthc. Informatics (ICHI), 2013 IEEE Int. Conf., IEEE, 2013, p. 482.
[19]
A. Liberati, D.G. Altman, J. Tetzlaff, C. Mulrow, P.C. Gøtzsche, J.P.A. Ioannidis, M. Clarke, P.J. Devereaux, J. Kleijnen, D. Moher, The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration, PLoS Med. 6 (2009).
[20]
ISO/IEC, Systems and software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE), (2011). http://www.iso.org/iso/catalogue_detail.htm?csnumber=35733.
[21]
ISO/IEC, Software Engineering - Software Product Quality Requirements and Evaluation (SQuaRE)-Data Quality Model, Ginebra Int. Organ. Standarization, 2008, http://www.iso.org/iso/catalogue_detail.htm?csnumber=35736.
[22]
L.L. Pipino, Y.W. Lee, R.Y. Wang, Data quality assessment, Commun. ACM 45 (2002) 211–218.
[23]
C. Batini, C. Cappiello, C. Francalanci, A. Maurino, Methodologies for data quality assessment and improvement, ACM Comput. Surv. 41 (2009) 1–52,.
[24]
ISO, ISO 15836:2009, Information and documentation – the Dublin Core metadata element set, (2009). https://www.iso.org/standard/52142.html(accessed April 29, 2018).
[25]
R.L. Bradshaw, S. Matney, O.E. Livne, B.E. Bray, J.A Mitchell, S.P. Narus, Architecture of a federated query engine for heterogeneous resources., in: AMIA Annu. Symp. Proc., 2009, 2009, pp. 70–74. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2815441&tool=pmcentrez&rendertype=abstract.
[26]
P. Mo, R.L. Bradshaw, R. Butcher, R. Gouripeddi, P.B. Warner, R.K. Madsen, B. LaSalle, C. Julio Facelli, N.D. Schultz, Real-Time Federated Data Translations using Metadata-driven XQuery, 2014, AMIA CRI Spring, 2014, http://knowledge.amia.org/amia-56636-cri2014-1.977698/t-004-1.978136/a-089-1.978209/a-089-1.978210/ap-085-1.978211.
[27]
R.L. Bradshaw, C.J. Staes, G.D. Fiol, S.P. Narus, J.A. Mitchell, Going FURTHeR with the metadata repository, in: Annu. Symp. Proc. Am. Med. Informatics Assoc., 2012.
[28]
P. Chen, Entity-relationship modeling: historical events, future trends, and lessons learned, Softw. Pioneers, Springer, 2002, pp. 296–310.
[29]
W.R. Hogan, M.M. Wagner, Accuracy of data in computer-based patient records, J. Am. Med. Inform. Assoc.4 (n.d.) 342–355. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=61252&tool=pmcentrez&rendertype=abstract.
[30]
J. Brooke, SUS: a “Quick and Dirty” usability scale, in: P.W. Jordan, B. Thomas, B.A. Weerdmeester (Eds.), Usability Evaluation in Industry, McClelland, AL, 1996.
[31]
N. Sundar Rajan, R. Gouripeddi, J.C. Facelli, Measuring validity of phenotyping algorithms across disparate data using a data quality assessment framework, in: 3rd Work. Data Min. Med. Informatics Learn. Heal. AMIA Annu. Symp., American Medical Informatics Association, 2016, http://www.dmmh.org/dmmi16.
[32]
J. Merino, I. Caballero, B. Rivas, M. Serrano, M. Piattini, A data quality in use model for big data, Futur. Gener. Comput. Syst. (2015).
[33]
R. Gouripeddi, An informatics architecture for an exposome, in: Second. Use Data Res. (Interactive Learn. Annu. Symp. Proc. Am. Med. Informatics Assoc. 2016 Jt. Summits Transl. Sci., 2016.
[34]
O. Dziadkowiec, T. Callahan, M. Ozkaynak, B. Reeder, J. Welton, Using a data quality framework to clean data extracted from the electronic health record: a case study, eGEMs (2016) 4.
[35]
I. 11179-4:2004, Information technology - metadata registries (MDR) – part 4: formulation of data definitions, (2004).
[36]
W3C, RDF 1.1 Concepts and Abstract Syntax, (2014). https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/.
[37]
XML Metadata Interchange Specification Version 2.5.1, (n.d.).https://www.omg.org/spec/XMI/About-XMI/(accessed April 29, 2018).

Cited By

View all
  • (2023)Five-dimensional evaluation system and perceptron intelligent computing performance measurement methods based on medical heterogeneous equipment health dataNeural Computing and Applications10.1007/s00521-023-08316-335:35(24651-24664)Online publication date: 1-Dec-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computer Methods and Programs in Biomedicine
Computer Methods and Programs in Biomedicine  Volume 177, Issue C
Aug 2019
285 pages

Publisher

Elsevier North-Holland, Inc.

United States

Publication History

Published: 01 August 2019

Author Tags

  1. Data Quality Metadata Repository
  2. Knowledge representation
  3. Data quality assessment
  4. Data quality dimensions
  5. Data quality framework

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Five-dimensional evaluation system and perceptron intelligent computing performance measurement methods based on medical heterogeneous equipment health dataNeural Computing and Applications10.1007/s00521-023-08316-335:35(24651-24664)Online publication date: 1-Dec-2023

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media