REPORT
A systemic biologic model for healthcare data quality
Hamid Moghaddasi and Forough Rahimi
Poor documentation of healthcare data, including the
recording of inaccurate and false information, is a concern for
health data collections at all levels, whether large hospitals
or small clinics (WHO 2003). The National Committee on
Vital and Health Statistics (Meyer 2000; Center for Health
Information Quality 2002) stated that high quality healthcare
depends on access to accurate and comprehensive medical
records. Such information is essential for effective diagnosis
and treatment, measurement and improvement of healthcare quality, advancement of public health, improvement of
healthcare productivity, and facilitation of cost reimbursement.
Rigby et al. (1998) commented on the beneicial effects of high
quality information on healthcare quality, arguing that data
quality is at the very ‘heart’ of medicine. Providing high quality
healthcare, using available resources eficiently, and providing
ongoing high quality services that satisfy the needs of society
can be achieved only through effective communication and
public responsibility to resolve patients’ problems. Without
reliable information, high quality healthcare is dificult to
achieve (Shortliffe & Cimino 2014).
Correct, timely and accessible healthcare data play a
signiicant role in developmental planning and support of
healthcare services. It is essential to maximise quality and
timely distribution of data to facilitate provision of health care
to individuals at an optimal level (irrespective of the level at
which services are provided). Data quality is also important in
monitoring performance of healthcare institutions and staff.
Data collected and utilised should be accurate, complete,
reliable, and accessible to authorised users, including doctors,
healthcare teams, healthcare institutions, legal authorities, and
state, province, and national governmental healthcare authorities. WHO (2003) announced the importance of data quality,
with accurate and reliable healthcare data required for the
following cases:
continuing future patient health care at all levels of
healthcare
medico-legal purposes for patients, physicians and
healthcare institutions
providing correct accurate and reliable information about
diseases treated and surgical procedures performed in a
hospital and in the society together with immunisation
and screening programs, including the number and type of
participants
clinical and healthcare services research and healthcare
intervention outcomes
complete, correct, accurate and reliable statistical data on
the use of healthcare services in the society
training healthcare professionals
28
designing employment requirements and planning
healthcare services.
Literature
The healthcare environment is dependent on computerised
information systems and the amount of collected, used and
stored data is increasing. Computerised systems and related
databases are faster and larger than a decade ago. Systems
are more operational and complex, providing access to data
directly and immediately. However, larger, faster and more
complex systems are not necessarily better. Remember the
classic example about computer systems: if it’s garbage in, then
it’s garbage out. Entering low quality data into a system that
is getting larger and more complex will only generate more
garbage, which will then be distributed to a wide range of
users. Data with inherent quality are those that are comprehensive, current, relevant, accurate, timely, and appropriate.
This raises the question: ‘What are the characteristics of data
quality in a healthcare environment?’
The Institute of Medicine (IOM) in the US has reported
results of numerous studies investigating different aspects of
data quality that indicate poor documentation of patients’
medical records. The importance of the quality of collected,
stored, and processed data intensiies when considerable
amounts of erroneous data are entered into information
systems that are rapidly becoming faster, larger and more
complex. Fundamental questions are:
Can we protect our healthcare databases from the impact
of low-quality data?
Can organisations estimate their costs based on
incomplete, inconsistent, or unreliable data?
Can care providers make appropriate clinical decisions if
patients’ records are incomplete, unhelpful, and unreliable
or lack necessary data?
Can public health institutions carry out their projects,
identify and prevent diseases without current
comprehensive, accurate, and relevant data?
Can we monitor the quality and clinical care outcomes
without data repositories that are necessary, current,
appropriate, accurate, complete, reliable, comprehensive,
and relevant?
High quality data lead to commercial or strategic success
in the marketplace. It is essential that healthcare organisations
strive to maintain high quality data and develop processes,
policies, and guidelines to protect the value of the data (Johns
2002; Davis & Lacour 2002). According to Johns, characteristics of data quality are often explained in terms of relevancy,
completeness, accuracy, precision and accessibility. More often,
HIM-INTERCHANGE Vol 6 No 1 2016 ISSN 1838-8620 (PRINT) ISSN 1838-8639 (ONLINE)
‘integrity’ is used as a generic term to cover all characteristics
of data quality. One of the most complete views about aspects
of data quality was presented by Redman (1992, cited in Johns
2002). Redman’s model was based on assessing characteristics
of the end users’ sub-schema or conceptual views of the data.
In other words, data quality (as with other types of quality), is
an entity that is deined by incorporating the conceptual view
of end users of the data.
Discussion
Characteristics or attributes of data quality used in this article
correspond with views of experts and accredited organisations that analyse their meaning and concept.
Accuracy
speciied timeframe. So, in this sense one of the meanings of
‘timeliness’ is ‘currency’.
Completeness
Completeness can be interpreted as ‘the existence of all
the necessary data’ (WHO 2003), and each mandatory
data element within a data set should be completed, even if
data entry is delayed because of (for example) unforeseen
emergency circumstances. The data element is a guide term
that is explained by a clear, obvious, and standard words
and it is data absorbent. Words such as ‘name’, ‘sex’, ‘age’,
‘description of operation’, ‘cause of death’, and ‘main diagnosis’
are all data elements, each of which absorbs the related
data (Moghaddasi 2009). Johns (2002) and Abdelhak (2010)
explained the necessity of recording all essential data through
the expression of ‘comprehensiveness’. They did not consider
‘completeness’ as the characteristic of data quality; rather,
it was a prerequisite for data quality. Johns suggested that
necessary data should be collected based on a speciic domain
or scope that is referred to as ‘data set’.
Data should be accurate. If a patient’s gender is male, it must
be documented as male in his record. If the patient’s name is
James Russell, the same name must be written in full in the
record. Data in any format must accurately relect what really
happened (Johns 2002). Clearly (2001, cited in Moghaddasi et
al. 2014) recommended involving patients in conirming the
accuracy of some, especially demographic, data.
Relevancy
According to Ippilito (cited in Orli 1996), accuracy is the
Johns (2002) emphasised the importance of ‘data relevancy’.
degree of agreement between the value of a datum and a
This characteristic indicates that the meaning of the data
source that is assumed to be correct. Ippilito deined it as ‘a
should predicate the implementation of the process or the
qualitative assessment of freedom from error’. Johns (2002)
application for which the data are collected. One of the
rejected Ippilito’s view, arguing that accuracy and the deterprocesses covered at admission is to collect demographic
mination of the degree of accuracy are two distinct concepts.
data of patients so that they can be distinguished from each
To determine the accuracy of a data element its value needs
other. Recording a patient’s name, date of birth, and gender
to be veriied. How can this be achieved in the absence of the
is appropriate, while data such as their leisure activities and
data source? We can only hope
names of their pets are irrelevant
that accuracy of data will increase
. Once the conceptual view of the
If there is not a signiicant relationship
through implementing database
user has been developed, all data
between the data and the process or
management techniques, such as
should be distinguished as either
application for which the data are collected, relevant or unnecessary. Abdelhak
checking for consistency in the
data and limitations of the values
the data will not be useful, usable, or it for (2010) emphasised the need for a
of the ield.
signiicant relationship between data
purpose
and the process or application for
which these data are collected. The
Timeliness
International Standards Organization (ISO) (2008) identiied
The need for timely data depends on context, that is, the
this as being ‘it for purpose’. If there is not a signiicant relasituation and the conceptual view of the user (Johns 2002).
tionship between the data and the process or application for
In the ICU of a hospital, timeliness of data is so important
which the data are collected, the data will not be useful, usable,
that we may need to collect and interpret data every second,
or it for purpose.
whereas data used in a patient’s regular physical examination
may not be as time-sensitive. However, timely registration of
these data still plays an important role in effective treatment
Consistency
of patients (Clearly 2001, cited in Moghaddasi et al. 2014),
When different entities have common or similar attributes,
and entering test results into the computer, registration of
it is expected that the value of the attributes will also be
diagnosis and reporting surgery on time generates usable
identical. The number of a patient’s medical record must
information. Clearly also recommended that deadlines for data
be the same in all reports during a care episode. A type of
entry be based on national consensus and not be subjective,
inconsistency occurs when two related, but not identical,
while also acknowledging that recording of data should not
information entities do not match. For example, ‘hysterectomy’
delay urgent treatment of a patient. WHO (2003) emphasised
and ‘gender’ are related terms and this surgery is relevant only
that information, especially clinical information, and treatment
to women. There is inconsistency if it is recorded that hysteror results of treatment, should be documented immediately
ectomy is performed on a male patient.
as delays result in omissions and errors. Ippilito (cited in
Ippilito (cited in Orli 1996) maintained that consistency
Orli 1996) deined this characteristic of data quality as the
exists when data do not contradict each other. Consistency
extent to which data elements can be made available within a
can be viewed as internal correspondence between individual
HIM-INTERCHANGE Vol 6 No 1 2016 ISSN 1838-8620 (PRINT) ISSN 1838-8639 (ONLINE)
29
REPORT
data and data elements should be
deined so that both current and future
users could understand them
datum elements and the data. As consistency emphasises
internal consistency of an information entity, the existence of
consistency and a logical relationship between elements of
data, it follows that inconsistency must be due to inaccuracy.
It can be argued that there is no difference between the two
characteristics of consistency and accuracy, and the presence
or absence of one means the presence or absence of the
other. The Center for Health Information Quality (2002) also
identiied accuracy, relevancy and clearness as the three main
components of data quality, with consistency considered a
component of accuracy.
Deinition
Abdelhak (2010) believed that data and data elements should
be deined so that both current and future users could
understand them. Each data element should have a clear
meaning and an acceptable value (AHIMA 2009). A concise and
clear deinition of data and data elements facilitates accurate
data collection. Completely clear and illustrative deinitions
for the data and the determination of acceptable values for
them result in universal understanding of the data and data
elements.
Some experts consider the characteristics of ‘uniqueness’
and ‘precision’ to be independent attributes of data quality,
but according to the meaning of these terms, they are actually
components of its deinition. Ippilito (cited in Orli 1996)
considered uniqueness as a key value of data. Johns (2002) also
maintained that acceptable values or range of values should be
determined for each attribute and values should be suficient
to support relevant application or process. This characteristic
of data quality is precision. For example, regarding the thickness
of the needle used in a catheter, the precision or scope of
values range between 16 and 22; and for the amount of insulin
prescribed, the precision or the scope of values range from
1 to 12. Providing data in as much detail as possible helps to
reduce the risk of error.
Abdelhak (2010) conirmed John’s view that acceptable
values be determined for each data element. For instance, the
determined values for gender are male, female, and unknown.
Both Johns (2002) and Abdelhak (2010) considered the more
accurate and detailed explanation of data and their values as
a characteristic of data quality called ‘granularity’, and that
these characteristics and data values should be determined
correctly and precisely. A patient’s body temperature should
be recorded to the precision of a tenth of a degree; there is a
signiicant physiological difference between, say, a temperature
of 101.1°C and 101.9ºC. Hence, values of data such as the
patient’s body temperature should be determined precisely
30
and accurately. As a result, the concept of ‘granularity’ emphasising the more accurate and precise data and their values is
considered as a component of ‘deinition’.
Reliability
This feature emphasises the need to repeat the data collection, processing, storage, and the representation of the data;
consistent results depend on the consistency of the input
data. As discussed earlier, the concept of ‘consistency’ implies
‘accuracy’; and the concept of ‘accuracy’ covers ‘reliability’.
Thus, according to the deinition provided by the organisation,
this feature depends on ‘consistency’ or ‘accuracy’. It should
also be mentioned that according to the above deinition, this
characteristic is a combination of the following attributes:
‘deinition’, ‘completeness’, ‘accuracy’, ‘relevancy’ and ‘timeliness’.
Coverage
The concept of ‘coverage’ was proposed by Clearly (2002)
and according to this characteristic, data should relect
whatever is done by the healthcare institution. Regardless of
Clearly’s speciic deinition based on what he maintained was
the attribute of ‘completeness’, it can be understood that the
characteristic of ‘coverage’ implies the concept of ‘completeness’.
Comparability
The feature of ‘comparability’, also introduced by Canadian
Institute for Health Information (Canadian Institute for Health
Information 2009; AHIMA 2003) is a function of the characteristic of ‘deinition’ of the data. If a datum or a fact is not
meaningful, it cannot be used to compare or justify similarities or differences in entities. The characteristic of ‘deinition’
creates the attribute of ‘precision’ of the data and causes the
data to be meaningful and understandable. Meaningfulness
of data causes the characteristic of ‘comparability’ of data to
appear.
Validity
Davis (2002) believed that data recorded in any format may be
affected by human error. To be useful, data must be valid. The
characteristic of ‘validity’ implies that data are in accordance
with acceptable and expected scope. In the American postal
system «ABCDE» is not a recognised postal code because US
postal codes only include digits.
AHIMA (2009) maintained that data should be in an
agreed format and based on national standards. Clearly
(2002) considered a need for data to be in accordance with
HIM-INTERCHANGE Vol 6 No 1 2016 ISSN 1838-8620 (PRINT) ISSN 1838-8639 (ONLINE)
national standards as the ‘validity of data’. As the characteristic
of ‘deinition’ represents clear and illustrative explanations,
determining an acceptable range of values for data, and data
elements (in an agreed format and based on national or
international standards), it seems that the characteristic of the
‘deinition’ covers ‘validity’.
The ‘security of data’ emphasises secure storage in order
not to harm the data. ‘Conidentiality’, which is the patient’s
right, represents the internal and external disclosure policies
of the organisation. It implies conditional release of information.
Conclusion
Data representation format
Redman (1992, cited in Johns 2002) added this feature to the
set of characteristics based on the conceptual view of the
end user to evaluate data quality. It is a set of rules used to
record data. The value of the datum should be represented in
a speciic format and through a speciic language. As the main
core of data quality, the data representation format includes
several characteristics such as type of language (character,
symbol, picture), the size of character, symbol or picture,
font style, ink, color, rules, margin, horizontal or vertical
spacing. The most important characteristic is legibility; all data,
whether handwritten, typed, or printed must be legible.
According to Johns (2002), ‘representation format’, the
format through which data are represented to the end user,
affects data quality. By format we do not only mean the size
of a computer screen or printed reports; we also mean the
language (symbols) used to transfer the meaning of the data.
Laboratory results usually include characters. Results may also
be highlighted (when data are not in the normal range). An
example of the graphic symbols used in the medical documentation is ♂ (male) and ♀ (female). Using metaphors in the
representation of data value can be useful.
The representation format functions as ‘body’ for the
nature or content of data, without which data cannot be
tangible, as the soul needs a body through which to communicate. Unlike Johns (2002), who argued that representation
format is different from characteristics of data, it can be
argued that without the representation format data are
meaningless and even if we consider these data meaningful
(which is impossible), they cannot represent meaning without
the representation format. Humans use their minds to sort
and process data as mediators or observers, and people often
need to transfer mental images to words, gestures, and even
visualisation in order to communicate. Thus, data are a series
of content and representational formats. Content is like the
soul and the representation format is like its body.
In studies covered in this article, 24 attributes or characteristics from various sources originating in the US, Canada, the
UK, Australia and WHO were proposed for the quality of
healthcare data. Some attributes were found to result from
one or more main features, while others were the result of
combining a number of key features. Some were not related
to factors forming the nature of the data; they were associated
with information management processes and were considered
as the quality of information management.
Redman (1992) introduced a set of features including
accuracy, timeliness, currency, precision, relevancy, consistency,
comprehensiveness, and granularity based on the conceptual
view of the end user, and added data representation format
to the list. Thus it became known as Redman’s model. The
Center of Health Information Quality (2002) also introduced a model consisting of three core elements, including:
accuracy, relevancy, and clearness, each of which has its own
component(s):
accurate (consistent, continuity, current, reliable)
Ambiguity in concepts
It is worth mentioning that accessibility, security and conidentiality, introduced by some organisations and individuals as
characteristics of data quality, are not deinitively related to
factors forming the nature of data and cannot be considered
as characteristics of data quality itself. These characteristics
are related to the process of storage, retrieval, and distribution of information and are characteristics of the quality of
information management.
In discussing ‘accessibility’, Abdelhak (2010) maintained that
all data should be easily accessible and usable (for all clinical,
administrative, and organisational purposes) and collecting
these data should be legal. If the data are not available, the
value of collecting and documenting them correctly disappears.
Figure 1: A systemic biologic model of data quality
HIM-INTERCHANGE Vol 6 No 1 2016 ISSN 1838-8620 (PRINT) ISSN 1838-8639 (ONLINE)
31
REPORT
relevant (accessible, appropriate, patient involvement)
clear (appearance of text, presentation, content).
In the present study the proposed model, analogous to a
biological entity (such as humans), is that data are entities that
consist of two main components: content (nature-soul), and
representation format, (corpus-body) (Figure 1 refers). The
characteristics of data quality are determined based on factors
forming each of these two components. The foundation of
this model is based on studies conducted by various organisations and individuals. In addition to the ive characteristics of
accuracy, completeness, timeliness, relevance, and deinition,
which are the result of the analysis of the 24 characteristics in
the present study and are related to the content of the data, it
is also necessary to add the characteristic of ‘logical linkage’ to
each of the two main components of data quality (content and
format). By ‘logical linkage’ we mean that there is a strong and
logical relationship between the two main components of data
quality (content and data representation format), a strong and
logical relationship among each of the relevant characteristics
of the two components (if needed), and a strong and logical
relationship between the characteristics forming each of
them. Since the system is a set of components connected and
coordinated to achieve a speciic purpose or purposes, the
characteristic of ‘logical linkage’ in the set of characteristics of
data quality completes the existence of data as a system and
improves the data quality as a system. In terms of completeness of data, there should be a relationship and a logical
sequence among the necessary data. If we look at a set of the
characteristics of deinition, accuracy, completeness, relevancy,
and timeliness generally, the strong and logical relationship
and coordination among them must be clearly elicited. The
absence of this feature shows that the components of the data
are torn and the characteristics of data quality are detached
from each other.
Meyer, C. (2000).Uniform data standards for patient medical records
information. Available at: http://www.ncvhs.hhs.gov/wp-content/
uploads/2014/08/hipaa000706.pdf (accessed 27 Nov 2015).
Moghaddasi, H., Rabiei, R. and Sadeghi, S. (2014). Improving the quality
of clinical coding: a comprehensive audit model. Journal of Health
Management and Informatics 1(2): http://jhmi.sums.ac.ir/index.php/
JHMI/article/view/16.
Moghaddasi, H. (2009). Health data processing. Tehran,Vajehpardaz.
Orli, J.R. (1996). Data quality methods. Available at: http://kismeta.com/
cleand1.html (accessd 27 Nov 2015).
Rigby, M., Roberts, R., Williams, J., Clark, J., Savill, A., Lervy, B. and Mooney,
G. (1998). Integrated record keeping as an essential aspect of a primary
care led health service. British Medical Journal 317(7158): 579–582.
Shortliffe, H.E. and Cimino, J.J. (2014). Biomedical informatics. USA, SpringerVerlag.
World Health Organization (2003). Improving data quality: a guide for
developing countries. Available at: http://www.phinnetwork.org/
portals/0/improving_data_quality.pdf (accessed 27 Nov 2015).
References
Abdelhak, M. (2010). Health information: management of a strategic resource.
USA, W.B. Saunders Company.
American Health Information Management Association (AHIMA) (2009).
Practice brief: data quality management model. Available at: http://
www.umass.edu/eei/2009Workshop/pdfs/Data%20Quality%20
Management%20Model.pdf (accessed 27 Nov 2015).
American Health Information Management Association (AHIMA) (2003).
Management and improving data quality. Journal of AHIMA 74(7): 64
A-C.
Canadian Institute for Health Information (CIHI). (2009). The CIHI Data
Quality Framework, (Ottawa, Ontario.: CIHI.
Center for Health Information Quality (CHIQ). (2002).Guidelines for
health information quality. Available on: www.hfht.org/chiq.
Clearly, S. (2002). The NISHA Hospitals NHS Trust; Data Quality Policy.
(Available from the authors upon request).
Davis, N. & Lacour M. (2002). Introduction to health information technology.
USA, W.B. Saunders Company.
International Standards Organization (ISO) (2008). ISO 9001:2008. Quality
management systems requirements. Available at: http://www.iso.org/iso/
catalogue_detail?csnumber=46486 (accessed 27 Nov 2015).
Johns, L.M. (2002). Information management for health professions. USA,
Delmar Publishers.
32
Corresponding author:
Hamid Moghaddasi, PhD
Associate Professor of Health information Management and
Medical Informatics
Head of Health information Management and Medical
Informatics Department
College of Paramedical Sciences
Post Code 19395
Shahid Beheshti University of Medical Sciences
Darband Street, Ghods Square (Tajrish),Tehran, Iran
Moghaddasi@sbmu.ac.ir
Forough Rahimi, PhD
Assistant Professor, College of Paramedical Sciences
Shahid Beheshti University of Medical Sciences
Tehran, Iran
frahimi@sbmu.ac.ir
HIM-INTERCHANGE Vol 6 No 1 2016 ISSN 1838-8620 (PRINT) ISSN 1838-8639 (ONLINE)