Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Westgard Control Quality

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Clinica Chimica Acta 346 (2004) 3 11

www.elsevier.com/locate/clinchim

The truth about quality: medical usefulness and analytical


reliability of laboratory tests
James O. Westgard a,*, Teresa Darcy b
a

Department of Pathology and Laboratory Medicine, University of Wisconsin Medical School,


1300 University Avenue, Madison, WI 53706, USA
b
Department of Pathology and Laboratory Medicine, University of Wisconsin Hospital and Clinics,
600 Highland Avenue, Madison, WI 53792, USA
Received 18 December 2003; accepted 23 December 2003

Abstract
Background: In this age of evidence-based medicine, nothing is more important than the quality of laboratory tests. It is
commonly thought that laboratory tests provide two-thirds to three-fourths of the information used for making medical
decisions. If so, test results had better tell the truth about what is happening with our patients. Methods: The age-old truth
standard for the quality of evidence describes three dimensions that are importanta test should tell the truth, the whole truth,
and nothing but the truth. This three-dimensional model can be used to characterize the clinical and analytical reliability of
laboratory tests and guide the translation of outcome criteria, or quality goals, into practical specifications for method
performance. Results: Clinical reliability, or medical usefulness, should assess the correctness of patient classifications based on
stated test interpretation guidelines, taking into account the precision and accuracy of the laboratory method, and allowing for
the known within-subject biologic variation and the QC needed to detect method instability. Analytical reliability should assess
the correctness of a test result based on a stated error limit, taking into account the precision and accuracy of the method and
allowing for the QC necessary to detect method instability. These assessments challenge the reliability of current tests for
cholesterol, glucose, and glycated hemoglobin in the implementation of U.S. national clinical guidelines. Conclusions:
Evidence-based medicine must employ scientific methodology for translating test interpretation guidelines into practical, benchlevel, operating specifications for the imprecision and inaccuracy allowable for a method and the QC necessary to detect method
instability.
D 2004 Elsevier B.V. All rights reserved.
Keywords: Clinical and analytical reliability; Quality control; Laboratory tests

1. Introduction
There is a long history of discussions of quality
requirements in laboratory medicine, beginning in the
* Corresponding author. Tel.: +1-608-263-9976; fax: +1-608262-9520.
E-mail address: jowestgard@med.wisc.edu (J.O. Westgard).
0009-8981/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.cccn.2003.12.034

1960s with the work of Barnett [1] and Tonks [2],


evolving with the work on biologic variability by
Harris [3,4] in the 1970s and Fraser [5,6] in the
1980s, and culminating in the 1990s with a global
consensus on a hierarchy of quality goals and specifications [7]. In spite of all this discussion, there
remains a serious gap between theory and practice.
Most service laboratories today still do not utilize

J.O. Westgard, T. Darcy / Clinica Chimica Acta 346 (2004) 311

objective quality goals to plan and implement their


testing processes. More seriously, recommendations
emerging from evidence-based medicine do not properly translate test interpretation guidelines into specifications for precision, accuracy, and quality control.
Thus, national testing guidelines and recommendations for test interpretation may assume a level of
quality that cannot be guaranteed in healthcare laboratories today.
In this age of evidence-based medicine, the quality
of evidence must be carefully examined [8]. The ageold standard for the quality of evidencetell the
truth, the whole truth, and nothing but the truth
provides an important insight that has been missing in
evidence-based medicine. The standard emphasizes
three dimensions or characteristics of truth. Evidencebased medicine typically focuses only on the first and
second dimensions and neglects the third dimension
of nothing but the truth.
For example, in evaluating the medical usefulness
of a new diagnostic test, the truth standard suggests
the following characteristics would be important:


First, a relationship to the disease of interest;


Second, a reliable measurement process that
provides correct test results; and
 Third, the meaning of the test result should not be
confounded by other factors such as non-specificity due to other diseases, changes expected due
individual biologic variation, changes due to
method instability, etc.


In this context, one might wonder about the suitability of a test like high sensitivity C-reactive protein
(hs-CRP) for cardiac screening, as recently recommended in the U.S. by the Centers for Disease Control
(CDC) and the American Heart Association (AHA)
[9]. There is an inherent problem with false positives
because hs-CRP is part of the innate immune response [10] to a wide variety of infectious, inflammatory, and necrotic disease processes. While the first
dimension of truth is satisfied by the epidemiologic
studies in the literature and the second dimension is
satisfied by the availability of high sensitivity measurement procedures, the third dimension reveals that
the test may not be reliable because of variation due to
known confounding factors. Those factors identified
in a recent review [10] include the following: non-

specificity of disease, within-subject biologic variation, smoking, obesity, alcohol consumption, statin
therapy, hormone replacement therapy, specimen
collection effects due to non-fasting and anticoagulants, differences between analytical methods because of the need for better standardization, need
for better analytical precision, and the need for better
quality assessment schemes to monitor and control
method performance in the field. These factors all
relate to the nothing but the truth dimension of
quality that is critical in preventing wrong results
and wrongful treatment of patients.
The purpose of this paper is to demonstrate the
importance of this third dimension of truth in translating outcome criteria for laboratory tests into practical specifications for laboratory methods. Common
confounding factors, such as within-subject biologic
variability and the QC needed to detect medically
important method instability, must always be considered in establishing the specifications for routine
operation of a method. In effect, a margin of safety
is necessary to allow for possible effects of known
variables that will otherwise cause a hazardous
situation for the patient [11]. Patient safety is a
major concern in healthcare today [12], therefore
laboratories need to assure that the test results
reported are not hazardous to the health of their
patients. The proper definition and implementation of
clinical and analytical quality requirements is critical
in that endeavor.

2. Materials and methods


Outcome criteria (also called quality goals, quality
requirements) must be translated into operating specifications in order to manage a process [13]. These
bench-level operating specifications should describe
the precision and accuracy that are allowable for a
method and the QC that is necessary to detect method
instability that would otherwise cause medically important errors in the test results.

3. Computer tools
A requirement for analytical quality, stated in
the form of an allowable total error (TEa), can be

J.O. Westgard, T. Darcy / Clinica Chimica Acta 346 (2004) 311

allocated over the analytical components or factors,


such as precision, accuracy, and quality control. A
requirement for clinical quality, stated in the form
or a medical decision interval (Dint) or gray
zone between two different actions based on a
test result, can be allocated over both analytical
and pre-analytical factors, which include sampling
variables such as the patients own biologic variation. These allocations can be carried out using
quality-planning models, which define a mathematical relationship between the factors involved
[14,15]. A graphical representation of the models
can be provided by a chart of operating specifications, or OPSpecs chart [16,17], which can be
prepared with available computer programs ([18,
19], QC Validator or EZ Rules, Westgard QC,
Madison, WI 53717). Power function and/or critical
error graphs can also be prepared to describe the
performance of different QC procedures (rules and
Ns).

Appropriate method specifications for the SD (or


CV) and bias.
 Appropriate safety margin to allow for the known
insensitivity of the statistical control rules and
number of control measurements.

6. QC safety margin
The need for a safety margin for QC is documented
by power curves [20] for the particular control
rules and numbers of control measurements, as shown
in Fig. 1. For QC procedures commonly used in
laboratories today (shown in the key at the right side
of the figure), changes or systematic errors must be as
large as 1.7 3.7 times the method standard deviation
to be detected reliably (with a probability of 0.90, or
90% assurance).

7. Results
4. Analytical model
The analytical reliability of a laboratory test
depends on the following:


Correct result within the allowable total error for


the test,
 Appropriate method specifications for precision
and accuracy; and
 Appropriate safety margin to allow for the
insensitivity of the particular statistical control
rules and number of control measurements needed
to assure detection of any method instability that
would cause a misinterpretation of test results and
false classification of the patient.

5. Clinical model
The clinical reliability and medical usefulness of a
test depends on the following:


Correct classification of a patient based on the


upper and lower limits of a decision interval,
allowing for the known within-subject biological
variation;

Three testscholesterol, glucose, and glycated


hemoglobinwere investigated to assess their
expected analytical and clinical reliability based on
performance specifications and test interpretation
guidelines that are recommended in practice standards
published in the U.S.

8. Truth about cholesterol


Fig. 2 shows a chart of operating specifications
(OPSpecs chart) for a cholesterol test having an
allowable total error (TEa) of 10%, which is the
CLIA criterion for acceptable performance in proficiency testing. The OPSpecs chart shows the relationship between the precision (%CV on the x-axis)
and accuracy (%bias on the y-axis) allowable for a
method and the QC necessary to detect a medically
important systematic error (the different lines for
various rules and Ns identified in the key at the
right side of the chart). For a QC procedure to
reliably detect medically important errors, the line
should have limits for allowable bias and allowable
imprecision above the operating point of the
method. A methods operating point is given by a
y-coordinate that represents the observed bias (in %)

J.O. Westgard, T. Darcy / Clinica Chimica Acta 346 (2004) 311

Fig. 1. Power curves for commonly used QC procedures. Probability of rejecting an analytical run is plotted on the y-axis versus the size of
systematic errors on the x-axis. The different curves (top to bottom) correspond to the different control rules and numbers of control
measurements listed in the key at the right.

and an x-coordinate that represents the observed SD


or CV (in %). Note that a method having the
NCEPs maximum allowable CV of 3.0% and max-

imum allowable bias of 3.0% cannot be adequately


controlled to assure the quality required in proficiency testing.

Fig. 2. Chart of operating specifications for a cholesterol test having an allowable total error (TEa) of 10%. Allowable method bias (in %) is
shown on the y-axis versus allowable method CV (in %) on the x-axis. The different lines define the limits of allowable bias and CV for different
QC procedures, whose rules and numbers of control measurements are listed in the key at the right.

J.O. Westgard, T. Darcy / Clinica Chimica Acta 346 (2004) 311

The OPSpecs chart in Fig. 3 represents a cholesterol test and the NCEP guidelines for test
interpretation and method performance specifications
[21]. The NCEP test interpretation guidelines indicate that a patient cholesterol result of 5.17 mmol/
l (200 mg/dl) or less is okay, but a value of 6.23
mmol/l (240 mg/dl) requires follow-up to identify
the problem and the treatment. The difference between these limits corresponds to a decision interval
(Dint) of approximately 20% [100  (6.23 5.17)/
5.17]. Given that the within-subject biologic CV is
known to be 6.5%, much of the clinical decision
interval is consumed by pre-analytical variation,
leaving approximately 9.3% available for analytical
components, as shown by the y-intercepts of the
lines for the various QC procedures. The operating
point that is shown again represents the NCEP
method performance specifications of a maximum
allowable CV of 3.0% and a maximum allowable
bias of 3.0%. The only appropriate way to QC this
method would be a multirule procedure with six
measurements per run. Given that the U.S. CLIA
regulations only require laboratories to measure two
controls per run, there is no guarantee that a
cholesterol test will have the quality needed to lead
to the correct clinical interpretation.

9. Truth about glucose


New guidelines for the use of laboratory tests in the
diagnosis and treatment of diabetes were published in
2002 [22]. The American Diabetes Association
(ADA) together with the U.S. Health and Human
Services (HSS) have recommended that a normal
fasting glucose should be < 110 mg/dl and that an
individual whose fasting result is >126 mg/dl should
be classified as hyperglycemic, which if confirmed
establishes a diagnosis of diabetes.
A test for glucose must clearly separate a patient
whose homeostatic set point is 126 mg/dl from one
whose set point is 110 mg/dl. This difference from
110 to 126 mg/dl is a decision interval of 14.5%
[(16/110)  100]. The first dimension of truth
requires that a glucose test clearly distinguish between values that would classify a patient as normal
or abnormal, taking into account the known withinsubject biological variation of 6.5% [6]. The second
dimension of truth has to do with the performance
specifications for the method, where the ADA/HSS
have recommended a CV of 2.2% and a bias of 0%
as desirable. The third dimension has to due with
factors that might cause a misclassification of a
patient, such as the minimal QC performed in

Fig. 3. Chart of operating specifications for a cholesterol test having a clinical decision interval of 20%.

J.O. Westgard, T. Darcy / Clinica Chimica Acta 346 (2004) 311

Fig. 4. Chart of operating specifications for a glucose test having a clinical decision interval of 14.5%.

laboratories today. Fig. 4 shows the OPSpecs chart


for a decision interval of 14.5% and a known
within-subject biologic variation of 6.5%. The maximum allowable CV for a glucose method is about
1.1% if method bias were zero. Note that the

operating point for a method with a CV of 2.2% is


actually off the chart, which indicates that QC
procedures having as many as six control per run
will still not be sufficient to guarantee the quality
needed for a glucose test.

Fig. 5. Performance observed for field methods as shown on a chart of operating specifications for a glucose test having a clinical decision
interval of 14.5%.

J.O. Westgard, T. Darcy / Clinica Chimica Acta 346 (2004) 311

Fig. 5 shows data on the performance of routine


glucose methods, as obtained from the New York
State Department of Health proficiency survey in
February 2002 [23]. The actual method CVs and
biases are seen to be much larger than allowable for
methods commonly in use at the time of publication
of the new glucose guidelines. Bias itself ranges
from 1.4% to >3.0%, with the most common estimate being approximately 2.0% for the largest method subgroups. Accuracy is therefore still an issue in
glucose testing. On the basis of the known performance of glucose methods at the time of the publication of the new guidelines, glucose will not be a
reliable test when used with the new ADA/HSS
interpretation guidelines. Method performance and
laboratory QC cannot guarantee that patients will
be classified correctly.

10. Truth about glycated hemoglobin


The best monitor of long-term control of glucose
and the best indicator or patient outcome is glycated
hemoglobin. U.S. national guidelines [22] recommend
that glycated hemoglobin be measured twice a year,
describe a level of 7% or lower as desirable, and

recommend that a value of z 8.0% should be grounds


for re-assessing the patients treatment plan. The first
dimension of truth is that a difference from 7.0% to
8.0% must be distinguishable by the measurement
procedure to correctly classify a patient, allowing for
the known within-subject biological CV of 4.1% [24]
to 5.6% [6]. The second dimension relates to method
performance where the recommendations include a
desirable CV of 3.0% and a maximum allowable CV
of 5.0% [22]. Method bias is assumed to be minimal if
the method is properly calibrated and certified by the
National Glycohemoglobin Standardization Program
(NGSP). The third dimension of truth must consider
the known insensitivity of QC procedures to detect
changes in method performance that would otherwise
cause misclassifications.
Fig. 6 shows the operating specifications for a
glycated hemoglobin testing process designed for a
decision interval of 14.3% (100  1/7) and accounting for within-subject biologic CV of 4.1%. The
desirable method CV is actually 1.9 2.2% if only
two control measurements are used and up to 2.7%
with four control measurements. The recommended
method CV of 3.0% will not assure proper classification of individual patients unless the method is
controlled with a multirule procedure having an N of

Fig. 6. Chart of operating specifications for a glycated hemoglobin test having a clinical decision interval of 14.3%.

10

J.O. Westgard, T. Darcy / Clinica Chimica Acta 346 (2004) 311

6. Note that this assessment assumes a bias of zero,


which is undoubtedly optimistic given the wide
variety of methods available in the field today,
including home-test methods.

11. Discussion
It is extremely unfortunate that clinical guidelines
continue to recommend CVs for analytical imprecision (often unspecified as to type, within-run, day-today, etc.) that bear no resemblance to the objective
criteria related to biological variation and are not
based on medical use of outcomes studies. This
statement by Bruns and Oosterhuis in the year 2003
[8, Chapter 10, p. 201] demonstrates that currently it
is still a major problem to establish quality requirements for laboratory tests and performance specifications for laboratory methods.
We believe one difficulty is the lack of consideration of factors that confound the interpretation of a
test resultthe nothing but the truth dimension of
the truth standard. While existence of these factors is
often acknowledged, the factors are not considered
quantitatively when method specifications are established. At a minimum, it is always important to
account for the within-subject biologic variation,
which is a pre-analytical factor, and the quality control
necessary to detect medically important errors, which
is an analytical factor. We have demonstrated that
these factors can be accounted for by the use of a
graphical tool known as a chart of operating specifications. We recommend that a tool like this be
adopted as part of the scientific methodology for
translating test interpretation guidelines into method
performance specifications.
Many readers will find it surprising that laboratory
methods for common tests such as cholesterol, glucose, and glycated hemoglobin may not be reliable for
current clinical practice guidelines. We are likewise
astonished that the promoters of evidence-based medicine have not come to grips with the fundamentals for
establishing correct specifications for laboratory tests.
It may be that the ease of making measurements with
todays highly automated methods makes people think
that measurements are simple, and by extension, they
must be reliable. Unfortunately, the ease of making
measurements hides the complexity of the measure-

ment process and actually misleads people into thinking that nothing can go wrong. Current regulations
also re-enforce this thinking by allowing personnel
with less and less training to perform more and more
laboratory tests with less and less quality control. In
short, the quality of laboratory tests today is being
assumed, not assured!
We believe that evidence-based medicine is moving in the right direction and will eventually transform
the art of medicine into the science of medicine.
However, there is still a need to make improvements
and to apply scientific principles to fill the gaps where
experimental studies and data are not available or not
sufficient. Price and Christenson [8] point out that the
randomized clinical trial is not the remedy to our
problem with quality specifications. A different methodology is needed, which we suggest requires the use
of mathematical planning models to translate test
interpretation guidelines into bench-level method
specifications for precision, accuracy, and quality
control.

References
[1] Barnett RN. Medical significance of laboratory results. Am J
Clin Pathol 1968;50:671 6.
[2] Tonks DB. A study on the accuracy and precision of clinical
chemistry determinations in 170 Canadian laboratories. Clin
Chem 1963;9:217 33.
[3] Cotlove E, Harris EK, Williams GZ. Biological and analytic
components of variation in long term studies of serum constituents in normal subjects: III. Physiological and medical
implications. Clin Chem 1970;26:1028 32.
[4] Harris EK, Boyd JC. Statistical bases of reference values in
laboratory medicine. New York: Marcel Dekker; 1995.
Chapter 8.
[5] Fraser CG. The application of theoretical goals based on biological variation data in clinical chemistry. Arch Pathol Lab
Med 1988;112:404 15.
[6] Fraser CG. Biological variation: from principles to practice.
Washington (DC): AACC Press; 2001.
[7] Hyltoft Petersen P, Fraser CG, Kallner A, Kenny D. Strategies
to set global analytical quality specifications in laboratory
medicine. Scand J Clin Lab Invest 1999;59(7):475 585.
[8] Price CP, Christenson RH. Evidence-based laboratory medicine: from principles to outcomes. Washington (DC): AACC
Press; 2003.
[9] Pearson TA, Mensah FA, Alexander RW, Anderson JL,
Cannon III RO, Criqui M, et al. Markers of inflammation
and cardiovascular disease: application to clinical and public
health practice; A statement for healthcare professionals

J.O. Westgard, T. Darcy / Clinica Chimica Acta 346 (2004) 311

[10]

[11]

[12]

[13]

[14]

[15]

[16]

from the Centers for Disease Control and Prevention and the
American Heart Association. Circulation 2003;107:499 511.
Ledue TB, Rifai N. Preanalytic and analytic sources of variation in C-reactive protein measurement: implications for cardiovascular disease risk assessment. Clin Chem 2003;49:
1258 71.
Westgard JO. Error budgets for quality management: practical tools for planning and assuring the analytical quality of
laboratory testing processes. Clin Lab Manage Rev 1996;10:
377 403.
Kohn LT, Corrigan JH, Donaldson MS, editors. To err is
human: building a safer health system. Institute of medicine
report. Washington (DC): National Academy Press; 2000.
Westgard JO, Burnett RW, Bowers GN. Quality management
science in clinical chemistry: a dynamic framework for continuous improvement of quality. Clin Chem 1990;36:1712 6.
Westgard JO, Hyltoft Petersen P, Wiebe DA. Laboratory process specifications for assuring quality in the US National
Cholesterol Education Program. Clin Chem 1991;37:656 61.
Westgard JO, Wiebe DA. Cholesterol operational process
specifications for assuring the quality required by CLIA proficiency testing. Clin Chem 1991;37:1938 44.
Westgard JO. Charts of operational process specifications
(OPSpecs Charts) for assessing the precision, accuracy,
and quality control needed to satisfy proficiency testing criteria. Clin Chem 1992;38:1226 33.

11

[17] Westgard JO. Assuring analytical quality through process


planning and quality control. Arch Pathol Lab Med 1992;
116:765 9.
[18] Westgard JO, Stein B, Westgard SA, Kennedy R. QC Valiator
2.0: a computer program for automatic selection of statistical
QC procedures for applications in healthcare laboratories.
Comput Methods Programs Biomed 1997;53:175 86.
[19] Westgard JO, Stein B. Automated selection of statistical quality-control procedures to assure meeting clinical or analytical
quality requirements. Clin Chem 1997;43:400 3.
[20] Westgard JO, Groth T. Power functions for statistical control
rules. Clin Chem 1979;25:863 9.
[21] National Cholesterol Education Program Laboratory Standardization Panel. Current status of blood cholesterol measurements in clinical laboratories in the United States. Clin
Chem 1988;34:193 201.
[22] Sacks DB, Bruns DE, Goldstein DE, Maclaren NK, McDonald JM, Parrott M. Guidelines and recommendations for laboratory analysis in the diagnosis and management of diabetes
mellitus. Clin Chem 2002;48:436 72.
[23] New York Department of Health website: http://www.
wadsworth.org/chemheme/chem/gencc/ptframes.htm.
[24] Lytken Larsen M, Fraser CG, Hyltoft Petersen P. A comparison of analytical goals for haemoglobin Alc assays derived using different strategies. Ann Clin Biochem 1991;28:
272 8.

You might also like