Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
136 views

Reference Vs Consensus Values

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views

Reference Vs Consensus Values

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Accreditation and Quality Assurance

https://doi.org/10.1007/s00769-019-01423-6

GENERAL PAPER

Reference versus consensus values in proficiency testing of clinical


chemistry: a statistical comparison based on laboratories results
in Colombia
Clara Morales1 · Ramón Giraldo2 

Received: 13 October 2019 / Accepted: 26 December 2019


© Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract
Proficiency testing or external quality control provides additional means to ensure the quality of laboratory testing results.
Various methods can be considered in practice to fulfill this objective. The most commonly applied are comparison of labo-
ratory results with reference or consensus values. In this work, we study the concordance between these schemes based on
the review of a large dataset corresponding to clinical chemistry proficiency testing results. The analysis is carried out by
using several statistical methodologies (diagnostic tests, contingency tables, and test of hypothesis). Results indicate that
the conclusions obtained from these schemes can be in some cases (several analytes) markedly different. This is possibly
because some statistical assumptions to apply PT based on consensus values are violated.

Keywords  Consensus value · Proficiency testing · Reference value · Test of hypothesis

Introduction The performance evaluation of laboratories requires


using assigned values, obtained, between others, from con-
Proficiency testing (PT), also known as external quality sensus or certified reference material [1]. PROASECAL [5]
assessment, is essential to ensure the analytical quality of is a Colombian provider of PT that uses reference materi-
laboratories [1]. A PT involves testing the same control als acquired from accredited providers. This company has
material in more than one laboratory for results comparison adopted two schemes for carrying out PT. The results of
[1]. In this context, the control material must be analyzed participants are compared with both reference values (RV),
under the same conditions as patients’ samples to reflect the given by accredited providers, and consensus values (CV)
actual condition of the process [2]. Results of a PT allow calculated by robust methods [6]. In both cases, RV and CV
laboratories assessing their competency and taking cor- are defined according to the methods used by laboratories.
rective actions, whenever required [1]. Hence, it must be The combined use of these schemes is adopted for several
ensured the asigned values are reliable [3]. A false posi- reasons. In Latin America and particularly in Colombia are
tive implies not only an economic impact but also motivates employed a huge variety of equipment and reagents which
the laboratory to make changes that may lead to incorrect makes it difficult to form groups of participants working
reports. On the other hand, a false negative exposes the labo- under the same experimental conditions. It is usual that
ratory to report data that can put patients’ lives at risk [4]. laboratories change equipment and reagents throughout the
year. Additionally, the PT can include laboratories with a
low complexity level that use open technology equipment.
Given the wide range of possible scenarios above men-
* Ramón Giraldo
tioned, it might be risky to make conclusions based exclu-
rgiraldoh@unal.edu.co
sively on CV. Several authors have considered the problem
Clara Morales of comparing results obtained by means of RV and CV. For
cmorales@proasecal.com
example, [7] evaluates the performance of CV in PT pro-
1
PROASECAL SAS, Bogotá, Colombia grams using Monte Carlo simulation techniques. He con-
2
Statistics Department, Universidad Nacional de Colombia, cludes that the deviation of consensus value from the true
Bogotá, Colombia value could be as large as 40 %, depending on the analytes

13
Vol.:(0123456789)
Accreditation and Quality Assurance

under investigation. Koch and Baumeister [4] mention that laboratories for several analytes (Table 2). The number of
the best way to avoid potentially biased assigned values is participants varies from one analyte to another (see the first
to use RV, thus ensuring that the assigned value is close two columns in Table 2). Using the statistics in Table 1, the
to the true value. Also in the same sense, Baldan et al. reported value of each laboratory for a particular analyte is
[8], based on a particular dataset, carried out an economic classified, under each scheme (CV and RV), as satisfactory or
assessment between RV and CV, where it is concluded unsatisfactory. For example, using the Z score criterion, the
that the use of CV does not necessarily reduce the costs result of a particular laboratory ( xi ) is unsatisfactory when |Zi |
of a PT and that the quality assessment of laboratories is is greater than 3 [10]. The rules for establishing whether the
frequently better when RV is used. value reported by a laboratory is satisfactory or unsatisfactory
In this work, we make a comparison of clinical chemis- when the statistics D, Drel and PA are used can be reviewed
try PT results, based on CV and RV. We use a large dataset in [7]. From the values reported by laboratories in various
collected by PROASECAL [5] to carry out the analysis. rounds, we can build a table with concordances and discord-
The main objective is to quantify and characterize, through ances counts between the two schemes. The possible results
the use of several statistical methodologies, the possible of these counts are sketched in Fig. 1, where the values a, b, c
discrepancies between these schemes for the particular and d correspond to the following count of events:
case of laboratories in Colombia.
The article is organized as follows. In “Data and meth- 1. Number of true positives (a): Number of laboratories
odology” section is described the dataset studied and with unsatisfactory performance according to both
the methodology used for carrying out the analysis. In schemes.
“Results and discussion” section a discussion based on 2. Number of false positives (b): Number of laboratories
the results is given. The paper ends with a “Conclusions” with satisfactory performance according to RV and
section. unsatisfactory result under CV.
3. Number of false negatives (c): Number of laboratories
with unsatisfactory performance according to RV and
Data and methodology satisfactory results under CV.
4. Number of true negatives (d): Laboratories with satisfac-
Performance assessment of laboratories in PT is generally tory performance according to both schemes.
based on the observation of four statistics (all of these leading
to the same conclusion). These are difference (D), percent- In Table 2 are shown the counts before described for
age difference (Drel), percentage of allowed deviation ( PA ) each one of the analytes considered. These datasets can be
and Z score [9]. Given a set of reported values, the statistics arranged in 2×2 contingency tables (as described in Fig. 1)
can be calculated based on CV (consensus mean ( x∗ ) and of matched-pair studies [11], where the performance of each
consensus standard deviation ( s∗ )) or RV ( xpt and 𝜎pt ). The laboratory is a dichotomous variable whose results (satisfac-
expression to calculate each one of these statistics is shown tory or unsatisfactory) are evaluated by both CV and RV.
in Table 1. Using these statistics, we compare PT results For example, in the case of uric acid (Table 2), there were
obtained under both criteria (CV and RV) from data collected 13027 reports from laboratories (collected during the period
by PROASECAL [5] over six years (2013–2018). Specifi- of study) and the performance of each one was classified as
cally, we analyze the reported values of a large number of satisfactory or unsatisfactory according to both CV and RV.
From these, 918 (7 % of cases) and 319 (3 % of cases) were
classified, respectively, as true positives and true negatives.
Table 1  Statistics used in proficiency testing for assessing the partici- In the remaining cases, there were differences between the
pants’ performance: Difference (D), percentage difference (Drel), per-
centage of allowed deviation ( PA ) and Z score (Z) tests based on CV and RV (88 % were false positives and 2 %
false negatives). In statistics, there are several tests appro-
Statistic Consensus value Reference value
priated for the treatment of dichotomous variables as men-
Di xi − x∗ xi − xpt tioned above. Among other 𝜒 2 , Fisher and McNemar’s [11]
Di,rel
( ∗)
xi −x
( x −x )
i pt can be used in this scenario. The latter is particularly useful
× 100 × 100
x∗ xpt when two tests are performed on the same group of patients
PA i Di ∕𝛿̂E × 100 Di ∕𝛿E × 100 [11]. The datasets shown in Table 2 have this structure. For
( ∗) ( x −x )
Zi xi −x i pt each analyte, two tests (based on CV and RV) are performed
s∗ 𝜎pt
on the same group of patients (laboratories). With this in
mind, here we use McNemar’s test to assess the dependence
These are calculated based on xi (the value reported by the i-th partic-
ipant), consensus values ( x∗ , s∗ and 𝛿̂E = 3 × s∗ ), and reference values between CV and RV results. The null hypothesis is that, in
( xpt , 𝜎pt , and 𝛿E = 3𝜎pt) the population of laboratories who can be studied with both

13
Accreditation and Quality Assurance

Table 2  For each analyte is Analyte n a b c d P-value


given the number of participants
in all the rounds considered and Uric acid 13027 918 (0.07) 253 (0.02) 319 (0.03) 11033 (0.88) 0.007
the values of a (participants
Albumin 8247 715 (0.09) 102 (0.01) 453 (0.06) 6665 (0.84) 0.000
with unsatisfactory performance
according to both schemes), b Alat 11683 814 (0.07) 214 (0.02) 258 (0.02) 9942 (0.89) 0.048
(participants with satisfactory Asat 11653 787 (0.07) 297 (0.02) 209 (0.02) 9912 (0.88) 0.000
performance according Total bilirubin 11503 905 (0.08) 279 (0.03) 318 (0.03) 9519 (0.86) 0.119
to reference values and
Cholesterol 13432 1360 (0.11) 177 (0.01) 494 (0.04) 10905 (0.84) 0.000
unsatisfactory under consensus
values), c (participants with Creatinine 13887 1029 (0.08) 241 (0.02) 392 (0.03) 11684 (0.88) 0.000
unsatisfactory performance Glucose 14104 1198 (0.09) 248 (0.02) 561 (0.04) 11568 (0.85) 0.000
according to reference Ureic nitrogen 13230 1956 (0.15) 124 (0.01 ) 1038 (0.08) 9667 (0.76) 0.000
values and satisfactory under
Total proteins 8028 795 (0.10) 116 (0.02) 360 (0.05) 6451 (0.84) 0.000
consensus values) and d
(participants with satisfactory Triglycerides 13380 595 (0.05) 646 (0.04) 71 (0.01) 11507 (0.90) 0.000
performance in both cases) Calcium 6191 567 (0.10) 108 (0.02) 202 (0.03) 5041 (0.85) 0.000
CK 6767 210 (0.03) 338 (0.05) 53 (0.01) 5889 (0.91) 0.000
Chlorine 6375 542 (0.09) 74 (0.01) 267 (0.04) 5247 (0.85) 0.000
Alkaline phosphatase 9535 529 (0.06) 281 (0.03) 146 (0.02) 8187 (0.91) 0.000
Phosphorous 5100 327 (0,07) 106 (0.02) 116 (0.02) 4349 (0.91) 0.546
Iron 3024 292 (0.10) 100 (0.03) 88 (0.03) 2398 (0.83) 0.422
LDH 7293 451 (0.06) 203 (0.03) 123 (0.02) 6183 (0.89) 0.000
Magnesium 4621 262 (0.06) 119 (0.03) 101 (0.02) 3933 (0.89) 0.252
Sodium 6508 922 (0.15) 16 (0.00) 469 (0.07) 4876 (0.78) 0.000
Potassium 6521 222 (0.04) 266 (0.04) 43 (0.01) 5702 (0.91) 0.000
HDL cholesterol 8915 525 (0.06) 299 (0.03) 133 (0.02) 7598 (0.89) 0.000
Amylase 7627 294 (0.04) 344 (0.05) 95 (0.01) 6548 (0.90) 0.000
GGT​ 3073 139 (0.05) 51 (0.02) 59 (0.02) 2677 (0.91) 0.505
Direct bilirubin 11173 983 (0.09) 323 (0.03) 447 (0.04) 8970 (0.84) 0.000
Ionized Calcium 1873 74 (0.04) 53 (0.03) 29 (0.02) 1613 (0.91) 0.011

In brackets are the corresponding proportions respect to the total number of laboratories
It is also shown at each case the P-value of a McNemar’s test

schemes (CV and RV), the proportion of them that would a+c
P̂ 1 =
obtain unsatisfactory performance from RV (call it P1 ) is n
the same as the proportion receiving unsatisfactory perfor- P̂ 2 =
a + b
,
mance from CV (call it P2 ); that is, H0 ∶ P1 = P2 versus n
Ha ∶ P1 ≠ P2 [12]. Based on the values defined in Fig. 1, with n the total number of laboratories. Alternatively,
these proportions can be estimated as the hypothesis could also be stated as H0 ∶ 𝜓 = 1 versus
Ha ∶ 𝜓 ≠ 1 , where 𝜓 is the population ratio estimated by

Fig. 1  Possible classifications Reference value


of laboratories in a round of a
PT program according to their Unsasfactory Sasfactory
performance (satisfactory or
unsatisfactory) on both refer-
ence and consensus values. This
correspond to a 2×2 contin- Unsasfactory
gency table of paired data

Consensus
Sasfactory

13
Accreditation and Quality Assurance

b/c (Fig. 1). We conduct a McNemar’s test (for each analyte) Table 3  Proportions 𝜋 , 𝛾  , T + and T − for each analyte
from values a, b, c and d given in Table 2 to test the null Analyte 𝜋 𝛾 T+ T−
hypothesis above described. The P-values of these tests are
shown in the last column of Table 2. Uric acid 0.74 0.98 0.78 0.97
Sensitivity ( 𝜋 ), specificity ( 𝛾 ) and predictive values (posi- Albumin 0.61 0.98 0.88 0.94
tive ( T + ) and negative ( T − )) are commonly used for screening Alat 0.76 0.98 0.79 0.97
and diagnostic tests [13]. These values allow measuring agree- Asat 0.79 0.97 0.73 0.98
ment between the results of a test under evaluation and that of Total bilirubin 0.74 0.97 0.76 0.97
the reference standard [14]. This principle can be adapted to Cholesterol 0.73 0.98 0.88 0.96
the context of PT to assess the agreement between consensus Creatinine 0.72 0.98 0.81 0.97
results and the obtained from reference values. In this scenario, Glucose 0.68 0.98 0.83 0.95
the test under evaluation is the consensus and the reference Ureic nitrogen 0.65 0.99 0.94 0.90
standard is the reference value. A proficiency testing based on Total proteins 0.69 0.98 0.87 0.95
consensus will be better insofar as there be a high degree of Triglycerides 0.89 0.95 0.48 0.99
agreement between their results and those obtained by using Calcium 0.74 0.98 0.84 0.96
the reference values ( xpt and 𝜎pt ). Based on a, b, c and d values CK 0.80 0.95 0.38 0.99
defined in Fig. 1 and shown in Table 2, 𝜋 , 𝛾 , T + , and T − are Chlorine 0.67 0.99 0.88 0.95
estimated, respectively, as Alkaline phosphatase 0.78 0.97 0.65 0.98
Phosphorous 0.74 0.98 0.76 0.97
a
𝜋= , Iron 0.77 0.96 0.74 0.96
a+c
LDH 0.79 0.97 0.69 0.98
d
𝛾= , Magnesium 0.72 0.97 0.69 0.97
b+d
a Sodium 0.66 1.00 0.98 0.91
T+ = , Potassium 0.84 0.96 0.45 0.99
a+b
d HDL cholesterol 0.80 0.96 0.64 0.98
T− = .
c+d Amylase 0.76 0.95 0.46 0.99
GGT​ 0.70 0.98 0.73 0.98
𝜋 is in this scenario the proportion of laboratories with Direct bilirubin 0.69 0.97 0.75 0.95
unsatisfactory performance according to RV who have an Ionized Calcium 0.72 0.97 0.58 0.98
unsatisfactory performance according to CV (proportion of
true positives). On the other hand, 𝛾 is the proportion of
laboratories with satisfactory performance according to RV
which also have satisfactory performance under CV (propor- T+ = 𝜋
tion of true negatives). T + is the proportion of laboratories a a
=
with unsatisfactory performance according to CV who actu- a+b a+c
ally have the unsatisfactory performance according to RV. a(a + c) = a(a + b) (1)
T − is the proportion of laboratories with satisfactory perfor- (a + c) (a + b)
=
mance according to CV and satisfactory performance under n n
RV [15]. The values of these measures (for each analyte) are ̂ ̂
P1 = P2 ,
shown in Table  3. These are used in "Results and discus-
sion" section to contribute to the explanation of differences In Eq.  1, T + and 𝜋 are estimations of the real unknown
detected between CV and RV schemes. proportions.
Note that null hypothesis H0 ∶ P1 = P2 for the McNemar’s Another way of comparing CV and RV schemes can
test can be now stated as H0 ∶ T + = 𝜋 versus Ha ∶ T + ≠ 𝜋 . be done by means of some classical statistical test. The
Using observations a, b, c, d and the definitions above men- probability distribution of the data can have an impact on
tioned, we can show this equivalence: performance assessment [2]. From [16], the distribution
of the results reported by the participating laboratories is
expected to be normal or at least unimodal and reason-
ably symmetric. If the associated probability distribution
is not normal, the assessment by consensus value may be
impaired [17]. It is also reasonable to think that, in addi-
tion to the normality, there should be a similarity between
the mean and standard deviation of consensus ( x∗ and s∗ ,

13
Accreditation and Quality Assurance

respectively) and those given by reference values ( xpt and From Table 2, there are several aspects remarkable. The
𝜎pt , respectively). The mean and standard deviation of con- percentage
( ) of laboratories with satisfactory results according
sensus need to be reliable. When these are not correctly to RV b+d is in almost all cases equal or greater than 85 %
n
estimated, the PT could be considered inconsistent [2]. (ureic nitrogen and sodium are exceptions to this pattern).
All these aspects can be studied statistically through some This result has two positive interpretations. On the one hand,
hypothesis testing procedures. Specifically, a normality it is an empirical indicator that laboratories performances
Lilliefors test [18], a one-sample t test (to compare x∗ with are in general reliable (i.e., Z score lower than 3 in a high
xpt ) and a chi-square test (to compare s∗ with 𝜎pt ) can be proportion of cases), and on the other hand, it shows (taking
used. A review of these tests can be carried out in [12]. In into account that the percentage of false positives is low) that
“Results and discussion” section are shown the results of PT based on CV have a good performance to identify true
these tests (with the data corresponding to each analyte). negatives (d values in Table 2). This point can also be estab-
In all cases, the tests are carried out after the algorithm A lished with 𝛾 values (< 95 % in all cases, Table 3).
[19] has been applied to the data. The other side of the coin is however that false negatives
(c values, Table 2) are relatively high concerning true posi-
tives (a values, Table 2). This can also be evidenced by 𝜋
Results and discussion and T + values. We can observe in Table 3 that these propor-
tions are much lower than those of 𝛾 and T − . In this sense,
McNemar’s tests P-values (Table 2), estimations of 𝜋 , 𝛾  , T + some critical results are given in analytes such as Amylase,
and T − (Table 3) and P-values corresponding to normality Potassium, CK and Triglycerides, where the T + values are
tests, t tests and chi-square tests (Tables 4 and 5) are used even lower than 50 %. These results indicate that, for several
in this Section to describe the relationship between CV and analytes, PT based on CV have deficiencies in estimating
RV. We consider in all cases a significance level 𝛼 = 5 %.

Table 4  P-values obtained Analyte (xpt , x∗) (𝜎pt , s∗) Lilliefors t test 𝜒 2 test
from three tests of hypothesis
(Lilliefors, one-mean t test and Uric acid (5.7, 5.8) (0.4, 0.5) 0.000 0.004 0.000
𝜒 2 test, respectively) based on
Albumin (4.3, 4.3) (0.3, 0.3) 0.028 0.000 0.014
data reported for participants
at one round of a proficiency Alat (36.0, 34.9) (4.0, 3.3) 0.000 0.000 0.000
testing carried out in 2018 Asat (36.0, 37.5) (3.5, 4.5) 0.024 0.000 0.019
Total bilirubin (1.6, 1.6) (0.1, 0.1) 0.540 0.945 0.246
Cholesterol (163.0, 159.6) (10.5, 10.7) 0.011 0.000 0.022
Creatinine (1.4, 1.4) (0.1, 0.1) 0.047 0.653 0.012
Glucose (110.0, 110.2) (8.5, 6.4) 0.057 0.000 0.000
Ureic nitrogen (20.0, 20.3) (2.2, 2.2) 0.000 0.118 0.041
Total proteins (5.8, 5.9) (0.6, 0.2) 0.164 0.000 0.000
Triglycerides (101, 98.5) (8.2, 10.6) 0.000 0.000 0.001
Calcium (9.6, 9.7) (0.5, 0.6) 0.257 0.017 0.056
CK (189, 187.6) (17, 18.4) 0.579 0.479 0.637
Chlorine (97, 97) (2.4 2.4) 0.027 0.371 0.170
Alkaline phosphatase (284 283.7) (21.5, 30.8) 0.149 0.000 0.000
Phosphorous (5.3, 5.3) (0.4, 0.3) 0.027 0.000 0.000
Iron (96.7, 98.5) (8.6, 10.1) 0.534 0.182 0.748
LDH (418.9, 418.9) (46.9, 46.9) 0.563 0.998 0.249
Magnesium (2.2, 2.3) (0.1, 0.2) 0.682 0.088 0.017
Sodium (141, 142) (3.5, 3.3) 0.627 0.004 0.046
Potassium (3.9, 3.9) (0.2, 0.1) 0.176 0.534 0.000
HDL cholesterol (53.7, 52.2) (4.1, 7.5) 0.362 0.041 0.000
Amylase (108, 99.6) (8.0, 11.5) 0.571 0.000 0.000
GGT​ (49.0, 47.7) (3.5, 4.8) 0.374 0.000 0.000
Direct bilirubin (1.1, 1.1) (0.2, 0.2) 0.037 0.993 0.123
Ionized Calcium (1, 1) (0.08, 0.08) 0.392 0.942 0.564

xpt and 𝜎pt correspond to the reference values. It is used a significance value 𝛼 = 5 % in all cases

13
Accreditation and Quality Assurance

Table 5  P-values obtained Analyte (xpt , x∗) (𝜎pt , s∗) Lilliefors t test 𝜒 2 test
from three tests of hypothesis
(Lilliefors, one-mean t test and Uric acid (9.36, 9.52) (0.41, 0.73) 0.124 0.002 0.000
𝜒 2 test, respectively) based on
Albumin (2.98, 3.03) (0.15, 0.25) 0.017 0.033 0.000
data reported for participants
at one round of a proficiency Alat (125, 119) (8.3, 14.0) 0.072 0.000 0.000
testing carried out in 2018 Asat (157, 150.7) (10.67, 14.73) 0.229 0.000 0.000
Total bilirubin (5.28, 5.25) (0.37, 0.59) 0.073 0.684 0.000
Cholesterol (262, 261.14) (11, 14.26) 0.039 0.318 0.004
Creatinine (3.85, 3.87) (0.26, 0.30) 0.019 0.294 0.844
Glucose (272, 268) (13.3, 16.6) 0.209 0.007 0.114
Ureic nitrogen (52.8, 51.8) (2.63, 4.96) 0.176 0.020 0.000
Total proteins (4.78, 4.8) (0.32, 0.36) 0.332 0.656 0.912
Triglycerides (240, 242.9) (13, 14.6) 0.000 0.000 0.837
Calcium (12.6, 12.54) (0.43, 0.6) 0.002 0.350 0.005
CK (541, 527.3) (32.3, 41.3) 0.623 0.003 0.126
Chlorine (115, 115.6) (3.3, 2.8) 0.018 0.094 0.009
Alkaline phosphatase (524, 481) (26, 48) 0.318 0.000 0.000
Phosphorous (7.16, 7.20) (0.35, 0.57) 0.258 0.543 0.003
Iron (199, 201) (12, 14.5) 0.041 0.362 0.495
LDH (721, 699) (36, 79) 0.199 0.045 0.000
Magnesium (4.16, 4.0) (0.16, 0.38) 0.011 0.014 0.000
Sodium (156, 155.4) (2.67, 263) 0.360 0.061 0.185
Potassium (5.99, 6.02) (0.16, 0.18) 0.031 0.190 0.935
HDL cholesterol (97.7, 101.2) (4.8, 10.1) 0.014 0.001 0.000
Amylase (310, 312.5) (15.3, 29) 0.335 0.411 0.000
GGT​ (139, 133) (7, 17.8) 0.082 0.063 0.000
Direct bilirubin (1.64, 1.59) (0.12, 0.32) 0.597 0.279 0.000
Ionized Calcium (1.09, 1.09) (0.01, 0.1) 0.510 0.976 0.153

xpt and 𝜎pt correspond to the reference values. It is used a significance value 𝛼 = 5 % in all cases. The tests
are based on pathological data

true positives. In Table 2, we can see that 21 P-values from percentage of the cases at least one of the assumptions
McNemar’s test are lower than 𝛼 , that is, in an 81 % of cases required to apply PT based on CV is not valid. This may
(analytes) is rejected the hypothesis that the proportion of be one of the reasons why CV results are not satisfactory
participants with unsatisfactory performances according to (for some analytes). Total bilirubin, CK, iron, LDH and ion-
RV ( P1 ) is equal to the proportion of unsatisfactory perfor- ized calcium (normal data) and total proteins, sodium and
mance under CV ( P2 ) (equivalently is rejected the hypoth- ionized calcium (pathological data) are exceptions to the
esis that 𝜋 is equal to T + ). This is another clear indicator general pattern above mentioned (possibly due to the use in
that the scheme based on CV might be, for several analytes, these cases of equipment with cutting edge technology). In
inappropriate to identify true positives. many cases, CV assumptions are violated because, between
In Tables 4 and 5 are given (for each analyte) P-values others, the participants use different batches of calibrators,
corresponding to three tests (Lilliefors, one-sample t and technologies, reagents and standardization protocols. Also
chi-square). Each one of these is carried out with both nor- there are variations in preventive and corrective maintenance
mal (Table 4) and pathological values (Table 5) reported of equipment, and the procedures are highly influenced by
by participants at one round of a PT carried out in 2018. operational staff expertise.
The percentages of P-values (Lilliefors test, t test and 𝜒 2
test, respectively) lower than 𝛼 = 0.05 are 42.3 %, 57.7 %,
69.2 % (normal values) and 38.5 %, 46.2 %, and 65.4 % Conclusions
(pathological values). In 76.9 % (normal values) and 92.3 %
(pathological values) of the cases is rejected at least one The analysis of a large dataset of PT in clinical chemistry,
of the hypotheses. From these results, we can conclude obtained by PROASECAL SAS in Colombia, based on ref-
that in both cases (normal and pathological data) in a high erence values allows us to identify that laboratories have

13
Accreditation and Quality Assurance

in general suitable performances. The percentages of sat- 6. Hund E, Massart DL, Smeyers-Verbeke J (2000) Inter-laboratory
isfactory performance are in some cases (several analytes) studies in analytical chemistry. Anal Chim Acta 423(2):145–165
7. Wong S (2005) Evaluation of the use of consensus values in pro-
even greater than 90 %. However, a comparison of profi- ficiency testing programmes. Accred Qual Assur 10(8):409–414
ciency testing results obtained by consensus and reference 8. Baldan A, van der Veen AM, Prauß D, Recknagel A, Boley N,
values show discrepancies between these approaches when Evans S, Woods D (2001) Economy of proficiency testing: refer-
the laboratories have unsatisfactory results. This is possibly ence versus consensus values. Accred Qual Assur 6(4–5):164–167
9. Wong S (2007) A comparison of performance statistics for profi-
because some statistical assumptions to apply PT based on ciency testing programmes. Accred Qual Assur 12:59–66
consensus values are violated. The statistical analysis carried 10. ISO 13528 (2015) Statistical methods for use in proficiency testing
out indicate that, for several analytes, there are differences by interlaboratory comparison. Standard, International Organiza-
between the mean values ( x∗ and xpt ) and uncertainties meas- tion for Standardization, Geneva
11. Hollander M, Wolfe DA (1999) Nonparametric statistical meth-
ures ( s∗ and 𝜎pt ) of these schemes. Also in many cases, there ods. Wiley, London
is no statistical evidence of normality, which is a required 12. Zar J (1999) Biostatistical analysis. Pearson Education India,
assumption to apply a PT based on consensus. Many practi- Bengaluru
cal aspects do not evaluated in this work such as training of 13. Lalkhen AG, McCluskey A (2008) Clinical tests: sensitivity and
specificity. Contin Educ Anaesth Crit Care Pain 8(6):221–223
analysts or standardization of laboratories processes could 14. Reitsmaa J, Glasa A, Rutjesa A, Scholtenb R, Bossuyta P, Zwin-
be generating these differences dermana A (2005) Bivariate analysis of sensitivity and specificity
produces informative summary measures in diagnostic reviews. J
Acknowledgements  We would like to thank PROASECAL SAS com- Clin Epidemiol 58:982–990
pany for providing us the dataset analyzed in the article. 15. Kim S, Lee W (2017) Does McNemar’s test compare the sensitivi-
ties and specificities of two diagnostic tests? Stat Methods Med
Res 26(1):142–154
16. Wong S (2016) Review of the new edition of ISO 13528. Accred
References Qual Assur 21(4):249–254
17. Willink R (2005) Forming a comparison reference value from
1. ISO/IEC 17043 (2010) Conformity assessment-general require- different distributions of belief. Metrologia 43(1):12
ments for proficiency testing. Standard, International Organization 18. Razali N, Wah Y (2011) Power comparisons of Shapiro-Wilk,
for Standardization, Geneva Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests. J
2. Medeiros de Albano F, Schwengber ten Caten C (2014) Profi- Stat Model Anal 2(1):21–33
ciency tests for laboratories: a systematic review. Accred Qual 19. Carobbi C (2017) A modified ISO 13528 robust analysis (algo-
Assur 19:245–257 rithm A) that takes measurement uncertainty into account. Meas-
3. Szewczak E, Bondarzewski A (2016) Is the assessment of inter- urement 110:296–306
laboratory comparison results for a small number of tests and
limited number of participants reliable and rational? Accred Qual Publisher’s Note Springer Nature remains neutral with regard to
Assur 21(2):91–100 jurisdictional claims in published maps and institutional affiliations.
4. Koch M, Baumeister F (2012) On the use of consensus means as
assigned values. Accred Qual Assur 17(4):395–398
5. PROASECAL SAS (2019) https​://www.proas​ecal.com/

13

You might also like