Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

2006Vittadinisanarico.pdf

...Read more
Testing Procedures for Multilevel Models with Administrative Data Verifiche di Ipotesi per Multilevel con Dati Amministrativi Giorgio Vittadini 1 Department of Statistics University of Milano-Bicocca, Italy giorgio.vittadini@unimib.it Maurizio Sanarico Noustat s.r.l. m.sanarico@noustat.it Paolo Berta C.R.I.S.P. crisp.statistica@unimib.it Abstract: Recent Relative Effectiveness studies of the Health Sector have strongly cri- ticized hierarchical ranking in hospitals. As an alternative, they propose a multi-faceted approach which evaluates the quality and characteristics of Hospital services. In this direction, the use of administrative data has proven highly useful. This data is less precise than clinical data but performs more effectively in describing general situations. The numerosity of the population renders all the parameters significant in linear model tests. We must therefore utilize resampling schemes in order to verify the hypotheses concerning the significance of the parameters in opportunely drawn subsamples. Keywords: Hospitals Effectiveness, Administrative Data, Testing Procedure for Multi- level Models, Heterogeneous Samples, Bootstrap. 1 Hospital Effectiveness with Administrative Data Several recent statistical papers deal with risk-adjusted comparisons on the basis of mortality or morbidity outcomes corrected by means of Multilevel models in order to take into account different case-mix of patients (Goldstein and Spiegelhalter (1996);Vit- tadini et al. (2003); Vittadini et al. (2004)). These papers, extremely accurate from the methodological point of view, are all based on small samples of patients with particular pathologies. Other medical papers propose risk-adjusted comparisons as a method for evaluating quality and effectiveness of health structures (Iezzoni (1997)). Moreover, in some countries private or public External Health Agencies gather, ad-hoc, larger data sets and use linear and logistic models in order to validate quality indicators (AHRQ (2003); JCAHO (2004)). In other cases, they benchmark health structures by means of risk-adjusted comparisons (CIHI (2003); National Health Service (2004)). Recently, this use of risk adjusted comparisons for benchmarking health structures has been strongly criticized (Lilford et al. (1994)). In particular, it has been stated that: “The sensitivity of an institution’s position in league tables to the method of risk adjustment used suggests that comparisons of outcomes are unlikely to tell us about the quality of 1 via Bicocca degli Arcimboldi, 8 - 20126 Milano
care”. Therefore it has been suggested that “The agencies should facilitate the develop- ment and dissemination of a database for best practice and improvement based on the results for primary and secondary research.”(Lilford et al. (1994)). In this direction, the use of administrative data, used by payors to pay bills and manage operations, can be very useful. In fact: “These data are typically computerized, making it easy to collect and use large quantities of information Administrative data have been used to examine geographic variation in utilization of surgical and medical procedures, monitor the use of health services, assess the effects of a policy change on health expenditures, evaluate the relationships between hospital death rates and hospital characteristics” (Damberg et al. (1998)). Which problems do linear models and, in particular, multilevel models involve when they are used with administrative data? Besides problems connected with the accuracy of data (i.e. coding accuracy, timing of diagnoses uncertainty etc.) (Damberg et al. (1998)), there is a relevant methodological problem. It is known that when there are large data sets the significance tests associated with linear models refuse the null hypothesis in all cases. In fact the sample size influences the results, and beyond a certain threshold, it is the only determining factor of the test. Every explicative variable seems to be significant for ex- plaining the outcomes, and this result is particularly misleading for the topics mentioned above (Vroman Battle and Rakow (1993)). Therefore we need appropriate testing procedures able to verify hypotheses regarding the significance of explicative variables in samples drawn from the population associated with administrate data. In general terms, we must devise inference methods in heterogeneous samples collected from large data sets (Duncan and Moon (1998)). The conclusion ob- tained for tests connected with the Multilevel Model used for the evaluation of healthcare institutions, can be generalized for Linear models in a more general context. 2 The Model Let us consider a number of outcomes obtained from hospital discharge forms. These outcomes are binary variables and due to the hierarchical structure of n observations in a Logistic Multilevel Model. Therefore given the variable Y ij = Bin ( n, π ij ), we fit the following model Logit(π ij )= f (X, Z )= γ 00 +γ 10 X ij +γ 01 Z j +γ 11 X ij Z j +u 1j X ij +u 0j +ε ij (1) π ij = 1 1 + exp(-f(X, Z)) (2) Before the estimation of the model, we applied a process aimed at optimising the univaria- te relationship between the outcomes and the predictors. Such a process consisted of a discretization based on the automatic identification of the linear intervals in each relationship, assigning an indicator variable to each interval. In this way, non-linearities were captured and modelled, thereby enhancing the expressive power of the independent variables. The model building initiates with a model including only the dummies; in this phase we facilitated a chunk elimination; the second step consisted in adding all the remaining predictors. A backward elimination completed the model building process. A positive side effect of the transformation process was the weakening of the evidence calling for the inclusion of interactions. A possible explanation of this fact is that most of
Testing Procedures for Multilevel Models with Administrative Data Verifiche di Ipotesi per Multilevel con Dati Amministrativi Giorgio Vittadini1 Department of Statistics University of Milano-Bicocca, Italy giorgio.vittadini@unimib.it Maurizio Sanarico Noustat s.r.l. m.sanarico@noustat.it Paolo Berta C.R.I.S.P. crisp.statistica@unimib.it Abstract: Recent Relative Effectiveness studies of the Health Sector have strongly criticized hierarchical ranking in hospitals. As an alternative, they propose a multi-faceted approach which evaluates the quality and characteristics of Hospital services. In this direction, the use of administrative data has proven highly useful. This data is less precise than clinical data but performs more effectively in describing general situations. The numerosity of the population renders all the parameters significant in linear model tests. We must therefore utilize resampling schemes in order to verify the hypotheses concerning the significance of the parameters in opportunely drawn subsamples. Keywords: Hospitals Effectiveness, Administrative Data, Testing Procedure for Multilevel Models, Heterogeneous Samples, Bootstrap. 1 Hospital Effectiveness with Administrative Data Several recent statistical papers deal with risk-adjusted comparisons on the basis of mortality or morbidity outcomes corrected by means of Multilevel models in order to take into account different case-mix of patients (Goldstein and Spiegelhalter (1996);Vittadini et al. (2003); Vittadini et al. (2004)). These papers, extremely accurate from the methodological point of view, are all based on small samples of patients with particular pathologies. Other medical papers propose risk-adjusted comparisons as a method for evaluating quality and effectiveness of health structures (Iezzoni (1997)). Moreover, in some countries private or public External Health Agencies gather, ad-hoc, larger data sets and use linear and logistic models in order to validate quality indicators (AHRQ (2003); JCAHO (2004)). In other cases, they benchmark health structures by means of risk-adjusted comparisons (CIHI (2003); National Health Service (2004)). Recently, this use of risk adjusted comparisons for benchmarking health structures has been strongly criticized (Lilford et al. (1994)). In particular, it has been stated that: “The sensitivity of an institution’s position in league tables to the method of risk adjustment used suggests that comparisons of outcomes are unlikely to tell us about the quality of 1 via Bicocca degli Arcimboldi, 8 - 20126 Milano care”. Therefore it has been suggested that “The agencies should facilitate the development and dissemination of a database for best practice and improvement based on the results for primary and secondary research.”(Lilford et al. (1994)). In this direction, the use of administrative data, used by payors to pay bills and manage operations, can be very useful. In fact: “These data are typically computerized, making it easy to collect and use large quantities of information Administrative data have been used to examine geographic variation in utilization of surgical and medical procedures, monitor the use of health services, assess the effects of a policy change on health expenditures, evaluate the relationships between hospital death rates and hospital characteristics” (Damberg et al. (1998)). Which problems do linear models and, in particular, multilevel models involve when they are used with administrative data? Besides problems connected with the accuracy of data (i.e. coding accuracy, timing of diagnoses uncertainty etc.) (Damberg et al. (1998)), there is a relevant methodological problem. It is known that when there are large data sets the significance tests associated with linear models refuse the null hypothesis in all cases. In fact the sample size influences the results, and beyond a certain threshold, it is the only determining factor of the test. Every explicative variable seems to be significant for explaining the outcomes, and this result is particularly misleading for the topics mentioned above (Vroman Battle and Rakow (1993)). Therefore we need appropriate testing procedures able to verify hypotheses regarding the significance of explicative variables in samples drawn from the population associated with administrate data. In general terms, we must devise inference methods in heterogeneous samples collected from large data sets (Duncan and Moon (1998)). The conclusion obtained for tests connected with the Multilevel Model used for the evaluation of healthcare institutions, can be generalized for Linear models in a more general context. 2 The Model Let us consider a number of outcomes obtained from hospital discharge forms. These outcomes are binary variables and due to the hierarchical structure of n observations in a Logistic Multilevel Model. Therefore given the variable Yij = Bin ( n, πij ), we fit the following model Logit(πij ) = f (X, Z) = γ00 +γ10 Xij +γ01 Zj +γ11 Xij Zj +u1j Xij +u0j +εij (1) 1 (2) 1 + exp(−f(X, Z)) Before the estimation of the model, we applied a process aimed at optimising the univariate relationship between the outcomes and the predictors. Such a process consisted of a discretization based on the automatic identification of the linear intervals in each relationship, assigning an indicator variable to each interval. In this way, non-linearities were captured and modelled, thereby enhancing the expressive power of the independent variables. The model building initiates with a model including only the dummies; in this phase we facilitated a chunk elimination; the second step consisted in adding all the remaining predictors. A backward elimination completed the model building process. A positive side effect of the transformation process was the weakening of the evidence calling for the inclusion of interactions. A possible explanation of this fact is that most of πij = the variability is retained by the main effects after transformation, and that non linearity and second-order interaction compete for the same information. 3 Problems Involved in Tests with Large Data Sets Because of the high number of degrees of freedom, all first level effects turned out to be significant, whereas this is not true in the case of the hospital level effects. This result is inherent in the classical estimation procedure, designed for small to medium samples (at most some thousands), and it is due to the indefinite narrowing of the standard errors as sample increases. The imbalance between the first and second level analysis has been overcome by utilizing an empirical testing procedure looking first at the size of effects, then at their relative variability (gauged by confidence intervals) and finally the statistical significance of the effects. This procedure adheres to the following logic: first evaluate the practical importance, then the extent of statistical variability associated with the effect (uncertainty), then the significance is used as a standard measure of the deviation to a null effect. The next step in the present work tries to go beyond what is described above. Statistical inference based on very large sample (> 100.000), containing many heterogeneous groups, leads to irrelevance of statistical testing because of the exceeding power. We think that the real interest is to disentangle the complex data structure. In summary “failing to reject the null hypothesis” is not the same as “accepting the null hypothesis” or as “rejecting the alternative hypothesis” (Vroman Battle and Rakow (1993)) because of the large size of the sample, the null hypothesis is rejected but this does not mean that the alternative hypothesis of significance is accepted. 4 The New Proposal We propose a scheme of analysis in which we first attempt to discover and represent the heterogeneity, then we model the detailed data incorporating the structure which has emerged, performing the inference using a number of competing approaches: conducting the analysis within each sub sample, patching together the results using and comparing a standard approach, a Bayesian approach, resampling-based approaches. The standard approach has many drawbacks: in this paper it is used to provide a reference point. The Bayesian approach, by modelling the probabilities directly, seems to be immune to the problems discussed above Albert and Chib (1951). However, apart from being computationally very expensive, it is not clear how this approach performs in the presence of large samples. The structured resampling-based approach is an attempt to overcome the problems described by combining resampling-based techniques (for example, various versions of Bootstrap or Boosting) and a representation of the heterogeneity in the sample which can be obtained either by a data-driven approach (useful for achieving a statistically representative analysis of the heterogeneity) or by a knowledge-driven approach (useful to test hypotheses or to explore specific well-identified sub-samples) (Di Ciccio and Efron (1996); Efron (1996)). Multilevel models represent the statistical relationships existing between a given depen- dent variable or response function and a set of predictors, taking into account the objects of different size to which such predictors are associated and the relationships (most of the time hierarchical) between these objects. This allows taking into account the different sources of variability in the data correctly. However the multilevel paradigm is not able to capture all of the variability and heterogeneity in the data. For example, it is not able to explain the heterogeneous behaviour of a given agent (hospital) considered from the response function conditional to the set of predictor, relative to other similar agents. Being alike after modelling out the variability associated to the multi-level model amount, they are probably alike from a managerial point of view. Given this, we considered a more complementary approach in which no predefined structure was super-imposed on the observations. This methodology has been called clusterweighted modelling or soft-clustering. The idea is to combine the results in order to highlight cases that behave in a well-characterized way (belonging to a single domain of influence or cluster) and cases whose response has characteristics partially shared by more than one cluster. In the final part of this section, we define soft-clustering methodology in further detail. First, the basic assumptions in the approach: • The clusters do not interact or describe the data locally with respect to the maximum of the joint probability. • There is no prior information. An arbitrary cost function Eij , is used to express the energy associated to zi in the cluster Cj with centre µzj . • An iterative process with many clusters utilized to achieve a satisfactory partitioning of the data space through a sequential fusion of the clusters. The probability that point xi ∈ Cj belongs to cluster Cj is expressed by pij. The total average cost is therefore: < E >= M X N X pi j Ei j (3) j=1 i=1 Equation (3) acts as a boundary condition to the data distribution. To find a stable distribution we follow the maximum entropy principle during each step of the iterative process. The pij which maximizes entropy is: H=− M X N X pi j log (pi j ) j=1 i=1  N P   pij = 1  i=1 M N P P   pij Eij =< E >= cost  (4) i=1 j=1 the Boltzmann distributions are: pi j exp [ − β Eij ] = Zj Zj = N X exp ( − β Ei j ) (5) i=1 where the distribution function β is a Lagrange multiplier. Using a thermodynamic analogy, if β ∝ 1/T, where T is the a “temperature” of the system, with increasing β, the system tends to be frozen and only the closer points influence each other. With decreasing β we have a more disordered system (observations have a higher degree of interaction). The assumption of independence between the clusters and the pij of different clusters allows us to define the free energy Fj for the cluster Cj : Fj = − 1 log Zj β ∂Fj = 0 ∂µ2j ∀j Considering the squared euclidean distance: 2 E i j = | zi − − µ→i | = | yi − µ yi |2 + | xi − µ xi |2 (6) (7) we obtain: µ zj h N z exp − β X i = h N P i=1 exp − β zi − µ zj 2 i z j 2 i zi − µ i=1 (8) Equation (8) cannot be solved analytically. A solution can be obtained through fixed-point iteration of the following formula: h 2 i z N X zi exp − β zi − µ j (n) µ zj ( n + 1 ) = (9) h N 2 i P z i=1 exp − β zi − µ j (n) i=1 which is typically iterated until a stable µ zj is obtained. The process converges to a local minimum with respect to specified initial conditions and β, which reflects the number of clusters used to represent the data. This topic, currently under investigation, is the application of the cluster-weighted modelling of all data, taking into account the patient as well as the hospital level, constructing a multi-level cluster weighted framework of analysis, able to answer some very interesting questions without the necessity of using ad-hoc procedures. Below, we give a brief account of the theory. Let us consider the data vectors as { yn ,xn ,zn }, where y is the response function, x the level-1 variables and z the level-2 variables. We then infer p(y,x,z) as the joint probability density. This density is expanded over a sum of cluster Ck , each cluster containing an input distribution, a local model, and an output distribution. The input distribution is: p(y, x, z) = = K P k=1 K P p(y, x, z, ck ) = K P p(y, x, z|ck )p(ck ) = k=1 (10) p(y|z, x, ck )p(x|z, ck )p(z|ck )p(ck ) k=1 with the normalization condition Σn p(ck )=1. In the presence of both discrete and continuous predictors we must further partition them accordingly. The next point is to associate a specific density to each of the terms in the formula. Normally, the conditional distributions p(x|z,ck ) and p(z|ck ) are taken to be Gaussian distributions with appropriate covariance matrices (for example, diagonal or structured). The output distribution p(y|x,z,ck ) depends on whether y is continuous-valued or discrete, and on the type of local model connecting x,z and y: f(x,z,bk ). In most cases a linear function is enough, given the composition of many local functions (as many as required by the data distribution), to represent complex nonlinear functions. 5 An Application The study is based on the administrative data provided by the Lombardy Regional Health Care Directorate regarding 1.152.266 admissions to 160 hospitals. The data consists of: regional population anagraphical records, Administrative Hospital Discharge records and hospitals’ structural characteristics. Response variables are: in-hospital and post-discharge mortality, patient’s discharges against medical advice, transfers to other hospitals, unscheduled hospital re-admissions, unscheduled returns to operating room. Patients’ case mix and hospitals’ characteristics are also collected from the same sources. We use a logistic Multilevel model to investigate best and worse practices of hospitals connected with their characteristics (i.e.: size, private vs public status, general vs specialized, etc.). The test procedures mentioned above are used in order to evaluate the significance of parameters related to explicative variables in the context of large populations. Multilevel models produce a variety of useful results, and in the Health Care Effectiveness Evaluation context level-2 residuals are particularly important. In the present case, we have 160 residuals, one for each hospital, with a considerable level of heterogeneity, indicating either a possible difference in managerial effectiveness, or some other source of variability. Is there further information, perhaps at a higher level, not-well defined at the sampling design stage, that can reasonably account for a significant portion of the level-2 residual variation? Is it possible to individualize combinations of conditions associated with specific portions of the level-2 residual distribution? In order to answer to these questions, a simple but effective strategy identifies a set of variables able to determine the aforementioned patterns. Then, we apply a multiple comparison Bonferroni-adjusted procedure to identify their statistical significance. Three criteria are used: the effect size, the adjusted p-value and the standard errors (confidence intervals) of the effect. In the analysis we also include second-order interaction effects. The procedure described has been well-accepted in literature, and able to obtain interpretable results consistent with experience-backed beliefs. At this point we pursue an integrated approach in combining the Multilevel and Cluster weighted models. Therefore we present an outline of a practical application of the association between Multilevel modelling and cluster-weighted clustering, namely, to apply the soft-clustering to level-2 residuals to obtain automatically the main patterns of variation in the joint distribution of residuals. As a further interesting result, we obtain the probability of each hospital to belong to each cluster. This allows us to discern well-characterized hospitals from other less-clearly defined ones. The case study concerns the soft-clustering analysis for mortality rate within 30 days after discharge outcome. The procedure detected 4 clusters. All the observations were wellcharacterized by a single cluster with one exception, as already mentioned. The tables below show an analysis of variance on these clusters and their composition: Parameter Intercept Cluster Cluster Cluster Cluster DEAS PRS IRCCS UNI PRIV cluster0 cluster1 cluster2 cluster3 Cluster cluster0 N Obs 55 cluster1 24 cluster2 49 cluster3 32 Estimate -.3277484344 0.0340194876 0.5136519229 -.1232972155 0.0000000000 0.6099373653 0.4559875042 -.4917529255 -.1586361703 -.0270973074 Variable PRIV IRCCS DEAS PRS RR PRIV IRCCS DEAS PRS RR PRIV IRCCS DEAS PRS RR PRIV IRCCS DEAS PRS RR Mean 1.00 0.00 0.16 0.15 1.00 0.00 0.25 0.00 0.00 1.21 0.02 0.08 1.00 0.00 1.16 0.03 0.03 0.00 1.00 1.25 Standard Error 0.20144526 0.37985182 0.22580980 0.25478708 . 0.17460280 0.18319776 0.16183465 0.14348553 0.34221433 Minimum 1.00 0.00 0.00 0.00 0.24 0.00 0.00 0.00 0.00 0.24 0.00 0.00 1.00 0.00 0.27 0.00 0.00 0.00 1.00 0.39 t Value -1.63 0.09 2.27 -0.48 . 3.49 2.49 -3.04 -1.11 -0.08 Maximum 1.00 0.00 1.00 1.00 3.05 0.00 1.00 0.00 0.00 0.175 1.00 1.00 1.00 0.00 2.17 1.00 1.00 0.00 1.00 2.56 Pr > |t| 0.7347222 6.45 0.16875 4.36875 . 0.0006 0.0965278 0.0028 1.8798611 6.5069444 Std Dev 0.00 0.00 0.37 0.36 0.59 0.00 0.44 0.00 0.00 0.05 0.14 0.28 0.00 0.00 0.41 0.18 0.18 0.00 0.00 0.55 Median 1.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.00 1.10 0.00 0.00 1.00 0.00 1.11 0.00 0.00 0.00 1.00 1.23 The risk within clusters presents a significant heterogeneity. The presence of DEAS and emergency units remain significant in explaining the heterogeneity. Another interesting result is that most of the hospitals with a given profile have similar patterns of residual variation (hospital-specific effectiveness), whereas some are deviate from this norm. This is a significant finding, allowing the proposal of innovations for the improvement in Quality for specific health care facilities. References AHRQ (2003) Guide to Inpatient Quality Indicators, www.ahrq.gov/dat/hcup. Albert J. and Chib S. (1951) Bayesian tests and model diagnostics in conditionally independent hierarchical models, Journal of the American Statistical Association, 92, 916–925. CIHI (2003) Hospital report 2002, Acute Care Technical Summary, secure.cihi.ca/cihiweb/splash.html. Damberg C., Kerr E.A. and McGlynn E. (1998) Description of data sources and related issues, in: Health Information Systems, Design Issues and Analytical Applications, McGlynn E.A., Damberg C.L., Kerr E.A. and Brook R., eds., RAND Health Corporation, 43–76. Di Ciccio T.J. and Efron B. (1996) Bootstrap confidence intervals, Statistical Science, 11, 189–212. Duncan C. Jones K. and Moon G. (1998) Context, composition and heterogeneity: using multilevel models in health research, Social Science and Medicine, 46, 97–117. Efron B. (1996) Empirical bayes methods for combining likelihoods, Journal of the American Statistical Association, 91, 538–565. Goldstein H. and Spiegelhalter D.J. (1996) League tables and their limitations: statistical issues in comparisons of institutional performance, Journal of the Royal Statistical Society A, 159, 385–443. Iezzoni L. (1997) The risk of risk adjustment, JAMA, 278 (19), 1600–1607. JCAHO (2004) www.jcaho.com. Lilford R., Mohammed M., Spiegelhalter D.J. and Thomson R. (1994) Use and misuse of process and outcome data in managing performance of acute medical care: avoiding institutional stigma, The Lancet, 364, 1147–1154. National Health Service C. (2004) A Commentary on Star Ratings 2002-2003, www.chi.nhs.uk./ratings. Vittadini G., Carabalona R. and Rossi C. (2004) Metodologie statistiche di valutazione dell’efficacia delle strutture sanitarie, in: Qualità e Valutazione delle Strutture Sanitarie, Pagano M. and Vittadini G., eds., Etas, Milano, 269–282. Vittadini G., Rossi C. and Sanarico M. (2003) Recenti sviluppi nella metodologia statistica per la valutazione dell’efficacia degli ospedali, Statistica, 1, 1–24. Vroman Battle M. and Rakow E.A. (1993) Zen and the art of reporting differences in data that are not statistical significant, IEEE Transactions on Professional Communication, 39,2, 75–80.
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
Murat Atan
Ankara Hacı Bayram Veli University
E.i.abdul Sathar
University of Kerala
zerife yıldırım
Harran University
Adebayo D Agunbiade
Olabisi Onabanjo University, Ago-Iwoye, Nigeria