Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Academia.eduAcademia.edu

Multiple Events Time Data: A Bayesian Recourse

2005, Handbook of Statistics

In many long-term clinical trials subjects often experience a number of events all of which might serve as important endpoints for medical studies. The analysis of such multiple events can be beneficial from both medical and statistical perspectives. We review and discuss some recent developments in the area of Bayesian methodologies for the analysis of multiple event times data, focusing more on semiparametric models based on the intensity functions. The nonparametric part of such models is modeled as a realization of a stochastic process; while the parametric part, which may include a regression parameter and a parameter quantifying the heterogeneity (i.e., frailty) of a population, is assumed to have a parametric distribution with possibly unknown hyperparameters. Numerical computations required to obtain posterior estimates of the parameters are generally based on a class of sampling based methods, broadly known as the Markov Chain Monte Carlo (MCMC) algorithms. The methodologies discussed in this chapter are motivated by and aimed at analyzing some common types of multiple event times data that are often observed in medical studies. Bayesian Exploratory Data Analysis (EDA) methods and diagnostics for model selection and model assessment are considered for each of these cases.

Multiple Events Time Data: A Bayesian Recourse Debajyoti Sinha Department of Biostatistics, Bioinformatics & Epidemiology, MUSC Sujit K. Ghosh Department of Statistics, NCSU Final revision: January 29, 2005 Abstract In many long-term clinical trials subjects often experience a number of events all of which might serve as important endpoints for medical studies. The analysis of such multiple events can be beneficial from both medical and statistical perspectives. We review and discuss some recent developments in the area of Bayesian methodologies for the analysis of multiple event times data, focusing more on semiparametric models based on the intensity functions. The nonparametric part of such models is modeled as a realization of a stochastic process; while the parametric part, which may include a regression parameter and a parameter quantifying the heterogeneity (i.e., frailty) of a population, is assumed to have a parametric distribution with possibly unknown hyperparameters. Numerical computations required to obtain posterior estimates of the parameters are generally based on a class of sampling based methods, broadly known as the Markov Chain Monte Carlo (MCMC) algorithms. The methodologies discussed in this chapter are motivated by and aimed at analyzing some common types of multiple event times data that are often observed in medical studies. Bayesian Exploratory Data Analysis (EDA) methods and diagnostics for model selection and model assessment are considered for each of these cases. Key Words and phrases: Censored Data; Markov Chain Monte Carlo; Mixture Models; Prior Processes; Semiparametric Models; Survival Analysis. 1 1 Introduction In medical studies, interest often centers on the relation between time to an event or ‘failure’ of interest (such as occurrence of a tumor) and the explanatory variables or covariates (such as treatments, design variables, etc.). During a given followup time, a subject may experience more than one events; which may consist of repeated occurrences (recurrent events) or events from different sources. In such cases, we may be interested in the process N (t), counting the number of these important but possibly non-fatal events up to time t. This counting process N (t) for each subject may serve as a measure of disease morbidity. There are several kinds of multiple event times data encountered by statisticians in medical studies. We discuss some major categories of multiple events data primarily based on informative and noninformative termination. Several practical examples are provided that motivated us to develop various statistical models. Several statistical methods are available to analyze multiple events data from both frequentist and Bayesian perspectives. Wei and Glidden (1997) and Hougaard (2000) have provided some recent overviews of the frequentist methods for multiple events data. In this review, we focus on the semiparametric Bayesian models primarily based on the so-called intensity functions which is the risk of a new event given the history of events in the past. In semiparametric Bayes methods, the nonparametric part of the model is assumed to be a realization of a stochastic process summarizing the available prior information about the unknown intensity function. The parametric part is assumed to have a prior distribution with possibly unknown hyperparameters. The semiparametric nature of the model allows considerable generality and applicability but enough structure for useful physical interpretation and understanding for particular applications in the medical research. The advantages of models based on the intensity functions are explained briefly by Wang et al. (2001). In Section 2, we provide several motivating examples to illustrate various types of multiple events data encountered in practice. These examples serve as a basis to consider complex multiple events models and indicate the challenges to the statisticians posed by typical modern day multiple event times data problems. In addition, such complex models though suitable for complex data structure are not easy to handle and usually requires computational tools sophisticated enough to deal with complex data structures. The semiparametric Bayesian methodologies, despite being comparatively new concepts, have enormous potential for success in 2 handling such complex models. Section 3 reviews some existing and newly developed semiparametric models to address various features of complex data structures and designs as illustrated by the examples in Section 2. In Section 4 we briefly review some of the recently developed frequentist statistical methods to analyze these type of complex data structures, along with their limitations in the scope of applications and related computational difficulties. Any Bayesian inference requires prior specification and the inference based on such Bayesian methods depends crucially on such prior specifications. In Section 5, we briefly discuss the commonly used prior processes to model the nonparametric part (e.g., the intensity function). Section 6 presents some existing and newly developed semiparametric Bayesian models suitable for the examples introduced in Section 2. In Section 7, we present a case study to illustrate the implementation of a Bayesian semiparametric model. Section 8 summarizes and concludes with a discussion about future research directions and potential new problems. 2 Practical Examples This section presents examples of the major categories of multiple event times data that are commonly encountered in practice. These examples are not intended to cover all possible types of such data encountered in medical studies, but at least these examples illustrate the three major classifications among these data problems: (1) multiple events data with single event type and noninformative termination; (2) multiple events data with multiple event types and noninformative termination; and (3) multiple events data with dependent termination. These examples also illustrate one important and challenging aspect of multiple events data, viz., termination and censoring of different kinds depending on the particular design and method of data collection used in medical studies. (1) An example of multiple event times data with noninformative censoring is given by Gail et al. (1980). The observations are the times to occurrence of mammary tumors in 48 rats injected with a carcinogen and subsequently randomized to receive either the treatment or the control. In addition, this study had regular follow-ups (only twice in each week) and hence for each rat only the number of new tumors between two consecutive follow up times was 3 recorded. These highly grouped data contain many ties among event times within the same rat and across different rats due to imprecise measurements (specifically, twice a week scheduled observations) of tumor times. All rats were terminated from observation (right censored) after the 182nd day. (2) Many methods for analyzing multiple events data are based on the assumption that any subject’s risk of being under observation at any time is independent of the history of recurrent events experienced by the subject. However, this assumption of independent censoring may not be valid when the termination is caused by informative dropouts or failure events. These terminating events may not be of direct interest, however, the informative censoring induced by such terminating events, if not accounted for, may result into biased estimates. An example of such data is the cohort study of AIDS Link to Intravenous Experiences (ALIVE) discussed by Wang et al. (2001), where the history of recurrent hospitalizations for each patient is recorded. The interest is to study the effects of race and drug user status on the rate of hospitalization, however, for each patient, the time of dropout from the study is believed to be associated with the history of hospitalization. These type of data structures are not limited to medical studies. For instance in industrial settings, for a reliability study of repairable systems, each system may experience recurrent repairable failures until it is censored from observation due to some reason related to its failure pattern. In product testing, the recurrent adverse events may be detection of repairable faults and the censoring may be the final release of the product. (3) Finally, we consider a case where the recurrent non-fatal events, i.e., the informative termination is also an event of interest. For example, in some bladder-cancer trials, after the surgical removal of a patients’ primary tumors, each cancer patient is monitored through regularly scheduled clinic visits for occurrence of recurrent but benign superficial bladder tumors till their termination from observation due to cure from the disease. For such a study, the recurrent adverse events are non-fatal, but their intensity may signify the underlying quality-of-life experience and disease state of the patient, whereas the termination due to cure is a positive outcome. Such a study often includes regular follow-ups, say, monthly, and for each patient, the number of new recurrent events between each pair of consecutive clinic visits is recorded. 4 Bladder cancer data from the VACURG (Sun and Wei, 2000) is a good example of such a multiple-events data with dependent termination. The data set is available at the web-site http://www.blackwellpulishers.co.uk/rss. The time-independent covariates available for each patient consist of a binary variable of treatment (e.g., 1 for treatment and 0 for placebo), the number of tumors at the entry and the size of the largest tumor at the entry. One of our main focuses is to investigate the effects of treatment on the risk of superficial tumor as well as on the risk of discharge from monitoring. In addition, we are also interested in the possible association between the history of nonfatal tumor occurrences and the risk of termination from regular monitoring of tumors. In this review article, we also consider methodologies applicable for timevarying covariates, but, unless mentioned otherwise, the covariates below are assumed to be constant over time within the follow up time. After developing different competing methodologies for time-independent covariates, we extend the methodologies to models with time-dependent covariate. We also explore methodologies that deal with interval censoring and panel count data based on regularly scheduled follow-up times. 3 Semiparametric Models Based on Intensity Functions One of the early scholarly reviews of several models and methodologies proposed for multiple event time data in the context of frequentist inference is available in Oakes (1992). Most of the early multiple events models, as described and proposed in Andersen and Gill (1982), often make the unrealistic assumption that the risk of an event for a given subject is unaffected by all of the earlier events that a given subject has experienced. For a more recent comprehensive review of frequentist models for multiple events, see Hougaard (2001). Some of these models cannot assess heterogeneity between individuals nor can they handle ties and grouping of event times. We review some fully specified stochastic models which avoid such limitations. A fully specified semiparametric model of multiple event times data, the socalled proportional intensity model, is presented in Sinha (1993). In this model, 5 number of recurrent events experienced by subject i, within the time interval (0, t], denoted by Ni (t) is modeled as a mixed non-homogeneous Poisson process (Ross, 1983) with the conditional intensity function, lim δ↓0 Pr[Ni (t, t + δ) = 1|wi , xi ] = bi (t|xi , wi ) = b0 (t) exp(βxi )wi , δ (1) given the unobserved subject-specific random ‘frailty’ wi and the observed timeindependent covariate xi , where Ni (t, t + δ) = Ni (t + δ) − Ni (t). The frailty random variables wi ’s are assumed to be independent and identically distributed (i.i.d.) as Gamma variates in Sinha (1993). However, any random variable with unit mean and support on [0, ∞) can be used to model the frailty wi . Alternatively, we can characterize the frailty distribution using the moment generating function (mgf) of the wi , Ψ(t) = E[exp(tW )], assuming that the mgf exists. To assure identifiability, we assume that E[W ] = k Ψ(1) (0) = 1, where Ψ(k) (t) = dtd k Ψ(t) denotes the k-th derivative of the mgf Ψ(t). One advantage of characterizing the frailty distribution by its mgf is that it helps to express the conditional risk of a new event given history of events N (t) = {N (s) : s < t} at any time t as, Ψ(N (t)+1) (−B(t)) 1 , lim Pr[N (t, t + δ) = 1|N (t)] = h(t|N (t)) = δ↓0 δ Ψ(N (t)) (−B(t)) (2) where N (t, t+δ) is the number of recurrent events within the time interval (t, t+δ], Rt N (t) = N (0, t) and B(t) = 0 b(u)du is the cumulative intensity of the nonhomogeneous Poisson process. In (2), we have suppressed the role of covariate x in the expression to simplify the notations. It is to be noted that in absence of covariate Rt x we have B(t) = 0 b0 (u)du, otherwise we have b(u) = b0 (u) exp(βx). This result expresses the Markovian structure of the recurrent events process in the sense that the risk of future event at a given time depends on the past history N (t) only through N (t), current count of the past events. We denote the unknown variance of W by κ, quantifying the dependencies among inter-event times within each subject and the heterogeneity of the intensity processes in different subjects. Under the Gamma frailty model of Sinha (1993), Ψ(t) = (1 − tκ)−1/κ and it easily follows that, h(t|N (t), x) = {1 + κ−1 N (t)}h1 (t|x), 6 (3) where h1 (t|x) = {1 + B(t|x)}−1 is the hazard for the first event. One can clearly see how the variance, κ of frailty establishes the relationship between the risk of a future event and the number of past events. Motivated by the result in (3), Oakes (1992) proposed a marginal model for multiple events data with conditional hazard, h(t|N (t), x) = {1 + κ−1 N (t)}h0 (t) exp(βx) . It is clear that in the presence of any covariate x, this model is not equivalent to the mixed Poisson process model of Sinha (1993) because h1 (t|x) is not in a proportional hazards structure when the conditional intensity is given by (1). The model in (1) is useful when the focus is on the influence of the covariates (or treatments) and on the heterogeneity among subjects related to the rate of occurrence of events in each subject. The model also assumes that the heterogeneity among subjects remains constant over time, and hence it works best when each subject experiences several events, but a moderate number of subjects are observed in a relatively short span of time. The M -site model of Gail et al. (1980) assumes that each subject has fixed number, M , of potential i.i.d. events at risk. Assuming that the underlying hazard for each potential event at risk is h0 (t) exp(βx), the conditional hazard for a new event at time t, given the history N (t), is given by, lim δ↓0 P [N (t, t + δ) = 1|N (t)] = h(t|N (t), x) = {M − N (t)}h0 (t) exp(βx). δ (4) This model may not be applicable for most of the clinical studies involving multiple events as we can see from the above expression, that the marginal risk of the future event to any subject decreases with the number of previous events N (t). However, it may be an ideal feature for some other situations, for example, when we are dealing with the number of crashes to a software with fixed but unknown M number of undetected bugs at the onset of testing. An extension of M-site model is when M is assumed to vary among subjects with probability distribution P [M = m|φ] = g(m|φ) for m = 0, 1, 2, · · · , where φ is the parameter of this discrete parametric distribution. If we look at only the time to first event T1 under this model, we can show that S1 (t) = P [T1 > t] = ΨM [log S(t)] where ΨM (t) is the mgf of M and S(t) is the common survival function corresponding to each one of M i.i.d. latent events at risk. Note that when g(0|φ) > 0 there is a probability that a subject may not be at risk of any recurrent event at any time point. When M follows a Poisson distribution, this S1 (t) is same 7 as the cure-rate model of Chen et al. (2001). To avoid this positive probability of a subject being not at risk of any event, we consider only the distributions with gM (0) = 0 (for example, a Poisson distribution truncated at 0). For this extended M-site model, the conditional hazard for a new event at time t given the history N (t), can be obtained as, h(t|N (t), x) = h(t|x) E[M × · · · × (M − N ){S(t)}M ]1[M >N ] , E[M × · · · × (M − N + 1){S(t)}M ]1[M ≥N ] (5) where h(t|x) = h0 (t) exp(βx) and 1A is the indicator function. We can further simplify this as, h(t|N (t), x) = h(t|x) E[M − N |M ≥ N, N (t) = N ] , (6) where the expectation in (6) is taken with respect to the conditional probability distribution P [M − N = m|M ≥ N, N (t) = N ] = g ∗ (m|N ) given by  g(N + k){S(t)}N +k N k+k ∗ g (k|N ) = P∞ (7) ′ . ′ ){S(t)}N +k′ N +k g(N + k ′ ′ k =0 k The expression in (6) shows that the relationship between risk of a future event and events history is complicated for this extended M-site model and it depends to a great extend on the distribution gM (·) of M . Markov and semi-Markov models of Lagakos, Sommer and Zelen (1978) often have many parameters and can be hard to interpret. Another class of models ‘modulated renewal processes’ (Cox, 1972b) - has not been widely used, perhaps because of the difficulty in estimating the asymptotic covariance matrix of the parameter estimates (Oakes and Cui, 1994). Besides the assumption of (1), the methodology described in Sinha (1993) further assumes that the hazard of the termination time Ti at time t lim δ↓0 P [Ti ∈ (t, t + δ)|Ni (t), xi , Ti > t] = hT (t|Ni (t), xi ) δ (8) is independent of the history of point process Ni (t) = {Ni (s) : s < t}. This underlying assumption, not often precisely mentioned, can be expressed as hT (t|Ni (t), xi , Ti > t) = hT (t|Ti > t, xi ) . (9) For various practical situations, when the termination Ti is induced by a terminal or fatal event such as death, this assumption in (9) may not be valid at all. Because, 8 in such a case, the pattern of recurrent events in Ni (·) may indicate the progression of the morbidity of the subject. This is the situation when there is an informative termination (e.g., Wang et al. , 2001). A generalization of (1) in Sinha and Maiti (2004) assumes that the hazard of Ti at time t depends on the the previous non-fatal events via the frailty wi . But, given the random effect wi , the hazard of termination at time t does not depend on the history (Ni (t)) of recurrent events. For the case with only a discrete distribution for the termination time, the discretized conditional hazard is given by, exp(β2 xi )wiα P [Ti = aj |xi , wi , Ni (aj ), Ti > aj−1 ] = 1 − φj , (10) where φj = P [T = aj |x = 0, w = 1, N (aj ), T > aj−1 ] is the discretized baseline conditional probability of not termination at time point aj for a patient at risk of termination at that time point. This is assuming that the termination can occur only at one of the scheduled inspection times a1 < · · · < ag . Also, β2 is the regression parameter of the covariate effect on the conditional hazard of termination. We are using a discretized version of the proportional hazards (Cox, 1972a) structure in (10). Also, α ∈ R quantifies the nature of dependence between Ni (t) and the hazard of Ti . For α = 0, we have P (Ti = aj |xi , Ti > aj−1 , Ni (t)) free of Ni (t) and we are in the situation similar to (9) when the risk of termination is independent of the events history. For α > 0, we have positive association between the Ni and Ti and hence, given the covariate, subjects with higher than usual rates of non-fatal events are more likely to be terminated earlier than others. Similarly, α < 0 indicates a negative relationship between termination time and rate of non-fatal events. The special case α = 0 further indicates that the heterogeneity among subjects in terms of rates of non-fatal events is same as the heterogeneity in terms of their termination times. Hence, α > 1 implies more variability in their termination times compared to the variability in their rates of non-fatal events. Similarly, 0 < α < 1 implies more variability in the later. For the special case of α = 1 and R aj φj = exp[− aj−1 b0 (u)du], we get the model of Lancaster and Intrator (1998). Their model has the restrictive assumptions that the hazard function of termination is proportional to the conditional baseline intensity B0 (t) and more importantly, the variability of the subjects in terms of their proneness to non-fatal events is same as their variability in termination times. In Sinha and Maiti (2004), independent Ga(η, η) density is used for the frailty wi . 9 4 Frequentist Methods for Analyzing Multiple Event Data Statistical methodologies for multiple events data can be classified into two groups: conditional and marginal approaches. Conditional models and approaches are used mainly in semiparametric Bayesian and likelihood based methods of analysis. Marginal methods are more popular with generalized estimating equations (GEE) methodologies. This section discusses the present status of the frequentist methods to analyze the examples in Section 2 using the above mentioned models. We focus on the main difficulties faced by the frequentist methods and later describe how these difficulties are overcome by Bayesian analyses. First, we discuss existing frequentist methods for analysis of multiple events data under noninformative or independent terminating event. Oakes (1992) gives a comprehensive review of the several frequentist methods for analyzing multiple event time data and discusses limitations of the methodologies of Andersen and Gill (1982), Lagakos et al. (1978) and the modulated renewal process (Cox, 1972b). His proposed marginal model for multiple event time data applies only to non-grouped data. Thall’s (1988) Poisson regression model for analyzing grouped multiple event-time data does not use the hierarchical structure for the random frailty and he estimates the frailty for each subject instead of a single hyperparameter quantifying the overall variability of the random frailties. Also, Thall and Vail’s (1990) quasi-likelihood method to analyze grouped data cannot be used to estimate efficiently the parameter of the frailty distribution and the standard error of the treatment effect. One common problem present in all these methods for multiple event time data is the lack of a suitable method to verify the modeling assumptions. Most recent review of the conventional frequentist methods for analysis of multiple data can be found in Hougaard (2001). These methods may allow a time-dependent covariate, but usually assume the covariate effect is constant over time. However, there is no specific frequentist methodology to deal with interval-censored data when the scheduled inspection times are not same for each subject. The frequentist methodologies to verify the modeling assumptions, such as the assumption of proportional hazards, constancy of the covariate effect over time, etc., are also scarce for multiple events data. For many currently popular statistical procedures for multiple events data, the final estimates are often highly dependent on starting values, and in practice, it 10 is often difficult to agree on a starting value. In the presence of many nuisance parameters, the asymptotic properties of the maximum likelihood estimate (MLE) and other frequentist estimate of the parameter of interest, say, the regression parameter β, await more extensive results. The small sample properties of these frequentist estimates have yet to be determined. When the risk of termination depends on the history of recurrent events, existing methods in literature are not so common. When the terminating event is of no interest and it is treated only as a sort of informative censoring of recurrent event times, there are some existing frequentist methods to deal with such cases, most notably of them is Wang et al. (2001). However, these approaches are not based of fully specified models. Depending on their goals of analysis, these procedures can be categorized in three groups—(1) based on the inference about the rate function, which is P [N (t, t + δ) = 1|x]; (2) based on the mean function E[N (t)|x]; (3) based on the intensity function P [N (t, t + δ) = 1|x, N (t)]. However, these methods do not precisely mention the model for hT (t|N (t), x). One can not automatically estimate this hazard function using these approaches. It is difficult to know the loss in efficiency for not specifying the model for termination and it is also not possible to check the sensitivity of the assumptions about termination, though the expectation is that the approach used is somewhat ‘robust’ as it is free of any precise modeling assumptions on termination. When we have multiple events with dependent important terminating events (e.g., Sinha and Maiti, 2004) and we are interested about both types of events, there is even bigger need to use a fully specified stochastic model for hT (t|N (t), x). One possible strategy is to focus on either the mean function or the rate function of 1[T >t] N (t) (called the termination modulated recurrent events process, e.g., Ghosh and Lin, 2004). However, this can be misleading if we are interested about understanding the effects of covariates on recurrent events as well as on the termination events. We instead will focus on fully specified stochastic models which can be used for complete understanding of the covariate-effects, prediction, model selection and for understanding the process generating the data. 5 Prior Processes in Semiparametric Model The most popular way to model a prior process on the baseline cumulative intensity function (as well as for the baseline cumulative hazard function) is to use the Levy 11 process introduced by Kalbfleisch (1978), Ferguson and Phadia (1979)and Burridge (1981). They advocated the use of a process with positive independent increments in disjoint intervals (Levy process) to model the prior process of the cumulative Rt hazard and intensity function B0 (t) = 0 b0 (u)du. Following Sinha (1993) we can assume that B0 (t) is the realization of a positive non-decreasing independent increment process (Levy process). That is, the increments of B0 (t) in disjoint intervals are positive and mutually independent. Similarly, the cumulative hazard Rt h (u)du = H0 (t) can be modeled by a Levy process when the cdf is absolutely 0 0 continuous. 5.1 Gamma Processes The most commonly used Levy process for modeling B0 (t) is the Gamma process (Sinha, 1993), denoted by B0 (t) ∼ GP (|Lambda∗ , c), when B0 (t2 ) − B0 (t1 ) has a ∗ ∗ (t ) 1 for t2 > t1 gamma distribution with mean Λ∗ (t2 )−Λ∗ (t1 ) and variance Λ (t2 )−Λ c ∗ and c > 0. Here the non-decreasing positive function, Λ , is the mean function of the random prior process and the hyper-parameter c can be interpreted as a measure of confidence attached to the initial guess Λ∗ (·) of B0 (·), the expected number of events by time t at baseline covariate value and in absence of any frailty effect. It is assumed that Λ∗ is completely specified except possibly for a few unknown parameters. For example, one can consider a Gamma process model with Λ∗ (t) = λt for some hyperparameter λ, possibly unknown. For ordinary univariate survival data, Burridge (1981) explains how a Gamma process model may give wrong inferences if applied to continuous survival data. Similar caution should be used for application of the Gamma process while dealing with continuously observed multiple events data with no ties. Two unsatisfactory features of a Gamma process are: (1) independent increments in adjacent intervals, and (2) realization of the sample path is discrete with probability one. However, this process has found a vast number of applications in literature for modeling different types of multiple events data. 5.2 Correlated Prior Processes This model was introduced by Leonard (1978) for modeling density functions with prior information on smoothness. A discretized approximation of this process assumes a piecewise constant function b0 (t). Over a set of prespecified grouping 12 intervals Ij = (aj−1 , aj ] for j = 1, 2, · · · , g with 0 = t0 < a1 < a2 < · · · < ag−1 < ag , the baseline intensity function is given as b0 (t) = λj for t ∈ Ij . We introduce some smoothness for the values of the λj ’s in adjacent intervals using a first order random walk process for αj = logλj with αj+1 = αj + ej+1 for j = 1, · · · , g − 1 . (11) The ej ’s are i.i.d. normal random variables with 0 mean and variance σ 2 . Higher values of σ imply less prior information on the discretized linear spline smoothing of the λj ’s. The prior information can even be incomplete, because without the specification of the marginal prior of α1 , the joint prior distribution of the α’s remains undefined. Original model of Gamerman (1991), introduced in the context of modeling hazard of an univariate survival time, used the so-called linear Bayesian method which requires only a partial specification (mean 0 and finite variance) of the distribution of ej+1 in (11). It is also important to note that, Gray (1994) again in the context of hazard function, used the prior process structure similar to (11) on λj ’s itself and used a ‘flat’ prior for λ1 . See also Leonard (1978) for a more formal justification for using this kind of incomplete prior information and prior process in theoretical nonparametric Bayesian inference. Various other types of correlated prior processes involving linear and higher order discretized spline smoothing of the αj ’s can be used to summarize different types of prior information about the underlying baseline function. One interesting class of prior process is where a local linear trend is modeled as αj+1 = αj + δj + ej and δj+1 = δj + e′j (12) for j = 1, · · · , g, where ej ∼ N (0, σ12 ), e′j ∼ N (0, σ22 ), and ej ’s and e′j ’s are mutually independent. The model in (12) will reduce to a smoothed linear trend model by taking σ2 = 0. This process allows the intensity to be somewhat linear and smooth without forcing the form to be completely parametric. The completely parametric form with σ1 = σ2 = 0 will be insensitive to any surprising trend of the baseline intensity function suggested by the data. Despite some of their nice physical properties, correlated prior processes have so far enjoyed only limited applications for modeling multiple events data. The correlated prior process is also related to the so-called martingale process introduced by Arjas and Gasbarra (1994), where b0 (t) is modeled as a piecewise constant function with jumps. The jump times are realizations of a time homogeneous Poisson process, and the sequence of b0 (t) values in different jump intervals 13 form a discrete martingale. This martingale process is very similar to the model of (11) if we relax the condition of the jump times being random, and instead assume that the hazard rates change in each grouping interval. The condition of the hazard rates being martingale can be relaxed also to get a sub-martingale process similar to that in (12). A slight modification of the prior process in (11), with αj+1 − α0j = γ(αj − α0j ) + ej+1 , where α0j is the prior mean of αj and γ is another hyperparameter of deviation, is a discretized integrated Ornstein-Uhlenbeck (DIOU) process discussed by Cumberland and Sykes (1982). However, to the best of our knowledge in the current literature, these martingale processes have not been used in multiple events data. 6 Bayesian solution As an illustration of the implementation and applicability of the Bayesian solution for the analysis of multiple events data, we concentrate on the analysis of the bladder-cancer data (Wei and Sun, 2001). Bladder cancer data from the VACURG is a good example of multiple events data with dependent termination process. After their entry at the trial, the patients in this study were monitored every month during their clinic visit for number of new (since the previous clinic visit) superficial bladder tumors till their termination time due to significant improvement in health status. The time-independent covariates used for each patient are binary variable of treatment (1 for treatment and 0 for placebo), number of tumors at the entry and the size of the largest tumor at the entry. To model the recurrent events process N (t), we use the mixed Poisson process model of Sinha (1993) with conditional intensity function of (1). We use the Gamma frailty for wi . For the discrete termination time T , we use the model (10). The baseline intensity function b0 (t) is assumed to have the Gamma process prior. Because the observed recurrent events history is based only on observations recorded at regular time intervals (based on regularly scheduled clinic visits), the likelihood based on the sampling distributions of the observed history of recurrent bladder-tumors given the frailty variables w = (w1 , · · · , wn ) and termination times is given by L1 (β1 , µ, w|D) ∝ n Y Y [exi β1 wi µj ]Nij exp[−exi β1 wi µj ] , i=1 j∈Mi 14 (13) where Mi is the set of inspection intervals when the subject i is under observation and Nij = Ni (aj ) − Ni (aj−1 ) is the number of new bladder-tumors to subject i observed during the interval between aj−1 and aj . It is obvious that the inspection intervals in Mi are all below the observed termination time Ti . For this case, the baseline cumulative intensity B0 (t) enters the likelihood L1 via the increments in these pre-specified time points, say, 0 = a0 < a1 < · · · < ag . One consequence of this fact is that the property of the Gamma process to have discrete sample path does not turn out to be any worrying factor in this application. When B0 (t) is a priori GP (Λ∗ , c), the priors of µj = B0 (aj ) − B0 (aj−1 ) for j = 1, · · · , g are independent Gamma densities with mean γj = Λ∗ (aj ) − Λ∗ (aj−1 ) and variance γj /c. Instead of the Gamma priors, one can also use alternative prior processes such as the correlated prior processes defined via the joint prior distribution of (µ1 , · · · , µg ). Similarly, the likelihood L2 (β2 , π, w|D) based on the sampling distributions of the Ti is given as L2 (β2 , φ, w|D) ∝ n Y wα exp(β2 xi ) exp(β x )wα (1 − φ(i)i )S(i) 2 i i , (14) i=1 where ti = a(i) is the discrete termination time of the subject i. The joint posterior for this model is given by p(β, w, η, µ, φ, α|D) ∝ L1 (β1 , µ, w|D) × L2 (β2 , φ, w|D) × Y f (w|η) × π(β, φ, η, α) × [π(µj |γj , c)], (15) j Q where f (w|η) = i [η η wiη−1 exp(−wi η)/Γ(η)], the product of Gamma densities of the w1 , · · · , wn given η = κ−1 , and π(β, φ, η, α) is obtained via the joint prior distribution for φ = (φ1 , · · · , φg ) and specification of the conditional priors of other parameters given φ. Note that this model here is identical to the one used by Sinha and Maiti (2004). We further refer this article for details about how to specify the priors and about the method of eliciting such priors. The details about numerical integration of (15) to obtain the marginal posterior using MCMC tools are also given in Sinha and Maiti (2004). At each iterative stage of the MCMC, we only need to sample from the conditional posteriors. For further convenience we use N (β10 , σ1 ) for π(β1 ), N (β20 , σ2 ) for π(β2 ), and independent Beta(φj |Aj , Bj ) densities for π(φj ). The π(α) can be assumed as a N (α0 , σ3 ) and we need to take α0 = 0 as we are not sure whether the association between the two classes of events 15 should be either positive or negative. We have also assumed that these parameters are apriori independent of each other. To use the MCMC method, one prefers the conditional posteriors that are quite easy to sample from. These forms of the prior are flexible, yet they facilitate easy to sample conditional posteriors. The expressions and the derivations of the conditional posteriors of the parameters used in the MCMC are given in Sinha and Maiti (2004). Except µj all other full conditional distributions are not in standard form. The full conditional distributions for φ and α are log-concave and an adaptive rejection sampling method by Gilks and Wild (1992) was used to sample from these log-concave densities. Metropolis-Hastings algorithm was used to sample wi ’s. 7 Analysis of the data-example Sinha and Maiti (2004) illustrates the kind of approximate Bayesian method with ten parallel Gibbs chain and 20,000 iterations in each chain. The previous analyses of this data ignored the possible association between the history of nonfatal tumor occurrences and the risk of termination. We would like to emphasize that to deal with missing observations in some of the months, Sinha and Maiti (2004) had to make some appropriate changes in our likelihood based on the assumption of noninformative missingness. Priors with zero mean and high variance (1000) for the regression parameters β1 and β2 have been used to reflect prior opinions of no covariate effects on the nonfatal events intensity as well as on the risk of termination. For the gamma priors of λ, ν and η, selecting small scales and shapes make the prior distributions very diffuse. One needs to evaluate the sensitivity of the posterior analysis over these choices by varying the values of the hyperparameters over a range of possible values. In this analysis, another crucial elicitation step is to choose the hyperparameters associated with the gamma process, we set, τ = 1/c and γ1 , · · · , γg , where γj = γ0 (aj ) − γ0 (aj−1 ). One possible rational behind eliciting these hyperparameters is given in the original paper. Table 1, reproduced from Sinha and Maiti (2004) presents the results of the data analysis. There is not much evidence for the treatment effect on time of termination, because the credible interval for β2 contains zero. But, there is good evidence for a treatment effect on the rate of nonfatal events. The posterior mean of α lies below zero indicating some evidence of negative association between nonfatal 16 events intensity and the termination time. Negative value of α implies that patients who are more prone to nonfatal events are likely to be terminated later. But, the evidence for negative α is not strong at the 95% posterior interval for α does contain zero. There is also strong evidence that the baseline hazard rate for the termination is increasing over time. Please note that one of the major advantages of a Bayesian analysis is that one can compute the prediction distributions of the future nonfatal events and termination time given the values of the covariate. Table 1. Summary of Posterior Analysis Parameter Mean S.E. 95% Credible interval β1 β2 α 1.491 -.0227 -.0137 .577 .249 .0295 (.508, 2.77) (-.520, .468) (-.081, .037) Model validation diagnostics are very important in any data analysis. Many model diagnostics have been developed in the literature. For detailed reviews of Bayesian model validation methods in multiple events data analysis, we may refer to Sinha and Dey (1997), Ibrahim, Chen and Sinha (2001) and Sinha and Maiti (2004). Most popular is a method for Bayesian model validation via conditional predictive ordinate (CPO) (see, Gelfand, Dey and Chang, 1996). For the subject i, the cross-validated posterior predictive probability evaluated at the observed data point is given by CP O( i) = P (Ni = Ni,obs , Ti = ti |D(−i) ) = {E[P (Ni = Ni,obs , Ti = ti |θ)−1 |D]}−1 , (16) where D(−i) is the observed data with the patient i removed, θ = (µ, β1 , β2 , η, α, λ, ν) and Ni = (Ni1 , · · · , NiTi ). The second expression of the CPO in (16) follows from a result of Gelfand, Dey and Chang (1996). This expression actually enables us to use the samples from the posterior p(θ|D) to compute the CPOs. Sinha and Maiti (2004) again illustrates the algorithm we used for calculating CPO’s for the Bladder-cancer data example. CPOs are treated just as the same way residuals are treated in frequentist survival analysis and CPOs are plotted against either their covariate values or to the termination times to check whether the CPO values have any relationship with the covariates and the termination times. 17 8 Discussions and Future Research We presented several Bayesian approaches to model the non-parametric part of a survival model based on intensity functions. There is no clear choice for the best model among these different models. The selection of the model depends mostly on the goal of the study, type of the data set and the amount of the computations required. Some of the merits and demerits of different processes are discussed below. Two unsatisfactory features of a Gamma process are (i) independent increments, and (ii) the realizations are discrete with probability one, so that the resulting marginal distributions imply a positive probability of tied failure times. As pointed out by Burridge (1981), this disadvantage mentioned in (ii) above and some other deficiencies of the Levy process model disappear when applied to grouped data. The specification of any one of the Levy processes and random mixture process requires the knowledge of the mean function of the prior process and the prior uncertainty around the mean. The correlated prior process does not necessarily require the direct knowledge of the mean of the prior process nor any confidence around the mean. But, the implementations of the Bayesian methodologies using the correlated prior processes are often computationally more intensive due to the apparent loss of the conjugacy of the conditional posteriors of the process parameters in most of the practical situations that we usually come across. But, our personal experience suggests that in general the methodologies based on correlated process are less computationally intensive compared to the methodologies based on random mixture processes. Random mixture processes have the advantages that the posterior estimate of the hazard (or intensity) function is continuous and one can compute the posterior estimates of the hazard curve and also any transformation thereof. Another type of semiparametric approach of Bayesian methods of survival data are being explored in the literature of competing risks models. See Cox and Oakes (1984) for the background of a typical competing risks problem. Recent work of McIntosh (1996) and Ghosh and Gelfand (1998b) consider few problems of this type, that is when the failure of the subject is due to one of many competing causes (also see Gelfand et al. , 2000). The subject may be observed to fail from one of these causes (failure types), otherwise the subject will be observed to be censored 18 at some point of time. The interest is in evaluating the effect of the treatment which is intended to act in only one of the possible failure types. Such failure types are often dependent (Moeschberger and Klein, 1995), and in fact, the order of the types might determine the appropriate model. In such contexts, Ghosh and Gelfand (1998a) propose a number of complex shock models incorporating covariates and time-dependent intensities. Semiparametric Bayesian analysis of multiple events is an expanding field of research. Lot of the works mentioned in this review are still in the process of development and getting more refined. Some of the subtopics, where there are enormous scopes of future research, can be very easily identified. We are yet to see a comprehensive Bayesian analysis with model diagnostics for some of the important multiple event survival models, such as stable frailty models and copula models. Another interesting future research problem is the Bayesian analysis of multivariate survival data when there is multiple level of clustering (Liang et al. , 1995; Peng and Oakes, 1996). The modeling techniques, computational tools, model diagnostics and model selection methods discussed can also be extended to more complicated problems in survival analysis, and there lies a wide cope of the future researches on adaptive and widely applicable methodologies. References Andersen, P. K. and Gill, R. D. (1982). Cox’s regression models for counting processes: a large sample study, Annals of Statistics, 10, 1100-1120. Arjas, E. and Gasbarra, D. (1994). Nonparametric Bayesian inference from right censored survival data, using the Gibbs sampler, Statistica Sinica, 4, 505-524. Burridge, J. (1981). Empirical Bayes analysis of survival time data, Journal of the Royal Statistical Society, B, 43, 65-75. Chen, M. H., Ibrahim, J. G. and Sinha, D. (2002). Bayesian inference for multivariate survival data with a surviving fraction, Journal of Multivariate Analysis, 80, 101-126 Cox, D. R. (1972a). Regression models and life tables, Journal of the Royal Statistical Society, B, 34, 187-220. 19 Cox, D. R. (1972b). The statistical analysis of dependencies in point processes, In: Stochastic Point Processes (ed P.A.W. Lewis), 55-66, Wiley, New York. Cox, D.R. and Oakes. D.O. (1984). Analysis of survival data, Chapman & Hall, London. Cumberland, W. G., and Sykes, Z. M. (1982). Weak convergence of an autoregressive process used in modeling population growth, Journal of Applied Probability, 19, 450-455. Ferguson, T. S. and Phadia, E. G. (1979). Bayesian nonparametric estimation based on censored data, Annals of Statistics, 7, 163-186. Gail, M. H., Santner, T.J. and Brown, C. C. (1980). An analysis of comparative carcinogenesis experiments with multiple times to tumor, Biometrics, 36, 255-266. Gamerman, D. (1991). Dynamic Bayesian models for survival data, Applied Statistics, 40, 63-79. Gelfand, A. E., Dey, D. K. and Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods, Bayesian Statistics, 4, Ed. J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, 147-167. Oxford University Press. Gelfand, A. E., Ghosh, S. K., Christiansen C., Soumerai, S. B. and McLaughlin, T. J. (2000). Proportional Hazards Models: A Latent Competing Risk Approach, Journal of Royal Statistical Society, series C (Applied Statistics), 49, 385-397. Ghosh, D. and Lin. D.Y. (2004). Semiparametric Analysis of Recurrent Events Data in the Presence of Dependent Censoring, Biometrics, 59, 877-885. Ghosh, S. K. and Gelfand, A. E. (1998a). A Latent Risk Approach for Modeling Individual Level Data Consisting of Multiple Event Times, Journal of Statistical Research, 32, 23-39. Ghosh, S. K. and Gelfand, A. E. (1998b). Latent Waiting Time Models For Bivariate Event Times With Censoring, Sankhya series B, 60, 31-47. 20 Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling, Applied Statistics, 41, 337-348. Gray, R. J. (1994). A Kernel Method for Incorporating Information on Disease Progression in the Analysis of Survival, Biometrika, 81, 527-539. Hougaard, P. (2000). Analysis of Multivariate Survival Data, Springer-Verlag, New York. Ibrahim, J.G., Chen, M-H. and Sinha, D. (2001), Bayesian Survival Analysis, Springer-Verlag. Kalbfleisch, J. D. (1978). Nonparametric Bayesian analysis of survival time data, Journal of Royal Statistical Society, B, 40, 214-221. Lagakos, S. W., Sommer, C. J. and Zelen, M. (1978). Semi-markov models for partially censored data, Biometrika, 65, 311-317. Lancaster, T. and Intrator, O. (1998). Panel data with survival: hospitalization of HIV-positive patients, Journal of the American Statistical Association, 93, 46-51. Leonard, T. (1978). Density Estimation, Stochastic Processes and Prior Information, Journal of the Royal Statistical Society, Series B, 40, 113-146. Moeschberger, M. L. and Klein, J. P. (1995). Statistical methods for dependent competing risk, Lifetime Data Analysis, 1, 195-204. McIntosh, M. W. (1996). The population risk as an explanatory variable in research synthesis of clinical trials, Statistics in Medicine, 15, 1713-1728. Oakes, D. (1992). Frailty models for multiple event times, J. P. Klein and P. K. Goel (eds.), Survival Analysis: State of the Art, 371-379, Kluwer Academic Publisher: Netherlands. Oakes, D. and Cui, L. (1994). On Semiparametric Inference for Modulated Renewal Processes, Biometrika, 81, 83-90. Peng, F. P. and Oakes, D. (1996). Bayesian analysis of multiple event time data under a nested frailty model, Technical Report, Department of Biostatistics, University of Rochester. 21 Ross, S.M. (1983). Stochastic Process, Wiley, New York. Sinha, D. (1993). Semiparametric Bayesian analysis of multiple event time data, Journal of the American Statistical Association, 88, 979-983. Sinha, D. and Dey, D. K. (1997). Semiparametric Bayesian analysis of survival data. Journal of the American Statistical Association, 92, 1195-1212. Sinha, D. and Maiti, T. (2004). Analyzing panel-count data with dependent termination: a Bayesian approach, Biometrics, vol. 60, 34-40. Sun, J., and L. J. Wei (2000). Regression analysis of panel count data with covariate-dependent observation and censoring times, Journal of the Royal Statistical Society, B, 62, 293-302. Thall, P. F. (1988). Mixed Poisson likelihood regression models for longitudinal count data with overdispersion, Biometrics, 44, 197-209. Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinal count data with overdispersion, Biometrics, 46, 657-671. Wang, M-C., Qin, J. and Chang, C-T. (2001). Analyzing recurrent event data with informative censoring, Journal of the American Statistical Association, 96, 1057-1065. Wei, L.J., and Glidden, D.V. (1997). An overview of statistical methods for multiple failure time data in clinical trials (with discussion), Statistics in Medicine, 16, 833-839. 22 View publication stats