Multiple Events Time Data: A Bayesian Recourse
Debajyoti Sinha
Department of Biostatistics, Bioinformatics & Epidemiology, MUSC
Sujit K. Ghosh
Department of Statistics, NCSU
Final revision: January 29, 2005
Abstract
In many long-term clinical trials subjects often experience a number of events
all of which might serve as important endpoints for medical studies. The
analysis of such multiple events can be beneficial from both medical and
statistical perspectives. We review and discuss some recent developments
in the area of Bayesian methodologies for the analysis of multiple event
times data, focusing more on semiparametric models based on the intensity
functions. The nonparametric part of such models is modeled as a realization
of a stochastic process; while the parametric part, which may include a
regression parameter and a parameter quantifying the heterogeneity (i.e.,
frailty) of a population, is assumed to have a parametric distribution with
possibly unknown hyperparameters. Numerical computations required to
obtain posterior estimates of the parameters are generally based on a class
of sampling based methods, broadly known as the Markov Chain Monte
Carlo (MCMC) algorithms. The methodologies discussed in this chapter are
motivated by and aimed at analyzing some common types of multiple event
times data that are often observed in medical studies. Bayesian Exploratory
Data Analysis (EDA) methods and diagnostics for model selection and model
assessment are considered for each of these cases.
Key Words and phrases: Censored Data; Markov Chain Monte Carlo; Mixture
Models; Prior Processes; Semiparametric Models; Survival Analysis.
1
1
Introduction
In medical studies, interest often centers on the relation between time to an event
or ‘failure’ of interest (such as occurrence of a tumor) and the explanatory variables
or covariates (such as treatments, design variables, etc.). During a given followup time, a subject may experience more than one events; which may consist of
repeated occurrences (recurrent events) or events from different sources. In such
cases, we may be interested in the process N (t), counting the number of these
important but possibly non-fatal events up to time t. This counting process N (t)
for each subject may serve as a measure of disease morbidity. There are several
kinds of multiple event times data encountered by statisticians in medical studies.
We discuss some major categories of multiple events data primarily based on informative and noninformative termination. Several practical examples are provided
that motivated us to develop various statistical models. Several statistical methods
are available to analyze multiple events data from both frequentist and Bayesian
perspectives. Wei and Glidden (1997) and Hougaard (2000) have provided some recent overviews of the frequentist methods for multiple events data. In this review,
we focus on the semiparametric Bayesian models primarily based on the so-called
intensity functions which is the risk of a new event given the history of events in
the past. In semiparametric Bayes methods, the nonparametric part of the model
is assumed to be a realization of a stochastic process summarizing the available
prior information about the unknown intensity function. The parametric part is
assumed to have a prior distribution with possibly unknown hyperparameters. The
semiparametric nature of the model allows considerable generality and applicability but enough structure for useful physical interpretation and understanding for
particular applications in the medical research. The advantages of models based
on the intensity functions are explained briefly by Wang et al. (2001).
In Section 2, we provide several motivating examples to illustrate various types
of multiple events data encountered in practice. These examples serve as a basis
to consider complex multiple events models and indicate the challenges to the
statisticians posed by typical modern day multiple event times data problems. In
addition, such complex models though suitable for complex data structure are not
easy to handle and usually requires computational tools sophisticated enough to
deal with complex data structures. The semiparametric Bayesian methodologies,
despite being comparatively new concepts, have enormous potential for success in
2
handling such complex models.
Section 3 reviews some existing and newly developed semiparametric models
to address various features of complex data structures and designs as illustrated
by the examples in Section 2. In Section 4 we briefly review some of the recently
developed frequentist statistical methods to analyze these type of complex data
structures, along with their limitations in the scope of applications and related
computational difficulties.
Any Bayesian inference requires prior specification and the inference based on
such Bayesian methods depends crucially on such prior specifications. In Section 5,
we briefly discuss the commonly used prior processes to model the nonparametric
part (e.g., the intensity function). Section 6 presents some existing and newly
developed semiparametric Bayesian models suitable for the examples introduced
in Section 2.
In Section 7, we present a case study to illustrate the implementation of a
Bayesian semiparametric model. Section 8 summarizes and concludes with a discussion about future research directions and potential new problems.
2
Practical Examples
This section presents examples of the major categories of multiple event times
data that are commonly encountered in practice. These examples are not intended
to cover all possible types of such data encountered in medical studies, but at
least these examples illustrate the three major classifications among these data
problems: (1) multiple events data with single event type and noninformative
termination; (2) multiple events data with multiple event types and noninformative
termination; and (3) multiple events data with dependent termination. These
examples also illustrate one important and challenging aspect of multiple events
data, viz., termination and censoring of different kinds depending on the particular
design and method of data collection used in medical studies.
(1) An example of multiple event times data with noninformative censoring is
given by Gail et al. (1980). The observations are the times to occurrence
of mammary tumors in 48 rats injected with a carcinogen and subsequently
randomized to receive either the treatment or the control. In addition, this
study had regular follow-ups (only twice in each week) and hence for each rat
only the number of new tumors between two consecutive follow up times was
3
recorded. These highly grouped data contain many ties among event times
within the same rat and across different rats due to imprecise measurements
(specifically, twice a week scheduled observations) of tumor times. All rats
were terminated from observation (right censored) after the 182nd day.
(2) Many methods for analyzing multiple events data are based on the assumption that any subject’s risk of being under observation at any time is independent of the history of recurrent events experienced by the subject. However,
this assumption of independent censoring may not be valid when the termination is caused by informative dropouts or failure events. These terminating events may not be of direct interest, however, the informative censoring
induced by such terminating events, if not accounted for, may result into
biased estimates. An example of such data is the cohort study of AIDS Link
to Intravenous Experiences (ALIVE) discussed by Wang et al. (2001), where
the history of recurrent hospitalizations for each patient is recorded. The
interest is to study the effects of race and drug user status on the rate of hospitalization, however, for each patient, the time of dropout from the study
is believed to be associated with the history of hospitalization.
These type of data structures are not limited to medical studies. For instance
in industrial settings, for a reliability study of repairable systems, each system may experience recurrent repairable failures until it is censored from
observation due to some reason related to its failure pattern. In product
testing, the recurrent adverse events may be detection of repairable faults
and the censoring may be the final release of the product.
(3) Finally, we consider a case where the recurrent non-fatal events, i.e., the
informative termination is also an event of interest. For example, in some
bladder-cancer trials, after the surgical removal of a patients’ primary tumors, each cancer patient is monitored through regularly scheduled clinic
visits for occurrence of recurrent but benign superficial bladder tumors till
their termination from observation due to cure from the disease. For such a
study, the recurrent adverse events are non-fatal, but their intensity may signify the underlying quality-of-life experience and disease state of the patient,
whereas the termination due to cure is a positive outcome. Such a study often
includes regular follow-ups, say, monthly, and for each patient, the number of
new recurrent events between each pair of consecutive clinic visits is recorded.
4
Bladder cancer data from the VACURG (Sun and Wei, 2000) is a good example of such a multiple-events data with dependent termination. The data set
is available at the web-site http://www.blackwellpulishers.co.uk/rss.
The time-independent covariates available for each patient consist of a binary variable of treatment (e.g., 1 for treatment and 0 for placebo), the
number of tumors at the entry and the size of the largest tumor at the entry.
One of our main focuses is to investigate the effects of treatment on the risk
of superficial tumor as well as on the risk of discharge from monitoring. In
addition, we are also interested in the possible association between the history of nonfatal tumor occurrences and the risk of termination from regular
monitoring of tumors.
In this review article, we also consider methodologies applicable for timevarying covariates, but, unless mentioned otherwise, the covariates below are assumed to be constant over time within the follow up time. After developing different competing methodologies for time-independent covariates, we extend the
methodologies to models with time-dependent covariate. We also explore methodologies that deal with interval censoring and panel count data based on regularly
scheduled follow-up times.
3
Semiparametric Models Based on Intensity Functions
One of the early scholarly reviews of several models and methodologies proposed
for multiple event time data in the context of frequentist inference is available in
Oakes (1992). Most of the early multiple events models, as described and proposed
in Andersen and Gill (1982), often make the unrealistic assumption that the risk
of an event for a given subject is unaffected by all of the earlier events that a given
subject has experienced. For a more recent comprehensive review of frequentist
models for multiple events, see Hougaard (2001). Some of these models cannot
assess heterogeneity between individuals nor can they handle ties and grouping of
event times. We review some fully specified stochastic models which avoid such
limitations.
A fully specified semiparametric model of multiple event times data, the socalled proportional intensity model, is presented in Sinha (1993). In this model,
5
number of recurrent events experienced by subject i, within the time interval (0, t],
denoted by Ni (t) is modeled as a mixed non-homogeneous Poisson process (Ross,
1983) with the conditional intensity function,
lim
δ↓0
Pr[Ni (t, t + δ) = 1|wi , xi ]
= bi (t|xi , wi ) = b0 (t) exp(βxi )wi ,
δ
(1)
given the unobserved subject-specific random ‘frailty’ wi and the observed timeindependent covariate xi , where Ni (t, t + δ) = Ni (t + δ) − Ni (t). The frailty random
variables wi ’s are assumed to be independent and identically distributed (i.i.d.) as
Gamma variates in Sinha (1993). However, any random variable with unit mean
and support on [0, ∞) can be used to model the frailty wi . Alternatively, we can
characterize the frailty distribution using the moment generating function (mgf)
of the wi ,
Ψ(t) = E[exp(tW )],
assuming that the mgf exists. To assure identifiability, we assume that E[W ] =
k
Ψ(1) (0) = 1, where Ψ(k) (t) = dtd k Ψ(t) denotes the k-th derivative of the mgf
Ψ(t). One advantage of characterizing the frailty distribution by its mgf is that
it helps to express the conditional risk of a new event given history of events
N (t) = {N (s) : s < t} at any time t as,
Ψ(N (t)+1) (−B(t))
1
,
lim Pr[N (t, t + δ) = 1|N (t)] = h(t|N (t)) =
δ↓0 δ
Ψ(N (t)) (−B(t))
(2)
where N (t, t+δ) is the number of recurrent events within the time interval (t, t+δ],
Rt
N (t) = N (0, t) and B(t) = 0 b(u)du is the cumulative intensity of the nonhomogeneous Poisson process. In (2), we have suppressed the role of covariate x in the
expression to simplify the notations. It is to be noted that in absence of covariate
Rt
x we have B(t) = 0 b0 (u)du, otherwise we have b(u) = b0 (u) exp(βx). This result
expresses the Markovian structure of the recurrent events process in the sense that
the risk of future event at a given time depends on the past history N (t) only
through N (t), current count of the past events.
We denote the unknown variance of W by κ, quantifying the dependencies
among inter-event times within each subject and the heterogeneity of the intensity
processes in different subjects. Under the Gamma frailty model of Sinha (1993),
Ψ(t) = (1 − tκ)−1/κ and it easily follows that,
h(t|N (t), x) = {1 + κ−1 N (t)}h1 (t|x),
6
(3)
where h1 (t|x) = {1 + B(t|x)}−1 is the hazard for the first event. One can clearly
see how the variance, κ of frailty establishes the relationship between the risk of
a future event and the number of past events. Motivated by the result in (3),
Oakes (1992) proposed a marginal model for multiple events data with conditional
hazard,
h(t|N (t), x) = {1 + κ−1 N (t)}h0 (t) exp(βx) .
It is clear that in the presence of any covariate x, this model is not equivalent
to the mixed Poisson process model of Sinha (1993) because h1 (t|x) is not in a
proportional hazards structure when the conditional intensity is given by (1).
The model in (1) is useful when the focus is on the influence of the covariates
(or treatments) and on the heterogeneity among subjects related to the rate of
occurrence of events in each subject. The model also assumes that the heterogeneity among subjects remains constant over time, and hence it works best when
each subject experiences several events, but a moderate number of subjects are
observed in a relatively short span of time.
The M -site model of Gail et al. (1980) assumes that each subject has fixed
number, M , of potential i.i.d. events at risk. Assuming that the underlying hazard
for each potential event at risk is h0 (t) exp(βx), the conditional hazard for a new
event at time t, given the history N (t), is given by,
lim
δ↓0
P [N (t, t + δ) = 1|N (t)]
= h(t|N (t), x) = {M − N (t)}h0 (t) exp(βx).
δ
(4)
This model may not be applicable for most of the clinical studies involving multiple
events as we can see from the above expression, that the marginal risk of the future
event to any subject decreases with the number of previous events N (t). However,
it may be an ideal feature for some other situations, for example, when we are
dealing with the number of crashes to a software with fixed but unknown M
number of undetected bugs at the onset of testing.
An extension of M-site model is when M is assumed to vary among subjects
with probability distribution P [M = m|φ] = g(m|φ) for m = 0, 1, 2, · · · , where
φ is the parameter of this discrete parametric distribution. If we look at only
the time to first event T1 under this model, we can show that S1 (t) = P [T1 >
t] = ΨM [log S(t)] where ΨM (t) is the mgf of M and S(t) is the common survival
function corresponding to each one of M i.i.d. latent events at risk. Note that when
g(0|φ) > 0 there is a probability that a subject may not be at risk of any recurrent
event at any time point. When M follows a Poisson distribution, this S1 (t) is same
7
as the cure-rate model of Chen et al. (2001). To avoid this positive probability of
a subject being not at risk of any event, we consider only the distributions with
gM (0) = 0 (for example, a Poisson distribution truncated at 0).
For this extended M-site model, the conditional hazard for a new event at time
t given the history N (t), can be obtained as,
h(t|N (t), x) = h(t|x)
E[M × · · · × (M − N ){S(t)}M ]1[M >N ]
,
E[M × · · · × (M − N + 1){S(t)}M ]1[M ≥N ]
(5)
where h(t|x) = h0 (t) exp(βx) and 1A is the indicator function. We can further
simplify this as,
h(t|N (t), x) = h(t|x) E[M − N |M ≥ N, N (t) = N ] ,
(6)
where the expectation in (6) is taken with respect to the conditional probability
distribution P [M − N = m|M ≥ N, N (t) = N ] = g ∗ (m|N ) given by
g(N + k){S(t)}N +k N k+k
∗
g (k|N ) = P∞
(7)
′ .
′ ){S(t)}N +k′ N +k
g(N
+
k
′
′
k =0
k
The expression in (6) shows that the relationship between risk of a future event
and events history is complicated for this extended M-site model and it depends
to a great extend on the distribution gM (·) of M .
Markov and semi-Markov models of Lagakos, Sommer and Zelen (1978) often
have many parameters and can be hard to interpret. Another class of models ‘modulated renewal processes’ (Cox, 1972b) - has not been widely used, perhaps
because of the difficulty in estimating the asymptotic covariance matrix of the
parameter estimates (Oakes and Cui, 1994).
Besides the assumption of (1), the methodology described in Sinha (1993) further assumes that the hazard of the termination time Ti at time t
lim
δ↓0
P [Ti ∈ (t, t + δ)|Ni (t), xi , Ti > t]
= hT (t|Ni (t), xi )
δ
(8)
is independent of the history of point process Ni (t) = {Ni (s) : s < t}. This
underlying assumption, not often precisely mentioned, can be expressed as
hT (t|Ni (t), xi , Ti > t) = hT (t|Ti > t, xi ) .
(9)
For various practical situations, when the termination Ti is induced by a terminal
or fatal event such as death, this assumption in (9) may not be valid at all. Because,
8
in such a case, the pattern of recurrent events in Ni (·) may indicate the progression
of the morbidity of the subject. This is the situation when there is an informative
termination (e.g., Wang et al. , 2001).
A generalization of (1) in Sinha and Maiti (2004) assumes that the hazard of Ti
at time t depends on the the previous non-fatal events via the frailty wi . But, given
the random effect wi , the hazard of termination at time t does not depend on the
history (Ni (t)) of recurrent events. For the case with only a discrete distribution
for the termination time, the discretized conditional hazard is given by,
exp(β2 xi )wiα
P [Ti = aj |xi , wi , Ni (aj ), Ti > aj−1 ] = 1 − φj
,
(10)
where φj = P [T = aj |x = 0, w = 1, N (aj ), T > aj−1 ] is the discretized baseline
conditional probability of not termination at time point aj for a patient at risk
of termination at that time point. This is assuming that the termination can
occur only at one of the scheduled inspection times a1 < · · · < ag . Also, β2
is the regression parameter of the covariate effect on the conditional hazard of
termination. We are using a discretized version of the proportional hazards (Cox,
1972a) structure in (10). Also, α ∈ R quantifies the nature of dependence between
Ni (t) and the hazard of Ti .
For α = 0, we have P (Ti = aj |xi , Ti > aj−1 , Ni (t)) free of Ni (t) and we are
in the situation similar to (9) when the risk of termination is independent of the
events history. For α > 0, we have positive association between the Ni and Ti and
hence, given the covariate, subjects with higher than usual rates of non-fatal events
are more likely to be terminated earlier than others. Similarly, α < 0 indicates a
negative relationship between termination time and rate of non-fatal events. The
special case α = 0 further indicates that the heterogeneity among subjects in
terms of rates of non-fatal events is same as the heterogeneity in terms of their
termination times. Hence, α > 1 implies more variability in their termination
times compared to the variability in their rates of non-fatal events. Similarly,
0 < α < 1 implies more variability in the later. For the special case of α = 1 and
R aj
φj = exp[− aj−1
b0 (u)du], we get the model of Lancaster and Intrator (1998). Their
model has the restrictive assumptions that the hazard function of termination is
proportional to the conditional baseline intensity B0 (t) and more importantly, the
variability of the subjects in terms of their proneness to non-fatal events is same
as their variability in termination times. In Sinha and Maiti (2004), independent
Ga(η, η) density is used for the frailty wi .
9
4
Frequentist Methods for Analyzing Multiple
Event Data
Statistical methodologies for multiple events data can be classified into two groups:
conditional and marginal approaches. Conditional models and approaches are
used mainly in semiparametric Bayesian and likelihood based methods of analysis. Marginal methods are more popular with generalized estimating equations
(GEE) methodologies. This section discusses the present status of the frequentist
methods to analyze the examples in Section 2 using the above mentioned models. We focus on the main difficulties faced by the frequentist methods and later
describe how these difficulties are overcome by Bayesian analyses.
First, we discuss existing frequentist methods for analysis of multiple events
data under noninformative or independent terminating event. Oakes (1992) gives
a comprehensive review of the several frequentist methods for analyzing multiple event time data and discusses limitations of the methodologies of Andersen
and Gill (1982), Lagakos et al. (1978) and the modulated renewal process (Cox,
1972b). His proposed marginal model for multiple event time data applies only to
non-grouped data. Thall’s (1988) Poisson regression model for analyzing grouped
multiple event-time data does not use the hierarchical structure for the random
frailty and he estimates the frailty for each subject instead of a single hyperparameter quantifying the overall variability of the random frailties. Also, Thall and
Vail’s (1990) quasi-likelihood method to analyze grouped data cannot be used to
estimate efficiently the parameter of the frailty distribution and the standard error of the treatment effect. One common problem present in all these methods
for multiple event time data is the lack of a suitable method to verify the modeling assumptions. Most recent review of the conventional frequentist methods
for analysis of multiple data can be found in Hougaard (2001). These methods
may allow a time-dependent covariate, but usually assume the covariate effect is
constant over time. However, there is no specific frequentist methodology to deal
with interval-censored data when the scheduled inspection times are not same for
each subject. The frequentist methodologies to verify the modeling assumptions,
such as the assumption of proportional hazards, constancy of the covariate effect
over time, etc., are also scarce for multiple events data.
For many currently popular statistical procedures for multiple events data, the
final estimates are often highly dependent on starting values, and in practice, it
10
is often difficult to agree on a starting value. In the presence of many nuisance
parameters, the asymptotic properties of the maximum likelihood estimate (MLE)
and other frequentist estimate of the parameter of interest, say, the regression
parameter β, await more extensive results. The small sample properties of these
frequentist estimates have yet to be determined.
When the risk of termination depends on the history of recurrent events, existing methods in literature are not so common. When the terminating event is
of no interest and it is treated only as a sort of informative censoring of recurrent event times, there are some existing frequentist methods to deal with such
cases, most notably of them is Wang et al. (2001). However, these approaches are
not based of fully specified models. Depending on their goals of analysis, these
procedures can be categorized in three groups—(1) based on the inference about
the rate function, which is P [N (t, t + δ) = 1|x]; (2) based on the mean function
E[N (t)|x]; (3) based on the intensity function P [N (t, t + δ) = 1|x, N (t)]. However,
these methods do not precisely mention the model for hT (t|N (t), x). One can not
automatically estimate this hazard function using these approaches. It is difficult
to know the loss in efficiency for not specifying the model for termination and it
is also not possible to check the sensitivity of the assumptions about termination,
though the expectation is that the approach used is somewhat ‘robust’ as it is free
of any precise modeling assumptions on termination.
When we have multiple events with dependent important terminating events
(e.g., Sinha and Maiti, 2004) and we are interested about both types of events,
there is even bigger need to use a fully specified stochastic model for hT (t|N (t), x).
One possible strategy is to focus on either the mean function or the rate function of 1[T >t] N (t) (called the termination modulated recurrent events process, e.g.,
Ghosh and Lin, 2004). However, this can be misleading if we are interested about
understanding the effects of covariates on recurrent events as well as on the termination events. We instead will focus on fully specified stochastic models which
can be used for complete understanding of the covariate-effects, prediction, model
selection and for understanding the process generating the data.
5
Prior Processes in Semiparametric Model
The most popular way to model a prior process on the baseline cumulative intensity
function (as well as for the baseline cumulative hazard function) is to use the Levy
11
process introduced by Kalbfleisch (1978), Ferguson and Phadia (1979)and Burridge
(1981). They advocated the use of a process with positive independent increments
in disjoint intervals (Levy process) to model the prior process of the cumulative
Rt
hazard and intensity function B0 (t) = 0 b0 (u)du. Following Sinha (1993) we
can assume that B0 (t) is the realization of a positive non-decreasing independent
increment process (Levy process). That is, the increments of B0 (t) in disjoint
intervals are positive and mutually independent. Similarly, the cumulative hazard
Rt
h (u)du = H0 (t) can be modeled by a Levy process when the cdf is absolutely
0 0
continuous.
5.1
Gamma Processes
The most commonly used Levy process for modeling B0 (t) is the Gamma process
(Sinha, 1993), denoted by B0 (t) ∼ GP (|Lambda∗ , c), when B0 (t2 ) − B0 (t1 ) has a
∗
∗ (t )
1
for t2 > t1
gamma distribution with mean Λ∗ (t2 )−Λ∗ (t1 ) and variance Λ (t2 )−Λ
c
∗
and c > 0. Here the non-decreasing positive function, Λ , is the mean function
of the random prior process and the hyper-parameter c can be interpreted as a
measure of confidence attached to the initial guess Λ∗ (·) of B0 (·), the expected
number of events by time t at baseline covariate value and in absence of any frailty
effect. It is assumed that Λ∗ is completely specified except possibly for a few
unknown parameters. For example, one can consider a Gamma process model
with Λ∗ (t) = λt for some hyperparameter λ, possibly unknown.
For ordinary univariate survival data, Burridge (1981) explains how a Gamma
process model may give wrong inferences if applied to continuous survival data.
Similar caution should be used for application of the Gamma process while dealing
with continuously observed multiple events data with no ties. Two unsatisfactory
features of a Gamma process are: (1) independent increments in adjacent intervals,
and (2) realization of the sample path is discrete with probability one. However,
this process has found a vast number of applications in literature for modeling
different types of multiple events data.
5.2
Correlated Prior Processes
This model was introduced by Leonard (1978) for modeling density functions with
prior information on smoothness. A discretized approximation of this process
assumes a piecewise constant function b0 (t). Over a set of prespecified grouping
12
intervals Ij = (aj−1 , aj ] for j = 1, 2, · · · , g with 0 = t0 < a1 < a2 < · · · < ag−1 < ag ,
the baseline intensity function is given as b0 (t) = λj for t ∈ Ij . We introduce some
smoothness for the values of the λj ’s in adjacent intervals using a first order random
walk process for αj = logλj with
αj+1 = αj + ej+1 for j = 1, · · · , g − 1 .
(11)
The ej ’s are i.i.d. normal random variables with 0 mean and variance σ 2 . Higher
values of σ imply less prior information on the discretized linear spline smoothing
of the λj ’s. The prior information can even be incomplete, because without the
specification of the marginal prior of α1 , the joint prior distribution of the α’s
remains undefined. Original model of Gamerman (1991), introduced in the context of modeling hazard of an univariate survival time, used the so-called linear
Bayesian method which requires only a partial specification (mean 0 and finite
variance) of the distribution of ej+1 in (11). It is also important to note that, Gray
(1994) again in the context of hazard function, used the prior process structure
similar to (11) on λj ’s itself and used a ‘flat’ prior for λ1 . See also Leonard (1978)
for a more formal justification for using this kind of incomplete prior information
and prior process in theoretical nonparametric Bayesian inference.
Various other types of correlated prior processes involving linear and higher
order discretized spline smoothing of the αj ’s can be used to summarize different
types of prior information about the underlying baseline function. One interesting
class of prior process is where a local linear trend is modeled as
αj+1 = αj + δj + ej
and δj+1 = δj + e′j
(12)
for j = 1, · · · , g, where ej ∼ N (0, σ12 ), e′j ∼ N (0, σ22 ), and ej ’s and e′j ’s are mutually
independent. The model in (12) will reduce to a smoothed linear trend model
by taking σ2 = 0. This process allows the intensity to be somewhat linear and
smooth without forcing the form to be completely parametric. The completely
parametric form with σ1 = σ2 = 0 will be insensitive to any surprising trend
of the baseline intensity function suggested by the data. Despite some of their
nice physical properties, correlated prior processes have so far enjoyed only limited
applications for modeling multiple events data.
The correlated prior process is also related to the so-called martingale process
introduced by Arjas and Gasbarra (1994), where b0 (t) is modeled as a piecewise
constant function with jumps. The jump times are realizations of a time homogeneous Poisson process, and the sequence of b0 (t) values in different jump intervals
13
form a discrete martingale. This martingale process is very similar to the model
of (11) if we relax the condition of the jump times being random, and instead assume that the hazard rates change in each grouping interval. The condition of the
hazard rates being martingale can be relaxed also to get a sub-martingale process
similar to that in (12). A slight modification of the prior process in (11), with
αj+1 − α0j = γ(αj − α0j ) + ej+1 , where α0j is the prior mean of αj and γ is another hyperparameter of deviation, is a discretized integrated Ornstein-Uhlenbeck
(DIOU) process discussed by Cumberland and Sykes (1982). However, to the best
of our knowledge in the current literature, these martingale processes have not
been used in multiple events data.
6
Bayesian solution
As an illustration of the implementation and applicability of the Bayesian solution
for the analysis of multiple events data, we concentrate on the analysis of the
bladder-cancer data (Wei and Sun, 2001). Bladder cancer data from the VACURG
is a good example of multiple events data with dependent termination process.
After their entry at the trial, the patients in this study were monitored every month
during their clinic visit for number of new (since the previous clinic visit) superficial
bladder tumors till their termination time due to significant improvement in health
status. The time-independent covariates used for each patient are binary variable
of treatment (1 for treatment and 0 for placebo), number of tumors at the entry
and the size of the largest tumor at the entry.
To model the recurrent events process N (t), we use the mixed Poisson process
model of Sinha (1993) with conditional intensity function of (1). We use the
Gamma frailty for wi . For the discrete termination time T , we use the model
(10). The baseline intensity function b0 (t) is assumed to have the Gamma process
prior. Because the observed recurrent events history is based only on observations
recorded at regular time intervals (based on regularly scheduled clinic visits), the
likelihood based on the sampling distributions of the observed history of recurrent
bladder-tumors given the frailty variables w = (w1 , · · · , wn ) and termination times
is given by
L1 (β1 , µ, w|D) ∝
n Y
Y
[exi β1 wi µj ]Nij exp[−exi β1 wi µj ] ,
i=1 j∈Mi
14
(13)
where Mi is the set of inspection intervals when the subject i is under observation
and Nij = Ni (aj ) − Ni (aj−1 ) is the number of new bladder-tumors to subject i
observed during the interval between aj−1 and aj . It is obvious that the inspection
intervals in Mi are all below the observed termination time Ti . For this case, the
baseline cumulative intensity B0 (t) enters the likelihood L1 via the increments in
these pre-specified time points, say, 0 = a0 < a1 < · · · < ag . One consequence
of this fact is that the property of the Gamma process to have discrete sample
path does not turn out to be any worrying factor in this application. When B0 (t)
is a priori GP (Λ∗ , c), the priors of µj = B0 (aj ) − B0 (aj−1 ) for j = 1, · · · , g are
independent Gamma densities with mean γj = Λ∗ (aj ) − Λ∗ (aj−1 ) and variance
γj /c. Instead of the Gamma priors, one can also use alternative prior processes
such as the correlated prior processes defined via the joint prior distribution of
(µ1 , · · · , µg ).
Similarly, the likelihood L2 (β2 , π, w|D) based on the sampling distributions of
the Ti is given as
L2 (β2 , φ, w|D) ∝
n
Y
wα exp(β2 xi )
exp(β x )wα
(1 − φ(i)i
)S(i) 2 i i ,
(14)
i=1
where ti = a(i) is the discrete termination time of the subject i. The joint posterior
for this model is given by
p(β, w, η, µ, φ, α|D) ∝ L1 (β1 , µ, w|D) × L2 (β2 , φ, w|D) ×
Y
f (w|η) × π(β, φ, η, α) × [π(µj |γj , c)],
(15)
j
Q
where f (w|η) = i [η η wiη−1 exp(−wi η)/Γ(η)], the product of Gamma densities of
the w1 , · · · , wn given η = κ−1 , and π(β, φ, η, α) is obtained via the joint prior
distribution for φ = (φ1 , · · · , φg ) and specification of the conditional priors of
other parameters given φ. Note that this model here is identical to the one used
by Sinha and Maiti (2004). We further refer this article for details about how to
specify the priors and about the method of eliciting such priors. The details about
numerical integration of (15) to obtain the marginal posterior using MCMC tools
are also given in Sinha and Maiti (2004). At each iterative stage of the MCMC,
we only need to sample from the conditional posteriors. For further convenience
we use N (β10 , σ1 ) for π(β1 ), N (β20 , σ2 ) for π(β2 ), and independent Beta(φj |Aj , Bj )
densities for π(φj ). The π(α) can be assumed as a N (α0 , σ3 ) and we need to take
α0 = 0 as we are not sure whether the association between the two classes of events
15
should be either positive or negative. We have also assumed that these parameters
are apriori independent of each other. To use the MCMC method, one prefers the
conditional posteriors that are quite easy to sample from. These forms of the prior
are flexible, yet they facilitate easy to sample conditional posteriors.
The expressions and the derivations of the conditional posteriors of the parameters used in the MCMC are given in Sinha and Maiti (2004). Except µj all other
full conditional distributions are not in standard form. The full conditional distributions for φ and α are log-concave and an adaptive rejection sampling method
by Gilks and Wild (1992) was used to sample from these log-concave densities.
Metropolis-Hastings algorithm was used to sample wi ’s.
7
Analysis of the data-example
Sinha and Maiti (2004) illustrates the kind of approximate Bayesian method with
ten parallel Gibbs chain and 20,000 iterations in each chain. The previous analyses of this data ignored the possible association between the history of nonfatal
tumor occurrences and the risk of termination. We would like to emphasize that
to deal with missing observations in some of the months, Sinha and Maiti (2004)
had to make some appropriate changes in our likelihood based on the assumption
of noninformative missingness. Priors with zero mean and high variance (1000) for
the regression parameters β1 and β2 have been used to reflect prior opinions of no
covariate effects on the nonfatal events intensity as well as on the risk of termination. For the gamma priors of λ, ν and η, selecting small scales and shapes make
the prior distributions very diffuse. One needs to evaluate the sensitivity of the
posterior analysis over these choices by varying the values of the hyperparameters
over a range of possible values.
In this analysis, another crucial elicitation step is to choose the hyperparameters
associated with the gamma process, we set, τ = 1/c and γ1 , · · · , γg , where γj =
γ0 (aj ) − γ0 (aj−1 ). One possible rational behind eliciting these hyperparameters is
given in the original paper.
Table 1, reproduced from Sinha and Maiti (2004) presents the results of the
data analysis. There is not much evidence for the treatment effect on time of
termination, because the credible interval for β2 contains zero. But, there is good
evidence for a treatment effect on the rate of nonfatal events. The posterior mean of
α lies below zero indicating some evidence of negative association between nonfatal
16
events intensity and the termination time. Negative value of α implies that patients
who are more prone to nonfatal events are likely to be terminated later. But, the
evidence for negative α is not strong at the 95% posterior interval for α does
contain zero. There is also strong evidence that the baseline hazard rate for the
termination is increasing over time. Please note that one of the major advantages
of a Bayesian analysis is that one can compute the prediction distributions of the
future nonfatal events and termination time given the values of the covariate.
Table 1. Summary of Posterior Analysis
Parameter Mean S.E. 95% Credible interval
β1
β2
α
1.491
-.0227
-.0137
.577
.249
.0295
(.508, 2.77)
(-.520, .468)
(-.081, .037)
Model validation diagnostics are very important in any data analysis. Many
model diagnostics have been developed in the literature. For detailed reviews of
Bayesian model validation methods in multiple events data analysis, we may refer
to Sinha and Dey (1997), Ibrahim, Chen and Sinha (2001) and Sinha and Maiti
(2004). Most popular is a method for Bayesian model validation via conditional
predictive ordinate (CPO) (see, Gelfand, Dey and Chang, 1996). For the subject i,
the cross-validated posterior predictive probability evaluated at the observed data
point is given by
CP O( i) = P (Ni = Ni,obs , Ti = ti |D(−i) ) = {E[P (Ni = Ni,obs , Ti = ti |θ)−1 |D]}−1 ,
(16)
where D(−i) is the observed data with the patient i removed, θ = (µ, β1 , β2 , η, α, λ, ν)
and Ni = (Ni1 , · · · , NiTi ). The second expression of the CPO in (16) follows from
a result of Gelfand, Dey and Chang (1996). This expression actually enables us
to use the samples from the posterior p(θ|D) to compute the CPOs. Sinha and
Maiti (2004) again illustrates the algorithm we used for calculating CPO’s for the
Bladder-cancer data example. CPOs are treated just as the same way residuals are
treated in frequentist survival analysis and CPOs are plotted against either their
covariate values or to the termination times to check whether the CPO values have
any relationship with the covariates and the termination times.
17
8
Discussions and Future Research
We presented several Bayesian approaches to model the non-parametric part of a
survival model based on intensity functions. There is no clear choice for the best
model among these different models. The selection of the model depends mostly
on the goal of the study, type of the data set and the amount of the computations
required. Some of the merits and demerits of different processes are discussed
below.
Two unsatisfactory features of a Gamma process are (i) independent increments, and (ii) the realizations are discrete with probability one, so that the resulting marginal distributions imply a positive probability of tied failure times. As
pointed out by Burridge (1981), this disadvantage mentioned in (ii) above and some
other deficiencies of the Levy process model disappear when applied to grouped
data.
The specification of any one of the Levy processes and random mixture process
requires the knowledge of the mean function of the prior process and the prior
uncertainty around the mean. The correlated prior process does not necessarily
require the direct knowledge of the mean of the prior process nor any confidence
around the mean. But, the implementations of the Bayesian methodologies using
the correlated prior processes are often computationally more intensive due to
the apparent loss of the conjugacy of the conditional posteriors of the process
parameters in most of the practical situations that we usually come across. But, our
personal experience suggests that in general the methodologies based on correlated
process are less computationally intensive compared to the methodologies based
on random mixture processes. Random mixture processes have the advantages
that the posterior estimate of the hazard (or intensity) function is continuous
and one can compute the posterior estimates of the hazard curve and also any
transformation thereof.
Another type of semiparametric approach of Bayesian methods of survival data
are being explored in the literature of competing risks models. See Cox and Oakes
(1984) for the background of a typical competing risks problem. Recent work
of McIntosh (1996) and Ghosh and Gelfand (1998b) consider few problems of this
type, that is when the failure of the subject is due to one of many competing causes
(also see Gelfand et al. , 2000). The subject may be observed to fail from one of
these causes (failure types), otherwise the subject will be observed to be censored
18
at some point of time. The interest is in evaluating the effect of the treatment
which is intended to act in only one of the possible failure types. Such failure
types are often dependent (Moeschberger and Klein, 1995), and in fact, the order
of the types might determine the appropriate model. In such contexts, Ghosh
and Gelfand (1998a) propose a number of complex shock models incorporating
covariates and time-dependent intensities.
Semiparametric Bayesian analysis of multiple events is an expanding field of
research. Lot of the works mentioned in this review are still in the process of
development and getting more refined. Some of the subtopics, where there are
enormous scopes of future research, can be very easily identified. We are yet to
see a comprehensive Bayesian analysis with model diagnostics for some of the
important multiple event survival models, such as stable frailty models and copula
models. Another interesting future research problem is the Bayesian analysis of
multivariate survival data when there is multiple level of clustering (Liang et al. ,
1995; Peng and Oakes, 1996).
The modeling techniques, computational tools, model diagnostics and model
selection methods discussed can also be extended to more complicated problems
in survival analysis, and there lies a wide cope of the future researches on adaptive
and widely applicable methodologies.
References
Andersen, P. K. and Gill, R. D. (1982). Cox’s regression models for counting
processes: a large sample study, Annals of Statistics, 10, 1100-1120.
Arjas, E. and Gasbarra, D. (1994). Nonparametric Bayesian inference from right
censored survival data, using the Gibbs sampler, Statistica Sinica, 4, 505-524.
Burridge, J. (1981). Empirical Bayes analysis of survival time data, Journal of
the Royal Statistical Society, B, 43, 65-75.
Chen, M. H., Ibrahim, J. G. and Sinha, D. (2002). Bayesian inference for multivariate survival data with a surviving fraction, Journal of Multivariate Analysis, 80, 101-126
Cox, D. R. (1972a). Regression models and life tables, Journal of the Royal
Statistical Society, B, 34, 187-220.
19
Cox, D. R. (1972b). The statistical analysis of dependencies in point processes,
In: Stochastic Point Processes (ed P.A.W. Lewis), 55-66, Wiley, New York.
Cox, D.R. and Oakes. D.O. (1984). Analysis of survival data, Chapman & Hall,
London.
Cumberland, W. G., and Sykes, Z. M. (1982). Weak convergence of an autoregressive process used in modeling population growth, Journal of Applied
Probability, 19, 450-455.
Ferguson, T. S. and Phadia, E. G. (1979). Bayesian nonparametric estimation
based on censored data, Annals of Statistics, 7, 163-186.
Gail, M. H., Santner, T.J. and Brown, C. C. (1980). An analysis of comparative
carcinogenesis experiments with multiple times to tumor, Biometrics, 36,
255-266.
Gamerman, D. (1991). Dynamic Bayesian models for survival data, Applied
Statistics, 40, 63-79.
Gelfand, A. E., Dey, D. K. and Chang, H. (1992). Model determination using
predictive distributions with implementation via sampling-based methods,
Bayesian Statistics, 4, Ed. J. M. Bernardo, J. O. Berger, A. P. Dawid and
A. F. M. Smith, 147-167. Oxford University Press.
Gelfand, A. E., Ghosh, S. K., Christiansen C., Soumerai, S. B. and McLaughlin, T. J. (2000). Proportional Hazards Models: A Latent Competing Risk
Approach, Journal of Royal Statistical Society, series C (Applied Statistics),
49, 385-397.
Ghosh, D. and Lin. D.Y. (2004). Semiparametric Analysis of Recurrent Events
Data in the Presence of Dependent Censoring, Biometrics, 59, 877-885.
Ghosh, S. K. and Gelfand, A. E. (1998a). A Latent Risk Approach for Modeling Individual Level Data Consisting of Multiple Event Times, Journal of
Statistical Research, 32, 23-39.
Ghosh, S. K. and Gelfand, A. E. (1998b). Latent Waiting Time Models For
Bivariate Event Times With Censoring, Sankhya series B, 60, 31-47.
20
Gilks, W.R. and Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling,
Applied Statistics, 41, 337-348.
Gray, R. J. (1994). A Kernel Method for Incorporating Information on Disease
Progression in the Analysis of Survival, Biometrika, 81, 527-539.
Hougaard, P. (2000). Analysis of Multivariate Survival Data, Springer-Verlag,
New York.
Ibrahim, J.G., Chen, M-H. and Sinha, D. (2001), Bayesian Survival Analysis,
Springer-Verlag.
Kalbfleisch, J. D. (1978). Nonparametric Bayesian analysis of survival time data,
Journal of Royal Statistical Society, B, 40, 214-221.
Lagakos, S. W., Sommer, C. J. and Zelen, M. (1978). Semi-markov models for
partially censored data, Biometrika, 65, 311-317.
Lancaster, T. and Intrator, O. (1998). Panel data with survival: hospitalization
of HIV-positive patients, Journal of the American Statistical Association, 93,
46-51.
Leonard, T. (1978). Density Estimation, Stochastic Processes and Prior Information, Journal of the Royal Statistical Society, Series B, 40, 113-146.
Moeschberger, M. L. and Klein, J. P. (1995). Statistical methods for dependent
competing risk, Lifetime Data Analysis, 1, 195-204.
McIntosh, M. W. (1996). The population risk as an explanatory variable in
research synthesis of clinical trials, Statistics in Medicine, 15, 1713-1728.
Oakes, D. (1992). Frailty models for multiple event times, J. P. Klein and
P. K. Goel (eds.), Survival Analysis: State of the Art, 371-379, Kluwer Academic Publisher: Netherlands.
Oakes, D. and Cui, L. (1994). On Semiparametric Inference for Modulated Renewal Processes, Biometrika, 81, 83-90.
Peng, F. P. and Oakes, D. (1996). Bayesian analysis of multiple event time data
under a nested frailty model, Technical Report, Department of Biostatistics,
University of Rochester.
21
Ross, S.M. (1983). Stochastic Process, Wiley, New York.
Sinha, D. (1993). Semiparametric Bayesian analysis of multiple event time data,
Journal of the American Statistical Association, 88, 979-983.
Sinha, D. and Dey, D. K. (1997). Semiparametric Bayesian analysis of survival
data. Journal of the American Statistical Association, 92, 1195-1212.
Sinha, D. and Maiti, T. (2004). Analyzing panel-count data with dependent
termination: a Bayesian approach, Biometrics, vol. 60, 34-40.
Sun, J., and L. J. Wei (2000). Regression analysis of panel count data with
covariate-dependent observation and censoring times, Journal of the Royal
Statistical Society, B, 62, 293-302.
Thall, P. F. (1988). Mixed Poisson likelihood regression models for longitudinal
count data with overdispersion, Biometrics, 44, 197-209.
Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinal count
data with overdispersion, Biometrics, 46, 657-671.
Wang, M-C., Qin, J. and Chang, C-T. (2001). Analyzing recurrent event data
with informative censoring, Journal of the American Statistical Association,
96, 1057-1065.
Wei, L.J., and Glidden, D.V. (1997). An overview of statistical methods for
multiple failure time data in clinical trials (with discussion), Statistics in
Medicine, 16, 833-839.
22
View publication stats