Latent Stick-Breaking Processes

David Dunson

Latent Stick-Breaking Processes

2010, Journal of the American Statistical Association

Supplementary materials for this article are available online. Please click the JASA link at http://pubs.amstat.org. Latent Stick-Breaking Processes Abel R ODRÍGUEZ, David B. D UNSON, and Alan E. G ELFAND We develop a model for stochastic processes with random marginal distributions. Our model relies on a stick-breaking construction for the marginal distribution of the process, and introduces dependence across locations by using a latent Gaussian copula model as the mechanism for selecting the atoms. The resulting latent stick-breaking process (LaSBP) induces a random partition of the index space, with points closer in space having a higher probability of being in the same cluster. We develop an efficient and straightforward Markov chain Monte Carlo (MCMC) algorithm for computation and discuss applications in financial econometrics and ecology. This article has supplementary material online. KEY WORDS: Nonparametric Bayes; Option pricing; Point-referenced counts; Random probability measure; Random stochastic processes. 1. INTRODUCTION Recent literature in Bayesian nonparametrics repeatedly focused on models for collections of distributions based on extensions of the Dirichlet process and other stick-breaking priors (Ferguson 1973; Sethuraman 1994; Ishwaran and James 2001; Ongaro and Cattaneo 2004). These extensions are generated by introducing dependence in the weights and/or atoms of the stick-breaking construction (MacEachern 2000; DeIorio et al. 2004; Gelfand, Kottas, and MacEachern 2005; Teh et al. 2006; Griffin and Steel 2006b; Duan, Guindani, and Gelfand 2007; Dunson and Park 2008; Rodriguez, Dunson, and Gelfand 2008), or by considering mixtures of independent samples from one or more Dirichlet processes (Müller, Quintana, and Rosner 2004; Griffin and Steel 2006a; Dunson, Pillai, and Park 2007). In contrast, this article focuses on models for prior distributions on stochastic processes on an index space D ⊂ Rd with rich but constant marginal distributions. We no longer focus on building different distributions for each possible value of the index space, but instead construct a stochastic process where observations at different locations are dependent, but have a common (albeit unknown) marginal distribution. We call the resulting construction a latent stick-breaking process (LaSBP). As the name indicates, we rely on a stick-breaking construction to represent the unknown marginal distribution, while introducing an underlying (latent) Gaussian process on D to drive the selection of the atoms at each location s ∈ D, as suggested in MacEachern (2007). As a motivation, consider the problem of estimating species abundance from point counts at pre-specified locations. It is clear that abundance changes spatially with the specific characteristics of micro-habitats, and modeling these changes is key to understanding population intensities and dynamics. However, biologists are interested in not only the spatial distribution of a species, but also its local and global abundance. When studying abundance of European Starlings (a nonnative bird species) Abel Rodriguez is Assistant Professor, Department of Applied Mathematics and Statistics, University of California, Mailstop SOE2, Santa Cruz, CA 95064 (E-mail: abel@soe.ucsc.edu). David B. Dunson is Professor (E-mail: dunson@stat.duke.edu) and Alan E. Gelfand is Professor (E-mail: alan@stat.duke.edu), Department of Statistical Sciences, Duke University, Box 90251, Durham, NC 27708. The work of the first author was supported in part by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences and NSF DMS 0915272. The work of the third author was supported in part by NSF DMS 0504953. The authors thank the associate editor and two anonymous referees, whose comments greatly contributed to improve the quality of the manuscript. in North Carolina (see Section 7.2), one may be interested in the marginal distribution of the number of individual birds observed at randomly selected locations. In this context, the underlying Gaussian process introduced previously can be interpreted as a latent, location-specific habitat quality score, which changes spatially making higher rates more likely in certain regions. Similarly, we can consider the price of a financial asset to be a random variable whose volatility changes in time according to an underlying score reflecting market sentiments, with the common marginal distribution providing a summary of the return levels that can be expected from the asset (see Section 7.1). Current nonparametric Bayes models for related problems focus on priors for dependent collections of conditional distributions, and therefore, assume that the marginal distribution of the process changes at each location. Instead, we are interested in models that assume a constant marginal distribution across the index space, but allow samples from it to be dependent. Due to the discrete nature of its marginal distribution, LaSBP models induce a random segmentation of the index space D. This segmentation is spatially coherent, in the sense that, the closer two locations are in space, the larger the probability of them being assigned to the same cluster. Typically, the segmentation has an interesting interpretation in specific applications. For example, the partitions can be identified with micro-habitats in species distribution applications. Nonetheless, smooth interpolations and predictions can be recovered through model averaging. The literature in signal and image segmentation is enormous, some interesting reviews include Pal and Pal (1993), Pham, Xu, and Prince (2000), and Suri, Wilson, and Laxminarayan (2005). Mixture models are the most relevant for our discussion. In contrast to our method, many of them do not incorporate information on spatial location in the prior for allocating observations to mixture components. One interesting exception is the model developed in Figueiredo (2005) and Figueiredo, Cheng, and Murino (2007) for image segmentation, which uses a latent conditionally autoregressive (CAR) structure to define spatially dependent weights, in a similar spirit to Rodriguez, Dunson, and Taylor (2009). Since the selection process in the LaSBP inherits the characteristics of the underlying Gaussian process, our models can 647 © 2010 American Statistical Association Journal of the American Statistical Association June 2010, Vol. 105, No. 490, Theory and Methods DOI: 10.1198/jasa.2010.tm08241 648 easily accommodate a rich variety of correlation structures, including nonstationary processes. This is especially interesting because (1) typical change-point models (like those used in finance) assume a stationary Markovian structure to simplify computation, and (2) nonstationary dependence structures are hard to specify in other nonparametric models like the orderdependent Dirichlet process (Griffin and Steel 2006b). Also, the model does not require replicates at each location, giving it an advantage over Gelfand, Kottas, and MacEachern (2005) for the purpose of our applications. In addition, the use of order constraints in the definition of the LaSBP allows us to generate skewed mixing distributions. For scale mixtures (as in Section 7.1), this feature yields models that favor heavier tails for the marginal distribution of the observations than those obtained by dependent Dirichlet process (DDP) mixtures with the same baseline measure. For location mixtures (as in Section 7.2), the LaSBP favors distributions that have a heavy right or left tail. This is appealing in many applications, but can be avoided through suitable adjustment of the distributions of the stick-breaking weights if desired. The LaSBP can also be formulated as a Gaussian copula process with a common and unknown marginal distribution, which is assigned a stick-breaking prior. The literature on copula modeling is extensive; Nelsen (1999), among others, provided an introduction. Within a quasi-Bayesian setting, Hoff (2007) recently developed a multivariate nonparametric model for mixed discrete and continuous data that uses Gaussian copulas and marginal set likelihoods. Although similar in spirit to the LaSBP, this model avoids the specification of the marginal distributions by using a pseudo-likelihood, and is formulated as a procedure to obtain a finite-dimensional distribution and not a process. Also, Kottas, Müller, and Quintana (2005) formulated a model for ordinal data using a nonparametric specification for the copula. Again, the goals of this model are different from ours, as they focus on contingency tables and build a mixture model for the copula describing the correlation across entries in the table. The LaSBP is related to the hybrid Dirichlet process (hDP) models in Petrone, Guindani, and Gelfand (2009), who also proposed the use of Gaussian copula processes to build dependence across locations. However, our construction is conceptually simpler because it uses scalar atoms rather than atoms that are process realizations. In addition, while we focus on problems where prediction and interpolation are the main goal, the hDP is designed for local surface clustering. Although the hDP is, in principle, a more flexible model (as it allows for more general atoms than the LaSBP), the underlying latent field lacks interpretability and can produce multimodal predictive distributions as an artifact (see Section 3 and Figure 2 in it). This is a direct consequence of the use of process realizations in the definition of the hDP, as an adjacent switch of labels does not imply any closeness for the adjacent atoms. For some applications (e.g., the stochastic volatility model in Section 7.1), the infinite hidden Markov model (iHMM) (Beal, Ghahramani, and Rasmussen 2001; Fox et al. 2008; van Gael et al. 2008), which is closely connected to the hierarchical Dirichlet process (Teh et al. 2006), is a viable alternative to the LaSBP. However, the iHMM assumes by construction that the index space D for the stochastic process is discrete and unidimensional, which prevents its use in other interesting applications Journal of the American Statistical Association, June 2010 such as the one presented in Section 7.2. In addition, the iHMM relies on a Markovian assumption, which only allows a very restrictive local dependence structure in allocation to mixture components, while the LaSBP is much more flexible in using a latent Gaussian process to drive the allocation. This article is organized as follows: Section 2 reviews the literature on stick-breaking processes and sets the stage for the LaSBP model. Sections 3 and 4 define the latent stick-breaking process and discusses some of its properties, with Section 5 describing an efficient Markov chain Monte Carlo (MCMC) algorithm for posterior computation. Section 6 discusses extensions of the model to multivariate problems. Section 7 presents two applications in financial econometrics and ecology. Finally, in Section 8 we present a short discussion and some future directions. 2. STICK–BREAKING PRIORS Let G be a random probability measure on (, B), with B the Borel sets of . Then G has a stick-breaking prior if G(·) = L wl δθl∗ (·), l=1 iid θl∗ ∼ H for some baseline probability measure H on with and wl = vl l−1 k=1 (1 − vk ) where vl ∼ beta(al , bl ) for l < L and vl = 1 for l = L. The number of atoms L can be finite (either known or unknown) or infinite. The class of stick-breaking priors includes some well-known processes. For example, taking L = ∞, al = a and bl = b + l(1 − a) for 0 ≤ a < 1 and b > −(1 − a) yields the two-parameter Poisson–Dirichlet Process (Pitman 1996), with the choice a = 1 resulting in the Dirichlet process with baseline measure H and precision parameter b (Sethuraman 1994). For L < ∞, setting vL = 1 ensures that Ll=1 wl = 1 with probability one. If L = ∞, Ishwaran and James (2001) showed that the weights add up to 1 (and therefore the process yields a well-defined probability distribution) almost surely if and only if ∞ E(log(1 − vk )) = −∞. For any measurable set B ∈ B, l=1 G(B) is a random variable whose moments are given by E(G(B)) = H(B), Var(G(B)) = H(B)(1 − H(B)) L al (1 + al )(al + bl ) × (al + bl )2 (al + bl + 1) l=1 × l−1 k=1 bk (1 + bk )(ak + bk ) . (ak + bk )2 (ak + bk + 1) Therefore, H describes the expected shape of the distributions, while the sequences {al }Ll=1 and {bl }Ll=1 control the variance of the sampled distributions around H. To increase the flexibilityof the model, most applications use convolutions of the form K(·|θ )G(dθ ), where G follows a stick-breaking prior and K is a kernel, to model unknown probability distributions. Under fairly general conditions on the kernel K, mixtures of this type yield models that are dense (in the L1 sense) in the space of absolutely continuous distributions (Lo 1984). Rodríguez, Dunson, and Gelfand: Latent Stick-Breaking Processes 649 One way to extend the stick-breaking priors to collections of distributions on a space D is to replace the random variables involved in the definition of the random distribution with stochastic processes on D. This is the strategy behind the dependent Dirichlet process (DP) (MacEachern 2000; DeIorio et al. 2004; Gelfand, Kottas, and MacEachern 2005; Griffin and Steel 2006b). Inherently, all of these approaches (including those that allow the weights to depend on the location) assume that, conditional on the stick-breaking parameters, the probability of assigning observations to atoms is known. For example, for the spatial Dirichlet process (SDP) (Gelfand, Kottas, and MacEachern 2005), Pr θ (s) = θi∗ (s), θ (s′ ) = θj∗ (s′ )|{θl∗ (s)}Ll=1 , {wl (s)}Ll=1 = 1, 0, i = j, i = j, for all s, s′ ∈ D and all i, j ∈ {1, . . . , L}. Therefore, conditional on the atoms and weights, the selection mechanism is constant for all locations s ∈ S. In the sequel, we explore an alternative approach, which constructs a process where the selection probabilities are not conditionally independent across locations while the atoms and weights are. 3. UNIVARIATE LATENT STICK–BREAKING PROCESS Our goal is to construct a stochastic process θ = {θ (s) : s ∈ D ⊂ Rd } where θ (s) ∈ ⊂ R for all s such that, marginally, θ (s)|G ∼ G for all s ∈ D and some unknown G, but Cov(θ (s), θ (s′ )|G) = 0. To do this, consider the general approach motivated by copula models: Define a stochastic process U = {U(s) : s ∈ D} with uniform marginals, then define θ (s) = G− (U(s)), where G is a random discrete probability distribution and G− is its generalized inverse. Then clearly θ (s) is distributed according to G for any given s and θ (s) and θ (s′ ) are correlated. The question of model development becomes how to choose the process U and a prior for G. In the sequel, we do this through an independent triplet, {{z(s) : s ∈ D}, {vl }Ll=1 , {θl∗ }Ll=1 }. (1) Here, the latent Gaussian process z is such that z(s) ∼ N(0, 1) for all s ∈ D and Cor(z(s), z(s′ )) = γ (s, s′ ), then U(s) = (z(s)) and the sequences {vl }Ll=1 and {θl∗ }Ll=1 are used to define a random distribution G(·) = L wl δθl∗ (·), and {vl }Ll=1 as the marginal prior probability of observing the different rates. Now, for any finite set of locations s1 , . . . , sn ∈ D, and any n let Pr θ (s1 ) = θl∗1 , . . . , θ(sn ) = θl∗n |{θl∗ }Ll=1 , {vl }Ll=1 = Pr z(s1 ) ∈ −1 πl1 −1 , −1 πl1 , . . . , z(sn ) ∈ −1 πln −1 , −1 πln , (2) where (·) denotes thecumulative standard normal distribution function and πl = 1 − k≤l (1 − vk ) is the proportion of the unit stick assigned to the first l atoms, with π0 = 0. We call the prior distribution induced by the hierarchical model above on the law P of the process a latent stick-breaking process (LaSBP), and write θ|P ∼ P with P ∼ LaSBPL ({al }Ll=1 , {bl }Ll=1 , H, γ ). The definition in Equation (2) makes clear that we are modeling a joint distribution, which encourages closeness among θ (s) and θ (s′ ). This is very different from the effect of DDP models, which encourage closeness among the distributions at s and s′ , but not necessarily between θ (s) and θ (s′ ). Theorem 1. For every n > 1, P ∼ LaSBPL ({al }Ll=1 , {bl }Ll=1 , H, γ ) satisfies Kolmogorov’s conditions almost surely, (1) symmetry: Ps1 ,...,sn (θ (s1 ) ∈ A1 , . . . , θ(sn ) ∈ An ) = Psν1 ,...,sνn θ sν1 ∈ Aν1 , . . . , θ sνn ∈ Aνn for any permutation ν = (ν1 , . . . , νn ) of the set {1, . . . , n}; (2) consistency: Ps1 ,...,sn+1 (θ (s1 ) ∈ A1 , . . . , θ(sn ) ∈ An , θ(sn+1 ) ∈ (−∞, ∞)) = Ps1 ,...,sn (θ (s1 ) ∈ A, . . . , θ (sn ) ∈ An ), where A1 , . . . , An are measurable subsets of . The proof is in the Appendix. Note that for any single given location Ps (·) = Ll=1 wl δθl∗ (·) = G(·), and therefore, the marginal distribution of the process is constant. However, draws at two different locations s and s′ are correlated because z(s) and z(s′ ) are correlated. For example, predictions on the value θ (s′ ) for an unobserved location s′ depend on the predictions for z(s′ ), in particular l=1 where the probability weights are defined as wl = vl k<l (1 − vk ) and satisfy l wl = 1 almost surely. The sequences of stick breaking ratios {vl }Ll=1 is such that vl ∼ beta(al , bl ) for l < L and vL = 1. The sequence of atoms {θl∗ }Ll=1 is constructed by imposing an order constraint on a sample from a baseline measure H on R using the following mechanism: Sample θ1∗ ∼ H and for l > 1 draw θl∗ ∼ Hl , where Hl is de∗ }, i.e., fined as the restriction of H to the set Sl = {θ : θ > θl−1 Hl (B) = H(B ∩ Sl )/H(Sl ) for any measurable set B ∈ B. Referring back to our example in ecology, we can think of z(s) as the quality of the habitat at location s, {θl∗ }Ll=1 as the (ordered) set of possible abundance rates that can potentially be observed, Pr θ (s′ ) = θl∗ |θ (s1 ), . . . , θ (sn ), {z(si )}ni=1 , {θl∗ }Ll=1 , {vl }Ll=1 = Pr z(s′ ) ∈ [−1 (πl−1 ), −1 (πl ))|z(s1 ), . . . , z(sn ) −1 (πl ) ′ −1 φ(t| ′s′ ,s −1 = s,s z; 1 − s′ ,s s,s s′ ,s ) dt, −1 (πl−1 ) where φ(·|a; b) denotes the density of the univariate Gaussian distribution with mean a and variance b, s′ ,s is a column vector with entries [ s′ ,s ]i = γ (si , s′ ), s,s is a n × n matrix with entries [ s,s ]ij = γ (si , sj ), and z = (z(s1 ), . . . , z(sn ))′ . An alternative but equivalent construction for the LaSBP can be obtained by introducing the labeling process ξ = {ξ(s) : s ∈ 650 Journal of the American Statistical Association, June 2010 D} with ξ(s) ∈ {1, . . . , L} such that ξ |{vl }Ll=1 ∼ Q and Q is characterized by the finite-dimensional distributions Qs1 ,...,sn (l1 , . . . , ln ) = Pr(ξ(s1 ) = l1 , . . . , ξ(sn ) = ln |{vl }Ll=1 ) = Pr z(s1 ) ∈ −1 πl1 −1 , −1 πl1 , . . . , z(sn ) ∈ −1 πln −1 , −1 πln . For example, Qs (k) = Pr(ξ(s) = k|{vl }Ll=1 ) = wk and Qs,s′ (l, k) = γ (s,s′ ) (−1 (πl ), −1 (πk )) − γ (s,s′ ) (−1 (πl−1 ), −1 (πk )) − γ (s,s′ ) (−1 (πl ), −1 (πk−1 )) + γ (s,s′ ) (−1 (πl−1 ), −1 (πk−1 )), where r (·, ·) denotes the cumulative distribution of the standard bivariate normal with correlation r. Conditionally on the ∗ . Intelabeling process we can define θ (s)|ξ(s), {θl∗ }Ll=1 = θξ(s) grating out ξ we get θ|P ∼ P and P ∼ LaSBP({al }Ll=1 , {bl }Ll=1 , H, γ ) as before, however, by explicitly introducing the latent labeling process ξ we immediately realize that the LaSBP induces a segmentation of the θ surface by discretizing a Gaussian process. To gain some intuition into this construction, we show in Figure 1 three random realizations of the θ surface on D = [0, 1]2 associated with three different marginal distributions P sharing the same underlying latent process z, which was generated using an exponential correlation function. For this simulation, we took L = 100 and al = 1, bl = α, for three different values of α. The latent process was sampled at 3600 locations over an evenly spaced 60 × 60 grid, and we used a standard Gaussian baseline measure H to generate the atoms of the process. The resulting surfaces are tiled, segmenting the space according to the level of the underlying Gaussian process. The parameter α influences the roughness of the segmentation (essentially, the number of different levels that the random surface takes), the baseline measure H controls the height of the tiles, and the correlation function γ influences the relative size of the different tiles. Figure 1 also provides insight into the order constraint we introduced in the definition of the process. The order constraint allows us to link the behavior of the atoms of the stick-breaking construction to the behavior of the latent process. In particular, it enforces the internal consistency condition z(s) > z(s′ ) ⇒ θ (s) ≥ θ (s′ ), which simplifies interpretation. In the species abundance application, this means that higher habitat quality scores should imply a larger number of sightings. In a similar spirit, a more nervous market should be subject to larger price changes. Without the order constraint forcing adjacent atoms to have similar values, interpretability is lost and label switching can unexpectedly lead to very wiggly estimates of the underlying latent field z and multimodal predictive distributions at new locations as an artifact. Figure 1. Realizations from a random law P drawn from a LaSBP process on [0, 1]2 with a standard Gaussian baseline measure. We illustrate the effect of different concentrations while keeping the underlying Gaussian process (shown in the upper left panel) constant to simplify interpretation. In the color scale, white corresponds to large values, yellow to medium values, and red to low values of the process. A color version of this figure is available in the electronic version of this article. Rodríguez, Dunson, and Gelfand: Latent Stick-Breaking Processes 651 L = l=1 al bk E(ψ(θl∗ )) a l + bl a k + bk k<l EH (ψ(θ1∗ )). > Therefore, due to the order constraint, some care must be exercised when interpreting the baseline measure in our model. In particular, the average level of the process will depend not only on the expected value of the baseline measure H, but also on the stick-breaking parameters {al }Ll=1 and {bl }Ll=1 . To understand the consequences of this behavior, consider the case al = 1 and bl = b. In that case, if b → ∞ then E(θ (s)) → ∞. In turn, for a finite b, the mean value of the process is strictly greater than the mean of the baseline measure. Example 1. Consider θ|P ∼ P and P ∼ LaSBPL (a, b, H, γ ), where H is an exponential baseline measure with mean λ. This coincides with the illustrations we present in Section 7. In ∗ follows a shifted exponential distribution and this case, θl∗ |θl−1 ∗ ∗ ∗ . This implies that E(θ ∗ ) = lλ and E(θl |θl−1 ) = λ + θl−1 l Figure 2. Locations in our toy example. Black points correspond to locations where data are observed, the white point is a location where prediction is desired. As an illustration of this phenomenon, consider a very simple setup where L = 3 and the process is observed at three locations (labeled s1 , s2 , and s3 in Figure 2). Let the stickbreaking weights be such that −1 (π1 ) = −1 and −1 (π2 ) = 1 and the underlying z take the values z(s1 ) = −2, z(s2 ) = 2, and z(s3 ) = 0. This implies that the component membership indicators are ξ(s1 ) = 1, ξ(s2 ) = 3, and ξ(s3 ) = 2. Also, let the atoms take the value θ1∗ = −5, θ2∗ = 5, and θ3∗ = 0 (this is a valid combination of values if the order constraint is removed). Consider now the value of the underlying process and the corresponding predictions at location s4 . Since the Gaussian process is continuous, predictions for z(s4 ) will be close to 0, making ξ(s4 ) = 2 (and a prediction close to 5) highly likely, even though s3 is far away from s1 , s2 , and s4 . This is undesirable if we are interested in convex interpolations, as in that case we will expect θ (s4 ) < 0. This feature can be particularly acute with small samples or low spatial coverage, as there may be little information to distinguish between multiple latent surfaces consistent with different label permutations. In most applications, the LaSBP will not be used to model the observable Y(s) directly, but will be used to construct LaSBP mixtures. Briefly, we will assume that Y(s)|θ (s) ∼ f (·|θ (s)), where θ (s) is a scalar, and f (·|θ (s)) is a density function, e.g., N(μ, 1/θ(s)) (as in Section 7.1) or Poi(θ (s)) (as in Section 7.2), and let θ |P ∼ P where P ∼ LaSBPL ({al }Ll=1 , {bl }Ll=1 , H, γ ). This implies that Y(s) and Y(s′ ) are correlated since θ (s) ′ and correlated, and that marginally, Y(s)|G ∼ L θ (s ) are ∗ ), which defines the same, possibly multimodal, w f (·|θ l=1 l l marginal for any s. 4. PROPERTIES First, consider the expectation of θ (s). If ψ is a measurable function on (, B), E ψ(θ(s)) = L l=1 E(wl )E(ψ(θl∗ )) E(θ (s)) = Note that L λl l=1 l−1 a b a+b a+b L+1 a+b b =λ 1− a a+b L a b − (L + 1) . a+b a+b a+b . a Therefore, the process is well defined (in the sense that it has a finite first moment) even with an infinite number of atoms as long as b is finite. A similar argument yields a+b 2 lim E (θ (s))2 = 2λ2 L→∞ a lim E[θ (s)] = λ L→∞ and lim Var[θ (s)] = λ L→∞ 2 a+b a 2 . Example 1 illustrates that in many cases we can interpret finite LaSBP’s as truncations of infinite processes. In this situation, the theory of truncations developed in Ishwaran and James (2001, 2003) or Rodriguez, Dunson, and Gelfand (2008) can be directly modified for the LaSBP. The following theorem gives sufficient conditions for the limiting process to have finite expectation. Theorem 2. Let θ |P ∼ P and P ∼ LaSBPL (a, b, H, γ ), where H is such that, for large l, ∗ ∗ ) ≤ μ + λθl−1 E(θl∗ |θl−1 for some real constants μ and λ such that 0 < λ ≤ 1. Then lim E(θ (s)) < ∞. L→∞ Corollary 1. If E(θl∗ ) < c1 lp + c2 λl for some real constants c1 , c2 , and λ with 0 < λ ≤ 1 and integer constant p, then limL→∞ E(θ (s)) < ∞. 652 Journal of the American Statistical Association, June 2010 Corollary 2. If H limL→∞ E(θ (s)) < ∞. has a compact support then The details of the proof are shown in the Appendix, and it can be easily extended to provide conditions for finite higher-order moments. As the following two examples show, the condition in Theorem 2 is generally satisfied when H corresponds to a distribution with light tails. Example 2. Let θ |P ∼ P and P ∼ LaSBPL (a, b, H, γ ) where H is a Gaussian distribution with mean μ and variance σ 2 . In this case ∗ − μ)/σ ) φ((θl−1 ∗ E(θl∗ |θl−1 )=μ+σ ∗ − μ)/σ ) , 1 − ((θl−1 where φ denotes the density of the standard normal distribution. Using some well-known properties (Abramowitz and Stegun 1965, p. 298), σ ∗ − μ)/σ ) φ((θl−1 ∗ − μ)/σ ) 1 − ((θl−1 1 ∗ ∗ − μ)2 + 2σ 2 ] ≤ √ [(θl−1 − μ) + (θl−1 2π 2 ∗ ≈ √ (θl−1 − μ) 2π √ ∗ . Since 2/ 2π < 1, Theorem 1 applies and for large θl−1 limL→∞ E(θ (s)) < ∞. Example 3. Consider now a Pareto baseline measure with density α η α+1 , θ ≥ η. p(θ ) = η θ The conditional density reduces to ∗ α+1 α θl−1 ∗ )= ∗ , p(θl∗ |θl−1 θl−1 θl∗ ∗ θl∗ ≥ θl−1 , ∗ ) = α θ ∗ . Therefore lim and E(θl∗ |θl−1 L→∞ E(θ (s)) < ∞ iff α−1 l−1 b/(a + b) < (α − 1)/α. In addition to the mean of the process, it is important to understand the covariance structure induced by our construction. The covariance of the θ process is given by Cov(θ (s), θ (s′ )) = L L E(Qs,s′ (l, k))E(θl∗ θk∗ ) l=1 k=1 − E(Qs (l))E(Qs′ (k))E(θl∗ )E(θk∗ ) . (3) Although we do not have a closed-form expression for the covariance function, it is clear from Equation (3) that the process on θ is isotropic, stationary or nonstationary if z has these properties. Additional properties for the selection process U can be seen in the Appendix. Realizations from P are discontinuous almost surely, which is a consequence of the stick-breaking construction. However, note that Qs,s′ (l, k) → Qs (l)Qs′ (k) as γ (s, s′ ) → 0, Qs (l), l = k, Qs,s′ (l, k) → 0, otherwise as γ (s, s′ ) → 1. In particular, this implies that Pr(θ (s) = θ (s′ )) → 1 as s′ → s almost surely, and the selection mechanism is spatially coherent. This result also shows that, as the range of the latent Gaussian process goes to zero, samples at different locations become iid observations from G. On the other hand, as the range grows, the random surfaces drawn from P are constant and the corresponding LaSBP mixture reduces to a parametric model with a prior H on the unknown parameter. It is important to note that θ (s) and θ (s′ ) are never independent. Indeed, even if γ (s, s′ ) = 0, implying that Pr(ξ(s) = l, ξ(s′ ) = k|{vr }Lr=1 ) = Pr(ξ(s) = l|{vr }Lr=1 )Pr(ξ(s′ ) = k|{vr }Lr=1 ) = Qs (l)Qs′ (k), it is clear from Equation (3) that, in general, Cov(θ (s), θ (s′ )) = 0 because Cov(E(θ (s)|G), E(θ (s′ )| G)) = 0. It is also interesting to formally consider the effect of the stick-breaking distribution on the surface. Assume for simplicity that al = a and bl = b for all l, which we will do in most a practical applications. Note that as a+b → 1 we have w1 → 1 and realizations from P will correspond to flat random surfaces, θ (s) = θ1∗ for all s ∈ D. Finally, we also note that, due to the order constraint on the atoms, the LaSBP with constant a and b tends to generate skewed marginal distributions as the process will also encourage larger weights for smaller values of θ . This provides our model with a particular flavor that differentiates it from the regular Dirichlet process and the Dirichlet-multinomial model underlying the hDP in Petrone, Guindani, and Gelfand (2009). More precisely, the weights arising from a stick-breaking construction are stochastically ordered. Under the order constraint for the θl∗ ’s, this implies that smaller θl∗ ’s receive stochastically larger weights. For scale LaSBP mixtures (as in Sections 7.1 and 7.2), this feature yields models that favor heavier tails for the marginal distribution of the observations than those obtained by dependent Dirichlet process (DDP) mixtures. For location mixtures, the LaSBP favors distributions that have a heavy right or left tail, which in our experience, can be appealing property in many applications. Indeed, there is often prior information available that one tail of the distribution is heavier and is more subject to secondary modes, and our specification allows a natural incorporation of such prior information. In the applications we considered, this was the case. To provide a few additional examples: (1) In studies of premature delivery, the natural response is the gestational age at delivery. The right tail tends to be very stable across populations and is very light, while the left tail (corresponding to premature deliveries) is quite heavy and varies substantially across ethnic groups and with other factors; (2) in studies of the level of a pollutant, the left tail (corresponding to unusually low pollution levels) tends to be very light relative to the right tail (corresponding to high pollution levels); and (3) similar behavior is observed typically for continuous adverse health responses, such as the frequency of DNA strand breaks in cells. In applications where this property not appealing, it can be mitigated through appropriate choices of the nonconstant sequences {al }Ll=1 and {bl }Ll=1 controlling the stick breaking weights. 5. INFERENCE FOR MIXTURES OF LASBP For inference purposes, we develop an MCMC algorithm. We assume al = a, bl = b, L < ∞ and use a blocked sampler (Ishwaran and James 2001). If L = ∞, a retrospective sampler Rodríguez, Dunson, and Gelfand: Latent Stick-Breaking Processes (Roberts and Papaspiliopoulos 2008) can be easily obtained as an extension. To facilitate computation, the latent processes, z and ξ are sampled explicitly. Assuming that n data points y(s1 ), . . . , y(sn ) were collected, one for each location s1 , . . . , sn , and using the familiar bracket notation, the joint distribution is given in this case by n n [y(si )|θ ∗ , ξ(si )] × [ξ(si )|z(si ), v] × [z(s1 ), . . . , z(sn )|λ] i=1 i=1 × ∞ l=1 [vi |α] × ∞ l=1 [θl ] × [α] × [λ]. After setting up initial values for all parameters in the model, the algorithm proceeds by sequentially updating the parameters according to the following steps: (1) Jointly update the latent processes ξ and z, one location at a time, by first sampling ξ(si ) from a discrete distribution such that Pr(ξ(si ) = l) ∝ p(y(si )|θl )Pr z(si ) ∈ [−1 (πl−1 ), −1 (πl ))|z(s1 ), . . . , z(si−1 ), z(si+1 ), . . . , z(sn ) , where the prior probability of component l can be obtained by a univariate integration, and then sampling z(si ) from the restricted univariate normal distribution defined by [z(si )|z(s1 ), . . . , z(si−1 ), z(si+1 ), . . . , z(sn )]1l , where l = {z(si ) : z(si ) ∈ [−1 (πξ(si )−1 ), −1 (πξ(si ) ))}. Note that, if si = sj for some j, then the prior probability for observation i being assigned to component ξ(sj ) is one, and therefore, ξ(si ) = ξ(sj ) and z(si ) = z(sj ), as expected. (2) Sample the stick-breaking ratios one at a time from their full conditional vl |v(−l) ∝ va−1 (1 − vl )b−1 1Al , l where Al = {vl : qll ≤ vl < qul } and qll = max qul = {i:ξ(si )≥l} min 1 − (z(si )) , 1− k≤ξ(si ),k=l (1 − vk ) {i:ξ(si )≥l+1} 1 − (z(si )) . k≤ξ(si )−1,k=l (1 − vk ) 1− Note that vl depends on v(−l) only through the constraint on the support. Also, vl depends on z only through {z(si ) : ξ(si ) ≥ l}. Therefore, for l > l∗ = maxi {ξ(si )}, vl is conditionally independent from v(−l) . (3) Sample the atoms {θl }. The set of full conditionals is given by p(θ1 | · · ·) ∝ h(θ1 ) p(y(si )|θ1 ) {i:ξ(si )=1} and p(θl | · · ·) ∝ h(θl |θl−1 ) p(y(si )|θl ), l > 1, {i:ξ(si )=l} where hl is the density associated with the measure Hl . 653 (4) The prior parameters on the stick-breaking ratios a and b can be jointly sampled using a random-walk Metropolis– Hastings algorithm. (5) The parameters of the underlying Gaussian process can be sampled conditional on the current realization z(s1 ), . . . , z(sn ) using a random-walk Metropolis–Hastings algorithm, as is customary in Gaussian process models. (6) Since the conditional moves in step (1) can raise some concerns about mixing rates, we suggest to additionally sample the whole Gaussian process jointly. This can be easily done by noting that the joint full conditional is given by [z(s1 ), . . . , z(sn )| · · ·] · · · ∼ N(0, )1 , where []ij = γ (si , sj ), = ni=1 i and i = {z(si ) : −1 (πξ(si )−1 ) ≤ z(si ) < −1 (πξ(si ) )} as before. Due to the form of the restrictions, this is a relatively straightforward step. The supplemental material available at http://pubs.amstat. org/loi/jasa discusses some details related to efficient computational implementation for this model. 6. MULTIVARIATE LATENT STICK–BREAKING PROCESSES The order constraint in the definition of the LaSBP makes it hard to directly extend our formulation to multivariate atoms. One alternative we explore here is to use a latent multivariate Gaussian process to induce dependence in the selection mechanism of multiple stick-breaking representations. As an example, consider a two-dimensional process θ = {(θ1 (s), θ2 (s))′ : s ∈ D} with marginal distributions G1 and G2 , which have stickbreaking representations Gi (·) = Ll=1 wil δθil∗ (·) with θil∗ ≤ ∗ for i = 1, 2 as defined in Section 3 for the univariθi,l+1 ate LaSBP, with components independent for i = 1, 2. Also, let (z1 (s), z2 (s))′ = z(s) = Ax(s), where xi is a standardized Gaussian process with correlation function γi (s, s′ ) and A is a matrix such that 1 0 1 η ′ A= . ⇒ AA= η 1 η 1 − η2 Letting aj be the jth column of A, the corresponding crosscovariance matrix for z is given by z(s),z(s′ ) = γ1 (s, s′ )a1 a′1 + γ1 (s, s′ )a2 a′2 . In particular, this implies that Var(zi (s)) = 1 for i = 1, 2 and Cov(z1 (s), z2 (s)) = Cor(z1 (s), z2 (s)) = η. As in Section 3, we can define finite-dimensional distributions Ps1 ,...,sn θ1 (s1 ) = θ1l∗ 1 , . . . , θ1 (sn ) = θ1l∗ n , θ2 (s1 ) = θ2l∗ n+1 , . . . , θ2 (sn ) = θ2l∗ 2n | {θ1l , θ2l }Ll=1 , {v1l , v2l }Ll=1 = Pr z1 (s1 ) ∈ −1 (π1,l1 −1 ), −1 π1,l1 , . . . , z1 (sn ) ∈ −1 π1,ln −1 , −1 π1,ln , z2 (s1 ) ∈ −1 π2,ln+1 −1 , −1 π2,ln+1 , . . . , z2 (sn ) ∈ −1 π2,l2n −1 , −1 π2,l2n . Choosing η = 0 implies that θ1 (s) is independent from θ2 (s). More generally, we can calculate the cross-covariance function 654 Journal of the American Statistical Association, June 2010 for this process by noting that ′ Cov(θi (s), θj (s )) = L L ij E[Qs,s′ (l, k)]E(θil∗ )E(θjk∗ ) l=1 k=1 j − E[Qis (l)]E[Qs (k)]E(θil∗ )E(θjk∗ ), (4) ij where as in Section 4, Qs,s′ (l, k) = Pr((zi (s)) ∈ [πi,l−1 , πi,l ), (zj (s′ )) ∈ [πj,k−1 , πj,k )|{v1l , v2l }Ll=1 ) and Qis (l) = Pr((zi (s)) ∈ [πi,l−1 , πi,l )|{vil }Ll=1 ) = wil . As expected, Equation (4) reduces to Equation (3) when i = j. The use of dependent Gaussian processes typically provides an interesting interpretation for the model. For example, in the context of the stochastic volatility model described in Section 7.1, the dependence between latent Gaussian processes measures the correlation between market sentiments across different assets. Additionally, sampling for this multivariate extension can proceed along the lines described in Section 5, with an additional step to sample the parameters in A conditionally on all other parameters. The multivariate extension we describe in this section is really just a straightforward modification of the univariate case in that the order restrictions are placed on the atoms in each marginal in the same manner as in the univariate case. Hence, the assumptions are not more stringent than in the univariate case, so our previous comments comparing the LaSBP with the hDP in the univariate case still apply. As we discussed in Section 4, it is very often the case that such order restrictions are justified. This is also true in the multivariate case. For example, instead of one pollutant, we may measure multiple pollutants, all of which should have a heavier right tail in their concentration distribution a priori. A posteriori these assumptions are not strictly enforced; if there is sufficient evidence in the data to refute our a priori (informed) guess of the shape of the distribution, the posterior estimate will be appropriately adapted. 7. ILLUSTRATIONS 7.1 Modeling Stochastic Volatility In this section we consider an application of the LaSBP to the modeling of stock-market returns. This example is interesting as it allows us to compare the LaSBP with other more standard models in a real life example. The dataset under consideration consists of the weekly returns of the S&P500 index covering the 10-year period between April 21, 1997 and April 9, 2007, for a total of 520 observations. Figure 3 shows the evolution of these returns. The series does not exhibit any long-term trend, but different levels of volatility can be clearly seen. In particular, two slightly different regimes are apparent; before May 2003, periods of high volatility are relatively frequent, while after May 2003, we can appreciate longer low-volatility periods. Indeed, it is well known that financial time series typically exhibit heavy tails and that periods of low/high volatility tend to cluster together. Similarly to other approaches in the literature (Bollerslev 1986; Jacquier, Polson, and Rossi 1994), we model r(t), the log-return of the S&P500 at time t, as conditionally following a normal distribution, i.e., r(t)|μ, θ (t) ∼ N(μ, θ (t)−1 ), where θ|P ∼ P and the law P follows a LaSBP with an exponential baseline measure with mean 1/φ0 and a decreasing (rather than Figure 3. Weekly returns on the S&P500 index between April 21, 1997 and April 9, 2007. A color version of this figure is available in the electronic version of this article. increasing) order constraint. Also, we take al = 1 and bl = α for all l. We noted before that this choice places stochastically larger probabilities on larger variances. A prior with this structure is well motivated in this application, because low volatility (high precision) is the norm, while high volatility (low precision) is associated with sporadic shocks to the system. To demonstrate the methodology we use an exponential correlation function with range parameter λ for the latent process, i.e., ′| γ (t, t′ ) = exp{− |t−t λ }; however, implementations using more general correlation functions (like Matérn, power exponential or some nonstationary family) is just as straightforward. This model results in a marginal distribution for the returns that is a scale mixture of Gaussian distributions, yielding a rich model on the class of unimodal distributions that can accommodate heavy tails. Also, the structure of the LaSBP naturally induces volatility clustering, while potentially allowing for richer dependence structures than Markov-switching models. We complete the model by specifying priors for the different parameters. The mean of the returns is given a normal distribution with mean μ0 and variance τ 2 , the spatial range is given a Gam(aλ , bλ ) distribution, the precision α is assigned a Gam(aα , bα ) prior and φ0 is given a Gam(δ, φ00 ) prior. Since the historical annual volatility of the S&P500 is traditionally estimated to be in the range of 12% –15%, we take τ 2 = 0.152 /52 and E(1/φ0 ) = 0.122 /52. The prior mean for the average return is taken to be μ0 = 0, and the degrees of freedom are set to δ = 2. We picked aα = bα = 1 so that the prior for the precision parameter α is centered around 1 and has variance 1. The prior for the temporal correlation is set as aλ = 1 and bλ = 1/5, implying that the a priori range of the latent process driving the correlation is approximately 15 weeks. Forty atoms (L = 40) were used in the stick-breaking construction. A small sensitivity analysis was conducted by changing the hyperpriors on σ02 , Rodríguez, Dunson, and Gelfand: Latent Stick-Breaking Processes α, and λ, as well as L, and results seemed mostly robust to the selection. All results are based on 50,000 iterations obtained after a burn-in period of 5000 samples. We resampled the latent process every 10 iterations to improve mixing. Visual inspection of the trace plots does not reveal any obvious mixing or convergence problems. Sensitivity analysis was performed by considering values for E(1/φ0 ) in the range between 0.052 /52 and 0.42 /52, leading to essentially the same results. The model identifies between 2 and 11 volatility regimes, with models having between 3 and 6 components receiving the most weight (estimated posterior probabilities 0.27, 0.33, 0.20, and 0.10, respectively). The mean of the precision α is estimated to be 0.700 [median 0.650, symmetric 90% credible interval (0.260; 1.312)]. The posterior distribution of the correlation parameter λ has a mean of 110.72 [90% credible interval (34.05; 241.53)], showing strong evidence of long-range dependence in the sample. This feature, which is well known in financial time series, is hard to capture with stochastic volatility models based on low-order autoregressive processes. The posterior mean weekly return is estimated to be 0.00194 [credible interval (0.00050; 0.00336)], a reasonable value given historical evidence. Figure 4 shows the posterior mean for the volatility path estimated by the LaSBP model, along with the sample standard deviation and the volatility path obtained from the “standard” stochastic volatility described in Jacquier, Polson, and Rossi (1994). Note that, although under the LaSBP each realization of the volatility path is piecewise constant, by taking an average over multiple realizations of the posterior distribution we obtain a smoother path. Also, although both models provide similar volatility estimates, there are some interesting differences. For example, the low volatility period between October 2003 and December 2006 is estimated by the Figure 4. Smoothed weekly volatilities for the S&P500 using the LaSBP stochastic volatility model, a standard stochastic volatility model fitted to the whole series, and separate standard stochastic volatility models for the periods before and after 06/16/2003. The flat dotted line corresponds to the empirical standard deviation of the sample. A color version of this figure is available in the electronic version of this article. 655 LaSBP analysis to be a period with essentially constant volatility, while the regular volatility model implies frequent fluctuations. Nonetheless, both models rapidly adapt to the rise in volatility in early 2007. Also the three high volatility peaks in 2000, 2001, and 2002 are more pronounced under our model. To better understand the behavior of these models, we also fitted separate standard stochastic volatility models for the periods before and after 06/16/2003 (labeled “separate” in Figure 4). Note that the separate stochastic volatility models fit a smoother volatility estimate than the standard model to the second period. This suggests that lack of smoothness in the second period for the standard stochastic volatility model is induced by the lack of flexibility in the model. Unlike standard volatility models, the LaSBP automatically produces a nonparametric estimate of the marginal distribution of returns on S&P500, which is shown in Figure 5. As expected from historical evidence, the tails of the estimated marginal distribution are heavier than the tails of a normal distribution with equivalent mean and variance. 7.2 Understanding Species Abundance: A Model for Point-Referenced Counts This section concentrates on models for spatial count data, in the spirit of Richardson and Green (2001). The Christmas Bird Count (CBC) is an annual census of early-winter bird populations conducted by over 50,000 observers each year between December 14th and January 5th. The primary objective of the CBC is “to monitor the status and distribution of bird populations across the Western Hemisphere.” Parties of volunteers follow specified routes through a designated 15-mile diameter circle, counting every bird they see or hear. The parties are organized by compilers who are also responsible for reporting total counts to the organization that sponsors the activity, the Figure 5. Marginal distribution of returns from the S&P500, compared with an empirical fit to the Gaussian distribution. A color version of this figure is available in the electronic version of this article. 656 Audubon Society. Data and additional details about this survey are available at http:// www.audubon.org/ bird/ cbc/ index.html. In this illustration, we concentrate on jointly modeling the abundance of two bird species in North Carolina during the 2006–2007 winter season: Sturnus vulgaris, also known as European Starling, and Zenaida macroura, also known as Mourning Dove. In this period, forty-four circles were active within the state. Since the diameter of the circles is very small compared to the size of the region under study, we treat the data as point referenced to the center of the circle. We elaborate on the model described in Section 6 to construct a bivariate LaSBP mixture of Poisson kernels. Specifically, letting yi (s) stand for the number of birds observed at location s (expressed in latitude and longitude) belonging to species i, we assume yi (s) ∼ Poi(h(s)θi (s)), where h(s) represents the number of man-hours invested at location s. Next, we assume that (θ1 , θ2 )|P and P follows a bivariate H20 with LaSBP with exponential baseline measures H10 and means θ10 and θ20 , stick-breaking weights wil = vil k<l (1 − vik ) with vil ∼ beta(1, αi ), and exponential correlation functions γ1 (·, ·) and γ2 (·, ·) with range ρ1 and ρ2 , respectively. We allow for a hierarchical structure on the mean of the baseline distributions H10 and H20 , whose inverse is assumed to follow a gamma distribution, 1/θi0 ∼ Gam(2, 2θi00 ). Based on historical data, we set θ100 = 10 and θ200 = 4. The correlation η and range parameters ρ1 and ρ2 defining the latent Gaussian processes are assumed unknown, and priors η ∼ Uni[−1, 1], ρ1 ∼ Gam(1, 1/2) and ρ2 ∼ Gam(1, 1/2) are, respectively, assigned. Precision parameters are assigned independent priors αi ∼ Gam(1, 1). Results are based on 100,000 iterations obtained after a burn-in period of 20,000 samples, and the latent process was resampled every 10 iterations. Visual inspection of trace plots reveals no obvious convergence problems, and autocorrelations seem within reasonable levels. Figure 6 shows the posterior mean surfaces corresponding to the per-man-hour sighting rates for each of the two species. As expected, rates for both species tend to be higher in the coastal plains region; the European Starling and the Mourning Dove tend to be associated with man-altered environments, preferring open country and grasslands, and avoiding woodlands, deserts, and swamps. The latent processes appear to be positively correlated, the posterior expectation for η is 0.26 and (0.12; 0.44) is a symmetric, 90% credible interval. However, the difference in the level of spatial dependence between species is striking; while suitable habitats for the Mourning Dove seem to cover large geographical regions, European Starlings seem to concentrate on small microhabitats. Figure 7 shows estimates of the marginal distribution of the number of sightings in 1 man-hour for both species under consideration. Both distributions are bimodal and heavy tailed. The smaller mode, which can be interpreted as the baseline rate of sightings, falls around four or six birds per man per hour for the European Starling and around two to four birds for the Mourning Dove. The higher mode, which can be interpreted as corresponding to specially suited habitats, is around 45 sightings per man-hour for the European Sterling and around 23 sightings for the Mourning Dove. We wish like to emphasize that parametric alternatives to our proposed model like a Poisson log-linear model do not allow for a multimodal marginal distribution like the one shown previously. On the other hand, nonparametric alternatives like Journal of the American Statistical Association, June 2010 (a) (b) Figure 6. Estimates of the spatial abundance (a) the European Starling and (b) Mourning Dove in North Carolina. A color version of this figure is available in the electronic version of this article. the SDP (Gelfand, Kottas, and MacEachern 2005) and the order dependent DP (Griffin and Steel 2006b) allow for flexible location-specific distributions for the number of sightings, but again do not provide a common estimate like the one in Figure 7. 8. DISCUSSION We develop and illustrate a novel mechanism to generate stochastic processes with random marginal distributions. This is done by representing the common marginal distribution using a stick-breaking construction and inducing dependence in the selection mechanism of the atoms. Implementation of these models is straightforward using standard MCMC algorithms. One important difference between our model and other hybridization schemes is its ability to provide reasonable predictions and interpolations. Indeed, we show that some sort of identifiability constraint on the atoms is necessary to facilitate interpretation and avoid multimodality in the posterior distribution due to label switching. LaSBP’s are very flexible. Unlike other nonparametric models in the literature, implementation of anisotropic and nonstationary models is straightforward, as is the modeling of discrete or continuous data. In particular, we are currently exploring hierarchical specifications for the latent process that also allow Rodríguez, Dunson, and Gelfand: Latent Stick-Breaking Processes (a) 657 (in the Kolmogorov sense). Note that, from Equation (2), the finitedimensional distributions can be represented as mixtures. Therefore, it is enough to prove that the process satisfies Kolmogorov’s conditions conditional on {vl }Ll=1 and {θl∗ }Ll=1 , as mixtures of consistent processes are consistent. Symmetry is immediate, G θ sν1 = θl∗ν , . . . , θ sνn = θl∗ν |{θl∗ }Ll=1 , {vl }Ll=1 n 1 ∈ −1 πlν −1 , −1 πlν 1 1 = Pr z sν1 z sν1 ∈ −1 πlνn −1 , −1 πlνn ,..., = Pr z(s1 ) ∈ −1 πl1 −1 , −1 πl1 , . . . , z(sn ) ∈ −1 πln −1 , −1 πln = G θ(s1 ) = θl∗1 , . . . , θ (sn ) = θl∗n |{θl∗ }Ll=1 , {vl }Ll=1 . For consistency G θ(s1 ) = θ1∗ , . . . , θ (sn ) = θn∗ , θ(sn+1 ) = u| u∈{θl∗ }Ll=1 (b) {θl∗ }Ll=1 , {vl }Ll=1 = L ln+1 =1 Pr z(s1 ) ∈ −1 πl1 −1 , −1 πl1 , . . . , z(sn+1 ) ∈ −1 πln+1 −1 , −1 πln+1 = Pr z(s1 ) ∈ −1 πl1 −1 , −1 πl1 , . . . , z(sn+1 ) ∈ (−∞, ∞) = Pr z(s1 ) ∈ −1 πl1 −1 , −1 πl1 , . . . , z(sn ) ∈ −1 πln −1 , −1 πln = G(θ (s1 ) = θ1∗ , . . . , θ (sn ) = θn∗ |{θl∗ }Ll=1 , {vl }Ll=1 ). A.2 Proof of Theorem 2 If E(θl |θl−1 ) ≤ μ + λθl−1 then solving the recursion E(θl ) ≤ μ l−1 k=0 λk + λl E(θ1 ) = μ + λl (E(θ1 ) − μ) 1−λ and E(θ (s)) = Figure 7. Marginal distribution of sightings for (a) the European Starling and (b) the Mourning Dove in North Carolina. efficient high-dimensional specifications where the correlation between processes varies in space. An alternative construction for the LaSBP can be obtained by using the order statistics from a sample of size L from the baseline measure as the atoms of G. This approach, which was suggested by one of the referees, will be explored elsewhere. APPENDIX A.1 Proof of Theorem 1 This is a simple consequence of the properties of a Gaussian process, and can be extended for any other consistent selection process ∞ l=1 E(θl ) l−1 a b a+b a+b ≤ l−1 ∞ b μ a + λl (E(θ1 ) − μ) 1−λ a+b a+b = aλ(E(θ ) − μ) μ + , 1−λ a + b(1 − λ) l=1 only if λb/(a + b) < 1. This result can be easily extended to the case where E(θl ) ≤ c1 lp + c2 λl by noting that the expected value of the first term corresponds to the value of the polylogarithm function of order p (Abramowitz and Stegun 1965), which is finite. A.3 Properties of the Selection Process Let {z(s) : s ∈ D} be a standard Gaussian process with correlation function γ (s, s′ ) and define U(s) = (z(s)). We call U(s) the selection process, as it controls which atom is associated with each location. This section describes some of the properties of U(s). 658 Journal of the American Statistical Association, June 2010 By construction, the marginals of U(s) are uniform on [0, 1], therefore, E(U(s)) = 1/2 for all s ∈ D. Equivalently, we can obtain this result through computing the expectation, ∞ z x2 z2 1 1 dx dz. E(U(s)) = √ exp − √ exp − 2 2 2π −∞ −∞ 2π If we define random variables X and Z such that (X, Z)′ ∼ N(0, I), it turns out that ∞ z x2 z2 1 1 dx dz = Pr(Z − X > 0) √ exp − √ exp − 2 2 2π −∞ −∞ 2π = 1/2, since Z − X ∼ N(0, 2). We can use a similar approach to compute E(U(s)U(s′ )). Indeed, E(U(s)U(s′ )) = x2 + x22 1 exp − 1 2 −∞ −∞ −∞ −∞ 2π ∞ ∞ z1 z2 1 1 ′ −1 −1/2 || × exp − z z dx dz, 2π 2 where z = (z1 , z2 )′ and = 1 γ (s, s′ ) Define ⎛X ⎞ 1 ⎛⎛ 0 ⎞ ⎛ 1 ⎜ X2 ⎟ ⎜0⎟ ⎜0 ⎜ ⎟ ∼ N⎜ ⎝⎝ ⎠ , ⎝ ⎝Z ⎠ 0 0 1 0 0 Z 2 γ (s, s′ ) . 1 0 0 0 ⎞⎞ 1 0 0 ⎟⎟ ⎠⎠ , 0 1 γ (s, s′ ) ′ 1 0 γ (s, s ) then E(U(s)U(s′ )) = Pr(Z1 − X1 > 0, Z2 − X2 > 0), where (Z1 − X1 > 0, Z2 − X2 > 0) ∼ N 2 0 , γ (s, s′ ) 0 γ (s, s′ ) 2 . Therefore, 1 Cov(U(s), U(s′ )) = (s,s′ ) (0, 0) − , 4 where denotes the cumulative distribution of a bivariate normal random variable with covariance matrix and (s, s′ ) = γ (s,s′ ) 2 . Clearly, if γ (s, s′ ) = 0 then E(U(s)U(s′ )) = γ (s,s′ ) 2 Pr(Z1 − X1 > 0, Z2 − X2 > 0) = Pr(Z1 − X1 > 0)Pr(Z2 − X2 > 0) = E(U(s))E(U(s′ )), and therefore, Cov(U(s), U(s′ )) = 0. Higher-order moments of the process can be similarly computed, resulting in expressions where the ith moment depends on the cumulative distribution of an ith variate normal random variable. SUPPLEMENTAL MATERIALS Short note: Discusses details pertaining to the efficient implementation of the Markov chain Monte Carlo sampler discussed in Section 5, including the efficient generation of pseudo-random numbers from truncated multivariate and conditional normal distributions. (supplemental_material. pdf) [Received April 2008. Revised November 2009.] REFERENCES Abramowitz, M., and Stegun, I. A. (eds.) (1965), Handbook of Mathematical Functions (1st ed.), New York: Dover. [652,657] Beal, M. J., Ghahramani, Z., and Rasmussen, C. E. (2001), “The Infinite Hidden Markov Model,” in Proceedings of Fourteenth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada. [648] Bollerslev, T. (1986), “Generalized Autoregressive Conditional Heteroscedasticity,” Journal of Econometrics, 31, 307–327. [654] DeIorio, M., Müller, P., Rosner, G. L., and MacEachern, S. N. (2004), “An ANOVA Model for Dependent Random Measures,” Journal of the American Statistical Association, 99, 205–215. [647,649] Duan, J. A., Guindani, M., and Gelfand, A. E. (2007), “Generalized Spatial Dirichlet Process Models,” Biometrika, 94, 809–825. [647] Dunson, D. B., and Park, J.-H. (2008), “Kernel Stick-Breaking Processes,” Biometrika, 95, 307–323. [647] Dunson, D. B., Pillai, N., and Park, J.-H. (2007), “Bayesian Density Regression,” Journal of the Royal Statistical Society, Ser. B, 69, 163–183. [647] Ferguson, T. S. (1973), “A Bayesian Analysis of Some Nonparametric Problems,” The Annals of Statistics, 1, 209–230. [647] Figueiredo, M. A. T. (2005), “Bayesian Image Segmentation Using Gaussian Field Priors,” in Energy Minimization Methods in Computer Vision and Pattern Recognition, eds. A. Rangarajan, B. Vemuri, and A. L. Yuille, Berlin, Heidelberg: Springer-Verlag, pp. 74–89. [647] Figueiredo, M. A., Cheng, D. S., and Murino, V. (2007), “Clustering Under Prior Knowledge With Application to Image Segmentation,” in Advances in Neural Information Processing Systems 19, eds. B. Schölkopf, J. Platt, and T. Hoffman, Cambridge, MA: MIT Press, pp. 401–408. [647] Fox, E., Sudderth, E., Jordan, M. I., and Willsky, A. (2008), “An HDP-HMM for Systems With State Persistence,” in Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland. [648] Gelfand, A. E., Kottas, A., and MacEachern, S. N. (2005), “Bayesian Nonparametric Spatial Modeling With Dirichlet Process Mixing,” Journal of the American Statistical Association, 100, 1021–1035. [647-649,656] Griffin, J. E., and Steel, M. F. J. (2006a), “Nonparametric Inference in Time Series Problems,” technical report, University of Warwick, Dept. of Statistics. [647] (2006b), “Order-Based Dependent Dirichlet Processes,” Journal of the American Statistical Association, 101, 179–194. [647-649,656] Hoff, P. D. (2007), “Extending the Rank Likelihood for Semiparametric Copula Estimation,” The Annals of Applied Statistics, 1, 265–283. [648] Ishwaran, H., and James, L. F. (2001), “Gibbs Sampling Methods for StickBreaking Priors,” Journal of the American Statistical Association, 96, 161– 173. [647,648,651,652] (2003), “Some Further Developments for Stick-Breaking Priors: Finite and Infinite Clustering and Classification,” Sankhya, 65 577–592. [651] Jacquier, E., Polson, N. G., and Rossi, P. E. (1994), “Bayesian Analysis of Stochastic Volatility Models,” Journal of Business and Economic Statistics, 12, 371–389. [654,655] Kottas, A., Müller, P., and Quintana, F. (2005), “Nonparametric Bayesian Modeling for Multivariate Ordinal Data,” Journal of Computational and Graphical Statistics, 14, 610–625. [648] Lo, A. Y. (1984), “On a Class of Bayesian Nonparametric Estimates: I. Density Estimates,” The Annals of Statistics, 12, 351–357. [648] MacEachern, S. N. (2000), “Dependent Dirichlet Processes,” technical report, Ohio State University, Dept. of Statistics. [647,649] (2007), Discussion of “Bayesian Nonparametric Modelling for Spatial Data Using Dirichlet Processes” by A. E. Gelfand, M. Guindani, and S. Petrone, in Bayesian Statistics 8, eds. J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, and M. West, New York: Oxford University Press, pp. 196–198. [647] Müller, P., Quintana, F., and Rosner, G. (2004), “Hierarchical Meta-Analysis Over Related Non-Parametric Bayesian Models,” Journal of the Royal Statistical Society, Ser. B, 66, 735–749. [647] Nelsen, R. B. (1999), An Introduction to Copulas, New York: Springer. [648] Ongaro, A., and Cattaneo, C. (2004), “Discrete Random Probability Measures: A General Framework for Nonparametric Bayesian Inference,” Statistics and Probability Letters, 67, 33–45. [647] Pal, N. R., and Pal, S. K. (1993), “A Review on Image Sementation Techniques,” Pattern Recognition, 26, 1277–1294. [647] Petrone, S., Guindani, M., and Gelfand, A. E. (2009), “Hybrid Dirichlet Mixture Models for Functional Data,” Journal of the Royal Statistical Society, Ser. B, 71, 755–782. [648,652] Pham, D. L., Xu, C., and Prince, J. L. (2000), “A Survey of Current Methods in Medial Image Segmentation,” Review of Biomedical Engineering, 2, 315– 337. [647] Pitman, J. (1996), “Some Developments of the Blackwell–Macqueen Urn Scheme,” in Statistics, Probability and Game Theory. Papers in Honor of David Blackwell, eds. T. S. Ferguson, L. S. Shapeley, and J. B. MacQueen, Hayward, CA: IMS, pp. 245–268. [648] Rodríguez, Dunson, and Gelfand: Latent Stick-Breaking Processes Richardson, S., and Green, P. J. (2001), “Hidden Markov Models and Disease Mapping,” Journal of the American Statistical Association, 97, 1055–1070. [655] Roberts, G., and Papaspiliopoulos, O. (2008), “Retrospective Markov Chain Monte Carlo Methods for Dirichlet Process Hierarchical Models,” Biometrika, 95, 169–186. [653] Rodriguez, A., Dunson, D. B., and Gelfand, A. E. (2008), “The Nested Dirichlet Process” (with discussion), Journal of the American Statistical Association, 103, 1131–1154. [647,651] Rodriguez, A., Dunson, D. B., and Taylor, J. (2009), “Bayesian Hierarchically Weighted Finite Mixtures Models for Samples of Distributions,” Biostatistics, 10, 155–171. [647] 659 Sethuraman, J. (1994), “A Constructive Definition of Dirichelt Priors,” Statistica Sinica, 4, 639–650. [647,648] Suri, J. S., Wilson, D., and Laxminarayan, S. (eds.) (2005), Handbook of Biomedical Image Analysis. Volume 2: Segmentation Models. Part B, New York: Springer. [647] Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006), “Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes,” Journal of the American Statistical Association, 101, 1566–1581. [647,648] van Gael, J., Saatci, Y., Teh, Y.-W., and Ghahramani, Z. (2008), “Beam Sampling for the Infinite Hidden Markov Model,” in Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland. [648]

Log In

Latent Stick-Breaking Processes

Related papers

Related papers

Related topics