Restricted Indian buffet processes

Doshi-Velez, Finale; Williamson, Sinead A.

doi:10.1007/s11222-016-9681-y

Restricted Indian buffet processes

Published: 15 July 2016

Volume 27, pages 1205–1223, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Finale Doshi-Velez¹ &
Sinead A. Williamson²

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Latent feature models are a powerful tool for modeling data with globally-shared features. Nonparametric distributions over exchangeable sets of features, such as the Indian Buffet Process, offer modeling flexibility by letting the number of latent features be unbounded. However, current models impose implicit distributions over the number of latent features per data point, and these implicit distributions may not match our knowledge about the data. In this work, we demonstrate how the restricted Indian buffet process circumvents this restriction, allowing arbitrary distributions over the number of features in an observation. We discuss several alternative constructions of the model and apply the insights to develop Markov Chain Monte Carlo and variational methods for simulation and posterior inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Biased Coin Flip Process for Nonparametric Topic Modeling

Construction of Jointly Distributed Random Samples Drawn from the Beta Two-Parameter Process

Article 27 June 2023

Density Estimation: Models Beyond the DP

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Technically, a CRM can also include a deterministic, non-atomic component; however we ignore this for simplicity.
The directing measure may also have a fixed-location part, however we ignore this in our analysis.
In its original formulation (Griffiths and Ghahramani 2011), the IBP imposes an ordering on the features which breaks the exchangeability in the more abstract feature-allocation representation. Here, we slightly modify the construction to refer to the more flexible feature allocation representation.
Arguably, the tilted Bernoulli process nomenclature is perhaps a better fit for the R-IBP, since for arbitrary f the “restricted Bernoulli process” is in fact a mixture of restricted distributions. However, the tilting interpretation was not apparent when the models described in this paper were first introduced in (Williamson et al. 2013), so we continue to use original term “restricted” for consistency.
More generally, (Hanif and Brewer 1983) lists over 50 ways to sample without replacement with unequal weights in the finite case.
The wall-clock time difference between the draw-by-draw procedure using inclusion probabilities and the approximate rejection samplers may be due in part due to Matlab vectorization; a draw-by-draw procedure requires a loop to sequentially compute whether a feature is present while the rejection sampler can sample all elements of $Z_n$ together.
Source: http://www.npr.org/api/queryGenerator.php

References

Aires, N.: Algorithms to find exact inclusion probabilities for conditional Poisson sampling and Pareto $\pi $ps sampling designs. Methodol. Comput. Appl. Probab. 1, 457–469 (1999)
Article MathSciNet MATH Google Scholar
Aldous, D.: Exchangeability and related topics. In: Ecole d’Ete St Flour, number 1117 in Springer Lecture Notes in Mathematics, pp. 1–198. Springer (1983)
Brix, A.: Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Probab. 31(4), 929–953 (1999)
Article MathSciNet MATH Google Scholar
Broderick, T., Mackey, L., Paisley, J., Jordan, M., et al.: Combinatorial clustering and the beta negative binomial process. Pattern. Anal. Mach. Intell. 37(2), 290–306 (2015)
Article Google Scholar
Broderick, T., Wilson, A., Jordan, M.: Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli (2014). arXiv:1410.6843v1
Brostrom, G., Nilsson, L.: Acceptance-rejection sampling from the conditional distribution of independent discrete random variables, given their sum. Stat. J. Theor. Appl. Stat. 34, 247–257 (2000)
MathSciNet MATH Google Scholar
Caron, F.: Bayesian nonparametric models for bipartite graphs. In: Advances in Neural Information Processing Systems, vol. 25, pp. 2051–2059. (2012)
Chen, S.X.: General properties and estimation of conditional Bernoulli models. J. Multivar. Anal. 74, 69–87 (2000)
Article MathSciNet MATH Google Scholar
Doshi, F., Miller, K.T., Van Gael, J.,Teh, Y.W.: Variational inference for the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 12, pp. 137–144. (2009)
Doshi-Velez, F., Ghahramani, Z.: Correlated non-parametric latent feature models. In: Uncertainty in Artificial Intelligence, vol. 12, pp. 143–150. (2009)
Ferguson, T.S., Klass, M.J.: A representation of independent increment processes without Gaussian components. Ann. Math. Stat. 43(5), 1634–1643 (1972)
Article MathSciNet MATH Google Scholar
Fox, E., Jordan, M., Sudderth, E., Willsky, A.: Sharing features among dynamical systems with beta processes. In: Advances in Neural Information Processing Systems, vol. 22, pp. 549–557. (2009)
Gerber, H.U., Shiu, E.S.: Option pricing by Esscher transforms. HEC Ecole des hautes études commerciales (1993)
Görür, D., Jäkel, F., Rasmussen, C.E.: A choice model with infinitely many latent features. In: International Conference of Machine Learning, vol. 23, pp. 361–368. (2006)
Griffiths, T.L., Ghahramani, Z.: The Indian buffet process: an introduction and review. J. Mach. Learn. Res. 12, 1185–1224 (2011)
MathSciNet MATH Google Scholar
Gupta, S., Phung, D., Venkatesh, S.: Factorial multi-task learning: a Bayesian nonparametric approach. In: International Conference of Machine Learning, vol. 30, pp. 657–665. (2013)
Hanif, M., Brewer, K.R.W.: Sampling with Unequal Probabilities. Springer-Verlag, New York (1983)
MATH Google Scholar
Hjort, N.L.: Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Stat. 18, 1259–1294 (1990)
Article MathSciNet MATH Google Scholar
James, L., Lijoi, A., Prünster, I.: Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 36(1), 76–97 (2009)
Article MathSciNet MATH Google Scholar
James, L.F.: Functionals of Dirichlet processes, the Cifarelli-Regazzini identity and beta-gamma processes. Ann. Stat. 33(2), 647–660 (2005)
Article MathSciNet MATH Google Scholar
Kingman, J.: Completely random measures. Pac. J. Math. 21(1), 59–78 (1967)
Article MathSciNet MATH Google Scholar
Knowles, D., Ghahramani, Z.: Infinite sparse factor analysis and infinite independent components analysis. In: Independent Component Analysis and Signal Separation, vol. 7, pp. 381–388. (2007)
Lau, J.W.: A conjugate class of random probability measures based on tilting and with its posterior analysis. Bernoulli 19(5B), 2590–2626 (2013)
Article MathSciNet MATH Google Scholar
Miller, K.T., Griffiths, T., Jordan, M.I.: The phylogenetic Indian buffet process: a non-exchangeable nonparametric prior for latent features. In: Uncertainty in Artificial Intelligence, vol. 24, pp. 403–410. (2008)
Miller, K.T., Griffiths, T.L., Jordan, M.I.: Nonparametric latent feature models for link prediction. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1276–1284. (2009)
Orbanz, P.: Construction of nonparametric Bayesian models from parametric Bayes equations. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1392–1400. (2009)
Papaspiliopoulos, O., Roberts, G.O.: Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95(1), 169–186 (2008)
Article MathSciNet MATH Google Scholar
Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25(2), 855–900 (1997)
Article MathSciNet MATH Google Scholar
Rosiski, J.: Series representations of Lévy processes from the perspective of point processes. In: Barndorff-Nielsen, O., Resnick, S., Mikosch, T. (eds.) Lévy Processes, pp. 401–415. Birkhuser, Boston (2001)
Chapter Google Scholar
Ruiz, F., Valera, I., Blanco, C., Perez-Cruz, F.: Bayesian nonparametric comorbidity analysis of psychiatric disorders. J. Mach. Learn. Res. 15, 1215–1247 (2014)
MathSciNet MATH Google Scholar
Saeedi, A., Bouchard-Côté, A.: Priors over recurrent continuous time processes. In: Advances in Neural Information Processing Systems, vol. 24, pp. 2052–2060. (2011)
Teh, Y.: A hierarchical Bayesian language model based on Pitman–Yor processes. In: International Conference on Computational Linguistics, vol. 21, pp. 985–992. (2006)
Teh, Y. W., Görür, D.: Indian buffet processes with power-law behavior. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1838–1846. (2009)
Teh, Y. W., Görür, D., Ghahramani, Z.: Stick-breaking construction for the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 11, pp. 556–563. (2007)
Thibaux, R., Jordan, M. I.: Hierarchical beta processes and the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 11, pp. 564–571. (2007)
Titsias, M.: The infinite gamma-Poisson feature model. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1513–1520. (2008)
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1305 (2008)
MATH Google Scholar
Williamson, S.A., MacEachern, S.N., Xing, E.P.: Restricting exchangeable nonparametric distributions. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2598–2606. (2013)
Zhou, M., Chen, H., Paisley, J., Ren, L., Sapiro, G., Carin, L.: Non-parametric Bayesian dictionary learning for sparse image representations. In: Advances in Neural Information Processing Systems, vol. 22, pp. 2295–2303. (2009)
Zhou, M., Hannah, L., Dunson, D., Carin, L.: Beta-negative binomial process and Poisson factor analysis. In: Artificial Intelligence and Statistics, vol. 15, pp. 1462–1471. (2012)

Download references

Acknowledgments

The authors would like to thank Ryan P. Adams for numerous helpful discussions and suggestions, and Jeff Miller for suggesting the link to tilted random measures.

Author information

Authors and Affiliations

Harvard Paulson School, 29 Oxford Street, Cambridge, MA, 02138, UK
Finale Doshi-Velez
McCombs School of Business, University of Texas at Austin, 2110 Speedway, Austin, TX, 78705, USA
Sinead A. Williamson

Authors

Finale Doshi-Velez
View author publications
You can also search for this author in PubMed Google Scholar
Sinead A. Williamson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sinead A. Williamson.

Appendix: Impact of truncation level on approximation quality when simulating from the R-IBP

In Sect. 5, we described two approximate methods for sampling from the R-IBP, that made use of a finite-dimensional approximation to the beta process-distributed measure $\mu $. As the dimensionality I of the approximation tends to infinity, these methods will give exact samples from the R-IBP; however a fixed finite I will introduce errors. In this appendix, we discuss the errors introduced in both cases, and provide an error bound for the inclusion probability sampler.

1.1 Impact of truncation level in an inclusion probability sampler

If a size-ordered stick-breaking representation is used to approximate the weights $\pi $, then we can directly bound the errors on the inclusion probabilities as functions of the truncation level I, the size of the smallest instantiated weight $\pi _I$, and the function f. To do so, we first expand the expression for the probabilities $S^\infty _J$, starting with Eq. 15:

$$\begin{aligned} \textstyle S_J^\infty= & {} \sum _{s \in A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s} ( 1 - \pi _j )\\&+ \sum _{s \ni A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s} ( 1 - \pi _j ) \\= & {} \exp (-\pi _I\alpha ) \sum _{s \in A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s, j \le I} ( 1 - \pi _j )\\&+ \sum _{s \ni A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s} ( 1 - \pi _j ) \\= & {} \exp (-\pi _I\alpha ) S^I_J + \sum _{s \ni A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s} ( 1 - \pi _j ) \end{aligned}$$

where $A_J(I)$ are the sets of feature allocations in which all J instantiated features are associated with one of the I largest atoms in $\mu $. The second line follows because the probability of that none of the features associated with the remaining atoms are selected is $\exp (-\pi _I \alpha )$.

Since the probability that at least one feature outside the most significant I features appears is $1 - \exp (-\pi _I\alpha )$, the second term is bounded between 0 and $1 - \exp (-\pi _I\alpha )$. Thus we can bound the inclusion probabilities

$$\begin{aligned}&\eta _{k;J}= \pi _k \frac{ S_{J-1}^{\infty }( \pi _1,...,\pi _{k-1},\pi _{k+1},...,\pi _I ) }{ S_J^{\infty }( \pi _1,...,\pi _I ) } \\&\quad \ge \pi _k \frac{ e^{-\pi _I\alpha }S_{J-1}^{I-1}( \pi _1,...,\pi _{k-1},\pi _{k+1},...,\pi _I ) }{ e^{-\pi _I\alpha }S_J^{I}( \pi _1,...,\pi _I ) + ( 1 - e^{-\pi _I\alpha } ) }\\&\quad \le \pi _k \frac{ e^{-\pi _I\alpha }S_{J-1}^{I-1}( \pi _1,...,\pi _{k-1},\pi _{k+1},...,\pi _I ) + ( 1 - e^{-\pi _I\alpha }) }{ e^{-\pi _I\alpha }S_J^{I}( \pi _1,...,\pi _I ) } \end{aligned}$$

As expected, the quality of the approximation depends not only truncation I (and associated $\pi _I$) but also on the values $S^I_J$. If the probability of sampling J elements from the first I is low, then the approximation will be poor because it is likely that additional features would have been required to sample J elements. These bounds can be used in situations where one can use approximate, rather than exact, probabilities.

1.2 Impact of truncation level in a rejection sampler

As $I\rightarrow \infty $, both the weak-limit approximation of Eq. 12 and the stick-breaking construction of Eq. 13 will give exact samples from the R-IBP. However, a finite I will introduce errors. When a stick-breaking representation for $\mu $ is used, then we know that all weights $\pi _j$, $j > I$ will be less than $\pi _I$. In particular, the iterative nature of the stick-breaking construction means that, if we exclude the first I atoms $\pi _1,\dots , \pi _I$, and scale the remaining atoms by $\pi _I$, we are left with a (strictly ordered) sample from the beta process.

We can consider the error introduced by this construction by considering the values of $z_{nj}$ that are excluded due to the truncation. If there are any non-zero elements $z_{nj}$ for $j > I$, our rejection probability will not be correct. Since the weights $\pi _j, j>I$ are described by a scaled beta process, we know that the number of excluded non-zero elements will be distributed as $\text{ Poisson }(\alpha \pi _I)$. So, with probability $1-\text{ Poisson }(0;\alpha \pi _I) =1 - \exp (-\pi _I\alpha )$ the true sum $\sum _i^\infty z_{ni} \ne \sum _i^I z_{ni}$ and thus we may incorrectly reject or accept a proposal. Conditioned on a desired number of features J, we can further break down the probability of incorrectly rejecting a proposal with $\sum _{i=1}^I(z_{i}^*)<J$ of incorrectly accepting a proposal with $\sum _{i=1}^I(z_{i}^*)=J$ by considering the following possible scenarios:

1.
$\sum _{i=1}^I z_{i}^* > J$: We reject the proposal. This is always correct.
2.
$\sum _{i=1}^I z_{i}^* = J$: We accept the proposal. However, if the truncated tail has $\sum _{i=I+1}^\infty z_{i}^*>0$, we should really have rejected. Our decision is correct with probability $P(\sum _{i=I+1}^\infty z_i^* = 0) = \exp (-\pi _I \alpha )$.
3.
$\sum _{i=1}^I z_{i}^* < J$: We reject the proposal. However, if $\sum _{i=1}^{I}z^*_{i}=J-k$ but the truncated tail has $\sum _{i=I+1}^\infty z_{i}^*=k$, we will really should have accepted. Our decision is correct with probability $1-P(\sum _{i=I+1}^\infty z_i^* = J-\sum _{i=1}^I z_{i}^* )= 1-\text{ Poisson }( J - \sum _{i=1}^I z^*_{i} ; \pi _I \alpha )$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doshi-Velez, F., Williamson, S.A. Restricted Indian buffet processes. Stat Comput 27, 1205–1223 (2017). https://doi.org/10.1007/s11222-016-9681-y

Download citation

Received: 20 August 2015
Accepted: 27 June 2016
Published: 15 July 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11222-016-9681-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Restricted Indian buffet processes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Biased Coin Flip Process for Nonparametric Topic Modeling

Construction of Jointly Distributed Random Samples Drawn from the Beta Two-Parameter Process

Density Estimation: Models Beyond the DP

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Impact of truncation level on approximation quality when simulating from the R-IBP

1.1 Impact of truncation level in an inclusion probability sampler

1.2 Impact of truncation level in a rejection sampler

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Restricted Indian buffet processes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

The Biased Coin Flip Process for Nonparametric Topic Modeling

Construction of Jointly Distributed Random Samples Drawn from the Beta Two-Parameter Process

Density Estimation: Models Beyond the DP

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Impact of truncation level on approximation quality when simulating from the R-IBP

Appendix: Impact of truncation level on approximation quality when simulating from the R-IBP

1.1 Impact of truncation level in an inclusion probability sampler

1.2 Impact of truncation level in a rejection sampler

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation