Abstract
Latent feature models are a powerful tool for modeling data with globally-shared features. Nonparametric distributions over exchangeable sets of features, such as the Indian Buffet Process, offer modeling flexibility by letting the number of latent features be unbounded. However, current models impose implicit distributions over the number of latent features per data point, and these implicit distributions may not match our knowledge about the data. In this work, we demonstrate how the restricted Indian buffet process circumvents this restriction, allowing arbitrary distributions over the number of features in an observation. We discuss several alternative constructions of the model and apply the insights to develop Markov Chain Monte Carlo and variational methods for simulation and posterior inference.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Technically, a CRM can also include a deterministic, non-atomic component; however we ignore this for simplicity.
The directing measure may also have a fixed-location part, however we ignore this in our analysis.
In its original formulation (Griffiths and Ghahramani 2011), the IBP imposes an ordering on the features which breaks the exchangeability in the more abstract feature-allocation representation. Here, we slightly modify the construction to refer to the more flexible feature allocation representation.
Arguably, the tilted Bernoulli process nomenclature is perhaps a better fit for the R-IBP, since for arbitrary f the “restricted Bernoulli process” is in fact a mixture of restricted distributions. However, the tilting interpretation was not apparent when the models described in this paper were first introduced in (Williamson et al. 2013), so we continue to use original term “restricted” for consistency.
More generally, (Hanif and Brewer 1983) lists over 50 ways to sample without replacement with unequal weights in the finite case.
The wall-clock time difference between the draw-by-draw procedure using inclusion probabilities and the approximate rejection samplers may be due in part due to Matlab vectorization; a draw-by-draw procedure requires a loop to sequentially compute whether a feature is present while the rejection sampler can sample all elements of \(Z_n\) together.
References
Aires, N.: Algorithms to find exact inclusion probabilities for conditional Poisson sampling and Pareto \(\pi \)ps sampling designs. Methodol. Comput. Appl. Probab. 1, 457–469 (1999)
Aldous, D.: Exchangeability and related topics. In: Ecole d’Ete St Flour, number 1117 in Springer Lecture Notes in Mathematics, pp. 1–198. Springer (1983)
Brix, A.: Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Probab. 31(4), 929–953 (1999)
Broderick, T., Mackey, L., Paisley, J., Jordan, M., et al.: Combinatorial clustering and the beta negative binomial process. Pattern. Anal. Mach. Intell. 37(2), 290–306 (2015)
Broderick, T., Wilson, A., Jordan, M.: Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli (2014). arXiv:1410.6843v1
Brostrom, G., Nilsson, L.: Acceptance-rejection sampling from the conditional distribution of independent discrete random variables, given their sum. Stat. J. Theor. Appl. Stat. 34, 247–257 (2000)
Caron, F.: Bayesian nonparametric models for bipartite graphs. In: Advances in Neural Information Processing Systems, vol. 25, pp. 2051–2059. (2012)
Chen, S.X.: General properties and estimation of conditional Bernoulli models. J. Multivar. Anal. 74, 69–87 (2000)
Doshi, F., Miller, K.T., Van Gael, J.,Teh, Y.W.: Variational inference for the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 12, pp. 137–144. (2009)
Doshi-Velez, F., Ghahramani, Z.: Correlated non-parametric latent feature models. In: Uncertainty in Artificial Intelligence, vol. 12, pp. 143–150. (2009)
Ferguson, T.S., Klass, M.J.: A representation of independent increment processes without Gaussian components. Ann. Math. Stat. 43(5), 1634–1643 (1972)
Fox, E., Jordan, M., Sudderth, E., Willsky, A.: Sharing features among dynamical systems with beta processes. In: Advances in Neural Information Processing Systems, vol. 22, pp. 549–557. (2009)
Gerber, H.U., Shiu, E.S.: Option pricing by Esscher transforms. HEC Ecole des hautes études commerciales (1993)
Görür, D., Jäkel, F., Rasmussen, C.E.: A choice model with infinitely many latent features. In: International Conference of Machine Learning, vol. 23, pp. 361–368. (2006)
Griffiths, T.L., Ghahramani, Z.: The Indian buffet process: an introduction and review. J. Mach. Learn. Res. 12, 1185–1224 (2011)
Gupta, S., Phung, D., Venkatesh, S.: Factorial multi-task learning: a Bayesian nonparametric approach. In: International Conference of Machine Learning, vol. 30, pp. 657–665. (2013)
Hanif, M., Brewer, K.R.W.: Sampling with Unequal Probabilities. Springer-Verlag, New York (1983)
Hjort, N.L.: Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Stat. 18, 1259–1294 (1990)
James, L., Lijoi, A., Prünster, I.: Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 36(1), 76–97 (2009)
James, L.F.: Functionals of Dirichlet processes, the Cifarelli-Regazzini identity and beta-gamma processes. Ann. Stat. 33(2), 647–660 (2005)
Kingman, J.: Completely random measures. Pac. J. Math. 21(1), 59–78 (1967)
Knowles, D., Ghahramani, Z.: Infinite sparse factor analysis and infinite independent components analysis. In: Independent Component Analysis and Signal Separation, vol. 7, pp. 381–388. (2007)
Lau, J.W.: A conjugate class of random probability measures based on tilting and with its posterior analysis. Bernoulli 19(5B), 2590–2626 (2013)
Miller, K.T., Griffiths, T., Jordan, M.I.: The phylogenetic Indian buffet process: a non-exchangeable nonparametric prior for latent features. In: Uncertainty in Artificial Intelligence, vol. 24, pp. 403–410. (2008)
Miller, K.T., Griffiths, T.L., Jordan, M.I.: Nonparametric latent feature models for link prediction. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1276–1284. (2009)
Orbanz, P.: Construction of nonparametric Bayesian models from parametric Bayes equations. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1392–1400. (2009)
Papaspiliopoulos, O., Roberts, G.O.: Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95(1), 169–186 (2008)
Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25(2), 855–900 (1997)
Rosiski, J.: Series representations of Lévy processes from the perspective of point processes. In: Barndorff-Nielsen, O., Resnick, S., Mikosch, T. (eds.) Lévy Processes, pp. 401–415. Birkhuser, Boston (2001)
Ruiz, F., Valera, I., Blanco, C., Perez-Cruz, F.: Bayesian nonparametric comorbidity analysis of psychiatric disorders. J. Mach. Learn. Res. 15, 1215–1247 (2014)
Saeedi, A., Bouchard-Côté, A.: Priors over recurrent continuous time processes. In: Advances in Neural Information Processing Systems, vol. 24, pp. 2052–2060. (2011)
Teh, Y.: A hierarchical Bayesian language model based on Pitman–Yor processes. In: International Conference on Computational Linguistics, vol. 21, pp. 985–992. (2006)
Teh, Y. W., Görür, D.: Indian buffet processes with power-law behavior. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1838–1846. (2009)
Teh, Y. W., Görür, D., Ghahramani, Z.: Stick-breaking construction for the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 11, pp. 556–563. (2007)
Thibaux, R., Jordan, M. I.: Hierarchical beta processes and the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 11, pp. 564–571. (2007)
Titsias, M.: The infinite gamma-Poisson feature model. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1513–1520. (2008)
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1305 (2008)
Williamson, S.A., MacEachern, S.N., Xing, E.P.: Restricting exchangeable nonparametric distributions. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2598–2606. (2013)
Zhou, M., Chen, H., Paisley, J., Ren, L., Sapiro, G., Carin, L.: Non-parametric Bayesian dictionary learning for sparse image representations. In: Advances in Neural Information Processing Systems, vol. 22, pp. 2295–2303. (2009)
Zhou, M., Hannah, L., Dunson, D., Carin, L.: Beta-negative binomial process and Poisson factor analysis. In: Artificial Intelligence and Statistics, vol. 15, pp. 1462–1471. (2012)
Acknowledgments
The authors would like to thank Ryan P. Adams for numerous helpful discussions and suggestions, and Jeff Miller for suggesting the link to tilted random measures.
Author information
Authors and Affiliations
Corresponding author
Appendix: Impact of truncation level on approximation quality when simulating from the R-IBP
Appendix: Impact of truncation level on approximation quality when simulating from the R-IBP
In Sect. 5, we described two approximate methods for sampling from the R-IBP, that made use of a finite-dimensional approximation to the beta process-distributed measure \(\mu \). As the dimensionality I of the approximation tends to infinity, these methods will give exact samples from the R-IBP; however a fixed finite I will introduce errors. In this appendix, we discuss the errors introduced in both cases, and provide an error bound for the inclusion probability sampler.
1.1 Impact of truncation level in an inclusion probability sampler
If a size-ordered stick-breaking representation is used to approximate the weights \(\pi \), then we can directly bound the errors on the inclusion probabilities as functions of the truncation level I, the size of the smallest instantiated weight \(\pi _I\), and the function f. To do so, we first expand the expression for the probabilities \(S^\infty _J\), starting with Eq. 15:
where \(A_J(I)\) are the sets of feature allocations in which all J instantiated features are associated with one of the I largest atoms in \(\mu \). The second line follows because the probability of that none of the features associated with the remaining atoms are selected is \(\exp (-\pi _I \alpha )\).
Since the probability that at least one feature outside the most significant I features appears is \(1 - \exp (-\pi _I\alpha )\), the second term is bounded between 0 and \(1 - \exp (-\pi _I\alpha )\). Thus we can bound the inclusion probabilities
As expected, the quality of the approximation depends not only truncation I (and associated \(\pi _I\)) but also on the values \(S^I_J\). If the probability of sampling J elements from the first I is low, then the approximation will be poor because it is likely that additional features would have been required to sample J elements. These bounds can be used in situations where one can use approximate, rather than exact, probabilities.
1.2 Impact of truncation level in a rejection sampler
As \(I\rightarrow \infty \), both the weak-limit approximation of Eq. 12 and the stick-breaking construction of Eq. 13 will give exact samples from the R-IBP. However, a finite I will introduce errors. When a stick-breaking representation for \(\mu \) is used, then we know that all weights \(\pi _j\), \(j > I\) will be less than \(\pi _I\). In particular, the iterative nature of the stick-breaking construction means that, if we exclude the first I atoms \(\pi _1,\dots , \pi _I\), and scale the remaining atoms by \(\pi _I\), we are left with a (strictly ordered) sample from the beta process.
We can consider the error introduced by this construction by considering the values of \(z_{nj}\) that are excluded due to the truncation. If there are any non-zero elements \(z_{nj}\) for \(j > I\), our rejection probability will not be correct. Since the weights \(\pi _j, j>I\) are described by a scaled beta process, we know that the number of excluded non-zero elements will be distributed as \(\text{ Poisson }(\alpha \pi _I)\). So, with probability \(1-\text{ Poisson }(0;\alpha \pi _I) =1 - \exp (-\pi _I\alpha )\) the true sum \(\sum _i^\infty z_{ni} \ne \sum _i^I z_{ni}\) and thus we may incorrectly reject or accept a proposal. Conditioned on a desired number of features J, we can further break down the probability of incorrectly rejecting a proposal with \(\sum _{i=1}^I(z_{i}^*)<J\) of incorrectly accepting a proposal with \(\sum _{i=1}^I(z_{i}^*)=J\) by considering the following possible scenarios:
-
1.
\(\sum _{i=1}^I z_{i}^* > J\): We reject the proposal. This is always correct.
-
2.
\(\sum _{i=1}^I z_{i}^* = J\): We accept the proposal. However, if the truncated tail has \(\sum _{i=I+1}^\infty z_{i}^*>0\), we should really have rejected. Our decision is correct with probability \(P(\sum _{i=I+1}^\infty z_i^* = 0) = \exp (-\pi _I \alpha )\).
-
3.
\(\sum _{i=1}^I z_{i}^* < J\): We reject the proposal. However, if \(\sum _{i=1}^{I}z^*_{i}=J-k\) but the truncated tail has \(\sum _{i=I+1}^\infty z_{i}^*=k\), we will really should have accepted. Our decision is correct with probability \(1-P(\sum _{i=I+1}^\infty z_i^* = J-\sum _{i=1}^I z_{i}^* )= 1-\text{ Poisson }( J - \sum _{i=1}^I z^*_{i} ; \pi _I \alpha )\).
Rights and permissions
About this article
Cite this article
Doshi-Velez, F., Williamson, S.A. Restricted Indian buffet processes. Stat Comput 27, 1205–1223 (2017). https://doi.org/10.1007/s11222-016-9681-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9681-y