Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Restricted Indian buffet processes

  • Published:
Statistics and Computing Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Latent feature models are a powerful tool for modeling data with globally-shared features. Nonparametric distributions over exchangeable sets of features, such as the Indian Buffet Process, offer modeling flexibility by letting the number of latent features be unbounded. However, current models impose implicit distributions over the number of latent features per data point, and these implicit distributions may not match our knowledge about the data. In this work, we demonstrate how the restricted Indian buffet process circumvents this restriction, allowing arbitrary distributions over the number of features in an observation. We discuss several alternative constructions of the model and apply the insights to develop Markov Chain Monte Carlo and variational methods for simulation and posterior inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Technically, a CRM can also include a deterministic, non-atomic component; however we ignore this for simplicity.

  2. The directing measure may also have a fixed-location part, however we ignore this in our analysis.

  3. In its original formulation (Griffiths and Ghahramani 2011), the IBP imposes an ordering on the features which breaks the exchangeability in the more abstract feature-allocation representation. Here, we slightly modify the construction to refer to the more flexible feature allocation representation.

  4. Arguably, the tilted Bernoulli process nomenclature is perhaps a better fit for the R-IBP, since for arbitrary f the “restricted Bernoulli process” is in fact a mixture of restricted distributions. However, the tilting interpretation was not apparent when the models described in this paper were first introduced in (Williamson et al. 2013), so we continue to use original term “restricted” for consistency.

  5. More generally, (Hanif and Brewer 1983) lists over 50 ways to sample without replacement with unequal weights in the finite case.

  6. The wall-clock time difference between the draw-by-draw procedure using inclusion probabilities and the approximate rejection samplers may be due in part due to Matlab vectorization; a draw-by-draw procedure requires a loop to sequentially compute whether a feature is present while the rejection sampler can sample all elements of \(Z_n\) together.

  7. Source: http://www.npr.org/api/queryGenerator.php

References

  • Aires, N.: Algorithms to find exact inclusion probabilities for conditional Poisson sampling and Pareto \(\pi \)ps sampling designs. Methodol. Comput. Appl. Probab. 1, 457–469 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Aldous, D.: Exchangeability and related topics. In: Ecole d’Ete St Flour, number 1117 in Springer Lecture Notes in Mathematics, pp. 1–198. Springer (1983)

  • Brix, A.: Generalized gamma measures and shot-noise Cox processes. Adv. Appl. Probab. 31(4), 929–953 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Broderick, T., Mackey, L., Paisley, J., Jordan, M., et al.: Combinatorial clustering and the beta negative binomial process. Pattern. Anal. Mach. Intell. 37(2), 290–306 (2015)

    Article  Google Scholar 

  • Broderick, T., Wilson, A., Jordan, M.: Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli (2014). arXiv:1410.6843v1

  • Brostrom, G., Nilsson, L.: Acceptance-rejection sampling from the conditional distribution of independent discrete random variables, given their sum. Stat. J. Theor. Appl. Stat. 34, 247–257 (2000)

    MathSciNet  MATH  Google Scholar 

  • Caron, F.: Bayesian nonparametric models for bipartite graphs. In: Advances in Neural Information Processing Systems, vol. 25, pp. 2051–2059. (2012)

  • Chen, S.X.: General properties and estimation of conditional Bernoulli models. J. Multivar. Anal. 74, 69–87 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Doshi, F., Miller, K.T., Van Gael, J.,Teh, Y.W.: Variational inference for the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 12, pp. 137–144. (2009)

  • Doshi-Velez, F., Ghahramani, Z.: Correlated non-parametric latent feature models. In: Uncertainty in Artificial Intelligence, vol. 12, pp. 143–150. (2009)

  • Ferguson, T.S., Klass, M.J.: A representation of independent increment processes without Gaussian components. Ann. Math. Stat. 43(5), 1634–1643 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  • Fox, E., Jordan, M., Sudderth, E., Willsky, A.: Sharing features among dynamical systems with beta processes. In: Advances in Neural Information Processing Systems, vol. 22, pp. 549–557. (2009)

  • Gerber, H.U., Shiu, E.S.: Option pricing by Esscher transforms. HEC Ecole des hautes études commerciales (1993)

  • Görür, D., Jäkel, F., Rasmussen, C.E.: A choice model with infinitely many latent features. In: International Conference of Machine Learning, vol. 23, pp. 361–368. (2006)

  • Griffiths, T.L., Ghahramani, Z.: The Indian buffet process: an introduction and review. J. Mach. Learn. Res. 12, 1185–1224 (2011)

    MathSciNet  MATH  Google Scholar 

  • Gupta, S., Phung, D., Venkatesh, S.: Factorial multi-task learning: a Bayesian nonparametric approach. In: International Conference of Machine Learning, vol. 30, pp. 657–665. (2013)

  • Hanif, M., Brewer, K.R.W.: Sampling with Unequal Probabilities. Springer-Verlag, New York (1983)

    MATH  Google Scholar 

  • Hjort, N.L.: Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Stat. 18, 1259–1294 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  • James, L., Lijoi, A., Prünster, I.: Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 36(1), 76–97 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • James, L.F.: Functionals of Dirichlet processes, the Cifarelli-Regazzini identity and beta-gamma processes. Ann. Stat. 33(2), 647–660 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Kingman, J.: Completely random measures. Pac. J. Math. 21(1), 59–78 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  • Knowles, D., Ghahramani, Z.: Infinite sparse factor analysis and infinite independent components analysis. In: Independent Component Analysis and Signal Separation, vol. 7, pp. 381–388. (2007)

  • Lau, J.W.: A conjugate class of random probability measures based on tilting and with its posterior analysis. Bernoulli 19(5B), 2590–2626 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Miller, K.T., Griffiths, T., Jordan, M.I.: The phylogenetic Indian buffet process: a non-exchangeable nonparametric prior for latent features. In: Uncertainty in Artificial Intelligence, vol. 24, pp. 403–410. (2008)

  • Miller, K.T., Griffiths, T.L., Jordan, M.I.: Nonparametric latent feature models for link prediction. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1276–1284. (2009)

  • Orbanz, P.: Construction of nonparametric Bayesian models from parametric Bayes equations. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1392–1400. (2009)

  • Papaspiliopoulos, O., Roberts, G.O.: Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95(1), 169–186 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25(2), 855–900 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Rosiski, J.: Series representations of Lévy processes from the perspective of point processes. In: Barndorff-Nielsen, O., Resnick, S., Mikosch, T. (eds.) Lévy Processes, pp. 401–415. Birkhuser, Boston (2001)

    Chapter  Google Scholar 

  • Ruiz, F., Valera, I., Blanco, C., Perez-Cruz, F.: Bayesian nonparametric comorbidity analysis of psychiatric disorders. J. Mach. Learn. Res. 15, 1215–1247 (2014)

    MathSciNet  MATH  Google Scholar 

  • Saeedi, A., Bouchard-Côté, A.: Priors over recurrent continuous time processes. In: Advances in Neural Information Processing Systems, vol. 24, pp. 2052–2060. (2011)

  • Teh, Y.: A hierarchical Bayesian language model based on Pitman–Yor processes. In: International Conference on Computational Linguistics, vol. 21, pp. 985–992. (2006)

  • Teh, Y. W., Görür, D.: Indian buffet processes with power-law behavior. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1838–1846. (2009)

  • Teh, Y. W., Görür, D., Ghahramani, Z.: Stick-breaking construction for the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 11, pp. 556–563. (2007)

  • Thibaux, R., Jordan, M. I.: Hierarchical beta processes and the Indian buffet process. In: Artificial Intelligence and Statistics, vol. 11, pp. 564–571. (2007)

  • Titsias, M.: The infinite gamma-Poisson feature model. In: Advances in Neural Information Processing Systems, vol. 21, pp. 1513–1520. (2008)

  • Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1, 1305 (2008)

    MATH  Google Scholar 

  • Williamson, S.A., MacEachern, S.N., Xing, E.P.: Restricting exchangeable nonparametric distributions. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2598–2606. (2013)

  • Zhou, M., Chen, H., Paisley, J., Ren, L., Sapiro, G., Carin, L.: Non-parametric Bayesian dictionary learning for sparse image representations. In: Advances in Neural Information Processing Systems, vol. 22, pp. 2295–2303. (2009)

  • Zhou, M., Hannah, L., Dunson, D., Carin, L.: Beta-negative binomial process and Poisson factor analysis. In: Artificial Intelligence and Statistics, vol. 15, pp. 1462–1471. (2012)

Download references

Acknowledgments

The authors would like to thank Ryan P. Adams for numerous helpful discussions and suggestions, and Jeff Miller for suggesting the link to tilted random measures.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sinead A. Williamson.

Appendix: Impact of truncation level on approximation quality when simulating from the R-IBP

Appendix: Impact of truncation level on approximation quality when simulating from the R-IBP

In Sect. 5, we described two approximate methods for sampling from the R-IBP, that made use of a finite-dimensional approximation to the beta process-distributed measure \(\mu \). As the dimensionality I of the approximation tends to infinity, these methods will give exact samples from the R-IBP; however a fixed finite I will introduce errors. In this appendix, we discuss the errors introduced in both cases, and provide an error bound for the inclusion probability sampler.

1.1 Impact of truncation level in an inclusion probability sampler

If a size-ordered stick-breaking representation is used to approximate the weights \(\pi \), then we can directly bound the errors on the inclusion probabilities as functions of the truncation level I, the size of the smallest instantiated weight \(\pi _I\), and the function f. To do so, we first expand the expression for the probabilities \(S^\infty _J\), starting with Eq. 15:

$$\begin{aligned} \textstyle S_J^\infty= & {} \sum _{s \in A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s} ( 1 - \pi _j )\\&+ \sum _{s \ni A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s} ( 1 - \pi _j ) \\= & {} \exp (-\pi _I\alpha ) \sum _{s \in A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s, j \le I} ( 1 - \pi _j )\\&+ \sum _{s \ni A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s} ( 1 - \pi _j ) \\= & {} \exp (-\pi _I\alpha ) S^I_J + \sum _{s \ni A_J(I)} \prod _{k \in s} \pi _k \prod _{j \ni s} ( 1 - \pi _j ) \end{aligned}$$

where \(A_J(I)\) are the sets of feature allocations in which all J instantiated features are associated with one of the I largest atoms in \(\mu \). The second line follows because the probability of that none of the features associated with the remaining atoms are selected is \(\exp (-\pi _I \alpha )\).

Since the probability that at least one feature outside the most significant I features appears is \(1 - \exp (-\pi _I\alpha )\), the second term is bounded between 0 and \(1 - \exp (-\pi _I\alpha )\). Thus we can bound the inclusion probabilities

$$\begin{aligned}&\eta _{k;J}= \pi _k \frac{ S_{J-1}^{\infty }( \pi _1,...,\pi _{k-1},\pi _{k+1},...,\pi _I ) }{ S_J^{\infty }( \pi _1,...,\pi _I ) } \\&\quad \ge \pi _k \frac{ e^{-\pi _I\alpha }S_{J-1}^{I-1}( \pi _1,...,\pi _{k-1},\pi _{k+1},...,\pi _I ) }{ e^{-\pi _I\alpha }S_J^{I}( \pi _1,...,\pi _I ) + ( 1 - e^{-\pi _I\alpha } ) }\\&\quad \le \pi _k \frac{ e^{-\pi _I\alpha }S_{J-1}^{I-1}( \pi _1,...,\pi _{k-1},\pi _{k+1},...,\pi _I ) + ( 1 - e^{-\pi _I\alpha }) }{ e^{-\pi _I\alpha }S_J^{I}( \pi _1,...,\pi _I ) } \end{aligned}$$

As expected, the quality of the approximation depends not only truncation I (and associated \(\pi _I\)) but also on the values \(S^I_J\). If the probability of sampling J elements from the first I is low, then the approximation will be poor because it is likely that additional features would have been required to sample J elements. These bounds can be used in situations where one can use approximate, rather than exact, probabilities.

1.2 Impact of truncation level in a rejection sampler

As \(I\rightarrow \infty \), both the weak-limit approximation of Eq. 12 and the stick-breaking construction of Eq. 13 will give exact samples from the R-IBP. However, a finite I will introduce errors. When a stick-breaking representation for \(\mu \) is used, then we know that all weights \(\pi _j\), \(j > I\) will be less than \(\pi _I\). In particular, the iterative nature of the stick-breaking construction means that, if we exclude the first I atoms \(\pi _1,\dots , \pi _I\), and scale the remaining atoms by \(\pi _I\), we are left with a (strictly ordered) sample from the beta process.

We can consider the error introduced by this construction by considering the values of \(z_{nj}\) that are excluded due to the truncation. If there are any non-zero elements \(z_{nj}\) for \(j > I\), our rejection probability will not be correct. Since the weights \(\pi _j, j>I\) are described by a scaled beta process, we know that the number of excluded non-zero elements will be distributed as \(\text{ Poisson }(\alpha \pi _I)\). So, with probability \(1-\text{ Poisson }(0;\alpha \pi _I) =1 - \exp (-\pi _I\alpha )\) the true sum \(\sum _i^\infty z_{ni} \ne \sum _i^I z_{ni}\) and thus we may incorrectly reject or accept a proposal. Conditioned on a desired number of features J, we can further break down the probability of incorrectly rejecting a proposal with \(\sum _{i=1}^I(z_{i}^*)<J\) of incorrectly accepting a proposal with \(\sum _{i=1}^I(z_{i}^*)=J\) by considering the following possible scenarios:

  1. 1.

    \(\sum _{i=1}^I z_{i}^* > J\): We reject the proposal. This is always correct.

  2. 2.

    \(\sum _{i=1}^I z_{i}^* = J\): We accept the proposal. However, if the truncated tail has \(\sum _{i=I+1}^\infty z_{i}^*>0\), we should really have rejected. Our decision is correct with probability \(P(\sum _{i=I+1}^\infty z_i^* = 0) = \exp (-\pi _I \alpha )\).

  3. 3.

    \(\sum _{i=1}^I z_{i}^* < J\): We reject the proposal. However, if \(\sum _{i=1}^{I}z^*_{i}=J-k\) but the truncated tail has \(\sum _{i=I+1}^\infty z_{i}^*=k\), we will really should have accepted. Our decision is correct with probability \(1-P(\sum _{i=I+1}^\infty z_i^* = J-\sum _{i=1}^I z_{i}^* )= 1-\text{ Poisson }( J - \sum _{i=1}^I z^*_{i} ; \pi _I \alpha )\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Doshi-Velez, F., Williamson, S.A. Restricted Indian buffet processes. Stat Comput 27, 1205–1223 (2017). https://doi.org/10.1007/s11222-016-9681-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-016-9681-y

Keywords