Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Latent features in similarity judgments: A nonparametric bayesian approach

Published: 01 November 2008 Publication History

Abstract

One of the central problems in cognitive science is determining the mental representations that underlie human inferences. Solutions to this problem often rely on the analysis of subjective similarity judgments, on the assumption that recognizing likenesses between people, objects, and events is crucial to everyday inference. One such solution is provided by the additive clustering model, which is widely used to infer the features of a set of stimuli from their similarities, on the assumption that similarity is a weighted linear function of common features. Existing approaches for implementing additive clustering often lack a complete framework for statistical inference, particularly with respect to choosing the number of features. To address these problems, this article develops a fully Bayesian formulation of the additive clustering model, using methods from nonparametric Bayesian statistics to allow the number of features to vary. We use this to explore several approaches to parameter estimation, showing that the nonparametric Bayesian approach provides a straightforward way to obtain estimates of both the number of features and their importance.

References

[1]
Aldous, D. (1985). Exchangeability and related topics. In École d'été de probabilités de Saint-Flour, XIII-1983 (pp. 1-198). Berlin: Springer.
[2]
Arabie, P., & Carroll, J. D. (1980). MAPCLUS: A mathematical programming approach to fitting the ADCLUS model. Psychometrika, 45, 211-235.
[3]
Attneave, F. (1950). Dimensions of similarity. American Journal of Psychology, 63, 546- 554.
[4]
Balasubramanian, V. (1997). Statistical inference, Occam's razor, and statistical mechanics on the space of probability distributions. Neural Computation, 9, 349-368.
[5]
Bernardo, J. M., & Smith, A. F. M. (2000). Bayesian theory (2nd ed). New York: Wiley.
[6]
Blackwell, D., & MacQueen, J. (1973). Ferguson distributions via Polya urn schemes. Annals of Statistics, 1, 353-355.
[7]
Browne, M.W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111-150.
[8]
Buneman, P. (1971). The recovery of trees from measures of dissimilarity. In F. R. Hodson, D. G. Kendall, & P. Tautu (Eds.), Mathematics in the archaelogical and historical sciences (pp. 387-395). Edinburgh, UK: Edinburgh University Press.
[9]
Chen, M., Shao, Q., & Ibrahim, J. G. (2000). Monte Carlo methods in Bayesian computation . NewYork: Springer.
[10]
Cole, A. J., & Wishart, D. (1970). An improved algorithm for the Jardine-Sibson method of generating overlapping clusters. Computer Journal, 13, 156-163.
[11]
Corter, J., & Tversky, A. (1986). Extended similarity trees. Psychometrika, 51, 429-451.
[12]
Cowles, M. K., & Carlin, B. P. (1996). Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association, 91, 833-904.
[13]
Cunningham, J. P. (1978). Free trees and bidirectional trees as representations of psychological distance. Journal of Mathematical Psychology, 17, 165-188.
[14]
D'Andrade, R. (1978). U-statistic hierarchical clustering. Psychometrika, 4, 58-67.
[15]
Dayhoff, M., Schwartz, R., & Orcutt, B. (1978). A model of evolutionary change in proteins. In M. O. Dayhoff (Ed.), Atlas of protein sequence and structure, 5 (pp. 345-352). Washington, DC: National Biomedical Research Foundation.
[16]
de Bruijn, N. G. (1958). Asymptotic methods in analysis. Amsterdam: North-Holland.
[17]
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39, 1-38.
[18]
Ekman, G. (1954). Dimensions of color vision. Journal of Psychology, 38, 467-474.
[19]
Ekman, G. (1963). A direct method for multidimensional ratio scaling. Psychometrika, 28, 33-41.
[20]
Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577-588.
[21]
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1, 209-230.
[22]
Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. Annals of Statistics, 2, 615-629.
[23]
Frank, L. E., & Heiser, W. J. (2008). Feature selection in feature network models: Finding predictive subsets of features with the Positive Lasso. British Journal of Mathematical and Statistical Psychology, 61, 1-27.
[24]
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721-741.
[25]
Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science, 7, 155-170.
[26]
Gibson, E. J., Osser, H., Schiff, W., & Smith, J. (1963). An analysis of critical features of letters, tested by a confusion matrix (Tech. Rep., Cooperative Research Project No. 639). Ithaca, NY: Cornell University.
[27]
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1995). Markov chain Monte Carlo in practice. London: Chapman and Hall.
[28]
Goldstone, R. L. (1994). Similarity, interactive activation, and mapping. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 3-28.
[29]
Goodman, N. (1972). Seven strictures on similarity. In N. Goodman (Ed.), Problems and projects. Indianapolis: Bobbs-Merrill.
[30]
Griffiths, T. L., & Ghahramani, Z. (2005). Infinite latent feature models and the Indian buffet process (Tech. Rep. No. 2005-001). London: Gatsby Computational Neuroscience Unit.
[31]
Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, 28, 100-108.
[32]
Henikoff, S., & Henikoff, J. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences USA, 89, 10915-10919.
[33]
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 417-441, 498-520.
[34]
Hutchinson, J. W., & Mungale, A. (1997). Pairwise partitioning: A nonmetric algorithm for identifying feature-based similarity structures. Psychometrika, 62, 85- 117.
[35]
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys, 31, 264-323.
[36]
Jain, S., & Neal, R. M. (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13, 158-182.
[37]
Jardine, N., & Sibson, R. (1968). The construction of hierarchic and nonhierarchic classifications. Computer Journal, 11, 177-184.
[38]
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241-254.
[39]
Komatsu, L. K. (1992). Recent views of conceptual structure. Psychological Bulletin, 112(3), 500-526.
[40]
Kontkanen, P., Myllymäki, P., Buntine, W., Rissanen, J., & Tirri, H. (2005). An MDL framework for data clustering. In P. Grünwald, I. J. Myung, & M. A. Pitt (Eds.), Advances in minimum description length: Theory and applications. Cambridge, MA: MIT Press.
[41]
Kraft, C. H. (1964). A class of distribution function processes which have derivatives. Journal of Applied Probability, 1, 385-388.
[42]
Kruschke, J. K. (1993). Human category learning: Implications for backpropagation models. Connection Science, 5, 3-36.
[43]
Kruskal, J. B. (1964a). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1-27.
[44]
Kruskal, J. B. (1964b). Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29, 115-129.
[45]
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211-240.
[46]
Lawley, D. N., & Maxwell, A. E. (1963). Factor analysis as a statistical model. London: Butterworths.
[47]
Lawson, C. L., & Hanson, R. J. (1974). Solving least squares problems. Englewood Cliffs, NJ: Prentice Hall.
[48]
Lee, M. D. (1998). Neural feature abstraction from judgments of similarity. Neural Computation, 10(7), 1815-1830.
[49]
Lee, M. D. (2001). Determining the dimensionality of multidimensional scaling models for cognitive modeling. Journal of Mathematical Psychology, 45, 149-166.
[50]
Lee, M. D. (2002). Generating additive clustering models with limited stochastic complexity. Journal of Classification, 19, 69-85.
[51]
Lee, M. D., & Navarro, D. J. (2002). Extending the ALCOVE model of category learning to featural stimulus domains. Psychonomic Bulletin and Review, 9, 43-58.
[52]
Lee, M. D., Pincombe, B. M., & Welsh, M. B. (2005). An empirical evaluation of models of text document similarity. In Proceedings of the 27th Annual Conference of the Cognitive Science Society (pp. 1254-1259). Mahwah, NJ: Erlbaum.
[53]
McAdams, S., Winsberg, S., Donnadieu, S., de Soete, G., & Krimphoff, J. (1995). Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes. Psychological Research, 58, 177-192.
[54]
McLachlan, G. J., & Basford, K. E. (1988). Mixture models: Inference and applications to clustering. New York: Dekker.
[55]
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281-297). Berkeley: University of California Press.
[56]
Medin, D. L., & Ortony, A. (1989). Psychological essentialism. In S. Vosniadou & A. Ortony (Eds.), Similarity and analogical reasoning. Cambridge: Cambridge University Press.
[57]
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A. H., & Teller, E. (1953). Equations of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087-1092.
[58]
Mirkin, B. G. (1987). Additive clustering and qualitative factor analysis methods for similarity matrices. Journal of Classification, 4, 7-31.
[59]
Myung, I. J., Balasubramanian, V., & Pitt, M. A. (2000). Counting probability distributions: Differential geometry and model selection. Proceedings of the National Academy of Sciences, 97, 11170-11175.
[60]
Navarro, D. J. (2003). Representing stimulus similarity. Unpublished doctoral dissertation, University of Adelaide.
[61]
Navarro, D. J., & Griffiths, T. L. (2007). A nonparametric Bayesian method for inferring features from similarity judgments. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19 (pp. 1033-1040). Cambridge, MA: MIT Press.
[62]
Navarro, D. J., Griffiths, T. L., Steyvers, M., & Lee, M. D. (2006). Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology, 50, 101- 122.
[63]
Navarro, D. J., & Lee, M. D. (2002). Commonalities and distinctions in featural stimulus representations. In Proceedings of the 24th Annual Conference of the Cognitive Science Society (pp. 685-690). Mahwah, NJ: Erlbaum.
[64]
Navarro, D. J., & Lee, M. D. (2003). Combining dimensions and features in similarity-based representations. In S. Becker, S. Thrün, & K. Obermayer (Eds.), Advances in neural information processing systems 15 (pp. 67-74) Cambridge, MA: MIT Press.
[65]
Navarro, D. J., & Lee, M. D. (2004). Common and distinctive features in stimulus representation: A modified version of the contrast model. Psychonomic Bulletin and Review, 11, 961-974.
[66]
Neal, R. M. (2003). Density modeling and clustering using Dirichlet diffusion trees. Bayesian Statistics, 7, 619-629.
[67]
Pearson, K. (1901). On lines and planes of closest fit to a system of points in space. Philosophical Magazine, 2, 559-572.
[68]
Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme. In T. S. Ferguson (Ed.), Statistics, probability and game theory: Papers in honor of David Blackwell (pp. 245-267). Hayward, CA: Institute of Mathematical Studies.
[69]
Ramsay, J. O. (1982). Some statistical approaches to multidimensional scaling data. Journal of the Royal Statistical Society. Series A (General), 145, 285-312.
[70]
Rasmussen, C. (2000). The infinite gaussian mixture model. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems, 12. Cambridge, MA: MIT Press.
[71]
Rissanen, J. (1996). Fisher information and stochastic complexity. IEEE Transactions on Information Theory, 42(1), 40-47.
[72]
Rosenberg, S., & Kim, M. P. (1975). The method of sorting as a data-generating procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.
[73]
Ruml, W. (2001). Constructing distributed representations using additive clustering. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14. Cambridge, MA: MIT Press.
[74]
Sattath, S., & Tversky, A. (1977). Additive similarity trees. Psychometrika, 42, 319-345.
[75]
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461-464.
[76]
Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4, 639-650.
[77]
Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with and unknown distance function; I and II. Psychometrika, 27, 125-140, 219-246.
[78]
Shepard, R. N. (1972). Psychological representation of speech sounds. In E. E. David & P. B.Denes (Eds.), Human communication: A unified view (pp. 67-113). New York: McGraw-Hill.
[79]
Shepard, R. N., & Arabie, P. (1979). Additive clustering: Representation of similarities as combinations of discrete overlapping properties. Psychological Review, 86, 87- 123.
[80]
Shepard, R. N., Kilpatric, D. W., & Cunningham, J. P. (1975). The internal representation of numbers. Cognitive Psychology, 7, 82-138.
[81]
Sneath, P. H. A. (1957). The application of computers to taxonomy. Journal of General Microbiology, 17, 201-206.
[82]
Sokal, R. R. (1974). Classification: Purposes, principles, progress, prospects. Science, 185, 1115-1123.
[83]
Sokal, R. R., & Sneath, P. H. A. (1963). Principles of numerical taxonomy. San Francisco: Freeman.
[84]
Spearman, C. (1904). General intelligence, objectively determined and measured. American Journal of Psychology, 15, 201-293.
[85]
Spearman, C. (1927). The abilities of man. New York: Macmillan.
[86]
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680.
[87]
Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of Experimental Psychology. New York: Wiley.
[88]
Tenenbaum, J. B. (1996). Learning the structure of similarity. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 3-9). Cambridge, MA: MIT Press.
[89]
Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24, 629-641.
[90]
Thompson, A. C. (1996). Minkowski geometry. Cambridge, UK: Cambridge University Press.
[91]
Thurstone, L. L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press.
[92]
Torgerson, W. S. (1958). Theory and methods of scaling. NewYork: Wiley.
[93]
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352.
[94]
Wolfe, J. H. (1970). Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5, 329-350.
[95]
Young, G., & Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19-22.

Cited By

View all
  • (2022)VICEProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602709(33661-33675)Online publication date: 28-Nov-2022
  • (2015)Distance Dependent Infinite Latent Feature ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2014.232138737:2(334-345)Online publication date: 1-Feb-2015
  • (2015)Role Discovery in NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.234991327:4(1112-1131)Online publication date: 5-Mar-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Neural Computation
Neural Computation  Volume 20, Issue 11
November 2008
265 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 November 2008

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)VICEProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602709(33661-33675)Online publication date: 28-Nov-2022
  • (2015)Distance Dependent Infinite Latent Feature ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2014.232138737:2(334-345)Online publication date: 1-Feb-2015
  • (2015)Role Discovery in NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.234991327:4(1112-1131)Online publication date: 5-Mar-2015
  • (2015)Learning latent features by nonnegative matrix factorization combining similarity judgmentsNeurocomputing10.1016/j.neucom.2014.12.050155:C(43-52)Online publication date: 1-May-2015
  • (2012)Human memory search as a random walk in a semantic networkProceedings of the 26th International Conference on Neural Information Processing Systems - Volume 210.5555/2999325.2999474(3041-3049)Online publication date: 3-Dec-2012
  • (2009)Indian buffet processes with power-law behaviorProceedings of the 23rd International Conference on Neural Information Processing Systems10.5555/2984093.2984299(1838-1846)Online publication date: 7-Dec-2009
  • (2009)Nonparametric latent feature models for link predictionProceedings of the 23rd International Conference on Neural Information Processing Systems10.5555/2984093.2984237(1276-1284)Online publication date: 7-Dec-2009
  • (2009)Accelerated sampling for the Indian Buffet ProcessProceedings of the 26th Annual International Conference on Machine Learning10.1145/1553374.1553409(273-280)Online publication date: 14-Jun-2009

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media