Abstract
The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier by Zhang (2002), but that study involved only small applications with 4 or 5 observed variables and no more than 2 latent variables due to the lack of efficient learning algorithms. Significant progress has been made since then on algorithmic research, and it is now possible to learn HLC models with dozens of observed variables. This allows us to demonstrate the benefits of HLC models more convincingly than before. The authors have successfully analyzed the CoIL Challenge 2000 data set using HLC models. The model obtained consists of 22 latent variables, and its structure is intuitively appealing. It is exciting to know that such a large and meaningful latent structure can be automatically inferred from data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
N. L. Zhang, Hierarchical latent class models for cluster analysis, in Proceedings of the 18th National Conference on Artificial Intelligence, AAAI Press, Menlo Park, 2002, 230–237.
N. L. Zhang, Hierarchical latent class models for cluster analysis, Journal of Machine Learning Research, 2004, 5(Jun): 697–723.
P. F. Lazarsfeld and N. W. Henry, Latent Structure Analysis, Houghton Mifflin, Boston, 1968.
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, Palo Alto, 1988.
D. J. Bartholomew and M. Knott, Latent Variable Models and Factor Analysis (2nd edition), Arnold, London, 1999.
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, Cambridge, 1998.
G. Elidan, N. Lotner, N. Friedman, and D. Koller, Discovering hidden variables: a structure-based approach, in Advances in Neural Information Processing Systems 13, MIT Press, Cambridge, 2001, 479–485.
R. Silva, R. Scheines, C. Glymour, and P. Spirtes, Learning measurement models for unobserved variables, in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, 2003, 545–555.
P. van der Putten and M. van Someren, A bias-variance analysis of a real world learning problem: the CoIL Challenge 2000, Machine Learning, 2004, 57(1–2): 177–195.
J. K. Vermunt and J. Magidson, Latent class cluster analysis, in Applied Latent Class Analysis, Cambridge University Press, Cambridge, 2002, 89–106.
G. Schwarz, Estimating the dimension of a model, Annals of Statistics, 1978, 6(2): 461–464.
H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 1974, 19(6): 716–723.
P. Cheeseman and J. Stutz, Bayesian classification (AutoClass): theory and results, in Advances in Knowledge Discovery and Data Mining (ed. by U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy), AAAI Press, Menlo Park, 1996.
R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks and Expert Systems, Springer, New York, 1999.
D. Geiger, D. Heckerman, and C.Meek, Asymptotic model selection for directed networks with hidden variables, in Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Fransisco, 1996, 283–290.
T. Koc̆ka and N. L. Zhang, Dimension correction for hierarchical latent class models, in Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (ed. by A. Darwiche and N. Friedman), Morgan Kaufmann Publishers, San Fransisco, 2002, 267–274.
N. L. Zhang and T. Koèka, Efficient learning of hierarchical latent class models, in Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, IEEE Computer Society, Los Alamitos, CA, 2004, 585–593.
N. Friedman, Learning belief networks in the presence of missing values and hidden variables, in Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann Publishers, San Fransisco, 1997, 125–133.
D. M. Chickering, Learning equivalence classes of Bayesian-network structures, Journal of Machine Learning Research, 2002, 2(Feb): 445–498.
Author information
Authors and Affiliations
Corresponding author
Additional information
*The research is supported by Hong Kong Grants Council Grants #622105 and #622307, and the National Basic Research Program of China (aka the 973 Program) under project No. 2003CB517106.
Rights and permissions
About this article
Cite this article
ZHANG, N.L., WANG, Y. & CHEN, T. Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*. J Syst Sci Complex 21, 172–183 (2008). https://doi.org/10.1007/s11424-008-9101-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-008-9101-2