Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*

ZHANG, Nevin L.; WANG, Yi; CHEN, Tao

doi:10.1007/s11424-008-9101-2

Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*

Published: 29 May 2008

Volume 21, pages 172–183, (2008)
Cite this article

Journal of Systems Science and Complexity Aims and scope Submit manuscript

Nevin L. ZHANG¹,
Yi WANG¹ &
Tao CHEN¹

84 Accesses
8 Citations
Explore all metrics

Abstract

The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier by Zhang (2002), but that study involved only small applications with 4 or 5 observed variables and no more than 2 latent variables due to the lack of efficient learning algorithms. Significant progress has been made since then on algorithmic research, and it is now possible to learn HLC models with dozens of observed variables. This allows us to demonstrate the benefits of HLC models more convincingly than before. The authors have successfully analyzed the CoIL Challenge 2000 data set using HLC models. The model obtained consists of 22 latent variables, and its structure is intuitively appealing. It is exciting to know that such a large and meaningful latent structure can be automatically inferred from data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Article 13 April 2021

Machine Learning-Based Clustering Analysis: Foundational Concepts, Methods, and Applications

DOLDA: a regularized supervised topic model for high-dimensional multi-class regression

Article Open access 12 June 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

N. L. Zhang, Hierarchical latent class models for cluster analysis, in Proceedings of the 18th National Conference on Artificial Intelligence, AAAI Press, Menlo Park, 2002, 230–237.
Google Scholar
N. L. Zhang, Hierarchical latent class models for cluster analysis, Journal of Machine Learning Research, 2004, 5(Jun): 697–723.
Google Scholar
P. F. Lazarsfeld and N. W. Henry, Latent Structure Analysis, Houghton Mifflin, Boston, 1968.
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers, Palo Alto, 1988.
Google Scholar
D. J. Bartholomew and M. Knott, Latent Variable Models and Factor Analysis (2nd edition), Arnold, London, 1999.
Google Scholar
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, Cambridge, 1998.
Google Scholar
G. Elidan, N. Lotner, N. Friedman, and D. Koller, Discovering hidden variables: a structure-based approach, in Advances in Neural Information Processing Systems 13, MIT Press, Cambridge, 2001, 479–485.
Google Scholar
R. Silva, R. Scheines, C. Glymour, and P. Spirtes, Learning measurement models for unobserved variables, in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, 2003, 545–555.
P. van der Putten and M. van Someren, A bias-variance analysis of a real world learning problem: the CoIL Challenge 2000, Machine Learning, 2004, 57(1–2): 177–195.
Article Google Scholar
J. K. Vermunt and J. Magidson, Latent class cluster analysis, in Applied Latent Class Analysis, Cambridge University Press, Cambridge, 2002, 89–106.
Google Scholar
G. Schwarz, Estimating the dimension of a model, Annals of Statistics, 1978, 6(2): 461–464.
Article Google Scholar
H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, 1974, 19(6): 716–723.
Article Google Scholar
P. Cheeseman and J. Stutz, Bayesian classification (AutoClass): theory and results, in Advances in Knowledge Discovery and Data Mining (ed. by U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy), AAAI Press, Menlo Park, 1996.
Google Scholar
R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, Probabilistic Networks and Expert Systems, Springer, New York, 1999.
D. Geiger, D. Heckerman, and C.Meek, Asymptotic model selection for directed networks with hidden variables, in Proceedings of the 12th Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Fransisco, 1996, 283–290.
Google Scholar
T. Koc̆ka and N. L. Zhang, Dimension correction for hierarchical latent class models, in Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (ed. by A. Darwiche and N. Friedman), Morgan Kaufmann Publishers, San Fransisco, 2002, 267–274.
Google Scholar
N. L. Zhang and T. Koèka, Efficient learning of hierarchical latent class models, in Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, IEEE Computer Society, Los Alamitos, CA, 2004, 585–593.
N. Friedman, Learning belief networks in the presence of missing values and hidden variables, in Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann Publishers, San Fransisco, 1997, 125–133.
D. M. Chickering, Learning equivalence classes of Bayesian-network structures, Journal of Machine Learning Research, 2002, 2(Feb): 445–498.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Nevin L. ZHANG, Yi WANG & Tao CHEN

Authors

Nevin L. ZHANG
View author publications
You can also search for this author in PubMed Google Scholar
Yi WANG
View author publications
You can also search for this author in PubMed Google Scholar
Tao CHEN
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nevin L. ZHANG.

Additional information

*The research is supported by Hong Kong Grants Council Grants #622105 and #622307, and the National Basic Research Program of China (aka the 973 Program) under project No. 2003CB517106.

Rights and permissions

Reprints and permissions

About this article

Cite this article

ZHANG, N.L., WANG, Y. & CHEN, T. Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*. J Syst Sci Complex 21, 172–183 (2008). https://doi.org/10.1007/s11424-008-9101-2

Download citation

Received: 13 August 2007
Revised: 10 October 2007
Published: 29 May 2008
Issue Date: February 2008
DOI: https://doi.org/10.1007/s11424-008-9101-2

Key words

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Machine Learning-Based Clustering Analysis: Foundational Concepts, Methods, and Applications

DOLDA: a regularized supervised topic model for high-dimensional multi-class regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Subscribe and save

Buy Now

Navigation

Discovery of latent structures: Experience with the CoIL Challenge 2000 data set*

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

Machine Learning-Based Clustering Analysis: Foundational Concepts, Methods, and Applications

DOLDA: a regularized supervised topic model for high-dimensional multi-class regression

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Subscribe and save

Buy Now

Search

Navigation