Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1363686.1363891acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

A hierarchical model-based approach to co-clustering high-dimensional data

Published: 16 March 2008 Publication History

Abstract

We propose a hierarchical, model-based co-clustering framework for handling high-dimensional datasets. The technique views the dataset as a joint probability distribution over row and column variables. Our approach starts by clustering tuples in a dataset, where each cluster is characterized by a different probability distribution. Subsequently, the conditional distribution of attributes over tuples is exploited to discover natural co-clusters in the data. An intensive empirical evaluation highlights the effectiveness of our approach.

References

[1]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. ACM SIGMOD'98 Conf., pages 94--105, 1998.
[2]
D. Barbará, J. Couto, and Y. Li. COOLCAT: an entropy-based algorithm for categorical clustering. In Proc. ACM Conf. on Information and Knowledge Management (CIKM'02), pages 582--589, 2002.
[3]
A. Califano, G. Stolovitzky, and Y. Tu. Analysis of gene expression microarrays for phenotype classification. In Proc. of the 8th Int. Conf. on Intelligent Systems for Molecular Biology, pages 75--85, 2000.
[4]
Y. Cheng and G. M. Church. Biclustering of expression data. In Proc. of the 8th Int. Conf. on Intelligent Systems for Molecular Biology, pages 93--103, 2000.
[5]
G. Costa, F. Folino, G. Manco, and R. Ortale. A hierarchical model for co-clustering high-dimensional data. Technical Report 1, ICAR-CNR, 2007.
[6]
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In Proc. of the 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), pages 89--98, 2003.
[7]
G. Getz, E. Levine, and E. Domany. Coupled two-way clustering analysis of gene microarray data. In Proc. of the Natural Academy of Sciences USA, pages 12079--12084, 2000.
[8]
J. A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association (JASA), 67(337):123--129, 1972.
[9]
L. Lazzeroni and A. Owen. Plaid model for gene expression data. technical report, 2000.
[10]
A. McCallum, K. Nigam, and L. H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Proc. 6th Int. Conf. on Knowledge Discovery and Data Mining (KDD'00), pages 169--178, 2000.
[11]
G. McLachlan and D. Peel. Finite Mixture Models. Wiley, 2000.
[12]
L. Parsons, E. Haque, and H. Liu. Subspace Clustering for High Dimensional Data: A Review. ACM SIGKDD Explorations, 6(1):90--105, 2004.
[13]
S. K. Selim and M. A. Ismail. Biclustering algorithms for biological data analysis: A survey. Transactions on Computational Biology and Bioinformatics (TCBB), 1(1):24--45, 2004.
[14]
A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant biclusters in gene expression data. Bioinformatics, 18(1):S136--S144, 2002.
[15]
C. Tang, Li. Zhang, I. Zhang, and M. Ramanathan. Interrelated two-way clustering: an unsupervised approach for gene expression data analysis. In Proc. of the 2nd IEEE Int. Symposium on Bioinformatics and Bioengineering, pages 41--48, 2001.
[16]
H. Wang, Wei Wang, J. Yang, and P. S. Yu. Clustering by pattern similarity in large data sets. In Proc. of the 2002 ACM SIGMOD Int. Conf. on Management of Data, pages 394--405, 2002.

Cited By

View all
  • (2024)Co-clustering: A Survey of the Main Methods, Recent Trends, and Open ProblemsACM Computing Surveys10.1145/369887557:2(1-33)Online publication date: 4-Oct-2024
  • (2022)Hierarchical Bayesian text modeling for the unsupervised joint analysis of latent topics and semantic clustersInternational Journal of Approximate Reasoning10.1016/j.ijar.2022.05.002147(23-39)Online publication date: Aug-2022
  • (2020)An overview of clustering methods for geo-referenced time series: from one-way clustering to co- and tri-clusteringInternational Journal of Geographical Information Science10.1080/13658816.2020.172692234:9(1822-1848)Online publication date: 16-Feb-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing
March 2008
2586 pages
ISBN:9781595937537
DOI:10.1145/1363686
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SAC '08
Sponsor:
SAC '08: The 2008 ACM Symposium on Applied Computing
March 16 - 20, 2008
Fortaleza, Ceara, Brazil

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Co-clustering: A Survey of the Main Methods, Recent Trends, and Open ProblemsACM Computing Surveys10.1145/369887557:2(1-33)Online publication date: 4-Oct-2024
  • (2022)Hierarchical Bayesian text modeling for the unsupervised joint analysis of latent topics and semantic clustersInternational Journal of Approximate Reasoning10.1016/j.ijar.2022.05.002147(23-39)Online publication date: Aug-2022
  • (2020)An overview of clustering methods for geo-referenced time series: from one-way clustering to co- and tri-clusteringInternational Journal of Geographical Information Science10.1080/13658816.2020.172692234:9(1822-1848)Online publication date: 16-Feb-2020
  • (2018)Topical Cluster Discovery in Semistructured Healthcare Data2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)10.1109/WI.2018.00014(772-775)Online publication date: Dec-2018
  • (2016)Structure-Oriented Techniques for XML Document PartitioningNovel Applications of Intelligent Systems10.1007/978-3-319-14194-7_9(167-182)Online publication date: 28-Jan-2016
  • (2015)Fully-Automatic XML Clustering by Structure-Constrained PhrasesProceedings of the 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI)10.1109/ICTAI.2015.34(146-153)Online publication date: 9-Nov-2015
  • (2015)Mining Clusters in XML Corpora Based on Bayesian Generative Topic Modeling2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA.2015.148(515-520)Online publication date: Dec-2015
  • (2014)A unified generative bayesian model for community discovery and role assignment based upon latent interaction factorsProceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.5555/3191835.3191854(93-100)Online publication date: 17-Aug-2014
  • (2014)XML Document Co-clustering via Non-negative Matrix Tri-factorizationProceedings of the 2014 IEEE 26th International Conference on Tools with Artificial Intelligence10.1109/ICTAI.2014.96(607-614)Online publication date: 10-Nov-2014
  • (2014)A unified generative Bayesian model for community discovery and role assignment based upon latent interaction factors2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014)10.1109/ASONAM.2014.6921566(93-100)Online publication date: Aug-2014
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media