Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2566486.2567975acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Comment-based multi-view clustering of web 2.0 items

Published: 07 April 2014 Publication History

Abstract

Clustering Web 2.0 items (i.e., web resources like videos, images) into semantic groups benefits many applications, such as organizing items, generating meaningful tags and improving web search. In this paper, we systematically investigate how user-generated comments can be used to improve the clustering of Web 2.0 items. In our preliminary study of Last.fm, we find that the two data sources extracted from user comments -- the textual comments and the commenting users -- provide complementary evidence to the items' intrinsic features. These sources have varying levels of quality, but we importantly we find that incorporating all three sources improves clustering. To accommodate such quality imbalance, we invoke multi-view clustering, in which each data source represents a view, aiming to best leverage the utility of different views.
To combine multiple views under a principled framework, we propose CoNMF (Co-regularized Non-negative Matrix Factorization), which extends NMF for multi-view clustering by jointly factorizing the multiple matrices through co-regularization. Under our CoNMF framework, we devise two paradigms -- pair-wise CoNMF and cluster-wise CoNMF -- and propose iterative algorithms for their joint factorization. Experimental results on Last.fm and Yelp datasets demonstrate the effectiveness of our solution. In Last.fm, CoNMF betters k-means with a statistically significant F1 increase of 14%, while achieving comparable performance with the state-of-the-art multi-view clustering method CoSC (Co-regularized Spectral Clustering). On a Yelp dataset, CoNMF outperforms the best baseline CoSC with a statistically significant performance gain of 7%.

References

[1]
Z. Akata, C. Thurau, and C. Bauckhage. Non-negative matrix factorization in multimodality data for segmentation and label prediction. In 16th Computer Vision Winter Workshop, 2011.
[2]
R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval. ACM Press New York, 1999.
[3]
S. Bird, E. Klein, and E. Loper. Natural language processing with Python. O'reilly, 2009.
[4]
M. B. Blaschko and C. H. Lampert. Correlational spectral clustering. In Proc. of CVPR '08, pages 1--8, 2008.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[6]
C. Boutsidis and E. Gallopoulos. Svd based initialization: A head start for nonnegative matrix factorization. Pattern Recognition, 41(4):1350--1362, 2008.
[7]
E. Bruno and S. Marchand-Maillet. Multiview clustering: A late fusion approach using latent models. In Proc. of SIGIR '09, pages 736--737, 2009.
[8]
C. Carpineto, S. Osinski, G. Romano, and D. Weiss. A survey of web clustering engines. ACM Computing Surveys, 41(3):17, 2009.
[9]
K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan. Multi-view clustering via canonical correlation analysis. In Proc. of ICML '09, pages 129--136, 2009.
[10]
P. Das, R. Srihari, and Y. Fu. Simultaneous joint and conditional modeling of documents tagged from two perspectives. In Proc. of CIKM '11, pages 1353--1362, 2011.
[11]
I. S. Dhillon and D. S. Modha. Concept decompositions for large sparse text data using clustering. Machine learning, 42(1-2):143--175, 2001.
[12]
C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of KDD '06, pages 126--135, 2006.
[13]
C. H. Ding, X. He, and H. D. Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proc. of SDM '05, pages 606--610, 2005.
[14]
K. Filippova and K. B. Hall. Improved video categorization from text metadata and user comments. In Proc. of SIGIR '11, pages 835--842, 2011.
[15]
E. Gaussier and C. Goutte. Relation between plsa and nmf and implications. In Proc. of SIGIR '05, pages 601--602, 2005.
[16]
D. Greene and P. Cunningham. A matrix factorization approach for integrating multiple data views. In Proc. of ECML/PKDD '09, pages 423--438, 2009.
[17]
A. Hindle, J. Shao, D. Lin, J. Lu, and R. Zhang. Clustering web video search results based on integration of multiple features. World Wide Web, 14(1):53--73, 2011.
[18]
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 42(1-2):177--196, 2001.
[19]
P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5:1457--1469, 2004.
[20]
C.-F. Hsu, J. Caverlee, and E. Khabiri. Hierarchical comments-based clustering. In Proc. of SAC '11, pages 1130--1137, 2011.
[21]
M. Hu, A. Sun, and E.-P. Lim. Comments-oriented document summarization: understanding documents with readers' feedback. In Proc. of SIGIR '08, pages 291--298, 2008.
[22]
F. Jing, C. Wang, Y. Yao, K. Deng, L. Zhang, and W.-Y. Ma. Igroup: web image search results clustering. In Proc. of MM '06, pages 377--384, 2006.
[23]
H. W. Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83--97, 1955.
[24]
A. Kumar, P. Rai, and H. D. Iii. Co-regularized multi-view spectral clustering. In Proc. of NIPS '11, pages 1413--1421, 2011.
[25]
T. Kuzar and P. Navrat. Slovak blog clustering enhanced by mining the web comments. In Proc. of WI-IAT '11, pages 293--296, 2011.
[26]
A. N. Langville, C. D. Meyer, R. Albright, J. Cox, and D. Duling. Initializations for the nonnegative matrix factorization. In Proc. of KDD '06, pages 23--26, 2006.
[27]
T. Lappas, K. Punera, and T. Sarlos. Mining tags using social endorsement networks. In Proc. of SIGIR '11, pages 195--204, 2011.
[28]
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.
[29]
B. Li, S. Xu, and J. Zhang. Enhancing clustering blog documents by utilizing author/reader comments. In Proc. of ACM-SE '07, pages 94--99, 2007.
[30]
C. Liu, H.-c. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In Proc. of WWW '10, pages 681--690, 2010.
[31]
D. Liu, X.-S. Hua, L. Yang, M. Wang, and H.-J. Zhang. Tag ranking. In Proc. of WWW '09, pages 351--360, 2009.
[32]
J. Liu, C. Wang, J. Gao, and J. Han. Multi-view clustering via joint nonnegative matrix factorization. In Proc. of SDM '13, pages 252--260, 2013.
[33]
B. Long, S. Y. Philip, and Z. M. Zhang. A general model for multiple view unsupervised learning. In Proc. of SDM '08, pages 822--833, 2008.
[34]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to information retrieval. Cambridge University Press Cambridge, 2008.
[35]
F. Pedregosa, G. Varoquaux, Gramfort, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
[36]
D. Ramage, P. Heymann, C. D. Manning, and H. Garcia-Molina. Clustering the tagged web. In Proc. of WSDM '09, pages 54--63, 2009.
[37]
D. Seung and L. Lee. Algorithms for non-negative matrix factorization. Advances in neural information processing systems, 13:556--562, 2001.
[38]
J. Wang, H. Zeng, Z. Chen, H. Lu, L. Tao, and W.-Y. Ma. Recom: reinforcement clustering of multi-type interrelated data objects. In Proc. of SIGIR '03, pages 274--281, 2003.
[39]
Y.-X. Wang and Y.-J. Zhang. Nonnegative matrix factorization: A comprehensive review. Knowledge and Data Engineering, IEEE Transactions on, 25(6):1336--1353, 2013.
[40]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proc. of SIGIR '03, pages 267--273, 2003.
[41]
H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, and J. Ma. Learning to cluster web search results. In Proc. of SIGIR '04, pages 210--217, 2004.
[42]
M. Zitnik and B. Zupan. Nimfa: A python library for nonnegative matrix factorization. Journal of Machine Learning Research, 13:849--853, 2012.

Cited By

View all
  • (2024)Community Detection for Heterogeneous Multiple Social NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.339978411:5(6966-6981)Online publication date: Oct-2024
  • (2024)A comprehensive survey on community detection methods and applications in complex information networksSocial Network Analysis and Mining10.1007/s13278-024-01246-514:1Online publication date: 18-Apr-2024
  • (2024)Community detection in multiplex networks by deep structure-preserving non-negative matrix factorizationApplied Intelligence10.1007/s10489-024-05870-855:1Online publication date: 26-Nov-2024
  • Show More Cited By

Index Terms

  1. Comment-based multi-view clustering of web 2.0 items

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '14: Proceedings of the 23rd international conference on World wide web
    April 2014
    926 pages
    ISBN:9781450327442
    DOI:10.1145/2566486

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 April 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. CoNMF
    2. co-regularized nmf
    3. comment-based clustering
    4. multi-view clustering

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WWW '14
    Sponsor:
    • IW3C2

    Acceptance Rates

    WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Community Detection for Heterogeneous Multiple Social NetworksIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.339978411:5(6966-6981)Online publication date: Oct-2024
    • (2024)A comprehensive survey on community detection methods and applications in complex information networksSocial Network Analysis and Mining10.1007/s13278-024-01246-514:1Online publication date: 18-Apr-2024
    • (2024)Community detection in multiplex networks by deep structure-preserving non-negative matrix factorizationApplied Intelligence10.1007/s10489-024-05870-855:1Online publication date: 26-Nov-2024
    • (2024)Asymmetric Short-Text Clustering via PromptNew Generation Computing10.1007/s00354-024-00244-742:4(599-615)Online publication date: 19-Feb-2024
    • (2023)A Comprehensive Survey on Multi-View ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327031135:12(12350-12368)Online publication date: 1-Dec-2023
    • (2023)Graph-based Multi-view Clustering for Web services2023 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom59178.2023.00106(567-572)Online publication date: 21-Dec-2023
    • (2023)Multi -View Clustering Using Sparse Non-Negative Matrix Factorization for Recommendation Systems2023 International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA58977.2023.00194(1287-1294)Online publication date: 15-Dec-2023
    • (2023)Optimal neighborhood kernel clustering with adaptive local kernels and block diagonal propertyNeural Computing and Applications10.1007/s00521-023-08885-335:30(22297-22312)Online publication date: 7-Aug-2023
    • (2022)Interest-Based Content Clustering for Enhancing Searching and Recommendations on Smart TVWireless Communications & Mobile Computing10.1155/2022/38968402022Online publication date: 1-Jan-2022
    • (2022)Unbalanced Incomplete Multi-View Clustering Via the Scheme of View Evolution: Weak Views are Meat; Strong Views Do EatIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2021.30779096:4(913-927)Online publication date: Aug-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media