research-article

Diversifying Restricted Boltzmann Machine for Document Modeling

Authors:

Yuntian Deng, and

Eric XingAuthors Info & Claims

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2015

Pages 1315 - 1324

https://doi.org/10.1145/2783258.2783264

Published: 10 August 2015 Publication History

Abstract

Restricted Boltzmann Machine (RBM) has shown great effectiveness in document modeling. It utilizes hidden units to discover the latent topics and can learn compact semantic representations for documents which greatly facilitate document retrieval, clustering and classification. The popularity (or frequency) of topics in text corpora usually follow a power-law distribution where a few dominant topics occur very frequently while most topics (in the long-tail region) have low probabilities. Due to this imbalance, RBM tends to learn multiple redundant hidden units to best represent dominant topics and ignore those in the long-tail region, which renders the learned representations to be redundant and non-informative. To solve this problem, we propose Diversified RBM (DRBM) which diversifies the hidden units, to make them cover not only the dominant topics, but also those in the long-tail region. We define a diversity metric and use it as a regularizer to encourage the hidden units to be diverse. Since the diversity metric is hard to optimize directly, we instead optimize its lower bound and prove that maximizing the lower bound with projected gradient ascent can increase this diversity metric. Experiments on document retrieval and clustering demonstrate that with diversification, the document modeling power of DRBM can be greatly improved.

Supplementary Material

MP4 File (p1315.mp4)

Download
160.37 MB

References

[1]

C. Archambeau, B. Lakshminarayanan, and G. Bouchard. Latent ibp compound dirichlet allocation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015.

[2]

Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, and J.-L. Gauvain. Neural probabilistic language models. In Innovations in Machine Learning. Springer, 2006.

[3]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. the Journal of machine Learning research, 2003.

Digital Library

[4]

D. Cai and X. He. Manifold adaptive experimental design for text categorization. Knowledge and Data Engineering, IEEE Transactions on, 2012.

Digital Library

[5]

D. Cai, X. He, and J. Han. Document clustering using locality preserving indexing. Knowledge and Data Engineering, IEEE Transactions on, 2005.

Digital Library

[6]

D. Cai, X. He, and J. Han. Locally consistent concept factorization for document clustering. Knowledge and Data Engineering, IEEE Transactions on, 2011.

Digital Library

[7]

Z. Cao, S. Li, Y. Liu, W. Li, and H. Ji. A novel neural topic model and its supervised extension. AAAI Conference on Artificial Intelligence, 2015.

[8]

Z. Chen and B. Liu. Mining topics in documents: standing on the shoulders of big data. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.

Digital Library

[9]

M. Denil, A. Demiraj, N. Kalchbrenner, P. Blunsom, and N. de Freitas. Modelling, visualising and summarising documents with a single convolutional neural network. arXiv preprint arXiv:1406.3830, 2014.

[10]

J. H. Friedman, J. L. Bentley, and R. A. Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 1977.

Digital Library

[11]

R. Guha. Towards a model theory for distributed representations. arXiv preprint arXiv:1410.5859, 2014.

[12]

G. Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 2002.

Digital Library

[13]

G. E. Hinton and R. R. Salakhutdinov. Replicated softmax: an undirected topic model. In Advances in neural information processing systems, 2009.

[14]

T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, 1999.

Digital Library

[15]

C. Huang, X. Qiu, and X. Huang. Text classification with document embeddings. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. Springer, 2014.

[16]

A. Kulesza and B. Taskar. Determinantal point processes for machine learning. arXiv preprint arXiv:1207.6083, 2012.

[17]

J. T. Kwok and R. P. Adams. Priors for diversity in generative latent variable models. In Advances in Neural Information Processing Systems. 2012.

[18]

H. Larochelle and S. Lauly. A neural autoregressive topic model. In Advances in Neural Information Processing Systems, 2012.

[19]

Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. International Conference on Machine Learning, 2014.

Digital Library

[20]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 2013.

Digital Library

[21]

J. Pitman and M. Yor. The two-parameter poisson-dirichlet distribution derived from a stable subordinator. The Annals of Probability, 1997.

[22]

I. Sato and H. Nakagawa. Topic models with power-law using pitman-yor process. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2010.

Digital Library

[23]

I. R. Shafarevich, A. Remizov, D. P. Kramer, and L. Nekludova. Linear algebra and geometry. Springer Science & Business Media, 2012.

[24]

P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. 1986.

[25]

R. Socher, C. C. Lin, C. Manning, and A. Y. Ng. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th international conference on machine learning, 2011.

[26]

N. Srivastava, R. Salakhutdinov, and G. Hinton. Modeling documents with a deep boltzmann machine. In Uncertainty in Artificial Intelligence, 2013.

[27]

Y. Wang, X. Zhao, Z. Sun, H. Yan, L. Wang, Z. Jin, L. Wang, Y. Gao, C. Law, and J. Zeng. Peacock: Learning long-tail topic features for industrial applications. arXiv preprint arXiv:1405.4402, 2014.

Digital Library

Cited By

Lundstern JBeaucé ETeran O(2024)The importance of nodal plane orientation diversity for earthquake focal mechanism stress inversionsGeological Society, London, Special Publications10.1144/SP546-2023-63546:1Online publication date: 5-Jan-2024
https://doi.org/10.1144/SP546-2023-63
Karthiga RNarasimhan KN.Raju Amirtharajan R(2024)Automatic approach for breast cancer detection based on deep belief network using histopathology imagesMultimedia Tools and Applications10.1007/s11042-024-18949-8Online publication date: 25-Mar-2024
https://doi.org/10.1007/s11042-024-18949-8
Murakami RChakraborty B(2022)Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short TextsSensors10.3390/s2203085222:3(852)Online publication date: 23-Jan-2022
https://doi.org/10.3390/s22030852
Show More Cited By

Index Terms

Diversifying Restricted Boltzmann Machine for Document Modeling
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

node2vec: Scalable Feature Learning for Networks
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by ...
Read More
Restricted Boltzmann machines for collaborative filtering
ICML '07: Proceedings of the 24th international conference on Machine learning

Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular data, ...
Read More
Generative adversarial networks

Generative adversarial networks are a kind of artificial intelligence algorithm designed to solve the generative modeling problem. The goal of a generative model is to study a collection of training examples and learn the probability distribution that ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2015

2378 pages

ISBN:9781450336642

DOI:10.1145/2783258

General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ASFOR
National Science Foundation

Conference

KDD '15

Sponsor:

KDD '15: The 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 10 - 13, 2015

NSW, Sydney, Australia

Acceptance Rates

KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '24

Sponsor:
sigkdd
sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
769
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

Lundstern JBeaucé ETeran O(2024)The importance of nodal plane orientation diversity for earthquake focal mechanism stress inversionsGeological Society, London, Special Publications10.1144/SP546-2023-63546:1Online publication date: 5-Jan-2024
https://doi.org/10.1144/SP546-2023-63
Karthiga RNarasimhan KN.Raju Amirtharajan R(2024)Automatic approach for breast cancer detection based on deep belief network using histopathology imagesMultimedia Tools and Applications10.1007/s11042-024-18949-8Online publication date: 25-Mar-2024
https://doi.org/10.1007/s11042-024-18949-8
Murakami RChakraborty B(2022)Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short TextsSensors10.3390/s2203085222:3(852)Online publication date: 23-Jan-2022
https://doi.org/10.3390/s22030852
Waheed AQunaibi SBarradas DWeinberg ZHong YWang L(2022)Darwin's Theory of CensorshipProceedings of the 21st Workshop on Privacy in the Electronic Society10.1145/3559613.3563206(103-108)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3559613.3563206
Gui LLeng JZhou JXu RHe Y(2022)Multi Task Mutual Learning for Joint Sentiment Classification and Topic DetectionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299948934:4(1915-1927)Online publication date: 1-Apr-2022
https://doi.org/10.1109/TKDE.2020.2999489
Shen JXiao ZZhen XZhang L(2022)Spherical Zero-Shot LearningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2021.306706732:2(634-645)Online publication date: Feb-2022
https://doi.org/10.1109/TCSVT.2021.3067067
Li XBlair A(2022)Eccentric regularization: minimizing hyperspherical energy without explicit projection2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892944(1-9)Online publication date: 18-Jul-2022
https://doi.org/10.1109/IJCNN55064.2022.9892944
Wang HWang YLian DGao JDemartini GZuccon GCulpepper JHuang ZTong H(2021)A Lightweight Knowledge Graph Embedding Framework for Efficient Inference and StorageProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482224(1909-1918)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482224
Mridha MLima ANur KDas SHasan MKabir M(2021)A Survey of Automatic Text Summarization: Progress, Process and ChallengesIEEE Access10.1109/ACCESS.2021.31297869(156043-156070)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3129786
Korencic DRistov SRepar JSnajder J(2021)A Topic Coverage Approach to Evaluation of Topic ModelsIEEE Access10.1109/ACCESS.2021.31094259(123280-123312)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3109425
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents