Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2806416.2806637acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Enhanced Word Embeddings from a Hierarchical Neural Language Model

Published: 17 October 2015 Publication History

Abstract

This paper proposes a neural language model to capture the interaction of text units of different levels, i.e., documents, paragraphs, sentences, words in an hierarchical structure. At each paralleled level, the model incorporates Markov property while each higher-level unit hierarchically influences its containing units. Such an architecture enables the learned word embeddings to encode both global and local information. We evaluate the learned word embeddings and experiments demonstrate the effectiveness of our model.

References

[1]
Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., and Gauvain, J.-L. (2006). Neural probabilistic language models. In Innovations in Machine Learning, pages 137--186. Springer.
[2]
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022.
[3]
Collobert, R. and Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM.
[4]
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493--2537.
[5]
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. (2001). Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, pages 406--414. ACM.
[6]
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.
[7]
Huang, E. H., Socher, R., Manning, C. D., and Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 873--882. Association for Computational Linguistics.
[8]
JeffreyPennington, R. and Manning, C. (2014). Glove: Global vectors for word representation.
[9]
Le, Q. V. and Mikolov, T. (2014). Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.
[10]
Li, J., Jurafsky, D., and Hovy, E. (2015a). When are tree structures necessary for deep learning of representations? arXiv preprint arXiv:1503.00185.
[11]
Li, J., Li, R., and Hovy, E. (2014). Recursive deep models for discourse parsing.
[12]
Li, J., Luong, M.-T., and Jurafsky, D. (2015b). A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057.
[13]
Luong, M.-T., Socher, R., and Manning, C. (2013). Better word representations with recursive neural networks for morphology. CoNLL-2013, 104.
[14]
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
[15]
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., and Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH, pages 1045--1048.
[16]
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Cernocky, J. (2011). Rnnlm-recurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196--201.
[17]
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119.
[18]
Mikolov, T., Yih, W.-t., and Zweig, G. (2013c). Linguistic regularities in continuous space word representations. In HLT-NAACL, pages 746--751. Citeseer.
[19]
Miller, G. A. and Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1--28.
[20]
Mnih, A. and Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.
[21]
Rubenstein, H. and Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10):627--633.
[22]
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP.
[23]
Srivastava, N. (2013). Improving neural networks with dropout. PhD thesis, University of Toronto.
[24]
Tai, K. S., Socher, R., and Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075.
[25]
Vaswani, A., Zhao, Y., Fossum, V., and Chiang, D. (2013). Decoding with large-scale neural language models improves translation. In EMNLP, pages 1387--1392. Citeseer.
[26]
Zaremba, W. and Sutskever, I. (2014). Learning to execute. arXiv preprint arXiv:1410.4615.

Cited By

View all
  • (2018)Learning to Rank for Coordination DetectionComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-77113-7_12(145-157)Online publication date: 10-Oct-2018
  • (2017)Enhanced word embedding with multiple prototypes2017 4th International Conference on Industrial Economics System and Industrial Security Engineering (IEIS)10.1109/IEIS.2017.8078651(1-5)Online publication date: Jul-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
October 2015
1998 pages
ISBN:9781450337946
DOI:10.1145/2806416
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed representations
  2. hierarchical model
  3. neural network
  4. word embeddings

Qualifiers

  • Short-paper

Conference

CIKM'15
Sponsor:

Acceptance Rates

CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Learning to Rank for Coordination DetectionComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-77113-7_12(145-157)Online publication date: 10-Oct-2018
  • (2017)Enhanced word embedding with multiple prototypes2017 4th International Conference on Industrial Economics System and Industrial Security Engineering (IEIS)10.1109/IEIS.2017.8078651(1-5)Online publication date: Jul-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media