short-paper

Enhanced Word Embeddings from a Hierarchical Neural Language Model

Authors:

Katsuhoto Sudoh,

Masaaki NagataAuthors Info & Claims

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 1927 - 1930

https://doi.org/10.1145/2806416.2806637

Published: 17 October 2015 Publication History

Abstract

This paper proposes a neural language model to capture the interaction of text units of different levels, i.e., documents, paragraphs, sentences, words in an hierarchical structure. At each paralleled level, the model incorporates Markov property while each higher-level unit hierarchically influences its containing units. Such an architecture enables the learned word embeddings to encode both global and local information. We evaluate the learned word embeddings and experiments demonstrate the effectiveness of our model.

References

[1]

Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., and Gauvain, J.-L. (2006). Neural probabilistic language models. In Innovations in Machine Learning, pages 137--186. Springer.

[2]

Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. the Journal of machine Learning research, 3:993--1022.

Digital Library

[3]

Collobert, R. and Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM.

Digital Library

[4]

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. (2011). Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493--2537.

Digital Library

[5]

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. (2001). Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, pages 406--414. ACM.

Digital Library

[6]

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.

[7]

Huang, E. H., Socher, R., Manning, C. D., and Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pages 873--882. Association for Computational Linguistics.

Digital Library

[8]

JeffreyPennington, R. and Manning, C. (2014). Glove: Global vectors for word representation.

[9]

Le, Q. V. and Mikolov, T. (2014). Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053.

[10]

Li, J., Jurafsky, D., and Hovy, E. (2015a). When are tree structures necessary for deep learning of representations? arXiv preprint arXiv:1503.00185.

[11]

Li, J., Li, R., and Hovy, E. (2014). Recursive deep models for discourse parsing.

[12]

Li, J., Luong, M.-T., and Jurafsky, D. (2015b). A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057.

[13]

Luong, M.-T., Socher, R., and Manning, C. (2013). Better word representations with recursive neural networks for morphology. CoNLL-2013, 104.

[14]

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[15]

Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., and Khudanpur, S. (2010). Recurrent neural network based language model. In INTERSPEECH, pages 1045--1048.

[16]

Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Cernocky, J. (2011). Rnnlm-recurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196--201.

[17]

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages 3111--3119.

Digital Library

[18]

Mikolov, T., Yih, W.-t., and Zweig, G. (2013c). Linguistic regularities in continuous space word representations. In HLT-NAACL, pages 746--751. Citeseer.

[19]

Miller, G. A. and Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1--28.

[20]

Mnih, A. and Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.

Digital Library

[21]

Rubenstein, H. and Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10):627--633.

Digital Library

[22]

Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP.

[23]

Srivastava, N. (2013). Improving neural networks with dropout. PhD thesis, University of Toronto.

[24]

Tai, K. S., Socher, R., and Manning, C. D. (2015). Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint arXiv:1503.00075.

[25]

Vaswani, A., Zhao, Y., Fossum, V., and Chiang, D. (2013). Decoding with large-scale neural language models improves translation. In EMNLP, pages 1387--1392. Citeseer.

[26]

Zaremba, W. and Sutskever, I. (2014). Learning to execute. arXiv preprint arXiv:1410.4615.

Cited By

Wang XLi RShindo HSudoh KNagata M(2018)Learning to Rank for Coordination DetectionComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-77113-7_12(145-157)Online publication date: 10-Oct-2018
https://doi.org/10.1007/978-3-319-77113-7_12
Zheng YShi YGuo KLi WZhu L(2017)Enhanced word embedding with multiple prototypes2017 4th International Conference on Industrial Economics System and Industrial Security Engineering (IEIS)10.1109/IEIS.2017.8078651(1-5)Online publication date: Jul-2017
https://doi.org/10.1109/IEIS.2017.8078651

Index Terms

Enhanced Word Embeddings from a Hierarchical Neural Language Model

Recommendations

Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content
WWW '15: Proceedings of the 24th International Conference on World Wide Web

We consider the problem of learning distributed representations for documents in data streams. The documents are represented as low-dimensional vectors and are jointly learned with distributed vector representations of word tokens using a hierarchical ...
Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

We propose a new unified framework for monolingual (MoIR) and cross-lingual information retrieval (CLIR) which relies on the induction of dense real-valued word vectors known as word embeddings (WE) from comparable data. To this end, we make several ...
Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

Crosslingual word embeddings developed from multiple parallel corpora help in understanding the relationships between languages and improving the prediction quality of machine translation. However, in low resource languages with complex and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

October 2015

1998 pages

ISBN:9781450337946

DOI:10.1145/2806416

General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

CIKM'15

Sponsor:

CIKM'15: 24th ACM International Conference on Information and Knowledge Management

October 18 - 23, 2015

Melbourne, Australia

Acceptance Rates

CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
269
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang XLi RShindo HSudoh KNagata M(2018)Learning to Rank for Coordination DetectionComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-77113-7_12(145-157)Online publication date: 10-Oct-2018
https://doi.org/10.1007/978-3-319-77113-7_12
Zheng YShi YGuo KLi WZhu L(2017)Enhanced word embedding with multiple prototypes2017 4th International Conference on Industrial Economics System and Industrial Security Engineering (IEIS)10.1109/IEIS.2017.8078651(1-5)Online publication date: Jul-2017
https://doi.org/10.1109/IEIS.2017.8078651

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents