research-article

Free access

Word representations: a simple and general method for semi-supervised learning

Authors:

Yoshua BengioAuthors Info & Claims

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Pages 384 - 394

Published: 11 July 2010 Publication History

Abstract

If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http://metaoptimize.com/projects/wordreprs/

References

[1]

}}Ando, R., & Zhang, T. (2005). A high-performance semi-supervised learning method for text chunking. ACL.

Digital Library

[2]

}}Bengio, Y. (2008). Neural net language models. Scholarpedia, 3, 3881.

[3]

}}Bengio, Y., Ducharme, R., & Vincent, P. (2001). A neural probabilistic language model. NIPS.

[4]

}}Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137--1155.

Digital Library

[5]

}}Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. ICML.

Digital Library

[6]

}}Bengio, Y., & Séénécal, J.-S. (2003). Quick training of probabilistic neural nets by importance sampling. AISTATS.

[7]

}}Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993--1022.

Digital Library

[8]

}}Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18, 467--479.

Digital Library

[9]

}}Candito, M., & Crabbéé, B. (2009). Improving generative statistical parsing with semi-supervised word clustering. IWPT (pp. 138--141).

Digital Library

[10]

}}Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. ICML.

Digital Library

[11]

}}Deschacht, K., & Moens, M.-F. (2009). Semi-supervised semantic role labeling using the Latent Words Language Model. EMNLP (pp. 21--29).

Digital Library

[12]

}}Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S., & Harshman, R. (1988). Using latent semantic analysis to improve access to textual information. SIGCHI Conference on Human Factors in Computing Systems (pp. 281--285). ACM.

Digital Library

[13]

}}Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 781--799.

[14]

}}Goldberg, Y., Tsarfaty, R., Adler, M., & Elhadad, M. (2009). Enhancing unlexicalized parsing performance using a wide coverage lexicon, fuzzy tag-set mapping, and EM-HMM-based lexical probabilities. EACL.

Digital Library

[15]

}}Honkela, T. (1997). Self-organizing maps of words for natural language processing applications. Proceedings of the International ICSC Symposium on Soft Computing.

[16]

}}Honkela, T., Pulkki, V., & Kohonen, T. (1995). Contextual relations of words in grimm tales, analyzed by self-organizing map. ICANN.

[17]

}}Huang, F., & Yates, A. (2009). Distributional representations for handling sparsity in supervised sequence labeling. ACL.

Digital Library

[18]

}}Kaski, S. (1998). Dimensionality reduction by random mapping: Fast similarity computation for clustering. IJCNN (pp. 413--418).

[19]

}}Koo, T., Carreras, X., & Collins, M. (2008). Simple semi-supervised dependency parsing. ACL (pp. 595--603).

[20]

}}Krishnan, V., & Manning, C. D. (2006). An effective two-stage model for exploiting non-local dependencies in named entity recognition. COLING-ACL.

Digital Library

[21]

}}Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 259--284.

[22]

}}Li, W., & McCallum, A. (2005). Semi-supervised sequence modeling with syntactic topic models. AAAI.

Digital Library

[23]

}}Liang, P. (2005). Semi-supervised learning for natural language. Master's thesis, Massachusetts Institute of Technology.

[24]

}}Lin, D., & Wu, X. (2009). Phrase clustering for discriminative learning. ACL-IJCNLP (pp. 1030--1038).

Digital Library

[25]

}}Lund, K., & Burgess, C. (1996). Producing highdimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instrumentation, and Computers, 28, 203--208.

[26]

}}Lund, K., Burgess, C., & Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. Cognitive Science Proceedings, LEA (pp. 660--665).

[27]

}}Martin, S., Liermann, J., & Ney, H. (1998). Algorithms for bigram and trigram word clustering. Speech Communication, 24, 19--37.

Digital Library

[28]

}}Miller, S., Guinness, J., & Zamanian, A. (2004). Name tagging with word clusters and discriminative training. HLT-NAACL (pp. 337--342).

[29]

}}Mnih, A., & Hinton, G. E. (2007). Three new graphical models for statistical language modelling. ICML.

Digital Library

[30]

}}Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. NIPS (pp. 1081--1088).

[31]

}}Morin, F., & Bengio, Y. (2005). Hierarchical probabilistic neural network language model. AISTATS.

[32]

}}Pereira, F., Tishby, N., & Lee, L. (1993). Distributional clustering of english words. ACL (pp. 183--190).

Digital Library

[33]

}}Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. CoNLL.

Digital Library

[34]

}}Ritter, H., & Kohonen, T. (1989). Self-organizing semantic maps. Biological Cybernetics, 241--254.

[35]

}}Sahlgren, M. (2001). Vector-based semantic analysis: Representing word meanings based on random labels. Proceedings of the Semantic Knowledge Acquisition and Categorisation Workshop, ESSLLI.

[36]

}}Sahlgren, M. (2005). An introduction to random indexing. Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering (TKE).

[37]

}}Sahlgren, M. (2006). The word-space model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Doctoral dissertation, Stockholm University.

[38]

}}Sang, E. T., & Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking. CoNLL.

Digital Library

[39]

}}Schwenk, H., & Gauvain, J.-L. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 765--768). Orlando, Florida.

[40]

}}Sha, F., & Pereira, F. C. N. (2003). Shallow parsing with conditional random fields. HLT-NAACL.

Digital Library

[41]

}}Spitkovsky, V., Alshawi, H., & Jurafsky, D. (2010). From baby steps to leapfrog: How "less is more" in unsupervised dependency parsing. NAACL-HLT.

Digital Library

[42]

}}Suzuki, J., & Isozaki, H. (2008). Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. ACL-08: HLT (pp. 665--673).

[43]

}}Suzuki, J., Isozaki, H., Carreras, X., & Collins, M. (2009). An empirical study of semi-supervised structured conditional models for dependency parsing. EMNLP.

Digital Library

[44]

}}Turian, J., Ratinov, L., Bengio, Y., & Roth, D. (2009). A preliminary evaluation of word representations for named-entity recognition. NIPS Workshop on Grammar Induction, Representation of Language and Language Learning.

[45]

}}Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research.

[46]

}}Ushioda, A. (1996). Hierarchical clustering of words. COLING (pp. 1159--1162).

Digital Library

[47]

}}Väyrynen, J., & Honkela, T. (2005). Comparison of independent component analysis and singular value decomposition in word context analysis. AKRR'05, International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning.

[48]

}}Väyrynen, J. J., & Honkela, T. (2004). Word category maps based on emergent features created by ICA. Proceedings of the STeP'2004 Cognition + Cybernetics Symposium (pp. 173--185). Finnish Artificial Intelligence Society.

[49]

}}Väyrynen, J. J., Honkela, T., & Lindqvist, L. (2007). Towards explicit semantic features using independent component analysis. Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR). Stockholm, Sweden: Swedish Institute of Computer Science.

[50]

}}Rehůrek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. LREC.

[51]

}}Zhang, T., & Johnson, D. (2003). A robust risk minimization based named entity recognition system. CoNLL.

Digital Library

[52]

}}Zhao, H., Chen, W., Kit, C., & Zhou, G. (2009). Multilingual dependency learning: a huge feature engineering method to semantic dependency parsing. CoNLL (pp. 55--60).

Digital Library

Cited By

Zhao XDeng YYang MWang LZhang RCheng HLam WShen YXu R(2024)A Comprehensive Survey on Relation Extraction: Recent Advances and New FrontiersACM Computing Surveys10.1145/367450156:11(1-39)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3674501
Lu XWang YFu YSun QMa XZheng XZhuo CBaeza-Yates RBonchi F(2024)MISP: A Multimodal-based Intelligent Server Failure Prediction Model for Cloud Computing SystemsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671568(5509-5520)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671568
Rai PChatterji SKim B(2023)Deep Learning-based Sequence Labeling Tools for NepaliACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360669622:8(1-23)Online publication date: 23-Aug-2023
https://dl.acm.org/doi/10.1145/3606696
Show More Cited By

Index Terms

Word representations: a simple and general method for semi-supervised learning
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Deep analysis of word sense disambiguation via semi-supervised learning and neural word representations
Highlights
- Ambiguity is a challenging task in text mining addressed for word-sense disambiguation algorithms.
Abstract
Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context. Different approaches have been proposed in supervised and unsupervised domains. In most cases, supervised learning provides superior WSD performance. ...
Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Enhancing Semantic Word Representations by Embedding Deep Word Relationships
ICCAE 2019: Proceedings of the 2019 11th International Conference on Computer and Automation Engineering

Word representations are created using analogy context-based statistics and lexical relations on words. Word representations are inputs for the learning models in Natural Language Understanding (NLU) tasks. However, to understand language, knowing only ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

July 2010

1618 pages

Program Chair:
Jan Hajič
Charles University in Prague, Czech Republic

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 11 July 2010

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

202
Total Citations
View Citations
7,854
Total Downloads

Downloads (Last 12 months)185
Downloads (Last 6 weeks)23

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhao XDeng YYang MWang LZhang RCheng HLam WShen YXu R(2024)A Comprehensive Survey on Relation Extraction: Recent Advances and New FrontiersACM Computing Surveys10.1145/367450156:11(1-39)Online publication date: 24-Jun-2024
https://dl.acm.org/doi/10.1145/3674501
Lu XWang YFu YSun QMa XZheng XZhuo CBaeza-Yates RBonchi F(2024)MISP: A Multimodal-based Intelligent Server Failure Prediction Model for Cloud Computing SystemsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671568(5509-5520)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671568
Rai PChatterji SKim B(2023)Deep Learning-based Sequence Labeling Tools for NepaliACM Transactions on Asian and Low-Resource Language Information Processing10.1145/360669622:8(1-23)Online publication date: 23-Aug-2023
https://dl.acm.org/doi/10.1145/3606696
Zhang TChandrasekaran DThung FLo DRastogi ATufano RBavota GArnaoudova VHaiduc S(2022)Benchmarking library recognition in tweetsProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527916(343-353)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3524610.3527916
Sarkhel RNandi A(2021)Improving information extraction from visually rich documents using visual span representationsProceedings of the VLDB Endowment10.14778/3446095.344610414:5(822-834)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.14778/3446095.3446104
Odakura FKobayashi KWakabayashi K(2021)Active Learning for Extracting Technical Terms Covering Multiword PhrasesThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487706(311-318)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1145/3487664.3487706
Mariani LMohebbi APezzè MTerragni VCadar CZhang X(2021)Semantic matching of GUI events for test reuse: are we there yet?Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464827(177-190)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3460319.3464827
Zhou QHui TWang RHu HLiu S(2021)Attentive Excitation and Aggregation for Bilingual Referring Image SegmentationACM Transactions on Intelligent Systems and Technology10.1145/344634512:2(1-17)Online publication date: 26-Feb-2021
https://dl.acm.org/doi/10.1145/3446345
Bashar MNayak R(2021)Active Learning for Effectively Fine-Tuning Transfer Learning to Downstream TaskACM Transactions on Intelligent Systems and Technology10.1145/344634312:2(1-24)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3446343
Brahma APotluri PKanapaneni MPrabhu STeki S(2021)Identification of Food Quality Descriptors in Customer Chat Conversations using Named Entity RecognitionProceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)10.1145/3430984.3431041(257-261)Online publication date: 2-Jan-2021
https://dl.acm.org/doi/10.1145/3430984.3431041
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents