research-article

A unified architecture for natural language processing: deep neural networks with multitask learning

Authors:

Ronan Collobert,

Jason WestonAuthors Info & Claims

ICML '08: Proceedings of the 25th international conference on Machine learning

Pages 160 - 167

https://doi.org/10.1145/1390156.1390177

Published: 05 July 2008 Publication History

Abstract

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

References

[1]

Ando, R., & Zhang, T. (2005). A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. JMLR, 6, 1817--1853.

Digital Library

[2]

Bengio, Y., & Ducharme, R. (2001). A neural probabilistic language model. NIPS 13.

[3]

Bridle, J. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. F. Soulié and J. Hérault (Eds.), Neurocomputing: Algorithms, architectures and applications, 227--236. NATO ASI Series.

[4]

Caruana, R. (1997). Multitask Learning. Machine Learning, 28, 41--75.

Digital Library

[5]

Chapelle, O., Schlkopf, B., & Zien, A. (2006). Semi-supervised learning. Adaptive computation and machine learning. Cambridge, Mass., USA: MIT Press.

Digital Library

[6]

Collobert, R., & Weston, J. (2007). Fast semantic extraction using a novel neural network architecture. Proceedings of the 45th Annual Meeting of the ACL (pp. 560--567).

[7]

Gildea, D., & Palmer, M. (2001). The necessity of parsing for predicate argument recognition. Proceedings of the 40th Annual Meeting of the ACL, 239--246.

Digital Library

[8]

Joachims, T. (1999). Transductive inference for text classification using support vector machines. ICML.

Digital Library

[9]

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86.

[10]

McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. Proceedings of HLTNAACL 2006.

Digital Library

[11]

Miller, S., Fox, H., Ramshaw, L., & Weischedel, R. (2000). A novel use of statistical parsing to extract information from text. 6th Applied Natural Language Processing Conference.

Digital Library

[12]

Musillo, G., & Merlo, P. (2006). Robust Parsing of the Proposition Bank. ROMAND 2006: Robust Methods in Analysis of Natural language Data.

[13]

Okanohara, D., & Tsujii, J. (2007). A discriminative language model with pseudo-negative samples. Proceedings of the 45th Annual Meeting of the ACL, 73--80.

[14]

Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Comput. Linguist., 31, 71--106.

Digital Library

[15]

Pradhan, S., Ward, W., Hacioglu, K., Martin, J., & Jurafsky, D. (2004). Shallow semantic parsing using support vector machines. Proceedings of HLT/NAACL-2004.

[16]

Rosenfeld, B., & Feldman, R. (2007). Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web. Proceedings of the 45th Annual Meeting of the ACL, 600--607.

[17]

Schwenk, H., & Gauvain, J. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 765--768).

[18]

Sutton, C., & McCallum, A. (2005a). Composition of conditional random fields for transfer learning. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 748--754.

Digital Library

[19]

Sutton, C., & McCallum, A. (2005b). Joint parsing and semantic role labeling. Proceedings of CoNLL-2005 (pp. 225--228).

Digital Library

[20]

Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. JMLR, 8, 693--723.

Digital Library

[21]

Ueffing, N., Haffari, G., & Sarkar, A. (2007). Transductive learning for statistical machine translation. Proceedings of the 45th Annual Meeting of the ACL, 25--32.

[22]

Waibel, A., abd G. Hinton, T. H., Shikano, K., & Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328--339.

Cited By

Krütli DHanne T(2025)Augmenting LLMs to Securely Retrieve Information for Construction and Facility ManagementInformation10.3390/info1602007616:2(76)Online publication date: 22-Jan-2025
https://doi.org/10.3390/info16020076
Chen SLi QShi YLi X(2025)Debiased Device Sampling for Federated Edge Learning in Wireless NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.346474024:2(709-721)Online publication date: Feb-2025
https://doi.org/10.1109/TMC.2024.3464740
Cao HVo C(2025)Multitask Sequence-to-Sequence Learning for Preprocessing Medical Textstable2025 19th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM64595.2025.10857530(1-8)Online publication date: 3-Jan-2025
https://doi.org/10.1109/IMCOM64595.2025.10857530
Show More Cited By

Index Terms

A unified architecture for natural language processing: deep neural networks with multitask learning
1. Computing methodologies

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '08: Proceedings of the 25th international conference on Machine learning

July 2008

1310 pages

ISBN:9781605582054

DOI:10.1145/1390156

General Chair:
William Cohen
Carnegie Mellon University
,
Program Chairs:
Andrew McCallum
University of Massachusetts Amherst
,
Sam Roweis
University of Toronto and Google

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Pascal
University of Helsinki
Xerox
Federation of Finnish Learned Societies
Google Inc.
NSF
Machine Learning Journal/Springer
Microsoft Research: Microsoft Research
Intel: Intel
Yahoo!
Helsinki Institute for Information Technology
IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICML '08

Sponsor:

Microsoft Research
Intel
IBM

ICML '08: The 25th Annual International Conference on Machine Learning held in conjunction with the 2007 International Conference on Inductive Logic Programming

July 5 - 9, 2008

Helsinki, Finland

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2,988
Total Citations
View Citations
17,000
Total Downloads

Downloads (Last 12 months)697
Downloads (Last 6 weeks)50

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Krütli DHanne T(2025)Augmenting LLMs to Securely Retrieve Information for Construction and Facility ManagementInformation10.3390/info1602007616:2(76)Online publication date: 22-Jan-2025
https://doi.org/10.3390/info16020076
Chen SLi QShi YLi X(2025)Debiased Device Sampling for Federated Edge Learning in Wireless NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.346474024:2(709-721)Online publication date: Feb-2025
https://doi.org/10.1109/TMC.2024.3464740
Cao HVo C(2025)Multitask Sequence-to-Sequence Learning for Preprocessing Medical Textstable2025 19th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM64595.2025.10857530(1-8)Online publication date: 3-Jan-2025
https://doi.org/10.1109/IMCOM64595.2025.10857530
Amin TTanoli ZAadil FAwan KLim S(2025)Enhancing Essay Scoring: An Analytical and Holistic Approach With Few-Shot Transformer-Based ModelsIEEE Access10.1109/ACCESS.2025.353027213(12483-12501)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3530272
Feuerriegel SMaarouf ABär DGeissler DSchweisthal JPröllochs NRobertson CRathje SHartmann JMohammad SNetzer OSiegel APlank BVan Bavel J(2025)Using natural language processing to analyse text data in behavioural scienceNature Reviews Psychology10.1038/s44159-024-00392-zOnline publication date: 2-Jan-2025
https://doi.org/10.1038/s44159-024-00392-z
Yang YXuan ZChen SKang Y(2025)A dynamic decoder with speculative termination for low latency inference in spiking neural networksNeurocomputing10.1016/j.neucom.2025.129458624(129458)Online publication date: Apr-2025
https://doi.org/10.1016/j.neucom.2025.129458
Zhu KHu FDing YZhou WWang R(2025)A comprehensive review of network pruning based on pruning granularity and pruning time perspectivesNeurocomputing10.1016/j.neucom.2025.129382(129382)Online publication date: Jan-2025
https://doi.org/10.1016/j.neucom.2025.129382
Zhong XTan ZGao SLi JShen JJi JTang JChiu B(2025)SMSMO: Learning to generate multimodal summary for scientific papersKnowledge-Based Systems10.1016/j.knosys.2024.112908310(112908)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2024.112908
Cai BWang JTang XLi XHu NHu YLiu MMiao Q(2025)Complex network structural analysis based on information supplementation graph contrastive learningKnowledge-Based Systems10.1016/j.knosys.2024.112833309(112833)Online publication date: Jan-2025
https://doi.org/10.1016/j.knosys.2024.112833
Zeng XYi X(2025)Deep learning-based data fusion for evaluating water dividing coefficientsGeoenergy Science and Engineering10.1016/j.geoen.2024.213540246(213540)Online publication date: Mar-2025
https://doi.org/10.1016/j.geoen.2024.213540
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten