Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1390156.1390177acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

A unified architecture for natural language processing: deep neural networks with multitask learning

Published: 05 July 2008 Publication History

Abstract

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

References

[1]
Ando, R., & Zhang, T. (2005). A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. JMLR, 6, 1817--1853.
[2]
Bengio, Y., & Ducharme, R. (2001). A neural probabilistic language model. NIPS 13.
[3]
Bridle, J. (1990). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. F. Soulié and J. Hérault (Eds.), Neurocomputing: Algorithms, architectures and applications, 227--236. NATO ASI Series.
[4]
Caruana, R. (1997). Multitask Learning. Machine Learning, 28, 41--75.
[5]
Chapelle, O., Schlkopf, B., & Zien, A. (2006). Semi-supervised learning. Adaptive computation and machine learning. Cambridge, Mass., USA: MIT Press.
[6]
Collobert, R., & Weston, J. (2007). Fast semantic extraction using a novel neural network architecture. Proceedings of the 45th Annual Meeting of the ACL (pp. 560--567).
[7]
Gildea, D., & Palmer, M. (2001). The necessity of parsing for predicate argument recognition. Proceedings of the 40th Annual Meeting of the ACL, 239--246.
[8]
Joachims, T. (1999). Transductive inference for text classification using support vector machines. ICML.
[9]
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86.
[10]
McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. Proceedings of HLTNAACL 2006.
[11]
Miller, S., Fox, H., Ramshaw, L., & Weischedel, R. (2000). A novel use of statistical parsing to extract information from text. 6th Applied Natural Language Processing Conference.
[12]
Musillo, G., & Merlo, P. (2006). Robust Parsing of the Proposition Bank. ROMAND 2006: Robust Methods in Analysis of Natural language Data.
[13]
Okanohara, D., & Tsujii, J. (2007). A discriminative language model with pseudo-negative samples. Proceedings of the 45th Annual Meeting of the ACL, 73--80.
[14]
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Comput. Linguist., 31, 71--106.
[15]
Pradhan, S., Ward, W., Hacioglu, K., Martin, J., & Jurafsky, D. (2004). Shallow semantic parsing using support vector machines. Proceedings of HLT/NAACL-2004.
[16]
Rosenfeld, B., & Feldman, R. (2007). Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web. Proceedings of the 45th Annual Meeting of the ACL, 600--607.
[17]
Schwenk, H., & Gauvain, J. (2002). Connectionist language modeling for large vocabulary continuous speech recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 765--768).
[18]
Sutton, C., & McCallum, A. (2005a). Composition of conditional random fields for transfer learning. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, 748--754.
[19]
Sutton, C., & McCallum, A. (2005b). Joint parsing and semantic role labeling. Proceedings of CoNLL-2005 (pp. 225--228).
[20]
Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. JMLR, 8, 693--723.
[21]
Ueffing, N., Haffari, G., & Sarkar, A. (2007). Transductive learning for statistical machine translation. Proceedings of the 45th Annual Meeting of the ACL, 25--32.
[22]
Waibel, A., abd G. Hinton, T. H., Shikano, K., & Lang, K. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 328--339.

Cited By

View all
  • (2025)Augmenting LLMs to Securely Retrieve Information for Construction and Facility ManagementInformation10.3390/info1602007616:2(76)Online publication date: 22-Jan-2025
  • (2025)Debiased Device Sampling for Federated Edge Learning in Wireless NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.346474024:2(709-721)Online publication date: Feb-2025
  • (2025)Multitask Sequence-to-Sequence Learning for Preprocessing Medical Textstable2025 19th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM64595.2025.10857530(1-8)Online publication date: 3-Jan-2025
  • Show More Cited By

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '08: Proceedings of the 25th international conference on Machine learning
July 2008
1310 pages
ISBN:9781605582054
DOI:10.1145/1390156
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Pascal
  • University of Helsinki
  • Xerox
  • Federation of Finnish Learned Societies
  • Google Inc.
  • NSF
  • Machine Learning Journal/Springer
  • Microsoft Research: Microsoft Research
  • Intel: Intel
  • Yahoo!
  • Helsinki Institute for Information Technology
  • IBM: IBM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2008

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '08
Sponsor:
  • Microsoft Research
  • Intel
  • IBM

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)697
  • Downloads (Last 6 weeks)50
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Augmenting LLMs to Securely Retrieve Information for Construction and Facility ManagementInformation10.3390/info1602007616:2(76)Online publication date: 22-Jan-2025
  • (2025)Debiased Device Sampling for Federated Edge Learning in Wireless NetworksIEEE Transactions on Mobile Computing10.1109/TMC.2024.346474024:2(709-721)Online publication date: Feb-2025
  • (2025)Multitask Sequence-to-Sequence Learning for Preprocessing Medical Textstable2025 19th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM64595.2025.10857530(1-8)Online publication date: 3-Jan-2025
  • (2025)Enhancing Essay Scoring: An Analytical and Holistic Approach With Few-Shot Transformer-Based ModelsIEEE Access10.1109/ACCESS.2025.353027213(12483-12501)Online publication date: 2025
  • (2025)Using natural language processing to analyse text data in behavioural scienceNature Reviews Psychology10.1038/s44159-024-00392-zOnline publication date: 2-Jan-2025
  • (2025)A dynamic decoder with speculative termination for low latency inference in spiking neural networksNeurocomputing10.1016/j.neucom.2025.129458624(129458)Online publication date: Apr-2025
  • (2025)A comprehensive review of network pruning based on pruning granularity and pruning time perspectivesNeurocomputing10.1016/j.neucom.2025.129382(129382)Online publication date: Jan-2025
  • (2025)SMSMO: Learning to generate multimodal summary for scientific papersKnowledge-Based Systems10.1016/j.knosys.2024.112908310(112908)Online publication date: Feb-2025
  • (2025)Complex network structural analysis based on information supplementation graph contrastive learningKnowledge-Based Systems10.1016/j.knosys.2024.112833309(112833)Online publication date: Jan-2025
  • (2025)Deep learning-based data fusion for evaluating water dividing coefficientsGeoenergy Science and Engineering10.1016/j.geoen.2024.213540246(213540)Online publication date: Mar-2025
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media