Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2390940.2390943dlproceedingsArticle/Chapter ViewAbstractPublication PageswlmConference Proceedingsconference-collections
research-article
Free access

Deep neural network language models

Published: 08 June 2012 Publication History

Abstract

In recent years, neural network language models (NNLMs) have shown success in both peplexity and word error rate (WER) compared to conventional n-gram language models. Most NNLMs are trained with one hidden layer. Deep neural networks (DNNs) with more hidden layers have been shown to capture higher-level discriminative information about input features, and thus produce better networks. Motivated by the success of DNNs in acoustic modeling, we explore deep neural network language models (DNN LMs) in this paper. Results on a Wall Street Journal (WSJ) task demonstrate that DNN LMs offer improvements over a single hidden layer NNLM. Furthermore, our preliminary results are competitive with a model M language model, considered to be one of the current state-of-the-art techniques for language modeling.

References

[1]
Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3: 1137--1155.
[2]
Yoshua Bengio. 2007. Learning Deep Architectures for AI. Technical report, Universit e de Montreal.
[3]
S. F. Chen and J. Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4).
[4]
Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, and Abhinav Sethy. 2009. Scaling shrinkage-based language models. In Proc. ASRU 2009, pages 299--304, Merano, Italy, December.
[5]
Stanley F. Chen. 2008. Performance prediction for exponential language models. Technical Report RC 24671, IBM Research Division.
[6]
George E. Dahl, Marc'Aurelio Ranzato, Abdel rahman Mohamed, and Geoffrey E. Hinton. 2010. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine. In Proc. NIPS.
[7]
Ahmad Emami. 2006. A neural syntactic language model. Ph.D. thesis, Johns Hopkins University, Baltimore, MD, USA.
[8]
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18: 1527--1554.
[9]
H-K. J. Kuo, L. Mangu, A. Emami, I. Zitouni, and Y-S. Lee. 2009. Syntactic features for Arabic speech recognition. In Proc. ASRU 2009, pages 327--332, Merano, Italy.
[10]
Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proc. INTERSPEECH 2010, pages 1045--1048.
[11]
Tomas Mikolov, Anoop Deoras, Daniel Povey, Lukas Burget, and Jan Cernocky. 2011a. Strategies for training large scale neural network language models. In Proc. ASRU 2011, pages 196--201.
[12]
Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. 2011b. Extensions of recurrent neural network language model. In Proc. ICASSP 2011, pages 5528--5531.
[13]
Andriy Mnih and Geoffrey Hinton. 2008. A scalable hierarchical distributed language model. In Proc. NIPS.
[14]
Abdel-rahman Mohamed, George E. Dahl, and Geoffrey Hinton. 2009. Deep belief networks for phone recognition. In Proc. NIPS Workshop on Deep Learning for Speech Recognition and Related Applications.
[15]
Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. In Proc. AISTATS05, pages 246--252.
[16]
Douglas B. Paul and Janet M. Baker. 1992. The design for the wall street journal-based csr corpus. In Proc. DARPA Speech and Natural Language Workshop, page 357362.
[17]
Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramabhadran. 2012. Improvements in Using Deep Belief Networks for Large Vocabulary Continuous Speech Recognition. Technical report, IBM, Speech and Language Algorithms Group.
[18]
Ruhi Sarikaya, Mohamed Afify, and Brian Kingsbury. 2009. Tied-mixture language modeling in continuous space. In HLT-NAACL, pages 459--467.
[19]
Holger Schwenk and Jean-Luc Gauvain. 2005. Training neural network language models on very large corpora. In Proc. HLT-EMNLP 2005, pages 201--208.
[20]
Holger Schwenk. 2007. Continuous space language models. Comput. Speech Lang., 21(3): 492--518, July.
[21]
Frank Seide, Gang Li, Xie Chen, and Dong Yu. 2011. Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription. In Proc. ASRU.
[22]
Hagen Soltau, George. Saon, and Brian Kingsbury. 2010. The IBM Attila speech recognition toolkit. In Proc. IEEE Workshop on Spoken Language Technology, pages 97--102.
[23]
Hai Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, and Francois Yvon. 2011. Structured output layer neural network language model. In Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, pages 5524--5527, Prague, Czech Republic.
[24]
Andreas Stolcke. 1998. Entropy-based pruning of backoff language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pages 270--274, Lansdowne, VA, USA.

Cited By

View all
  • (2024)TransCompressor: LLM-Powered Multimodal Data Compression for Smart TransportationProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3698120(2335-2340)Online publication date: 4-Dec-2024
  • (2023)Content-based and Knowledge-enriched Representations for Classification Across Modalities: A SurveyACM Computing Surveys10.1145/358368255:14s(1-40)Online publication date: 13-Feb-2023
  • (2022)A Novel Sentence Completion System for Punjabi Using Deep Neural NetworksInternational Journal of Software Innovation10.4018/IJSI.29327110:1(1-25)Online publication date: 29-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
WLM '12: Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
June 2012
68 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 08 June 2012

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)18
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)TransCompressor: LLM-Powered Multimodal Data Compression for Smart TransportationProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3698120(2335-2340)Online publication date: 4-Dec-2024
  • (2023)Content-based and Knowledge-enriched Representations for Classification Across Modalities: A SurveyACM Computing Surveys10.1145/358368255:14s(1-40)Online publication date: 13-Feb-2023
  • (2022)A Novel Sentence Completion System for Punjabi Using Deep Neural NetworksInternational Journal of Software Innovation10.4018/IJSI.29327110:1(1-25)Online publication date: 29-Apr-2022
  • (2021)Semantic matching of GUI events for test reuse: are we there yet?Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464827(177-190)Online publication date: 11-Jul-2021
  • (2020)Context-Dependent Sequence-to-Sequence Turkish Spelling CorrectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/338320019:4(1-16)Online publication date: 17-Apr-2020
  • (2019)Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and AlignmentIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.288881427:3(572-582)Online publication date: 1-Mar-2019
  • (2019)A Comparative Study of Deep Learning Techniques on Frame-Level Speech Data ClassificationCircuits, Systems, and Signal Processing10.1007/s00034-019-01130-038:8(3501-3520)Online publication date: 1-Aug-2019
  • (2018)Research on license plate detection algorithm based on SSDProceedings of the 2nd International Conference on Advances in Image Processing10.1145/3239576.3239618(19-23)Online publication date: 16-Jun-2018
  • (2018)Complementing global and local contexts in representing API descriptions to improve API retrieval tasksProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3236036(551-562)Online publication date: 26-Oct-2018
  • (2018)Deep learning for multisensorial and multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3107990.3107996(99-128)Online publication date: 1-Oct-2018
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media