research-article

Free access

Deep neural network language models

Authors:

Tara N. Sainath,

Brian Kingsbury,

Bhuvana RamabhadranAuthors Info & Claims

WLM '12: Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

Pages 20 - 28

Published: 08 June 2012 Publication History

Abstract

In recent years, neural network language models (NNLMs) have shown success in both peplexity and word error rate (WER) compared to conventional n-gram language models. Most NNLMs are trained with one hidden layer. Deep neural networks (DNNs) with more hidden layers have been shown to capture higher-level discriminative information about input features, and thus produce better networks. Motivated by the success of DNNs in acoustic modeling, we explore deep neural network language models (DNN LMs) in this paper. Results on a Wall Street Journal (WSJ) task demonstrate that DNN LMs offer improvements over a single hidden layer NNLM. Furthermore, our preliminary results are competitive with a model M language model, considered to be one of the current state-of-the-art techniques for language modeling.

References

[1]

Yoshua Bengio, Rejean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3: 1137--1155.

Digital Library

[2]

Yoshua Bengio. 2007. Learning Deep Architectures for AI. Technical report, Universit e de Montreal.

[3]

S. F. Chen and J. Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech and Language, 13(4).

[4]

Stanley F. Chen, Lidia Mangu, Bhuvana Ramabhadran, Ruhi Sarikaya, and Abhinav Sethy. 2009. Scaling shrinkage-based language models. In Proc. ASRU 2009, pages 299--304, Merano, Italy, December.

[5]

Stanley F. Chen. 2008. Performance prediction for exponential language models. Technical Report RC 24671, IBM Research Division.

[6]

George E. Dahl, Marc'Aurelio Ranzato, Abdel rahman Mohamed, and Geoffrey E. Hinton. 2010. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine. In Proc. NIPS.

[7]

Ahmad Emami. 2006. A neural syntactic language model. Ph.D. thesis, Johns Hopkins University, Baltimore, MD, USA.

Digital Library

[8]

Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation, 18: 1527--1554.

Digital Library

[9]

H-K. J. Kuo, L. Mangu, A. Emami, I. Zitouni, and Y-S. Lee. 2009. Syntactic features for Arabic speech recognition. In Proc. ASRU 2009, pages 327--332, Merano, Italy.

[10]

Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proc. INTERSPEECH 2010, pages 1045--1048.

[11]

Tomas Mikolov, Anoop Deoras, Daniel Povey, Lukas Burget, and Jan Cernocky. 2011a. Strategies for training large scale neural network language models. In Proc. ASRU 2011, pages 196--201.

[12]

Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. 2011b. Extensions of recurrent neural network language model. In Proc. ICASSP 2011, pages 5528--5531.

[13]

Andriy Mnih and Geoffrey Hinton. 2008. A scalable hierarchical distributed language model. In Proc. NIPS.

[14]

Abdel-rahman Mohamed, George E. Dahl, and Geoffrey Hinton. 2009. Deep belief networks for phone recognition. In Proc. NIPS Workshop on Deep Learning for Speech Recognition and Related Applications.

[15]

Frederic Morin and Yoshua Bengio. 2005. Hierarchical probabilistic neural network language model. In Proc. AISTATS05, pages 246--252.

[16]

Douglas B. Paul and Janet M. Baker. 1992. The design for the wall street journal-based csr corpus. In Proc. DARPA Speech and Natural Language Workshop, page 357362.

Digital Library

[17]

Tara N. Sainath, Brian Kingsbury, and Bhuvana Ramabhadran. 2012. Improvements in Using Deep Belief Networks for Large Vocabulary Continuous Speech Recognition. Technical report, IBM, Speech and Language Algorithms Group.

[18]

Ruhi Sarikaya, Mohamed Afify, and Brian Kingsbury. 2009. Tied-mixture language modeling in continuous space. In HLT-NAACL, pages 459--467.

Digital Library

[19]

Holger Schwenk and Jean-Luc Gauvain. 2005. Training neural network language models on very large corpora. In Proc. HLT-EMNLP 2005, pages 201--208.

Digital Library

[20]

Holger Schwenk. 2007. Continuous space language models. Comput. Speech Lang., 21(3): 492--518, July.

Digital Library

[21]

Frank Seide, Gang Li, Xie Chen, and Dong Yu. 2011. Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription. In Proc. ASRU.

[22]

Hagen Soltau, George. Saon, and Brian Kingsbury. 2010. The IBM Attila speech recognition toolkit. In Proc. IEEE Workshop on Spoken Language Technology, pages 97--102.

[23]

Hai Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, and Francois Yvon. 2011. Structured output layer neural network language model. In Proceedings of IEEE International Conference on Acoustic, Speech and Signal Processing, pages 5524--5527, Prague, Czech Republic.

[24]

Andreas Stolcke. 1998. Entropy-based pruning of backoff language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pages 270--274, Lansdowne, VA, USA.

Cited By

Yang HWu RXu WGanesan DLane NShi W(2024)TransCompressor: LLM-Powered Multimodal Data Compression for Smart TransportationProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3698120(2335-2340)Online publication date: 4-Dec-2024
https://dl.acm.org/doi/10.1145/3636534.3698120
Pittaras NGiannakopoulos GStamatopoulos PKarkaletsis V(2023)Content-based and Knowledge-enriched Representations for Classification Across Modalities: A SurveyACM Computing Surveys10.1145/358368255:14s(1-40)Online publication date: 13-Feb-2023
https://dl.acm.org/doi/10.1145/3583682
Mahi GVerma A(2022)A Novel Sentence Completion System for Punjabi Using Deep Neural NetworksInternational Journal of Software Innovation10.4018/IJSI.29327110:1(1-25)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.4018/IJSI.293271
Show More Cited By

Recommendations

Enhancing recurrent neural network-based language models by word tokenization

Different approaches have been used to estimate language models from a given corpus. Recently, researchers have used different neural network architectures to estimate the language models from a given corpus using unsupervised learning neural networks ...
Edge-preserving image denoising using a deep convolutional neural network
Highlights
- This paper makes use of a deep CNN for image denoising.
- The network is trained ...
Abstract
This paper introduces a novel denoising approach making use of a deep convolutional neural network to preserve image edges. The network is trained by using the edge map obtained from the well-known Canny algorithm and aims at ...
Artificial wavelet neural network and its application in neuro-fuzzy models

In the proposed work, two types of artificial neural networks are proposed by using well-known advantages and valuable features of wavelets and sigmoidal activation functions. Two neurons are derived by adding and multiplying the outputs of the wavelet ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

WLM '12: Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

June 2012

68 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 08 June 2012

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
1,552
Total Downloads

Downloads (Last 12 months)86
Downloads (Last 6 weeks)18

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang HWu RXu WGanesan DLane NShi W(2024)TransCompressor: LLM-Powered Multimodal Data Compression for Smart TransportationProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3698120(2335-2340)Online publication date: 4-Dec-2024
https://dl.acm.org/doi/10.1145/3636534.3698120
Pittaras NGiannakopoulos GStamatopoulos PKarkaletsis V(2023)Content-based and Knowledge-enriched Representations for Classification Across Modalities: A SurveyACM Computing Surveys10.1145/358368255:14s(1-40)Online publication date: 13-Feb-2023
https://dl.acm.org/doi/10.1145/3583682
Mahi GVerma A(2022)A Novel Sentence Completion System for Punjabi Using Deep Neural NetworksInternational Journal of Software Innovation10.4018/IJSI.29327110:1(1-25)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.4018/IJSI.293271
Mariani LMohebbi APezzè MTerragni VCadar CZhang X(2021)Semantic matching of GUI events for test reuse: are we there yet?Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3460319.3464827(177-190)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3460319.3464827
Büyük O(2020)Context-Dependent Sequence-to-Sequence Turkish Spelling CorrectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/338320019:4(1-16)Online publication date: 17-Apr-2020
https://dl.acm.org/doi/10.1145/3383200
Deena SHasan MDoulaty MSaz OHain T(2019)Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and AlignmentIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2018.288881427:3(572-582)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/TASLP.2018.2888814
Sabzi Shahrebabaki AImran AOlfati NSvendsen T(2019)A Comparative Study of Deep Learning Techniques on Frame-Level Speech Data ClassificationCircuits, Systems, and Signal Processing10.1007/s00034-019-01130-038:8(3501-3520)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s00034-019-01130-0
Wu PLin Y(2018)Research on license plate detection algorithm based on SSDProceedings of the 2nd International Conference on Advances in Image Processing10.1145/3239576.3239618(19-23)Online publication date: 16-Jun-2018
https://dl.acm.org/doi/10.1145/3239576.3239618
Nguyen TTran NPhan HNguyen TTruong LNguyen ANguyen HNguyen TLeavens GGarcia APăsăreanu C(2018)Complementing global and local contexts in representing API descriptions to improve API retrieval tasksProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3236036(551-562)Online publication date: 26-Oct-2018
https://dl.acm.org/doi/10.1145/3236024.3236036
Keren GMousa APietquin OZafeiriou SSchuller B(2018)Deep learning for multisensorial and multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3107990.3107996(99-128)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3107990.3107996
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten