Structured Regularizer for Neural Higher-Order Sequence Models

Ratajczak, Martin; Tschiatschek, Sebastian; Pernkopf, Franz

doi:10.1007/978-3-319-23528-8_11

Martin Ratajczak¹⁰,
Sebastian Tschiatschek¹¹ &
Franz Pernkopf¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9284))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4853 Accesses
1 Citations

Abstract

We introduce both joint training of neural higher-order linear-chain conditional random fields (NHO-LC-CRFs) and a new structured regularizer for sequence modelling. We show that this regularizer can be derived as lower bound from a mixture of models sharing parts, e.g. neural sub-networks, and relate it to ensemble learning. Furthermore, it can be expressed explicitly as regularization term in the training objective.

We exemplify its effectiveness by exploring the introduced NHO-LC-CRFs for sequence labeling. Higher-order LC-CRFs with linear factors are well-established for that task, but they lack the ability to model non-linear dependencies. These non-linear dependencies, however, can be efficiently modeled by neural higher-order input-dependent factors. Experimental results for phoneme classification with NHO-LC-CRFs confirm this fact and we achieve state-of-the-art phoneme error rate of \(16.7\%\) on TIMIT using the new structured regularizer.

This work was supported by the Austrian Science Fund (FWF) under the project number P25244-N15. Furthermore, we acknowledge NVIDIA for providing GPU computing resources.

Download to read the full chapter text

Chapter PDF

Linear Co-occurrence Rate Networks (L-CRNs) for Sequence Labeling

Sequence Segmentation Using Semi-Markov Conditional Random Fields

Article 20 March 2019

Sequence-Discriminative Training of Neural Networks

Keywords

References

Bachman, P., Alsharif, O., Precup, D.: Learning with pseudo-ensembles. In: Advances in Neural Information Processing Systems, pp. 3365–3373 (2014)
Google Scholar
Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (2010). Oral Presentation
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)
Google Scholar
Chang, H.A., Glass, J.R.: Hierarchical large-margin Gaussian mixture models for phonetic classification. In: Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 272–277 (2007)
Google Scholar
Do, T.M.T., Artières, T.: Neural conditional random fields. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 177–184 (2010)
Google Scholar
Gens, R., Domingos, P.: Learning the structure of sum-product networks. In: International Conference on Machine Learning (ICML), vol. 28, pp. 873–880 (2013)
Google Scholar
Gunawardana, A., Mahajan, M., Acero, A., Platt, J.C.: Hidden conditional random fields for phone classification. In: Interspeech, pp. 1117–1120 (2005)
Google Scholar
Halberstadt, A.K., Glass, J.R.: Heterogeneous acoustic measurements for phonetic classification. In: EUROSPEECH, pp. 401–404 (1997)
Google Scholar
He, X., Zemel, R.S., Carreira-Perpiñán, M.A.: Multiscale conditional random fields for image labeling. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 695–703 (2004)
Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine, pp. 82–97 (2012)
Google Scholar
Hinton, G.E.: Products of experts. In: International Conference on Artificial Neural Networks (ICANN), pp. 1–6 (1999)
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (ICML), pp. 282–289 (2001)
Google Scholar
Lafferty, J., Zhu, X., Liu, Y.: Kernel conditional random fields: representation and clique selection. In: International Conference on Machine Learning (ICML), p. 64 (2004)
Google Scholar
Larochelle, H., Bengio, Y.: Classification using discriminative restricted Boltzmann machines. In: International Conference on Machine Learning (ICML), pp. 536–543 (2008)
Google Scholar
Lavergne, T., Allauzen, A., Crego, J.M., Yvon, F.: From n-gram-based to crf-based translation models. In: Workshop on Statistical Machine Translation, pp. 542–553 (2011)
Google Scholar
Li, Y., Tarlow, D., Zemel, R.S.: Exploring compositional high order pattern potentials for structured output learning. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49–56 (2013)
Google Scholar
Maaten, L., Chen, M., Tyree, S., Weinberger, K.Q. In: International Conference on Machine Learning (ICML), pp. 410–418 (2013)
Google Scholar
van der Maaten, L., Welling, M., Saul, L.K.: Hidden-unit conditional random fields. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 479–488 (2011)
Google Scholar
Peng, J., Bo, L., Xu, J.: Conditional neural fields. In: Neural Information Processing Systems (NIPS), pp. 1419–1427 (2009)
Google Scholar
Poon, H., Domingos, P.: Sum-product networks: a new deep architecture. In: Uncertainty in Artificial Intelligence (UAI), pp. 337–346 (2011)
Google Scholar
Prabhavalkar, R., Fosler-Lussier, E.: Backpropagation training for multilayer conditional random field based phone recognition. In: International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5534–5537 (2010)
Google Scholar
Qian, X., Jiang, X., Zhang, Q., Huang, X., Wu, L.: Sparse higher order conditional random fields for improved sequence labeling. In: International Conference on Machine Learning (ICML), pp. 849–856 (2009)
Google Scholar
Ratajczak, M., Tschiatschek, S., Pernkopf, F.: Sum-product networks for structured prediction: context-specific deep conditional random fields. In: International Conference on Machine Learning (ICML): Workshop on Learning Tractable Probabilistic Models Workshop (2014)
Google Scholar
Ratajczak, M., Tschiatschek, S., Pernkopf, F.: Neural higher-order factors in conditional random fields for phoneme classification. In: Interspeech (2015)
Google Scholar
Roth, S., Black, M.J.: Fields of experts: a framework for learning image priors. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 860–867 (2005)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. chap. Learning Internal Representations by Error Propagation, pp. 318–362 (1986)
Google Scholar
Schuppler, B.: Automatic Analysis of Acoustic Reduction in Spontaneous Speech. Ph.D. thesis, PhD Thesis, Radboud University Nijmegen, Netherlands (2011)
Google Scholar
Sha, F., Saul, L.: Large margin Gaussian mixture modeling for phonetic classification and recognition. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 265–268 (2006)
Google Scholar
Stewart, L., He, X., Zemel, R.S.: Learning flexible features for conditional random fields. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1415–1426 (2008)
Article Google Scholar
Sung, Y.H., Boulis, C., Manning, C., Jurafsky, D.: Regularization, adaptation, and non-independent features improve hidden conditional random fields for phone classification. In: Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 347–352 (2007)
Google Scholar
Wager, S., Wang, S., Liang, P.: Dropout training as adaptive regularization. In: Neural Information Processing Systems (NIPS), pp. 351–359 (2013)
Google Scholar
Wan, L., Zeiler, M., Zhang, S., Cun, Y.L., Fergus, R.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning (ICML), pp. 1058–1066 (2013)
Google Scholar
Xu, P., Sarikaya, R.: Convolutional neural network based triangular crf for joint intent detection and slot filling. In: Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 78–83 (2013)
Google Scholar
Ye, N., Lee, W.S., Chieu, H.L., Wu, D.: Conditional random fields with high-order features for sequence labeling. In: Neural Information Processing Systems (NIPS), pp. 2196–2204 (2009)
Google Scholar
Zue, V., Seneff, S., Glass, J.R.: Speech database development at MIT: Timit and beyond. Speech Communication 9(4), 351–356 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Signal Processing and Speech Communication Laboratory, Graz University of Technology, Graz, Austria
Martin Ratajczak & Franz Pernkopf
Learning and Adaptive Systems Group, ETH Zurich, Zurich, Switzerland
Sebastian Tschiatschek

Authors

Martin Ratajczak
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Tschiatschek
View author publications
You can also search for this author in PubMed Google Scholar
Franz Pernkopf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Ratajczak .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
University of Porto - CRACS/INESC TEC, Porto, Portugal
Vítor Santos Costa
University of Porto - INESC TEC, Porto, Portugal
Carlos Soares
University of Porto - INESC TEC, Porto, Portugal
João Gama
University of Porto - INESC TEC, Porto, Portugal
Alípio Jorge

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ratajczak, M., Tschiatschek, S., Pernkopf, F. (2015). Structured Regularizer for Neural Higher-Order Sequence Models. In: Appice, A., Rodrigues, P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9284. Springer, Cham. https://doi.org/10.1007/978-3-319-23528-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-23528-8_11
Published: 29 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23527-1
Online ISBN: 978-3-319-23528-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Structured Regularizer for Neural Higher-Order Sequence Models

Abstract

Chapter PDF

Similar content being viewed by others

Linear Co-occurrence Rate Networks (L-CRNs) for Sequence Labeling

Sequence Segmentation Using Semi-Markov Conditional Random Fields

Sequence-Discriminative Training of Neural Networks

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Structured Regularizer for Neural Higher-Order Sequence Models

Abstract

Chapter PDF

Similar content being viewed by others

Linear Co-occurrence Rate Networks (L-CRNs) for Sequence Labeling

Sequence Segmentation Using Semi-Markov Conditional Random Fields

Sequence-Discriminative Training of Neural Networks

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation