article

Using continuous features in the maximum entropy model

Authors:

Alex AceroAuthors Info & Claims

Pattern Recognition Letters, Volume 30, Issue 14

Pages 1295 - 1300

https://doi.org/10.1016/j.patrec.2009.06.005

Published: 01 October 2009 Publication History

Abstract

We investigate the problem of using continuous features in the maximum entropy (MaxEnt) model. We explain why the MaxEnt model with the moment constraint (MaxEnt-MC) works well with binary features but not with the continuous features. We describe how to enhance constraints on the continuous features and show that the weights associated with the continuous features should be continuous functions instead of single values. We propose a spline-based solution to the MaxEnt model with non-linear continuous weighting functions and illustrate that the optimization problem can be converted into a standard log-linear model at a higher-dimensional space. The empirical results on two classification tasks that contain continuous features are reported. The results confirm our insight and show that our proposed solution consistently outperforms the MaxEnt-MC model and the bucketing approach with significant margins.

References

[1]

Parameter estimation for a computable general equilibrium model: A maximum entropy approach. Econ. Model. v19 i3. 375-398.

[2]

Asuncion, A., Newman, D.J., 2007. UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science. <http://www.ics.uci.edu/~mlearn/MLRepository.html>.

[3]

A maximum entropy approach to natural language processing. Comput. Linguist. v22. 39-71.

Digital Library

[4]

Chen, S.F., Rosenfeld, R., 1999. A gaussian prior for smoothing maximum entropy models. In: Technical Report CMU-CS-99-108, Carnegie Mellon University.

[5]

A survey of smoothing techniques for ME models. IEEE Trans. Speech Audio Process. v8 i1. 37-50.

[6]

Generalized iterative scaling for log-linear models. Ann. Math. Statist. v43. 1470-1480.

[7]

Deng, L., Li, X., Yu, D., Acero, A., 2005. A hidden trajectory model with bi-directional target-filtering: Cascaded vs. integrated implementation for phonetic recognition, In: Proc. of ICASSP 2005, vol. 1, pp. 337-340.

[8]

Maximum entropy model-based baseball highlight detection and classification. Computer Vision and Image Understanding. v96 i2. 181-199.

Digital Library

[9]

Goodman, J., 2004. Exponential priors for maximum entropy models. In: Proc. of the HLT-NAACL, pp. 305-311.

[10]

Gu, Y., McCallum, A., Towsley, D., 2005. Detecting anomalies in network traffic using maximum entropy estimation. In: Proc. of Internet Measurement Conf., pp. 345-350.

Digital Library

[11]

The principle of maximum entropy. Math. Intell. v7 i1.

[12]

A maximum entropy model of phonotactics and phonotactic learning. Linguist. Inq. v39 i3. 379-440.

[13]

Kazama, J., 2004. Improving maximum entropy natural language processing by uncertainty-aware extensions and unsupervised learning. Ph.D. Thesis, University of Tokyo.

[14]

Maximum entropy models with inequality constraints: A case study on text categorization. Mach. Learn. v60 i1-3. 159-194.

Digital Library

[15]

Ma, C., Nguyen, P., Mahajan, M., 2007. Finding speaker identities with a conditional maximum entropy model. In: Proc. of ICASSP 2007, vol. IV, pp. 261-264.

[16]

Mahajan, M., Gunawardana, A., Acero, A., 2006. Training algorithms for hidden conditional random fields. In: Proc. of ICASSP 2006, vol. I, pp. 273-276.

[17]

Malouf, R., 2002. A comparison of algorithms for maximum entropy parameter estimation. In: Proc. of CoNLL, vol. 20, pp. 1-7.

Digital Library

[18]

Updating quasi-newton matrices with limited storage. Math. Comput. v35. 773-782.

[19]

Och, F.J., Ney, H., 2002. Discriminative training and maximum entropy models for statistical machine translation. In: Proc. of the 40th Annual Meeting of the ACL, pp. 295-302.

Digital Library

[20]

Riedmiller, M., Braun, H., 1993. A direct adaptive method for faster back-propagation learning: The RPROP algorithm. In: Proc. of IEEE ICNN, vol. 1, pp. 586-591.

[21]

A maximum entropy approach to adaptive statistical language modeling. Comput. Speech Lang. v10. 187-228.

[22]

Yu, D., Mahajan, M., Mau, P., Acero, A., 2005a. Maximum entropy based generic filter for language model adaptation. In: Proc. of ICASSP 2005, vol. I, pp. 597-600.

[23]

Yu, D., Deng, L. Acero, A., 2005b. Evaluation of a long-contextual-span hidden trajectory model and phonetic recognizer using A* lattice search, In: Proc. of Interspeech 2005, pp. 553-556.

[24]

Structured speech modeling. IEEE Trans. Audio, Speech, Lang. Process. v14 i5. 1492-1504.

Digital Library

[25]

Yu, D., Deng, L., Gong, Y., Acero, A., 2008. Discriminative training of variable-parameter hmms for noise robust speech recognition. In: Proc. of Interspeech 2008, vol. I, pp. 285-288.

[26]

Yu, D., Deng, L., Acero, A., 2009. Hidden conditional random field with distribution constraints for phone classification, In: Proc. of Interspeech 2009.

[27]

Yu, D., Deng, L., Gong, Y., Acero, A., in press. A novel framework and training algorithm for variable-parameter hidden markov models. IEEE Trans. Audio, Speech, Lang. Process.

Digital Library

Cited By

Li JRao YJin FChen HXiang X(2016)Multi-label maximum entropy model for social emotion classification over short textNeurocomputing10.1016/j.neucom.2016.03.088210:C(247-256)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1016/j.neucom.2016.03.088
Rao YXie HLi JJin FWang FLi Q(2016)Social emotion classification of short text via topic-level maximum entropy modelInformation and Management10.1016/j.im.2016.04.00553:8(978-986)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1016/j.im.2016.04.005
Hongyu Guo (2015)Accelerated Continuous Conditional Random Fields For Load ForecastingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.239931127:8(2023-2033)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.1109/TKDE.2015.2399311

Recommendations

Efficient feature selection based on correlation measure between continuous and discrete features

Feature selection is frequently used to reduce the number of features in many applications where data of high dimensionality are involved. Lots of the feature selection methods mainly focus on measuring the correlation (or similarity) between two ...
Feature selection algorithm for mixed data with both nominal and continuous features

Feature selection is a crucial step in pattern recognition. Most feature selection algorithms reported are developed for continuous features. In this paper, we propose a feature selection algorithm for mixed-typed data containing both continuous and ...
Selecting discrete and continuous features based on neighborhood decision error minimization

Feature selection plays an important role in pattern recognition and machine learning. Feature evaluation and classification complexity estimation arise as key issues in the construction of selection algorithms. To estimate classification complexity in ...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition Letters

Pattern Recognition Letters Volume 30, Issue 14

October, 2009

95 pages

ISSN:0167-8655

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2009.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 October 2009

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li JRao YJin FChen HXiang X(2016)Multi-label maximum entropy model for social emotion classification over short textNeurocomputing10.1016/j.neucom.2016.03.088210:C(247-256)Online publication date: 19-Oct-2016
https://dl.acm.org/doi/10.1016/j.neucom.2016.03.088
Rao YXie HLi JJin FWang FLi Q(2016)Social emotion classification of short text via topic-level maximum entropy modelInformation and Management10.1016/j.im.2016.04.00553:8(978-986)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1016/j.im.2016.04.005
Hongyu Guo (2015)Accelerated Continuous Conditional Random Fields For Load ForecastingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2015.239931127:8(2023-2033)Online publication date: 1-Aug-2015
https://dl.acm.org/doi/10.1109/TKDE.2015.2399311

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents