Abstract
We augment the naive Bayes model with an n-gram language model to address two shortcomings of naive Bayes text classifiers. The chain augmented naive Bayes classifiers we propose have two advantages over standard naive Bayes classifiers. First, a chain augmented naive Bayes model relaxes some of the independence assumptions of naive Bayes— allowing a local Markov chain dependence in the observed variables—while still permitting efficient inference and learning. Second, smoothing techniques from statistical language modeling can be used to recover better estimates than the Laplace smoothing techniques usually used in naive Bayes classification. Our experimental results on three real world data sets show that we achieve substantial improvements over standard naive Bayes classification, while also achieving state of the art performance that competes with the best known methods in these cases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
T. Bell, J. Cleary and I. Witten. (1990). Text Compression. Prentice Hall.
S. Chen and J. Goodman. (1998). An Empirical Study of Smoothing Techniques for Language Modeling. Technical report, TR-10-98, Harvard University.
W. Cavnar, J. Trenkle. (1994). N-Gram-Based Text Categorization. In Proceedings of SDAIR-94.
P. Domingos and M. Pazzani. (1997). Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier. Machine Learning, 29, 103–130
R. Duda and P. Hart. (1973). Pattern Classification and Scene Analysis. Wiley, NY.
S. Eyheramendy, D. Lewis and D. Madigan. (2003). On the Naive Bayes Model for Text Categorization. To appear in Artificial Intelligence & Statistics 2003.
N. Friedman, D. Geiger, and M. Goldszmidt. (1997). Bayesian Network Classifiers. In Machine Learning 29:131–163.
J. He, A. Tan, and C. Tan. (2000). A Comparative Study on Chinese Text Categorization Methods. In Proceedings of PRICAI’2000 International Workshop on Text and Web Mining, p24–35.
D. Hiemstra. (2001). Using Language Models for Information Retrieval. Ph.D. Thesis, Centre for Telematics and Information Technology, University of Twente.
E. Keogh and M. Pazzanni. (1999). Learning Augmented Bayesian Classifiers: A Comparison of Distribution-based and Classification-based Approaches. In Artificial Intelligence & Statistics 1999
K. Kwok. (1999). Employing Multiple Representations for Chinese Information Retrieval, JASIS, 50(8), 709–723.
D. Lewis. (1998). Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proceedings ECML-98.
C. Manning, and H. SchĂĽtze. (1999). Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, Massachusetts.
A. McCallum and K. Nigam. (1998). A Comparison of Event Models for Naive Bayes Text Classification. In Proceedings of AAAI-98 Workshop on “Learning for Text Categorization”, AAAI Presss.
H. Ney, U. Essen, and R. Kneser. (1994). On Structuring Probabilistic Dependencies in Stochastic Language Modeling. In Comput. Speech and Lang., 8(1), 1–28.
M. Pazzani and D. Billsus. (1997). Learning and Revising User Profiles: The identification of interesting web sites. Machine Learning, 27, 313–331.
J. Ponte, W. Croft. (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of SIGIR1998, 275–281.
J. Rennie. (2001). Improving Multi-class Text Classification with Naive Bayes. Master’s Thesis. M. I. T. AI Technical Report AITR-2001-004. 2001.
I. Rish. (2001). An Empirical Study of the Naive Bayes Classifier. In Proceedings of IJCAI-01 Workshop on Empirical Methods in Artificial Intelligence.
S. Robertson and K. Sparck Jones. (1976). Relevance Weighting of Search Terms. JASIS, 27, 129–146.
S. Scott and S. Matwin. (1999). Feature Engineering for Text Classification. In Proceedings of ICML’99, pp. 379–388.
F. Sebastiani. (2002). Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1):1–47, 2002.
E. Stamatatos, N. Fakotakis and G. Kokkinakis. (2000). Automatic Text Categorization in Terms of Genre and Author. Comput. Ling., 26(4), pp. 471–495.
W. Teahan and D. Harper. (2001). Using Compression-Based Language Models for Text Categorization. In Proceedings of Workshop on LMIR.
A. Turpin and A. Moffat. (1999). Statistical Phrases for Vector-Space Information Retrieval. Proceedings of SIGIR 1999, pp. 309–310.
Y. Yang. (1999). An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval, Vol. 1, No. 1/2, pp. 67–88.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Peng, F., Schuurmans, D. (2003). Combining Naive Bayes and n-Gram Language Models for Text Classification. In: Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36618-0_24
Download citation
DOI: https://doi.org/10.1007/3-540-36618-0_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01274-0
Online ISBN: 978-3-540-36618-8
eBook Packages: Springer Book Archive