Generating Estimates of Classification Confidence for a Case-Based Spam Filter

Delany, Sarah Jane; Cunningham, Pádraig; Doyle, Dónal; Zamolotskikh, Anton

doi:10.1007/11536406_16

Sarah Jane Delany²⁰,
Pádraig Cunningham²¹,
Dónal Doyle²¹ &
…
Anton Zamolotskikh²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3620))

Included in the following conference series:

International Conference on Case-Based Reasoning

1317 Accesses
32 Citations

Abstract

Producing estimates of classification confidence is surprisingly difficult. One might expect that classifiers that can produce numeric classification scores (e.g. k-Nearest Neighbour, Naïve Bayes or Support Vector Machines) could readily produce confidence estimates based on thresholds. In fact, this proves not to be the case, probably because these are not probabilistic classifiers in the strict sense. The numeric scores coming from k-Nearest Neighbour, Naïve Bayes and Support Vector Machine classifiers are not well correlated with classification confidence. In this paper we describe a case-based spam filtering application that would benefit significantly from an ability to attach confidence predictions to positive classifications (i.e. messages classified as spam). We show that ‘obvious’ confidence metrics for a case-based classifier are not effective. We propose an ensemble-like solution that aggregates a collection of confidence metrics and show that this offers an effective solution in this spam filtering domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Combined Bayesian Classifiers Applied to Spam Filtering Problem

Analyzing the Performance Variations of Naive Bayes, Linear SVM, and Random Forest for Spam Detection: A Comprehensive Study on the &Quot; Spam or Ham" Dataset

Email Guard: Enhancing Security Through Spam Detection

References

Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Fausett, L.: Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Prentice Hall, Englewood Cliffs (1993)
Google Scholar
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley Series in Probability and Statistics. Wiley, Chichester (2000)
Book MATH Google Scholar
Christianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Google Scholar
Delany, S., Cunningham, P., Coyle, L.: An assessment of case-based reasoning for spam filtering. In: Artificial Intelligence Review (2005) (to appear)
Google Scholar
Cheetham, W.: Case-based reasoning with confidence. In: Blanzieri, E., Portinale, L. (eds.) EWCBR 2000. LNCS (LNAI), vol. 1898, pp. 15–25. Springer, Heidelberg (2000)
Chapter Google Scholar
Cheetham, W., Price, J.: Measures of solution accuracy in case-based reasoning systems. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 106–118. Springer, Heidelberg (2004)
Chapter Google Scholar
Lenat, D., Davis, R., Doyle, J., Genesereth, M., Goldstein, I., Schrobe, H.: Reasoning about reasoning. In: Hayes-Roth, F., Waterman, D.A., Lenat, D.B. (eds.) Building Expert Systems, pp. 219–239. Addison-Wesley, London (1983)
Google Scholar
Davis, R., Buchanan, B.: Meta level knowledge. In: Hayes-Roth, F., Waterman, D.A., Lenat, D.B. (eds.) Rule-Based Expert Systems, pp. 507–530. Addison-Wesley, London (1985)
Google Scholar
Davis, R.: Expert systems: Where are we? and where do we go from here? AI Magazine 3, 3–22 (1982)
Google Scholar
McLaren, B.M., Ashley, K.D.: Helping a cbr program know what it knows. In: Aha, D.W., Watson, I. (eds.) ICCBR 2001. LNCS (LNAI), vol. 2080, pp. 377–391. Springer, Heidelberg (2001)
Chapter Google Scholar
Doyle, D., Cunningham, P., Bridge, D., Rahman, Y.: Explanation oriented retrieval. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 157–168. Springer, Heidelberg (2004)
Chapter Google Scholar
Nugent, C., Cunningham, P.: A case-based explanation system for ’black-box’ systems. Artificial Intelligence Review (2005) (to appear)
Google Scholar
McSherry, D.: Explaining the pros and cons of conclusions in cbr. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 317–330. Springer, Heidelberg (2004)
Chapter Google Scholar
Massie, S., Craw, S., Wiratunga, N.: A visualisation tool to explain case-base reasoning solutions for tablet formulation. In: 24th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI 2004). LNCS, Springer, Heidelberg (2004)
Google Scholar
Delany, S.J., Cunningham, P., Coyle, L.: An assessment of case-based reasoning for spam filtering. In: Procs. of 15th Irish Conference on Artificial Intelligence and Cognitive Science, pp. 9–18 (2004)
Google Scholar
Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. In: Macintosh, A., Ellis, R., Allen, T. (eds.) Applications and Innovations in Intelligent Systems XII, Procs. of AI 2004, pp. 3–16. Springer, Heidelberg (2004)
Google Scholar
Delany, S.J., Cunningham, P.: An analysis of case-based editing in a spam filtering system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 128–141. Springer, Heidelberg (2004)
Chapter Google Scholar
Pantel, P., Lin, D.: Spamcop: A spam classification and organisation program. In: Procs of Workshop for Text Categorisation, AAAI 1998, pp. 95–98. AAAI Press, Menlo Park (1998)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk email. In: Procs of Workshop for Text Categorisation, AAAI 1998, pp. 55–62. AAAI Press, Menlo Park (1998)
Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, G., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Potamias, G., Moustakis, V., van Someren, M. (eds.) Procs. of Workshop on Machine Learning in the New Information Age, ECML 2000, pp. 9–17 (2000)
Google Scholar
Schneider, K.M.: A comparison of event models for näive bayes anti-spam e-mail filtering. In: 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pp. 307–314 (2003)
Google Scholar
Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP) 3(4), 243–269 (2004)
Article Google Scholar
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorisation. IEEE Transactions on Neural Networks 10, 1048–1055 (1999)
Article Google Scholar
Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited commercial email. Technical Report 2004/02, NCSR ”Demokritos” (2000)
Google Scholar
Kolcz, A., Alspector, J.: Svm-based filtering of email spam with content-specific misclassification costs. In: TextDM 2001 (IEEE ICDM-2001 Workshop on Text Mining), pp. 123–130. IEEE, Los Alamitos (2001)
Google Scholar
Michelakis, E., Androutsopoulos, I., Paliouras, G., Sakkis, G., Stamatopoulos, P.: Filtron: A learning-based anti-spam filter. In: 1st Conference on Email and Anti-Spam, CEAS 2004 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Dublin Institute of Technology, Kevin Street, Dublin 8, Ireland
Sarah Jane Delany
Trinity College, University of Dublin, Dublin 2, Ireland
Pádraig Cunningham, Dónal Doyle & Anton Zamolotskikh

Authors

Sarah Jane Delany
View author publications
You can also search for this author in PubMed Google Scholar
Pádraig Cunningham
View author publications
You can also search for this author in PubMed Google Scholar
Dónal Doyle
View author publications
You can also search for this author in PubMed Google Scholar
Anton Zamolotskikh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science & Engineering, Lehigh University, PA 18015, Bethlehem,
Héctor Muñoz-Ávila
Free University of Bozen-Bolzano,
Francesco Ricci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Delany, S.J., Cunningham, P., Doyle, D., Zamolotskikh, A. (2005). Generating Estimates of Classification Confidence for a Case-Based Spam Filter. In: Muñoz-Ávila, H., Ricci, F. (eds) Case-Based Reasoning Research and Development. ICCBR 2005. Lecture Notes in Computer Science(), vol 3620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11536406_16

Download citation

DOI: https://doi.org/10.1007/11536406_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28174-0
Online ISBN: 978-3-540-31855-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Generating Estimates of Classification Confidence for a Case-Based Spam Filter

Abstract

Access this chapter

Preview

Similar content being viewed by others

Combined Bayesian Classifiers Applied to Spam Filtering Problem

Analyzing the Performance Variations of Naive Bayes, Linear SVM, and Random Forest for Spam Detection: A Comprehensive Study on the &Quot; Spam or Ham" Dataset

Email Guard: Enhancing Security Through Spam Detection

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Generating Estimates of Classification Confidence for a Case-Based Spam Filter

Abstract

Access this chapter

Preview

Similar content being viewed by others

Combined Bayesian Classifiers Applied to Spam Filtering Problem

Analyzing the Performance Variations of Naive Bayes, Linear SVM, and Random Forest for Spam Detection: A Comprehensive Study on the &Quot; Spam or Ham" Dataset

Email Guard: Enhancing Security Through Spam Detection

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation