Abstract
Producing estimates of classification confidence is surprisingly difficult. One might expect that classifiers that can produce numeric classification scores (e.g. k-Nearest Neighbour, Naïve Bayes or Support Vector Machines) could readily produce confidence estimates based on thresholds. In fact, this proves not to be the case, probably because these are not probabilistic classifiers in the strict sense. The numeric scores coming from k-Nearest Neighbour, Naïve Bayes and Support Vector Machine classifiers are not well correlated with classification confidence. In this paper we describe a case-based spam filtering application that would benefit significantly from an ability to attach confidence predictions to positive classifications (i.e. messages classified as spam). We show that ‘obvious’ confidence metrics for a case-based classifier are not effective. We propose an ensemble-like solution that aggregates a collection of confidence metrics and show that this offers an effective solution in this spam filtering domain.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Fausett, L.: Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Prentice Hall, Englewood Cliffs (1993)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley Series in Probability and Statistics. Wiley, Chichester (2000)
Christianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Delany, S., Cunningham, P., Coyle, L.: An assessment of case-based reasoning for spam filtering. In: Artificial Intelligence Review (2005) (to appear)
Cheetham, W.: Case-based reasoning with confidence. In: Blanzieri, E., Portinale, L. (eds.) EWCBR 2000. LNCS (LNAI), vol. 1898, pp. 15–25. Springer, Heidelberg (2000)
Cheetham, W., Price, J.: Measures of solution accuracy in case-based reasoning systems. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 106–118. Springer, Heidelberg (2004)
Lenat, D., Davis, R., Doyle, J., Genesereth, M., Goldstein, I., Schrobe, H.: Reasoning about reasoning. In: Hayes-Roth, F., Waterman, D.A., Lenat, D.B. (eds.) Building Expert Systems, pp. 219–239. Addison-Wesley, London (1983)
Davis, R., Buchanan, B.: Meta level knowledge. In: Hayes-Roth, F., Waterman, D.A., Lenat, D.B. (eds.) Rule-Based Expert Systems, pp. 507–530. Addison-Wesley, London (1985)
Davis, R.: Expert systems: Where are we? and where do we go from here? AI Magazine 3, 3–22 (1982)
McLaren, B.M., Ashley, K.D.: Helping a cbr program know what it knows. In: Aha, D.W., Watson, I. (eds.) ICCBR 2001. LNCS (LNAI), vol. 2080, pp. 377–391. Springer, Heidelberg (2001)
Doyle, D., Cunningham, P., Bridge, D., Rahman, Y.: Explanation oriented retrieval. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 157–168. Springer, Heidelberg (2004)
Nugent, C., Cunningham, P.: A case-based explanation system for ’black-box’ systems. Artificial Intelligence Review (2005) (to appear)
McSherry, D.: Explaining the pros and cons of conclusions in cbr. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 317–330. Springer, Heidelberg (2004)
Massie, S., Craw, S., Wiratunga, N.: A visualisation tool to explain case-base reasoning solutions for tablet formulation. In: 24th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI 2004). LNCS, Springer, Heidelberg (2004)
Delany, S.J., Cunningham, P., Coyle, L.: An assessment of case-based reasoning for spam filtering. In: Procs. of 15th Irish Conference on Artificial Intelligence and Cognitive Science, pp. 9–18 (2004)
Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. In: Macintosh, A., Ellis, R., Allen, T. (eds.) Applications and Innovations in Intelligent Systems XII, Procs. of AI 2004, pp. 3–16. Springer, Heidelberg (2004)
Delany, S.J., Cunningham, P.: An analysis of case-based editing in a spam filtering system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 128–141. Springer, Heidelberg (2004)
Pantel, P., Lin, D.: Spamcop: A spam classification and organisation program. In: Procs of Workshop for Text Categorisation, AAAI 1998, pp. 95–98. AAAI Press, Menlo Park (1998)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk email. In: Procs of Workshop for Text Categorisation, AAAI 1998, pp. 55–62. AAAI Press, Menlo Park (1998)
Androutsopoulos, I., Koutsias, J., Chandrinos, G., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Potamias, G., Moustakis, V., van Someren, M. (eds.) Procs. of Workshop on Machine Learning in the New Information Age, ECML 2000, pp. 9–17 (2000)
Schneider, K.M.: A comparison of event models for näive bayes anti-spam e-mail filtering. In: 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pp. 307–314 (2003)
Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP) 3(4), 243–269 (2004)
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorisation. IEEE Transactions on Neural Networks 10, 1048–1055 (1999)
Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited commercial email. Technical Report 2004/02, NCSR ”Demokritos” (2000)
Kolcz, A., Alspector, J.: Svm-based filtering of email spam with content-specific misclassification costs. In: TextDM 2001 (IEEE ICDM-2001 Workshop on Text Mining), pp. 123–130. IEEE, Los Alamitos (2001)
Michelakis, E., Androutsopoulos, I., Paliouras, G., Sakkis, G., Stamatopoulos, P.: Filtron: A learning-based anti-spam filter. In: 1st Conference on Email and Anti-Spam, CEAS 2004 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Delany, S.J., Cunningham, P., Doyle, D., Zamolotskikh, A. (2005). Generating Estimates of Classification Confidence for a Case-Based Spam Filter. In: Muñoz-Ávila, H., Ricci, F. (eds) Case-Based Reasoning Research and Development. ICCBR 2005. Lecture Notes in Computer Science(), vol 3620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11536406_16
Download citation
DOI: https://doi.org/10.1007/11536406_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28174-0
Online ISBN: 978-3-540-31855-2
eBook Packages: Computer ScienceComputer Science (R0)