Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Generating Estimates of Classification Confidence for a Case-Based Spam Filter

  • Conference paper
Case-Based Reasoning Research and Development (ICCBR 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3620))

Included in the following conference series:

Abstract

Producing estimates of classification confidence is surprisingly difficult. One might expect that classifiers that can produce numeric classification scores (e.g. k-Nearest Neighbour, Naïve Bayes or Support Vector Machines) could readily produce confidence estimates based on thresholds. In fact, this proves not to be the case, probably because these are not probabilistic classifiers in the strict sense. The numeric scores coming from k-Nearest Neighbour, Naïve Bayes and Support Vector Machine classifiers are not well correlated with classification confidence. In this paper we describe a case-based spam filtering application that would benefit significantly from an ability to attach confidence predictions to positive classifications (i.e. messages classified as spam). We show that ‘obvious’ confidence metrics for a case-based classifier are not effective. We propose an ensemble-like solution that aggregates a collection of confidence metrics and show that this offers an effective solution in this spam filtering domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  2. Fausett, L.: Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Prentice Hall, Englewood Cliffs (1993)

    Google Scholar 

  3. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley Series in Probability and Statistics. Wiley, Chichester (2000)

    Book  MATH  Google Scholar 

  4. Christianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  5. Delany, S., Cunningham, P., Coyle, L.: An assessment of case-based reasoning for spam filtering. In: Artificial Intelligence Review (2005) (to appear)

    Google Scholar 

  6. Cheetham, W.: Case-based reasoning with confidence. In: Blanzieri, E., Portinale, L. (eds.) EWCBR 2000. LNCS (LNAI), vol. 1898, pp. 15–25. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  7. Cheetham, W., Price, J.: Measures of solution accuracy in case-based reasoning systems. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 106–118. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Lenat, D., Davis, R., Doyle, J., Genesereth, M., Goldstein, I., Schrobe, H.: Reasoning about reasoning. In: Hayes-Roth, F., Waterman, D.A., Lenat, D.B. (eds.) Building Expert Systems, pp. 219–239. Addison-Wesley, London (1983)

    Google Scholar 

  9. Davis, R., Buchanan, B.: Meta level knowledge. In: Hayes-Roth, F., Waterman, D.A., Lenat, D.B. (eds.) Rule-Based Expert Systems, pp. 507–530. Addison-Wesley, London (1985)

    Google Scholar 

  10. Davis, R.: Expert systems: Where are we? and where do we go from here? AI Magazine 3, 3–22 (1982)

    Google Scholar 

  11. McLaren, B.M., Ashley, K.D.: Helping a cbr program know what it knows. In: Aha, D.W., Watson, I. (eds.) ICCBR 2001. LNCS (LNAI), vol. 2080, pp. 377–391. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  12. Doyle, D., Cunningham, P., Bridge, D., Rahman, Y.: Explanation oriented retrieval. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 157–168. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Nugent, C., Cunningham, P.: A case-based explanation system for ’black-box’ systems. Artificial Intelligence Review (2005) (to appear)

    Google Scholar 

  14. McSherry, D.: Explaining the pros and cons of conclusions in cbr. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 317–330. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  15. Massie, S., Craw, S., Wiratunga, N.: A visualisation tool to explain case-base reasoning solutions for tablet formulation. In: 24th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI 2004). LNCS, Springer, Heidelberg (2004)

    Google Scholar 

  16. Delany, S.J., Cunningham, P., Coyle, L.: An assessment of case-based reasoning for spam filtering. In: Procs. of 15th Irish Conference on Artificial Intelligence and Cognitive Science, pp. 9–18 (2004)

    Google Scholar 

  17. Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. In: Macintosh, A., Ellis, R., Allen, T. (eds.) Applications and Innovations in Intelligent Systems XII, Procs. of AI 2004, pp. 3–16. Springer, Heidelberg (2004)

    Google Scholar 

  18. Delany, S.J., Cunningham, P.: An analysis of case-based editing in a spam filtering system. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 128–141. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Pantel, P., Lin, D.: Spamcop: A spam classification and organisation program. In: Procs of Workshop for Text Categorisation, AAAI 1998, pp. 95–98. AAAI Press, Menlo Park (1998)

    Google Scholar 

  20. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk email. In: Procs of Workshop for Text Categorisation, AAAI 1998, pp. 55–62. AAAI Press, Menlo Park (1998)

    Google Scholar 

  21. Androutsopoulos, I., Koutsias, J., Chandrinos, G., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Potamias, G., Moustakis, V., van Someren, M. (eds.) Procs. of Workshop on Machine Learning in the New Information Age, ECML 2000, pp. 9–17 (2000)

    Google Scholar 

  22. Schneider, K.M.: A comparison of event models for näive bayes anti-spam e-mail filtering. In: 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pp. 307–314 (2003)

    Google Scholar 

  23. Zhang, L., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing (TALIP) 3(4), 243–269 (2004)

    Article  Google Scholar 

  24. Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorisation. IEEE Transactions on Neural Networks 10, 1048–1055 (1999)

    Article  Google Scholar 

  25. Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited commercial email. Technical Report 2004/02, NCSR ”Demokritos” (2000)

    Google Scholar 

  26. Kolcz, A., Alspector, J.: Svm-based filtering of email spam with content-specific misclassification costs. In: TextDM 2001 (IEEE ICDM-2001 Workshop on Text Mining), pp. 123–130. IEEE, Los Alamitos (2001)

    Google Scholar 

  27. Michelakis, E., Androutsopoulos, I., Paliouras, G., Sakkis, G., Stamatopoulos, P.: Filtron: A learning-based anti-spam filter. In: 1st Conference on Email and Anti-Spam, CEAS 2004 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Delany, S.J., Cunningham, P., Doyle, D., Zamolotskikh, A. (2005). Generating Estimates of Classification Confidence for a Case-Based Spam Filter. In: Muñoz-Ávila, H., Ricci, F. (eds) Case-Based Reasoning Research and Development. ICCBR 2005. Lecture Notes in Computer Science(), vol 3620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11536406_16

Download citation

  • DOI: https://doi.org/10.1007/11536406_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28174-0

  • Online ISBN: 978-3-540-31855-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics