An Assessment of Case-Based Reasoning for Spam Filtering

Delany, Sarah Jane; Cunningham, Pádraig; Coyle, Lorcan

doi:10.1007/s10462-005-9006-6

An Assessment of Case-Based Reasoning for Spam Filtering

Published: 17 November 2005

Volume 24, pages 359–378, (2005)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Sarah Jane Delany¹,
Pádraig Cunningham¹ &
Lorcan Coyle³

228 Accesses
35 Citations
Explore all metrics

Abstract

Because of the changing nature of spam, a spam filtering system that uses machine learning will need to be dynamic. This suggests that a case-based (memory-based) approach may work well. Case-Based Reasoning (CBR) is a lazy approach to machine learning where induction is delayed to run time. This means that the case base can be updated continuously and new training data is immediately available to the induction process. In this paper we present a detailed description of such a system called ECUE and evaluate design decisions concerning the case representation. We compare its performance with an alternative system that uses Naïve Bayes. We find that there is little to choose between the two alternatives in cross-validation tests on data sets. However, ECUE does appear to have some advantages in tracking concept drift over time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spam E-Mail Classification Based on the IFWB Algorithm

An Empirical Study of a Simple Naive Bayes Classifier Based on Ranking Functions

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Androutsopoulos I, Koutsias J, Chandrinos G, Paliouras, G., Spyropoulos, C. (2000a). ‘An Evaluation of Naive Bayesian Anti-Spam Filtering’. In Potamias, G. Moustakis V. and van Someren M. (eds.) Proc. of Workshop on Machine Learning in the New Information Age, ECML 2000, 9–17
Androutsopoulos I, Koutsias J, Paliouras G, Karkaletsis, V. Sakkis, G., Spyropoulos, C. (2000b). Learning to Filter Spam E-Mail: A comparison of a naive Bayesian and a memory based approach. In Zaragoza H, Gallinari, P. and Rajman M. (eds.) Procs of Workshop on Machine Learning and Textual Information Access, PKDD 2000, 1–13
Androutsopoulos I, Paliouras, G., Michelakis, E. (2000c). Learning to Filter Unsolicited Commercial E-Mail. Technical Report 2004/02, NCSR “Demokritos”.
Bradley A. (1997). The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition 30: 1145–1150
Article Google Scholar
Brighton H., Mellish C. (2002). Advances in Instance Selection for Instance-Based Learning Algorithms. Data Mining and Knowledge Discovery 62: 153–172
Article MathSciNet Google Scholar
Ceglowski M, Coburn, A., Cuadrado, J. (2003). Semantic Search of Unstructured Data using Contextual Network Graphs
Cunningham P, Nowlan N, Delany, S., Haahr, M. (2003). A Case-Based approach to Spam Filtering that can track Concept Drift. In ICCBR 2003 Workshop on Long-Lived CBR Systems.
Delany, S. J., Cunningham, P. (2004). An Analysis of Case-Based Editing in a Spam Filtering System In Funk P., González-Calero P.(eds.) 7th European Conference on Case-Based Reasoning (ECCBR 2004), Vol. 3155 of LNAI. 128–141, Springer
Dietterich D.T. (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computing 10: 1895–1923
Article Google Scholar
Drucker H, Wu D., Vapnik V. (1999). Support Vector Machines for Spam Categorisation. IEEE Transactions on Neural Networks 10(5): 1048–1055
Article Google Scholar
Gee K. R. (2003). Using Latent Semantic Indexing to Filter Spam. In SAC ’03: Proceedings of the 2003 ACM symposium on Applied computing, 460–464, ACM Press
Kohavi R, Becker, B., Sommerfield, D. (1997). Improving Simple Bayes, In Proceedings of the 9th European Conference on Machine Learning (ECML 97). Springer Verlag
Lenz M, Auriol, E., Manago M. (1998). Diagnosis and Decision Support. In Lenz M, B. Bartsch-Spörl, Burkhard, H., Wess, S. (eds.) Case-Based Reasoning Technology From Foundations to Applications pp. 51–90, Springer-Verlag
Lewis, D., Ringuette M. (1994). Comparison of Two Learning Algorithms for Text Categorisation. In Procs of 3rd Annual Symposium on Document Analysis and Information Retrieval (SDAIR 94), 81–93
McKenna, E., Smyth, B. (2000). Competence-Guided Editing Methods for Lazy Learning. In Horn W. (ed.) ECAI 2000, Proceedings of the 14th European Conference on Artificial Intelligence 60–64, IOS Press
Niblett, T. (1987). Constructing Decision Trees in Noisy Domains. In Bratko I., Lavrac N. (eds.) Progress in Machine Learning, Procs of 2nd European Working Session on Learning (EWSL 87). 67–78, Sigma Press
Pantel, P., Lin, D. (1988). ‘SpamCop: A spam classification and organisation program’. In: Procs of Workshop for Text Categorisation, AAAI-98, 95–98
Quinlan J.R. (1997). C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Mateo, CA
Google Scholar
Sahami M, Dumais S, Heckerman, D., Horvitz, E. (1998). A Bayesian Approach to Filtering Junk E-mail. In Procs of Workshop for Text Categorisation AAAI-98, 55–62
Sakkis G., Androutsopoulos I, Paliouras G, Karkaletsis V, Spyropoulos C.D., Stamatopoulos P. (2003). A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. Information Retrieval 6(1): 49–73
Article Google Scholar
USPatent: 2000. United States Patent 6, 161, 130
Google Scholar
Wilson, D., Martinez, T. (1997). Instance Pruning Techniques. In ICML ’97: Proceedings of the Fourteenth International Conference on Machine Learning. 403–411, Morgan Kaufmann Publishers Inc
Yang, Y., Pedersen, J. (1997). A Comparative Study on Feature Selection in Text Categorization. In ICML ’97: Proceedings of the 14th International Conference on Machine Learning, 412–420. Morgan Kaufmann Publishers Inc

Download references

Author information

Authors and Affiliations

Dublin Institute of Technology, Kevin Street, Dublin 8, Ireland
Sarah Jane Delany & Pádraig Cunningham
University College Dublin, Belfield, Dublin 4, Ireland
Lorcan Coyle

Authors

Sarah Jane Delany
View author publications
You can also search for this author in PubMed Google Scholar
Pádraig Cunningham
View author publications
You can also search for this author in PubMed Google Scholar
Lorcan Coyle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarah Jane Delany.

Additional information

★ This research was supported by funding from Enterprise Ireland under grant no. CFTD/03/219 and funding from Science Foundation Ireland under grant no. SFI-02IN.1I111

Rights and permissions

Reprints and permissions

About this article

Cite this article

Delany, S.J., Cunningham, P. & Coyle, L. An Assessment of Case-Based Reasoning for Spam Filtering. Artif Intell Rev 24, 359–378 (2005). https://doi.org/10.1007/s10462-005-9006-6

Download citation

Published: 17 November 2005
Issue Date: November 2005
DOI: https://doi.org/10.1007/s10462-005-9006-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Assessment of Case-Based Reasoning for Spam Filtering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Spam E-Mail Classification Based on the IFWB Algorithm

An Empirical Study of a Simple Naive Bayes Classifier Based on Ranking Functions

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An Assessment of Case-Based Reasoning for Spam Filtering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Spam E-Mail Classification Based on the IFWB Algorithm

An Empirical Study of a Simple Naive Bayes Classifier Based on Ranking Functions

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation