On Effective E-mail Classification via Neural Networks

Cui, Bin; Mondal, Anirban; Shen, Jialie; Cong, Gao; Tan, Kian-Lee

doi:10.1007/11546924_9

Bin Cui¹⁹,
Anirban Mondal²⁰,
Jialie Shen²¹,
Gao Cong²² &
…
Kian-Lee Tan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3588))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1326 Accesses

Abstract

For addressing the growing problem of junk E-mail on the Internet, this paper proposes an effective E-mail classifying and cleansing method in this paper. Incidentally, E-mail messages can be modelled as semi-structured documents consisting of a set of fields with pre-defined semantics and a number of variable length free-text fields. Our proposed method deals with both fields having pre-defined semantics as well as variable length free-text fields for obtaining higher accuracy. The main contributions of this work are two-fold. First, we present a new model based on the Neural Network (NN) for classifying personal E-mails. In particular, we treat E-mail files as a particular kind of plain text files, the implication being that our feature set is relatively large (since there are thousands of different terms in different E-mail files). Second, we propose the use of Principal Component Analysis (PCA) as a preprocessor of NN to reduce the data in terms of both size as well as dimensionality so that the input data become more classifiable and faster for the convergence of the training process used in the NN model. The results of our performance evaluation demonstrate that the proposed algorithm is indeed effective in performing filtering with reasonable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic Text Classification Using Neural Network and Statistical Approaches

An E-mail Filtering Approach Using Classification Techniques

Exploring Algorithmic Paradigms in Message Classification: Insights from the Enron E-mail Dataset

References

Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: Proc. Recent Advances in Natural Language Processing (2001)
Google Scholar
Cohen, W.W.: Learning rules that classify e-mail. In: Proc. the AAAI Spring Symposium on Machine Learning in Information Access (1996)
Google Scholar
Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. In: Proc. SIGIR (1996)
Google Scholar
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the world wide web. In: Proc. the 15th National Conference on Artificial Intelligence (1998)
Google Scholar
Diao, Y.L., Lu, H.J., Wu, D.K.: A comparative study of classification based personal e-mail filtering. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805. Springer, Heidelberg (2000)
Chapter Google Scholar
Fawcett, T.: In vivo spam filtering: A challenge problem for data mining. KDD Explorations 5(2) (2003)
Google Scholar
Gee, K.R.: Using latent semantic indexing to filter spam. In: ACM Symposium on Applied Computing, Data Mining Track (2003)
Google Scholar
Haykin, S.: Neural networks: A comprehensive foundation. International Ed., 2nd edn. Prentice-Hall, Englewood Cliffs (1999)
Google Scholar
Ioannidis, J.: Fighting spam by encapsulating policy in email addresses. In: Proc. Network and Distributed Systems Security Conference, NDSS (2003)
Google Scholar
Jolliffe, I.T.: Principle Componet Analysis. Springer, Heidelberg (1986)
Google Scholar
Kung, S.Y.: Digital neural networks. Prentice-Hall, Englewood Cliffs (1993)
MATH Google Scholar
Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval (1994)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail. In: Proc. AAAI Workshop Learning for Text Categorization (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Singapore-MIT Alliance, National University of Singapore,
Bin Cui & Kian-Lee Tan
University of Tokyo, Japan
Anirban Mondal
University of New South Wales, Australia
Jialie Shen
The University of Edinburgh, UK
Gao Cong

Authors

Bin Cui
View author publications
You can also search for this author in PubMed Google Scholar
Anirban Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Jialie Shen
View author publications
You can also search for this author in PubMed Google Scholar
Gao Cong
View author publications
You can also search for this author in PubMed Google Scholar
Kian-Lee Tan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Copenhagen Business School, Centre for Applied ICT, 60 Howitzvej, 2000, Frederiksberg, DK
Kim Viborg Andersen
University Of Technology Sydney, NSW 2007, Australia
John Debenham
University of Linz, Altenbergerstraße 69, 4040, Linz, Austria
Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cui, B., Mondal, A., Shen, J., Cong, G., Tan, KL. (2005). On Effective E-mail Classification via Neural Networks. In: Andersen, K.V., Debenham, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2005. Lecture Notes in Computer Science, vol 3588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546924_9

Download citation

DOI: https://doi.org/10.1007/11546924_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28566-3
Online ISBN: 978-3-540-31729-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics