Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

On Effective E-mail Classification via Neural Networks

  • Conference paper
Database and Expert Systems Applications (DEXA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3588))

Included in the following conference series:

  • 1326 Accesses

Abstract

For addressing the growing problem of junk E-mail on the Internet, this paper proposes an effective E-mail classifying and cleansing method in this paper. Incidentally, E-mail messages can be modelled as semi-structured documents consisting of a set of fields with pre-defined semantics and a number of variable length free-text fields. Our proposed method deals with both fields having pre-defined semantics as well as variable length free-text fields for obtaining higher accuracy. The main contributions of this work are two-fold. First, we present a new model based on the Neural Network (NN) for classifying personal E-mails. In particular, we treat E-mail files as a particular kind of plain text files, the implication being that our feature set is relatively large (since there are thousands of different terms in different E-mail files). Second, we propose the use of Principal Component Analysis (PCA) as a preprocessor of NN to reduce the data in terms of both size as well as dimensionality so that the input data become more classifiable and faster for the convergence of the training process used in the NN model. The results of our performance evaluation demonstrate that the proposed algorithm is indeed effective in performing filtering with reasonable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: Proc. Recent Advances in Natural Language Processing (2001)

    Google Scholar 

  2. Cohen, W.W.: Learning rules that classify e-mail. In: Proc. the AAAI Spring Symposium on Machine Learning in Information Access (1996)

    Google Scholar 

  3. Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. In: Proc. SIGIR (1996)

    Google Scholar 

  4. Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the world wide web. In: Proc. the 15th National Conference on Artificial Intelligence (1998)

    Google Scholar 

  5. Diao, Y.L., Lu, H.J., Wu, D.K.: A comparative study of classification based personal e-mail filtering. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  6. Fawcett, T.: In vivo spam filtering: A challenge problem for data mining. KDD Explorations 5(2) (2003)

    Google Scholar 

  7. Gee, K.R.: Using latent semantic indexing to filter spam. In: ACM Symposium on Applied Computing, Data Mining Track (2003)

    Google Scholar 

  8. Haykin, S.: Neural networks: A comprehensive foundation. International Ed., 2nd edn. Prentice-Hall, Englewood Cliffs (1999)

    Google Scholar 

  9. Ioannidis, J.: Fighting spam by encapsulating policy in email addresses. In: Proc. Network and Distributed Systems Security Conference, NDSS (2003)

    Google Scholar 

  10. Jolliffe, I.T.: Principle Componet Analysis. Springer, Heidelberg (1986)

    Google Scholar 

  11. Kung, S.Y.: Digital neural networks. Prentice-Hall, Englewood Cliffs (1993)

    MATH  Google Scholar 

  12. Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval (1994)

    Google Scholar 

  13. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail. In: Proc. AAAI Workshop Learning for Text Categorization (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cui, B., Mondal, A., Shen, J., Cong, G., Tan, KL. (2005). On Effective E-mail Classification via Neural Networks. In: Andersen, K.V., Debenham, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2005. Lecture Notes in Computer Science, vol 3588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546924_9

Download citation

  • DOI: https://doi.org/10.1007/11546924_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28566-3

  • Online ISBN: 978-3-540-31729-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics