Abstract
In this paper, we propose an approach for email spam detection based on text semantic analysis at two levels. The first level allows categorization of emails by specific domains (e.g., health, education, finance, etc.). The second level uses semantic features for spam detection in each specific domain. We show that the proposed method provides an efficient representation of internal semantic structure of email content which allows for more precise and interpretable spam filtering results compared to existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bratko, A., Cormack, G.V., et al.: Spam filtering using statistical data compression models. J. Mach. Learn. Res. 7, 2673–2698 (2006)
Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. (CSUR) 44(2), 1–27 (2012)
Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991). doi:10.1007/BFb0017011
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2007)
Çiltik, A., Güngör, T.: Time-efficient spam e-mail filtering using n-gram models. Pattern Recogn. Lett. 29(1), 19–33 (2008)
Gudkova, D., Vergelis, M., et al.: Spam and phishing in Q2 2016, pp. 1–22. Kaspersky Lab (2016)
Gudkova, D., Vergelis, M., Demidova, N.: Spam and phishing in Q2 2015, pp. 1–19. Kaspersky Lab (2015)
Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)
Herrera, F., Carmona del Jesus, C.J., et al.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2010). Published online first
Laorden, C., Santos, I., et al.: Word sense disambiguation for spam filtering. Electron. Commer. Res. Appl. 11(3), 290–298 (2012)
Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)
Renuka, D.K., Hamsapriya, T., et al.: Spam classification based on supervised learning using machine learning techniques. In: International Conference on Process Automation, Control and Computing (PACC), pp. 1–7. IEEE (2011)
Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced topic-based vector space model for semantics aware spam filtering. Expert Syst. Appl. 39(1), 437–444 (2012)
Symantec. Internet Security Threat Report, vol. 21, pp. 1–77, April 2016
Tang, G., Pei, J., Luk, W.S.: Email mining: tasks, common techniques, and tools. Knowl. Inf. Syst. 41(1), 1–31 (2014)
Torabi, Z.S., Nadimi-Shahraki, M.H., et al.: Efficient support vector machines for spam detection: a survey. Int. J. Comput. Sci. Inf. Secur. 13(1), 11 (2015)
Wang, H., Zheng, G., He, Y.: The improved bayesian algorithm to spam filtering. In: Wong, W.E. (ed.) Proceedings of the 4th International Conference on Computer Engineering and Networks, pp. 37–44. Springer, Cham (2015). doi:10.1007/978-3-319-11104-9_5
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Saidani, N., Adi, K., Allili, M.S. (2017). A Supervised Approach for Spam Detection Using Text-Based Semantic Representation. In: Aïmeur, E., Ruhi, U., Weiss, M. (eds) E-Technologies: Embracing the Internet of Things . MCETECH 2017. Lecture Notes in Business Information Processing, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-319-59041-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-59041-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59040-0
Online ISBN: 978-3-319-59041-7
eBook Packages: Computer ScienceComputer Science (R0)