Article

Content based SMS spam filtering

Authors:

José María Gómez Hidalgo,

Guillermo Cajigas Bringas,

Enrique Puertas Sánz,

Francisco Carrero GarcíaAuthors Info & Claims

DocEng '06: Proceedings of the 2006 ACM symposium on Document engineering

Pages 107 - 114

https://doi.org/10.1145/1166160.1166191

Published: 10 October 2006 Publication History

Abstract

In the recent years, we have witnessed a dramatic increment in the volume of spam email. Other related forms of spam are increasingly revealing as a problem of importance, specially the spam on Instant Messaging services (the so called SPIM), and Short Message Service (SMS) or mobile spam.Like email spam, the SMS spam problem can be approached with legal, economic or technical measures. Among the wide range of technical measures, Bayesian filters are playing a key role in stopping email spam. In this paper, we analyze to what extent Bayesian filtering techniques used to block email spam, can be applied to the problem of detecting and stopping mobile spam. In particular, we have built two SMS spam test collections of significant size, in English and Spanish. We have tested on them a number of messages representation techniques and Machine Learning algorithms, in terms of effectiveness. Our results demonstrate that Bayesian filtering techniques can be effectively transferred from email to SMS spam.

References

[1]

Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D. An Evaluation of Naive Bayesian Anti-spam Filtering. Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML 2000), pp. 9--17, 2000.

[2]

Christine E. Drakeand Jonathan J. Oliver, Eugene J. Koontz. Anatomy of a Phishing Email. Proceedings of the First Conference on Email and Anti-spam (CEAS), 2004.

[3]

Graham, Paul. Better Bayesian Filtering. Proceedings of the 2003 Spam Conference, January 2003.

[4]

Gómez, J.M., Maña-López, M., Puertas, E. Combining Text and Heuristics for Cost-Sensitive spam Filtering. 4 One of the strengths of the ROCCH method is that it is able to detect that a specific classifier for a given cost may me optimal for other cost distributions, given that the class distribution affects also classifiers learning and performance. Proceedings of the Fourth Computational Natural Language Learning Workshop, CoNLL-2000, Association for Computational Linguistics, 2000.

Digital Library

[5]

Domingos, P. 1999. Metacost: A general method for making classifiers cost-sensitive. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining.

Digital Library

[6]

Drucker, H, Vapnik, V., Wu, D. Support Vector Machines for spam Categorization. IEEE Transactions on Neural Networks, 10(5), pp. 1048--1054, 1999.

Digital Library

[7]

Frank, E., I.H. Witten. 1998. Generating accurate rule sets without global optimization. Machine Learning: Proceedings of the Fifteenth International Conference.

Digital Library

[8]

Gómez, J.M. 2002. Evaluating cost-sensitive unsolicited bulk email categorization. Proceedings of the ACM Symposium on Applied Computing.

Digital Library

[9]

Joachims, T. 2001. A statistical learning model of text classification with support vector machines. En Proceedings of the 24th ACM International Conference on Research and Development in Information Retrieval. ACM Press.

Digital Library

[10]

Lewis, D.D. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. En Proceedings of the 10th European Conference on Machine Learning. Springer Verlag.

Digital Library

[11]

Provost, F., T. Fawcett. 2001. Robust classification for imprecise environments. Machine Learning Journal, 42(3):203--231.

Digital Library

[12]

Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.

Digital Library

[13]

Salton, G. 1989. Automatic text processing: the transformation, analysis and retrieval of information by computer. Addison-Wesley.

Digital Library

[14]

Sebastiani, F., 2002. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1--47.

Digital Library

[15]

Ting, K.M. 1998. Inducing cost-sensitive trees via instance weighting. En Proceedings of the 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, 139--147.

Digital Library

[16]

Witten, I.H., E. Frank. 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann.

Digital Library

[17]

Xiang,Y., Chowdhury, M., Ali, S. Filtering Mobile spam by Support Vector Machine. Proceedings of CSITeA-04, ISCA Press, December 27--29, 2004.

[18]

Yang, Y. 1999. An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1/2):69--90.

Digital Library

[19]

Yang, Y., J.O. Pedersen. 1997. A comparative study on feature selection in text categorization. En Proceedings of the 14th International Conference on Machine Learning.

Digital Library

[20]

Bratko, A, B. Filipic. Spam Filtering using Character-level Markov Models: Experiments for the TREC 2005 Spam Track. Proceedings of the 2005 Text Retrieval Conference, 2005.

[21]

Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.

[22]

Dwork, C., Goldberg A., Naor M. On memory-bound functions for fighting spam. In Proceedings of the 23rd Annual International Cryptology Conference (CRYPTO 2003), August 2003.

[23]

R.J. Hall. How to avoid unwanted email. Communications of the ACM, March 1998.

Digital Library

[24]

Golbeck, J., Hendler, J. Reputation network analysis for email filtering. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.

[25]

Tompkins T., Handley D. Giving e-mail back to the users: Using digital signatures to solve the spam problem. First Monday, 8(9), September 2003.

Cited By

Yu SKwon YKim MLee K(2024)Korean Voice Phishing Detection Applying NER With Key Tags and Sentence-Level N-GramIEEE Access10.1109/ACCESS.2024.338702712(52951-52962)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3387027
Salman MIkram MKaafar M(2024)Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning ModelsIEEE Access10.1109/ACCESS.2024.336467112(24306-24324)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3364671
Farek LBenaidja A(2024)An optimal feature selection method for text classification through redundancy and synergy analysisMultimedia Tools and Applications10.1007/s11042-024-19736-1Online publication date: 28-Jun-2024
https://doi.org/10.1007/s11042-024-19736-1
Show More Cited By

Index Terms

Content based SMS spam filtering
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Feature engineering for mobile (SMS) spam filtering
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Mobile spam in an increasing threat that may be addressed using filtering systems like those employed against email spam. We believe that email filtering techniques require some adaptation to reach good levels of performance on SMS spam, especially ...
Filtering spam with behavioral blacklisting
CCS '07: Proceedings of the 14th ACM conference on Computer and communications security

Spam filters often use the reputation of an IP address (or IP address range) to classify email senders. This approach worked well when most spam originated from senders with fixed IP addresses, but spam today is also sent from IP addresses for which ...
SMS Spam
ICETE 2014: Proceedings of the 11th International Joint Conference on e-Business and Telecommunications - Volume 4

Spam has been infesting our emails and Web experience for decades; distributing phishing scams, adult/dating scams, rogue security software, ransomware, money laundering and banking scams... the list goes on. Fortunately, in the last few years, user ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DocEng '06: Proceedings of the 2006 ACM symposium on Document engineering

October 2006

232 pages

ISBN:1595935150

DOI:10.1145/1166160

General Chair:
Dick Bulterman
CWI, Netherlands
,
Program Chair:
David F. Brailsford
University of Nottingham, UK

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

DocEng06

Sponsor:

DocEng06: ACM Symposium on Document Engineering

October 10 - 13, 2006

Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 178 of 537 submissions, 33%

Upcoming Conference

DocEng '24

Sponsor:
sigweb

ACM Symposium on Document Engineering 2024

August 20 - 23, 2024

San Jose , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

135
Total Citations
View Citations
2,247
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)4

Reflects downloads up to

Other Metrics

View Author Metrics

Citations

Cited By

Yu SKwon YKim MLee K(2024)Korean Voice Phishing Detection Applying NER With Key Tags and Sentence-Level N-GramIEEE Access10.1109/ACCESS.2024.338702712(52951-52962)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3387027
Salman MIkram MKaafar M(2024)Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning ModelsIEEE Access10.1109/ACCESS.2024.336467112(24306-24324)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3364671
Farek LBenaidja A(2024)An optimal feature selection method for text classification through redundancy and synergy analysisMultimedia Tools and Applications10.1007/s11042-024-19736-1Online publication date: 28-Jun-2024
https://doi.org/10.1007/s11042-024-19736-1
Lazhar FAmira B(2024)Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual informationKnowledge and Information Systems10.1007/s10115-024-02143-1Online publication date: 13-Jun-2024
https://doi.org/10.1007/s10115-024-02143-1
Tonolini FAletras NJiao YKazai GKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Robust weak supervision with variational auto-encodersProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619840(34394-34408)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619840
Lv SPan L(2023)BTSAMAInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.32735114:1(1-23)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.4018/IJACI.327351
Rimiru RGateri JKimwele M(2023)Rotational Invariance Using Gabor Convolution Neural Network and Color Space for Image ProcessingInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.32379814:1(1-11)Online publication date: 23-May-2023
https://dl.acm.org/doi/10.4018/IJACI.323798
Gaol FAlam PFranklyn MAngke KMatsuo T(2023)Traffic Light System With Embedded GPS (Global Positioning System) and GSM (Global System for Mobile Communications) ShieldInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.32319614:1(1-13)Online publication date: 12-May-2023
https://dl.acm.org/doi/10.4018/IJACI.323196
Manjramkar MJondhale K(2023)Cyber Security Using Machine Learning TechniquesProceedings of the International Conference on Applications of Machine Intelligence and Data Analytics (ICAMIDA 2022)10.2991/978-94-6463-136-4_59(680-701)Online publication date: 1-May-2023
https://doi.org/10.2991/978-94-6463-136-4_59
Ilhan Taskin ZYildirak KAladag C(2023)An enhanced random forest approach using CoClust clustering: MIMIC-III and SMS spam collection applicationJournal of Big Data10.1186/s40537-023-00720-910:1Online publication date: 30-Mar-2023
https://doi.org/10.1186/s40537-023-00720-9
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents