research-article

Mining spam email to identify common origins for forensic application

Authors:

Chun Wei,

Alan Sprague,

Gary Warner,

Anthony SkjellumAuthors Info & Claims

SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

Pages 1433 - 1437

https://doi.org/10.1145/1363686.1364019

Published: 16 March 2008 Publication History

Get Access

Abstract

In recent years, spam email has become a major tool for criminals to conduct illegal business on the Internet. Therefore, in this paper we describe a new research approach that uses data mining techniques to study spam emails with the focus on law enforcement forensic analysis. After we retrieve useful attributes from spam emails, we use a connected components clustering algorithm to form relationships between messages. These initial clusters are then refined by using a weighted edges model where membership in the cluster requires the weight to exceed a chosen threshold. The results of the cluster membership are validated by WHOIS data, by the IP address of the computer hosting the advertised sites, and through comparison of graphical images of website fetches. This technique has been successful in identifying relationships between spam campaigns that were not identified by human researchers, enabling additional data to be brought into a single investigation.

References

[1]

Airoldi, E. and Malin, B. ScamSlam: An Architecture for Learning the Criminal Relations Behind Scam Spam. Carnegie Mellon University, School of Computer Science, Technical Report CMU-ISRI-04-121. Pittsburgh: May 2004.

Google Scholar

[2]

Baase, S. Computer Algorithms: Introduction to Design and Analysis. (2^nd ed.). Addison-Wesley, 1988.

Digital Library

Google Scholar

[3]

Clark, J., Koprinska, I. and Poon, J. A neural network based approach to automated e-mail classification. In Proceedings of IEEE/WIC International Conference on Web Intelligence, 13, 17, (Oct. 2003), 702--705.

Digital Library

Google Scholar

[4]

Drucker, H., Wu, D. and Vapnik, V. N. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10, 5, (Sep 1999), 1048--1054.

Digital Library

Google Scholar

[5]

Han, J. and Kamber, M. Data Mining: Concepts and Techniques. (2^nd ed.). Morgan Kaufmann, San Francisco, CA, 2006.

Digital Library

Google Scholar

[6]

Jung, J. and Sit, E. An empirical study of spam traffic and the use of DNS black lists. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement. (Oct. 2004) 370--375.

Digital Library

Google Scholar

[7]

Sahami, M., Dumais S., Heckerman, D. and Horvitz, E. A Bayesian approach to filtering junk email. AAAI Workshop on Learning for Text Categorization, AAAI Technical Report WS-98-05. Madison, Wisconsin. July 1998. 55--62.

Google Scholar

[8]

Sanpakdee, U., Walairacht, A. and Walairacht, S. Adaptive spam mail filtering using genetic algorithm. In Proceedings of the 8th International Conference on Advanced Communication Technology. (Feb. 2006). 441--445.

Google Scholar

[9]

Soucy. P and Mineau, G. W. A simple KNN algorithm for text categorization. In Proceedings of 2001 IEEE International Conference on Data Mining, (Nov - Dec 2001) 647--648.

Digital Library

Google Scholar

[10]

Stolfo, S. Email Mining Toolkit Supporting Law Enforcement Forensic Analyses. NSF Final Report. DG.o 2005 Atlanta, GA. May 2005.

Google Scholar

[11]

Vel, O. D., Anderson, A., Corney, M. and Mohay, G. Mining Email Content for Author Identification Forensics. SIGMOD: Special Section on Data Mining for Intrusion Detection and Threat Analysis, 30, 4, (Dec. 2001) 55--64.

Digital Library

Google Scholar

[12]

Yang, Y. and Liu, X. A Re-examination of text categorization methods. In Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (Aug. 1999). 42--49.

Digital Library

Google Scholar

[13]

Zhao, W. and Zhang, Z. An email classification model based on rough set theory. In Proceedings of the 2005 International Conference on Active Media Technology. (May 2005). 403--40.

Google Scholar

Cited By

View all

Che JJamshidi KVora K(2024)Contigra: Graph Mining with Containment ConstraintsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629589(50-65)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629589
Dhaka DMehrotra M(2023)A Comprehensive Study to Detect Social Spam CampaignsAdvanced Applications of NLP and Deep Learning in Social Media Data10.4018/978-1-6684-6909-5.ch001(1-18)Online publication date: 9-Jun-2023
https://doi.org/10.4018/978-1-6684-6909-5.ch001
Saka TVaniea KKökciyan NDemontis AChen XTramèr F(2022)Context-Based Clustering to Mitigate Phishing AttacksProceedings of the 15th ACM Workshop on Artificial Intelligence and Security10.1145/3560830.3563728(115-126)Online publication date: 11-Nov-2022
https://dl.acm.org/doi/10.1145/3560830.3563728
Show More Cited By

Index Terms

Mining spam email to identify common origins for forensic application
1. Information systems
  1. World Wide Web
    1. Web applications
      1. Internet communications tools
2. Social and professional topics
  1. Computing / technology policy

Recommendations

Filtering spam with behavioral blacklisting
CCS '07: Proceedings of the 14th ACM conference on Computer and communications security

Spam filters often use the reputation of an IP address (or IP address range) to classify email senders. This approach worked well when most spam originated from senders with fixed IP addresses, but spam today is also sent from IP addresses for which ...
Detection of networks blocks used by the Storm Worm botnet
ACMSE '08: Proceedings of the 46th annual ACM Southeast Conference

Storm Worm is a prolific web-spread Trojan virus that infects computers and turns them into nodes (called bots) of a botnet. The bots then can be used to distribute spam messages, launch DOS attacks, host phishing web sites, etc. This paper investigated ...
Clustering malware-generated spam emails with a novel fuzzy string matching algorithm
SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing

In this paper, a fuzzy-matching clustering algorithm is introduced to group subjects found in spam emails which are generated by malware. A modified scoring strategy is applied in dynamic programming to find subjects that are similar to each other. A ...

Comments

Information & Contributors

Information

Published In

SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

March 2008

2586 pages

ISBN:9781595937537

DOI:10.1145/1363686

Conference Chairs:
Roger L. Wainwright
University of Tulsa
,
Hisham M. Haddad
Kennesaw State University

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SAC '08

Sponsor:

SIGAPP

SAC '08: The 2008 ACM Symposium on Applied Computing

March 16 - 20, 2008

Fortaleza, Ceara, Brazil

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
981
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Che JJamshidi KVora K(2024)Contigra: Graph Mining with Containment ConstraintsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629589(50-65)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629589
Dhaka DMehrotra M(2023)A Comprehensive Study to Detect Social Spam CampaignsAdvanced Applications of NLP and Deep Learning in Social Media Data10.4018/978-1-6684-6909-5.ch001(1-18)Online publication date: 9-Jun-2023
https://doi.org/10.4018/978-1-6684-6909-5.ch001
Saka TVaniea KKökciyan NDemontis AChen XTramèr F(2022)Context-Based Clustering to Mitigate Phishing AttacksProceedings of the 15th ACM Workshop on Artificial Intelligence and Security10.1145/3560830.3563728(115-126)Online publication date: 11-Nov-2022
https://dl.acm.org/doi/10.1145/3560830.3563728
Guo GYan DYuan LKhalil JLong CJiang ZZhou Y(2022)Maximal Directed Quasi -Clique Mining2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00188(1900-1913)Online publication date: May-2022
https://doi.org/10.1109/ICDE53745.2022.00188
Guo GYan DÖzsu MJiang ZKhalil J(2021)Scalable mining of maximal quasi-cliquesProceedings of the VLDB Endowment10.14778/3436905.343691614:4(573-585)Online publication date: 22-Feb-2021
https://dl.acm.org/doi/10.14778/3436905.3436916
Saini JNaik S(2020)Quantification of Multimillion Offers in ‘Next-of-Kin’ Unsolicited Bulk EmailsICT Analysis and Applications10.1007/978-981-15-8354-4_45(453-465)Online publication date: 16-Dec-2020
https://doi.org/10.1007/978-981-15-8354-4_45
Miró-Llinares FMoneva A(2020)Environmental Criminology and Cybercrime: Shifting Focus from the Wine to the BottlesThe Palgrave Handbook of International Cybercrime and Cyberdeviance10.1007/978-3-319-78440-3_30(491-511)Online publication date: 6-Jun-2020
https://doi.org/10.1007/978-3-319-78440-3_30
Sheikhalishahi MSaracino AMartinelli FLa Marra AMejri MTawbi N(2019)Digital Waste Disposal: an automated framework for analysis of spam emailsInternational Journal of Information Security10.1007/s10207-019-00470-x19:5(499-522)Online publication date: 25-Sep-2019
https://doi.org/10.1007/s10207-019-00470-x
Miró-Llinares FMoneva A(2019)Environmental Criminology and Cybercrime: Shifting Focus from the Wine to the BottlesThe Palgrave Handbook of International Cybercrime and Cyberdeviance10.1007/978-3-319-90307-1_30-1(1-22)Online publication date: 14-Jun-2019
https://doi.org/10.1007/978-3-319-90307-1_30-1
Ishihara S(2017)Strength of linguistic text evidence: A fused forensic text comparison systemForensic Science International10.1016/j.forsciint.2017.06.040278(184-197)Online publication date: Sep-2017
https://doi.org/10.1016/j.forsciint.2017.06.040
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Filtering spam with behavioral blacklisting

Detection of networks blocks used by the Storm Worm botnet

Clustering malware-generated spam emails with a novel fuzzy string matching algorithm

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Filtering spam with behavioral blacklisting

Detection of networks blocks used by the Storm Worm botnet

Clustering malware-generated spam emails with a novel fuzzy string matching algorithm

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations