Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1363686.1364019acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Mining spam email to identify common origins for forensic application

Published: 16 March 2008 Publication History

Abstract

In recent years, spam email has become a major tool for criminals to conduct illegal business on the Internet. Therefore, in this paper we describe a new research approach that uses data mining techniques to study spam emails with the focus on law enforcement forensic analysis. After we retrieve useful attributes from spam emails, we use a connected components clustering algorithm to form relationships between messages. These initial clusters are then refined by using a weighted edges model where membership in the cluster requires the weight to exceed a chosen threshold. The results of the cluster membership are validated by WHOIS data, by the IP address of the computer hosting the advertised sites, and through comparison of graphical images of website fetches. This technique has been successful in identifying relationships between spam campaigns that were not identified by human researchers, enabling additional data to be brought into a single investigation.

References

[1]
Airoldi, E. and Malin, B. ScamSlam: An Architecture for Learning the Criminal Relations Behind Scam Spam. Carnegie Mellon University, School of Computer Science, Technical Report CMU-ISRI-04-121. Pittsburgh: May 2004.
[2]
Baase, S. Computer Algorithms: Introduction to Design and Analysis. (2nd ed.). Addison-Wesley, 1988.
[3]
Clark, J., Koprinska, I. and Poon, J. A neural network based approach to automated e-mail classification. In Proceedings of IEEE/WIC International Conference on Web Intelligence, 13, 17, (Oct. 2003), 702--705.
[4]
Drucker, H., Wu, D. and Vapnik, V. N. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10, 5, (Sep 1999), 1048--1054.
[5]
Han, J. and Kamber, M. Data Mining: Concepts and Techniques. (2nd ed.). Morgan Kaufmann, San Francisco, CA, 2006.
[6]
Jung, J. and Sit, E. An empirical study of spam traffic and the use of DNS black lists. In Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement. (Oct. 2004) 370--375.
[7]
Sahami, M., Dumais S., Heckerman, D. and Horvitz, E. A Bayesian approach to filtering junk email. AAAI Workshop on Learning for Text Categorization, AAAI Technical Report WS-98-05. Madison, Wisconsin. July 1998. 55--62.
[8]
Sanpakdee, U., Walairacht, A. and Walairacht, S. Adaptive spam mail filtering using genetic algorithm. In Proceedings of the 8th International Conference on Advanced Communication Technology. (Feb. 2006). 441--445.
[9]
Soucy. P and Mineau, G. W. A simple KNN algorithm for text categorization. In Proceedings of 2001 IEEE International Conference on Data Mining, (Nov - Dec 2001) 647--648.
[10]
Stolfo, S. Email Mining Toolkit Supporting Law Enforcement Forensic Analyses. NSF Final Report. DG.o 2005 Atlanta, GA. May 2005.
[11]
Vel, O. D., Anderson, A., Corney, M. and Mohay, G. Mining Email Content for Author Identification Forensics. SIGMOD: Special Section on Data Mining for Intrusion Detection and Threat Analysis, 30, 4, (Dec. 2001) 55--64.
[12]
Yang, Y. and Liu, X. A Re-examination of text categorization methods. In Proceedings of 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (Aug. 1999). 42--49.
[13]
Zhao, W. and Zhang, Z. An email classification model based on rough set theory. In Proceedings of the 2005 International Conference on Active Media Technology. (May 2005). 403--40.

Cited By

View all
  • (2024)Contigra: Graph Mining with Containment ConstraintsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629589(50-65)Online publication date: 22-Apr-2024
  • (2023)A Comprehensive Study to Detect Social Spam CampaignsAdvanced Applications of NLP and Deep Learning in Social Media Data10.4018/978-1-6684-6909-5.ch001(1-18)Online publication date: 9-Jun-2023
  • (2022)Context-Based Clustering to Mitigate Phishing AttacksProceedings of the 15th ACM Workshop on Artificial Intelligence and Security10.1145/3560830.3563728(115-126)Online publication date: 11-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing
March 2008
2586 pages
ISBN:9781595937537
DOI:10.1145/1363686
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cyber crime
  2. data mining
  3. electronic mail
  4. forensic analysis
  5. spam

Qualifiers

  • Research-article

Conference

SAC '08
Sponsor:
SAC '08: The 2008 ACM Symposium on Applied Computing
March 16 - 20, 2008
Fortaleza, Ceara, Brazil

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Contigra: Graph Mining with Containment ConstraintsProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629589(50-65)Online publication date: 22-Apr-2024
  • (2023)A Comprehensive Study to Detect Social Spam CampaignsAdvanced Applications of NLP and Deep Learning in Social Media Data10.4018/978-1-6684-6909-5.ch001(1-18)Online publication date: 9-Jun-2023
  • (2022)Context-Based Clustering to Mitigate Phishing AttacksProceedings of the 15th ACM Workshop on Artificial Intelligence and Security10.1145/3560830.3563728(115-126)Online publication date: 11-Nov-2022
  • (2022)Maximal Directed Quasi -Clique Mining2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00188(1900-1913)Online publication date: May-2022
  • (2021)Scalable mining of maximal quasi-cliquesProceedings of the VLDB Endowment10.14778/3436905.343691614:4(573-585)Online publication date: 22-Feb-2021
  • (2020)Quantification of Multimillion Offers in ‘Next-of-Kin’ Unsolicited Bulk EmailsICT Analysis and Applications10.1007/978-981-15-8354-4_45(453-465)Online publication date: 16-Dec-2020
  • (2020)Environmental Criminology and Cybercrime: Shifting Focus from the Wine to the BottlesThe Palgrave Handbook of International Cybercrime and Cyberdeviance10.1007/978-3-319-78440-3_30(491-511)Online publication date: 6-Jun-2020
  • (2019)Digital Waste Disposal: an automated framework for analysis of spam emailsInternational Journal of Information Security10.1007/s10207-019-00470-x19:5(499-522)Online publication date: 25-Sep-2019
  • (2019)Environmental Criminology and Cybercrime: Shifting Focus from the Wine to the BottlesThe Palgrave Handbook of International Cybercrime and Cyberdeviance10.1007/978-3-319-90307-1_30-1(1-22)Online publication date: 14-Jun-2019
  • (2017)Strength of linguistic text evidence: A fused forensic text comparison systemForensic Science International10.1016/j.forsciint.2017.06.040278(184-197)Online publication date: Sep-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media