Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2398776.2398821acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article

Taster's choice: a comparative analysis of spam feeds

Published: 14 November 2012 Publication History

Abstract

E-mail spam has been the focus of a wide variety of measurement studies, at least in part due to the plethora of spam data sources available to the research community. However, there has been little attention paid to the suitability of such data sources for the kinds of analyses they are used for. In spite of the broad range of data available, most studies use a single "spam feed" and there has been little examination of how such feeds may differ in content. In this paper we provide this characterization by comparing the contents of ten distinct contemporaneous feeds of spam-advertised domain names. We document significant variations based on how such feeds are collected and show how these variations can produce differences in findings as a result.

Supplementary Material

PDF File (237.pdf)
Summary Review Documentation for "Taster's Choice: A Comparative Analysis of Spam Feeds", Authors: A. Pitsillidis, C. Kanich, G. Voelker, K. Levchenko, S. Savage

References

[1]
Alexa. Alexa top 500 global sites. http://www.alexa.com/topsites, June 2011.
[2]
D. S. Anderson, C. Fleizach, S. Savage, and G. M. Voelker. Spamscatter: Characterizing Internet Scam Hosting Infrastructure. In Proc. of 16th USENIX Security, 2007.
[3]
I. Androutsopoulos, J. Koutsias, K. Chandrinos, G. Paliouras, and C. D. Spyropoulos. An Evaluation of Naive Bayesian Anti-Spam Filtering. In Proc. of 1st MLNIA, 2000.
[4]
R. Beverly and K. Sollins. Exploiting Transport-Level Characteristics of Spam. In Proc. of 5th CEAS, 2008.
[5]
X. Carreras and L. Marquez. Boosting Trees for Anti-Spam Email Filtering. In Proceedings of RANLP-2001, 2001.
[6]
R. Clayton. How much did shutting down McColo help? In Proc. of 6th CEAS, 2009.
[7]
H. Drucker, D. Wu, and V. N. Vapnik. Support vector machines for spam categorization. In Proc. of IEEE Transactions on Neural Networks, 1999.
[8]
G. Gee and P. Kim. Doppleganger Domains. http://www.wired.com/images_blogs/threatlevel/2011/09/Doppelganger.Domains.pdf, 2011.
[9]
P. H. C. Guerra, D. Guedes, W. M. Jr., C. Hoepers, M. H. P. C. Chaves, and K. Steding-Jessen. Spamming Chains: A New Way of Understanding Spammer Behavior. In Proc. of 6th CEAS, 2009.
[10]
P. H. C. Guerra, D. Guedes, W. M. Jr., C. Hoepers, M. H. P. C. Chaves, and K. Steding-Jessen. Exploring the Spam Arms Race to Characterize Spam Evolution. In Proc. of 7th CEAS, 2010.
[11]
J. P. John, A. Moshchuk, S. D. Gribble, and A. Krishnamurthy. Studying Spamming Botnets Using Botlab. In Proc. of 6th NSDI, 2009.
[12]
C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. M. Voelker, V. Paxson, and S. Savage. Spamalytics: An Empirical Analysis of Spam Marketing Conversion. In Proc. of 15th ACM CCS, 2008.
[13]
M. Konte, N. Feamster, and J. Jung. Dynamics of Online Scam Hosting Infrastructure. In PAM, 2009.
[14]
C. Kreibich, C. Kanich, K. Levchenko, B. Enright, G. M. Voelker, V. Paxson, and S. Savage. On the Spam Campaign Trail. In Proc. 1st USENIX LEET, 2008.
[15]
C. Kreibich, C. Kanich, K. Levchenko, B. Enright, G. M. Voelker, V. Paxson, and S. Savage. Spamcraft: An Inside Look at Spam Campaign Orchestration. In Proc. of 2nd USENIX LEET, 2009.
[16]
M. Lee. Why My Email Went. http://www.symantec.com/connect/blogs/why-my-email-went, 2011.
[17]
N. Leontiadis, T. Moore, and N. Christin. Measuring and Analyzing Search-Redirection Attacks in the Illicit Online Prescription Drug Trade. In Proc. of USENIX Security, 2011.
[18]
K. Levchenko, A. Pitsillidis, N. Chachra, B. Enright, M. F--elegyh--azi, C. Grier, T. Halvorson, C. Kanich, C. Kreibich, H. Liu, D. McCoy, N. Weaver, V. Paxson, G. M. Voelker, and S. Savage. Click Trajectories: End-to-End Analysis of the Spam Value Chain. In Proc. of IEEE Symposium on Security and Privacy, 2011.
[19]
H. Liu, K. Levchenko, M. F--elegyh--azi, C. Kreibich, G. Maier, G. M. Voelker, and S. Savage. On the Effects of Registrar-level Intervention. In Proc. of 4th USENIX LEET, 2011.
[20]
M86 Security Labs. Top Spam Affiliate Programs. http://www.m86security.com/labs/traceitem.asp?article=1070, 2009.
[21]
Marshal8e6 TRACELabs. Marshal8e6 Security Threats: Email and Web Threats. http://www.marshal.com/newsimages/trace/Marshal8e6_TRACE_Report_Jan2009.pdf, 2009.
[22]
M. M. Masud, L. Khan, and B. Thuraisingham. Feature Based Techniques for Auto-Detection of Novel Email Worms. In Proc. of 11th PACKDDD, 2007.
[23]
D. McCoy, A. Pitsillidis, G. Jordan, N. Weaver, C. Kreibich, B. Krebs, G. M. Voelker, S. Savage, and K. Levchenko. PharmaLeaks: Understanding the Business of Online Pharmaceutical Affiliate Programs. In Proc. of the USENIX Security Symposium, 2012.
[24]
D. K. McGrath and M. Gupta. Behind Phishing: An Examination of Phisher Modi Operandi. In Proc. of 1st USENIX LEET, 2008.
[25]
T. Moore and R. Clayton. Examining the Impact of Website Take-down on Phishing. In Proceedings of the Anti-Phishing Working Group's 2nd annual eCrime Researchers Summit. ACM, 2007.
[26]
B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia. Exploiting Machine Learning to Subvert Your Spam Filter. In Proc. of 1st USENIX LEET, 2008.
[27]
ODP { Open Directory Project. http://www.dmoz.org, September 2011.
[28]
A. Pathak, Y. C. Hu, and Z. M. Mao. Peeking into Spammer Behavior from a Unique Vantage Point. In Proc. of 1st USENIX LEET, 2008.
[29]
A. Pathak, F. Qian, Y. C. Hu, Z. M. Mao, and S. Ranjan. Botnet Spam Campaigns Can Be Long Lasting: Evidence, Implications, and Analysis. In Proc. of 9th ACM SIGMETRICS, 2009.
[30]
A. Pitsillidis, K. Levchenko, C. Kreibich, C. Kanich, G. Voelkera, V. Paxson, N. Weaver, and S. Savage. Botnet Judo: Fighting Spam with Itself. In Proc. of 17th NDSS, 2010.
[31]
Z. Qian, Z. Mao, Y. Xie, and F. Yu. On network-level clusters for spam detection. In Proc. of 17th NDSS, 2010.
[32]
A. Ramachandran, N. Feamster, and S. Vempala. Filtering Spam with Behavioral Blacklisting. In Proc. of 14th ACM CCS, 2007.
[33]
D. Samosseiko. The Partnerka | What is it, and why should you care? In Proc. of Virus Bulletin Conference, 2009.
[34]
F. Sanchez, Z. Duan, and Y. Dong. Understanding Forgery Properties of Spam Delivery Paths. In Proc. of 7th CEAS, 2010.
[35]
S. Sinha, M. Bailey, and F. Jahanian. Shades of Grey: On the effectiveness of reputation-based blacklists. In Proc. of 3rd MALWARE, 2008.
[36]
O. Thonnard and M. Dacier. A Strategic Analysis of Spam Botnets Operations. In Proc. of 8th CEAS, 2011.
[37]
Trustwave. Spam Statistics { Week ending Sep 2, 2012. https://www.trustwave.com/support/labs/spam_statistics.asp, September 2012.
[38]
G. Warner. Random Pseudo-URLs Try to Confuse Anti-Spam Solutions. http://garwarner.blogspot.com/2010/09/random-pseudo-urls-try-to-confuse-anti.html, Sept. 2010.
[39]
C. Wei, A. Sprague, G. Warner, and A. Skjellum. Identifying New Spam Domains by Hosting IPs: Improving Domain Blacklisting. In Proc. of 7th CEAS, 2010.
[40]
A. G. West, A. J. Aviv, J. Chang, and I. Lee. Spam Mitigation Using Spatio-temporal Reputations From Blacklist History. In Proc of 26th. ACSAC, 2010.
[41]
J. Whissell and C. Clarke. Clustering for Semi-Supervised Spam Filtering. In Proc. of 8th CEAS, 2011.
[42]
Y. Xie, F. Yu, K. Achan, R. Panigrahy, G. Hulten, and I. Osipkov. Spamming Botnets: Signatures and Characteristics. In Proceedings of ACM SIGCOMM, 2008.
[43]
L. Zhuang, J. Dunagan, D. R. Simon, H. J. Wang, I. Osipkov, G. Hulten, and J. Tygar. Characterizing Botnets from Email Spam Records. In Proc. of 1st USENIX LEET, 2008.
[44]
J. Zittrain and L. Frieder. Spam Works: Evidence from Stock Touts and Corresponding Market Activity. Social Science Research Network, March 2007.

Cited By

View all
  • (2024)TIPCE: A Longitudinal Threat Intelligence Platform Comprehensiveness AnalysisProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653278(349-360)Online publication date: 19-Jun-2024
  • (2024)Harnessing TI Feeds for Exploitation Detection2024 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR61664.2024.10679417(200-207)Online publication date: 2-Sep-2024
  • (2021)Clairvoyance: Inferring Blocklist Use on the InternetPassive and Active Measurement10.1007/978-3-030-72582-2_4(57-75)Online publication date: 30-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IMC '12: Proceedings of the 2012 Internet Measurement Conference
November 2012
572 pages
ISBN:9781450317054
DOI:10.1145/2398776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. domain blacklists
  2. measurement
  3. spam e-mail

Qualifiers

  • Research-article

Conference

IMC '12
Sponsor:
IMC '12: Internet Measurement Conference
November 14 - 16, 2012
Massachusetts, Boston, USA

Acceptance Rates

Overall Acceptance Rate 277 of 1,083 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)TIPCE: A Longitudinal Threat Intelligence Platform Comprehensiveness AnalysisProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653278(349-360)Online publication date: 19-Jun-2024
  • (2024)Harnessing TI Feeds for Exploitation Detection2024 IEEE International Conference on Cyber Security and Resilience (CSR)10.1109/CSR61664.2024.10679417(200-207)Online publication date: 2-Sep-2024
  • (2021)Clairvoyance: Inferring Blocklist Use on the InternetPassive and Active Measurement10.1007/978-3-030-72582-2_4(57-75)Online publication date: 30-Mar-2021
  • (2019)Cognitive triaging of phishing attacksProceedings of the 28th USENIX Conference on Security Symposium10.5555/3361338.3361429(1309-1326)Online publication date: 14-Aug-2019
  • (2019)Reading the tea leavesProceedings of the 28th USENIX Conference on Security Symposium10.5555/3361338.3361398(851-867)Online publication date: 14-Aug-2019
  • (2019)Assessing the Effectiveness of Domain Blacklisting Against Malicious DNS Registrations2019 IEEE Security and Privacy Workshops (SPW)10.1109/SPW.2019.00045(199-204)Online publication date: May-2019
  • (2019)Getting Under Alexa’s Umbrella: Infiltration Attacks Against Internet Top Domain ListsInformation Security10.1007/978-3-030-30215-3_13(255-276)Online publication date: 2-Sep-2019
  • (2019)Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security ResearchPassive and Active Measurement10.1007/978-3-030-15986-3_11(161-177)Online publication date: 13-Mar-2019
  • (2018)Rotten Apples or Bad Harvest? What We Are Measuring When We Are Measuring AbuseACM Transactions on Internet Technology10.1145/312298518:4(1-25)Online publication date: 7-Aug-2018
  • (2017)MarmiteProceedings of the 33rd Annual Computer Security Applications Conference10.1145/3134600.3134604(91-102)Online publication date: 4-Dec-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media