tutorial

Online phishing classification using adversarial data mining and signaling games

Authors:

Gaston L'Huillier,

Nicolas FigueroaAuthors Info & Claims

ACM SIGKDD Explorations Newsletter, Volume 11, Issue 2

Pages 92 - 99

https://doi.org/10.1145/1809400.1809421

Published: 27 May 2010 Publication History

Abstract

In adversarial systems, the performance of a classifier decreases after it is deployed, as the adversary learns to defeat it. Recently, adversarial data mining was introduced, where the classification problem is viewed as a game mechanism between an adversary and an intelligent and adaptive classifier. Over the last years, phishing fraud through malicious email messages has been a serious threat that affects global security and economy, where traditional spam filtering techniques have shown to be ineffective. In this domain, using dynamic games of incomplete information, a game theoretic data mining framework is proposed in order to build an adversary-aware classifier for phishing fraud detection. To build the classifier, an online version of theWeighted Margin Support Vector Machines with a game theoretic prior knowledge function is proposed. In this paper, a new contentbased feature extraction technique for phishing filtering is described. Experiments show that the proposed classifier is highly competitive compared with previously proposed online classification algorithms in this adversarial environment, and promising results were obtained using traditional machine learning techniques over extracted features.

References

[1]

S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair. A comparison of machine learning techniques for phishing detection. In eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 60--69, New York, NY, USA, 2007. ACM.

Digital Library

[2]

M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In ASIACCS '06: Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16--25, New York, NY, USA, 2006. ACM.

Digital Library

[3]

R. Basne, S. Mukkamala, and A. H. Sung. Detection of Phishing Attacks: A Machine Learning Approach, chapter Studies in Fuzziness and Soft Computing, pages 373--383. Springer Berlin/Heidelberg, 2008.

[4]

A. Bergholz, J. D. Beer, S. Glahn, M.-F. Moens, G. Paass, and S. Strobel. New filtering approaches for phishing email. Journal of Computer Security, 2009. Accepted for publication.

Digital Library

[5]

A. Bergholz, J.-H. Chang, G. Paass, F. Reichartz, and S. Strobel. Improved phishing detection using model-based features. In Fifth Conference on Email and Anti-Spam, CEAS 2008, 2008.

[6]

B. Biggio, G. Fumera, and F. Roli. Multiple classifier systems for adversarial classification tasks. In J. A. Benediktsson, J. Kittler, and F. Roli, editors, MCS, volume 5519 of Lecture Notes in Computer Science, pages 132--141. Springer, 2009.

Digital Library

[7]

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001.

[8]

I.-K. Cho and D. M. Kreps. Signaling games and stable equilibria. The Quarterly Journal of Economics, 102(2):179--221, May 1987.

[9]

N. Dalvi, P. Domingos, M. Sumit, and S. DeepakVerma. Adversarial classification. In Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining, volume 1, pages 99--108, Seattle, WA, USA, 2004. ACM Press.

Digital Library

[10]

I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 649--656, New York, NY, USA, 2007. ACM.

Digital Library

[11]

D. Fudenberg and J. Tirole. Game Theory. MIT Press, October 1991.

[12]

C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, Vol. 2:213--242, December 2001.

Digital Library

[13]

R. Gibbons. Game Theory for Applied Economists. Princeton University Press, 1992.

[14]

J. Goodman, G. V. Cormack, and D. Heckerman. Spam and the ongoing battle for the inbox. Communications of the ACM, Vol. 50(2):24--33, 2007.

Digital Library

[15]

J. C. Harsanyi. Games with incomplete information played by bayesian players. the basic probability distribution of the game. Management Science, 14(7):486--502, 1968.

Digital Library

[16]

M. Kantarcioglu, B. Xi, and C. Clifton. A game theoretic framework for adversarial learning. In CERIAS 9th Annual Information Security Symposium, 2008.

Digital Library

[17]

D. M. Kreps and R. Wilson. Sequential equilibria. Econometrica, 50(4):863--94, July 1982.

[18]

D. Lowd and C. Meek. Adversarial learning. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 641--647, New York, NY, USA, 2005. ACM.

Digital Library

[19]

R. D. McKelvey, A. M. McLennan, and T. L. Turocy. Gambit: Software tools for game theory, version 0.2007.01.30, 2007.

[20]

J. Nazario. Phishing corpus, 2004-2007.

[21]

B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia. Exploiting machine learning to subvert your spam filter. In LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pages 1--9, Berkeley, CA, USA, 2008. USENIX Association.

Digital Library

[22]

J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998.

[23]

D. Sculley and G. M. Wachman. Relaxed online svms for spam filtering. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 415--422, New York, NY, USA, 2007. ACM.

Digital Library

[24]

F. Sebastiani. Text categorization. In A. Zanasi, editor, Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pages 109--129. WIT Press, Southampton, UK, 2005.

[25]

T. L. Turocy. Using quantal reponse to compute nash and sequential equilibria. Economic Theory, Vol. 42, Issue 1, 2010.

[26]

V. N. Vapnik. The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, 1999.

Digital Library

[27]

J. Velasquez, H. Yasuda, T. Aoki, and R. Weber. A new similarity measure to understand visitor behavior in a web site. IEICE Transactions on Information and Systems, Special Issues in Information Processing Technology for web utilization, vE87-D i2.:389--396, 2004.

[28]

J. D. Velasquez and V. Palade. Adaptive Web Sites: A Knowledge Extraction from Web Data Approach. IOS Press, 2008.

Digital Library

[29]

J. D. Velasquez, S. A. Rios, A. Bassi, H. Yasuda, and T. Aoki. Towards the identification of keywords in the web site text content: A methodological approach. International Journal of Web Information Systems information, Vol. 1(1):pp. 53--57, 2005.

[30]

I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005.

Digital Library

[31]

X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326--333, New York, NY, USA, 2004. ACM.

Digital Library

[32]

P. Zhang, X. Zhu, and Y. Shi. Categorizing and mining concept drifting data streams. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 812--820, New York, NY, USA, 2008. ACM.

Digital Library

Cited By

Bravo CThomas LWeber R(2017)Improving credit scoring by differentiating defaulter behaviourJournal of the Operational Research Society10.1057/jors.2014.5066:5(771-781)Online publication date: 21-Dec-2017
https://doi.org/10.1057/jors.2014.50
Zareapoor MSeeja K(2014)Text Mining for Phishing E-mail DetectionIntelligent Computing, Communication and Devices10.1007/978-81-322-2012-1_8(65-71)Online publication date: 26-Aug-2014
https://doi.org/10.1007/978-81-322-2012-1_8
Weber R(2014)From Operations Research to Dynamic Data Mining and BeyondZukunftsperspektiven des Operations Research10.1007/978-3-658-05707-7_22(343-356)Online publication date: 25-Apr-2014
https://doi.org/10.1007/978-3-658-05707-7_22
Show More Cited By

Index Terms

Online phishing classification using adversarial data mining and signaling games

Recommendations

Online phishing classification using adversarial data mining and signaling games
CSI-KDD '09: Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics

In adversarial systems, the performance of a classifier decreases after it is deployed, as the adversary learns to defeat it. Recently, adversarial data mining was introduced as a solution to this, where the classification problem is viewed as a game ...
Adversarial classification using signaling games with an application to phishing detection

In adversarial classification, the interaction between classifiers and adversaries can be modeled as a game between two players. It is natural to model this interaction as a dynamic game of incomplete information, since the classifier does not know the ...
New filtering approaches for phishing email
EU-Funded ICT Research on Trust and Security

Phishing emails usually contain a message from a credible looking source requesting a user to click a link to a website where she/he is asked to enter a password or other confidential information. Most phishing emails aim at withdrawing money from ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter

ACM SIGKDD Explorations Newsletter Volume 11, Issue 2

December 2009

128 pages

ISSN:1931-0145

EISSN:1931-0153

DOI:10.1145/1809400

Issue’s Table of Contents

Copyright © 2010 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2010

Published in SIGKDD Volume 11, Issue 2

Check for updates

Author Tags

Qualifiers

Tutorial

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
415
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bravo CThomas LWeber R(2017)Improving credit scoring by differentiating defaulter behaviourJournal of the Operational Research Society10.1057/jors.2014.5066:5(771-781)Online publication date: 21-Dec-2017
https://doi.org/10.1057/jors.2014.50
Zareapoor MSeeja K(2014)Text Mining for Phishing E-mail DetectionIntelligent Computing, Communication and Devices10.1007/978-81-322-2012-1_8(65-71)Online publication date: 26-Aug-2014
https://doi.org/10.1007/978-81-322-2012-1_8
Weber R(2014)From Operations Research to Dynamic Data Mining and BeyondZukunftsperspektiven des Operations Research10.1007/978-3-658-05707-7_22(343-356)Online publication date: 25-Apr-2014
https://doi.org/10.1007/978-3-658-05707-7_22
Peters GWeber RNowatzke R(2012)Dynamic rough clustering and its applicationsApplied Soft Computing10.1016/j.asoc.2012.05.01512:10(3193-3207)Online publication date: 1-Oct-2012
https://dl.acm.org/doi/10.1016/j.asoc.2012.05.015
Brown DFamili FPaass GSmith-Miles KThomas LWeber RBaeza-Yates RBravo CL'Huillier GMaldonado S(2011)Future trends in business analytics and optimizationIntelligent Data Analysis10.5555/2595490.259550015:6(1001-1017)Online publication date: 1-Nov-2011
https://dl.acm.org/doi/10.5555/2595490.2595500

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents