Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
tutorial

Online phishing classification using adversarial data mining and signaling games

Published: 27 May 2010 Publication History

Abstract

In adversarial systems, the performance of a classifier decreases after it is deployed, as the adversary learns to defeat it. Recently, adversarial data mining was introduced, where the classification problem is viewed as a game mechanism between an adversary and an intelligent and adaptive classifier. Over the last years, phishing fraud through malicious email messages has been a serious threat that affects global security and economy, where traditional spam filtering techniques have shown to be ineffective. In this domain, using dynamic games of incomplete information, a game theoretic data mining framework is proposed in order to build an adversary-aware classifier for phishing fraud detection. To build the classifier, an online version of theWeighted Margin Support Vector Machines with a game theoretic prior knowledge function is proposed. In this paper, a new contentbased feature extraction technique for phishing filtering is described. Experiments show that the proposed classifier is highly competitive compared with previously proposed online classification algorithms in this adversarial environment, and promising results were obtained using traditional machine learning techniques over extracted features.

References

[1]
S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair. A comparison of machine learning techniques for phishing detection. In eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pages 60--69, New York, NY, USA, 2007. ACM.
[2]
M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In ASIACCS '06: Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16--25, New York, NY, USA, 2006. ACM.
[3]
R. Basne, S. Mukkamala, and A. H. Sung. Detection of Phishing Attacks: A Machine Learning Approach, chapter Studies in Fuzziness and Soft Computing, pages 373--383. Springer Berlin/Heidelberg, 2008.
[4]
A. Bergholz, J. D. Beer, S. Glahn, M.-F. Moens, G. Paass, and S. Strobel. New filtering approaches for phishing email. Journal of Computer Security, 2009. Accepted for publication.
[5]
A. Bergholz, J.-H. Chang, G. Paass, F. Reichartz, and S. Strobel. Improved phishing detection using model-based features. In Fifth Conference on Email and Anti-Spam, CEAS 2008, 2008.
[6]
B. Biggio, G. Fumera, and F. Roli. Multiple classifier systems for adversarial classification tasks. In J. A. Benediktsson, J. Kittler, and F. Roli, editors, MCS, volume 5519 of Lecture Notes in Computer Science, pages 132--141. Springer, 2009.
[7]
C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001.
[8]
I.-K. Cho and D. M. Kreps. Signaling games and stable equilibria. The Quarterly Journal of Economics, 102(2):179--221, May 1987.
[9]
N. Dalvi, P. Domingos, M. Sumit, and S. DeepakVerma. Adversarial classification. In Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining, volume 1, pages 99--108, Seattle, WA, USA, 2004. ACM Press.
[10]
I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 649--656, New York, NY, USA, 2007. ACM.
[11]
D. Fudenberg and J. Tirole. Game Theory. MIT Press, October 1991.
[12]
C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, Vol. 2:213--242, December 2001.
[13]
R. Gibbons. Game Theory for Applied Economists. Princeton University Press, 1992.
[14]
J. Goodman, G. V. Cormack, and D. Heckerman. Spam and the ongoing battle for the inbox. Communications of the ACM, Vol. 50(2):24--33, 2007.
[15]
J. C. Harsanyi. Games with incomplete information played by bayesian players. the basic probability distribution of the game. Management Science, 14(7):486--502, 1968.
[16]
M. Kantarcioglu, B. Xi, and C. Clifton. A game theoretic framework for adversarial learning. In CERIAS 9th Annual Information Security Symposium, 2008.
[17]
D. M. Kreps and R. Wilson. Sequential equilibria. Econometrica, 50(4):863--94, July 1982.
[18]
D. Lowd and C. Meek. Adversarial learning. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 641--647, New York, NY, USA, 2005. ACM.
[19]
R. D. McKelvey, A. M. McLennan, and T. L. Turocy. Gambit: Software tools for game theory, version 0.2007.01.30, 2007.
[20]
J. Nazario. Phishing corpus, 2004-2007.
[21]
B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia. Exploiting machine learning to subvert your spam filter. In LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pages 1--9, Berkeley, CA, USA, 2008. USENIX Association.
[22]
J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. In B. Schoelkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1998.
[23]
D. Sculley and G. M. Wachman. Relaxed online svms for spam filtering. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 415--422, New York, NY, USA, 2007. ACM.
[24]
F. Sebastiani. Text categorization. In A. Zanasi, editor, Text Mining and its Applications to Intelligence, CRM and Knowledge Management, pages 109--129. WIT Press, Southampton, UK, 2005.
[25]
T. L. Turocy. Using quantal reponse to compute nash and sequential equilibria. Economic Theory, Vol. 42, Issue 1, 2010.
[26]
V. N. Vapnik. The Nature of Statistical Learning Theory (Information Science and Statistics). Springer, 1999.
[27]
J. Velasquez, H. Yasuda, T. Aoki, and R. Weber. A new similarity measure to understand visitor behavior in a web site. IEICE Transactions on Information and Systems, Special Issues in Information Processing Technology for web utilization, vE87-D i2.:389--396, 2004.
[28]
J. D. Velasquez and V. Palade. Adaptive Web Sites: A Knowledge Extraction from Web Data Approach. IOS Press, 2008.
[29]
J. D. Velasquez, S. A. Rios, A. Bassi, H. Yasuda, and T. Aoki. Towards the identification of keywords in the web site text content: A methodological approach. International Journal of Web Information Systems information, Vol. 1(1):pp. 53--57, 2005.
[30]
I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005.
[31]
X. Wu and R. Srihari. Incorporating prior knowledge with weighted margin support vector machines. In KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 326--333, New York, NY, USA, 2004. ACM.
[32]
P. Zhang, X. Zhu, and Y. Shi. Categorizing and mining concept drifting data streams. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 812--820, New York, NY, USA, 2008. ACM.

Cited By

View all
  • (2017)Improving credit scoring by differentiating defaulter behaviourJournal of the Operational Research Society10.1057/jors.2014.5066:5(771-781)Online publication date: 21-Dec-2017
  • (2014)Text Mining for Phishing E-mail DetectionIntelligent Computing, Communication and Devices10.1007/978-81-322-2012-1_8(65-71)Online publication date: 26-Aug-2014
  • (2014)From Operations Research to Dynamic Data Mining and BeyondZukunftsperspektiven des Operations Research10.1007/978-3-658-05707-7_22(343-356)Online publication date: 25-Apr-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 11, Issue 2
December 2009
128 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/1809400
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2010
Published in SIGKDD Volume 11, Issue 2

Check for updates

Author Tags

  1. adversarial classification
  2. data mining
  3. email filtering
  4. game theory
  5. games of incomplete information
  6. spam and phishing detection

Qualifiers

  • Tutorial

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Improving credit scoring by differentiating defaulter behaviourJournal of the Operational Research Society10.1057/jors.2014.5066:5(771-781)Online publication date: 21-Dec-2017
  • (2014)Text Mining for Phishing E-mail DetectionIntelligent Computing, Communication and Devices10.1007/978-81-322-2012-1_8(65-71)Online publication date: 26-Aug-2014
  • (2014)From Operations Research to Dynamic Data Mining and BeyondZukunftsperspektiven des Operations Research10.1007/978-3-658-05707-7_22(343-356)Online publication date: 25-Apr-2014
  • (2012)Dynamic rough clustering and its applicationsApplied Soft Computing10.1016/j.asoc.2012.05.01512:10(3193-3207)Online publication date: 1-Oct-2012
  • (2011)Future trends in business analytics and optimizationIntelligent Data Analysis10.5555/2595490.259550015:6(1001-1017)Online publication date: 1-Nov-2011

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media