Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3465481.3470112acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article
Open access

Combining Text and Visual Features to Improve the Identification of Cloned Webpages for Early Phishing Detection

Published: 17 August 2021 Publication History

Abstract

Phishing attacks arrive in high numbers and often spread quickly, meaning that after-the-fact countermeasures such as domain blacklisting are limited in efficacy. Visual similarity-based approaches have the potential of detecting previously unseen phishing webpages. These approaches, however, require identifying the legitimate webpage(s) they reproduce. Existing approaches rely on textual feature analysis for target identification, with misclassification rates of approximately 1%; however, as most websites a user might visit are legitimate, additional research is needed to further reduce classification errors. In this work, we propose a novel method for target identification that relies on both visual features (extracted from a screenshot of the web page) and textual features (extracted from the DOM of the web page) to identify which website a phishing web page is replicating, and assess its effectiveness in detecting phishing websites using data from phishing aggregators such as OpenPhish, PhishTank and PhishStats. Compared to state-of-the-art text-based classifiers, our method reduces the phishing misclassification rate by 67% (from 1.02% to 0.34%), for an accuracy of 99.66%. This work provides a further step forwards toward semi-automated decision support systems for phishing detection.

References

[1]
N. Abdelhamid, A. Ayesh, and F. Thabtah. 2014. Phishing detection based Associative Classification data mining. Expert Syst. Appl. 41(2014), 5948–5959.
[2]
S. Abdelnabi, K. Krombholz, and M. Fritz. 2020. VisualPhishNet: Zero-Day Phishing Website Detection by Visual Similarity. In Conference on Computer and Communications Security. ACM, 1681–1698.
[3]
M. Adebowale, K. Lwin, E. Sánchez, and M. Hossain. 2019. Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text. Expert Systems with Applications 115 (2019), 300–313.
[4]
S. Afroz and R. Greenstadt. 2011. PhishZoo: Detecting Phishing Websites by Looking at Them. In Int. Conference on Semantic Computing. IEEE, 368–375.
[5]
APWG. 2020. Phishing Activity Trends Report 1st quarter 2020 plus COVID-19 coverage. https://docs.apwg.org/reports/apwg_trends_report_q1_2020.pdf
[6]
G. Bradski and A. Kaehler. 2008. Learning OpenCV. O’Reilly Media, Inc.
[7]
J. Bórquez. 2020. Convert any image to pure CSS. https://javier.xyz/img2css/
[8]
K. Chiew, E. Chang, S. Sze, and W. Tiong. 2015. Utilisation of website logo for phishing detection. Computers & Security 54(2015), 16–26.
[9]
K. Chiew, J. Choo, S. Sze, and K. Yong. 2018. Leverage Website Favicon to Detect Phishing Websites. Security and Communication Networks(2018), 1–11.
[10]
R. Dhamija, J. Tygar, and M. Hearst. 2006. Why Phishing Works. In SIGCHI Conference on Human Factors in Computing Systems. ACM, 581–590.
[11]
Y. Ding, N. Luktarhan, K. Li, and W. Slamu. 2019. A keyword-based combination approach for detecting phishing webpages. Computers & Security 84(2019), 256–275.
[12]
[12] DMOZ.2017. http://dmoz-odp.org/Accessed: 2021-03-21.
[13]
A. Fu, L. Wenyin, and X. Deng. 2006. Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover’s Distance (EMD). IEEE Trans. Dependable Secure Comput. 3, 4 (2006), 301–311.
[14]
Google. 2021. Google Safe Browsing. https://safebrowsing.google.com/ Accessed: 2021-03-21.
[15]
B. Gupta, N. Arachchilage, and K. Psannis. 2018. Defending against Phishing Attacks: Taxonomy of Methods, Current Issues and Future Directions. Telecommun. Syst. 67, 2 (2018), 247–267.
[16]
X. Han, N. Kheir, and D. Balzarotti. 2016. PhishEye: Live Monitoring of Sandboxed Phishing Kits. In CCS. ACM, 1402–1413.
[17]
Alexa Internet Inc.2021. Alexa - Top sites. https://www.alexa.com/topsites Accessed: 2021-03-21.
[18]
A. Jain and B. Gupta. 2017. Phishing Detection: Analysis of Visual Similarity Based Approaches. Security and Communication Networks 2017 (01 2017), 1–20.
[19]
A. Krizhevsky, I. Sutskever, and G. Hinton. 2017. ImageNet Classification with Deep Convolutional Neural Networks. Comm. ACM (2017), 84–90.
[20]
G. Liu, B. Qiu, and W. Liu. 2010. Automatic Detection of Phishing Target from Phishing Webpage. In International Conference on Pattern Recognition. 4153–4156.
[21]
S. Marchal, K. Saari, N. Singh, and N. Asokan. 2015. Know Your Phish: Novel Techniques for Detecting Phishing Sites and their Targets. International Conference on Distributed Computing Systems (2015), 323–333.
[22]
E. Medvet, E. Kirda, and C. Kruegel. 2008. Visual-Similarity-Based Phishing Detection. In International Conference on Security and Privacy in Communication Networks. ACM, Article 22, 6 pages.
[23]
R. Mohammad and L. McCluskey. 2015. Phishing Websites Data Set. https://archive.ics.uci.edu/ml/datasets/phishing+websites/ Accessed: 2021-03-21.
[24]
V. Muppavarapu, A. Rajendran, and S. Vasudevan. 2018. Phishing detection using RDF and random forests. Int. Arab. J. Inf. Technol. 15 (2018), 817–824.
[25]
OpenPhish. 2020. OpenPhish - Phishing Intelligence. https://openphish.com/.
[26]
N. Otsu. 1979. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics 9, 1(1979), 62–66.
[27]
D. Pan, Y.and Xuhua. 2006. Anomaly Based Web Phishing Page Detection. In Annual Computer Security Applications Conference. 381–392.
[28]
P. Peng, C. Xu, L. Quinn, H. Hu, B. Viswanath, and G. Wang. 2019. What Happens After You Leak Your Password: Understanding Credential Sharing on Phishing Sites. In Asia Conference on Computer & Communications Security. ACM, 181–192.
[29]
[29] PhishStats.2021. https://phishstats.info/. Accessed: 2021-03-21.
[30]
PhishTank. 2020. Join the fight against phishing. https://www.phishtank.com/.
[31]
G. Ramesh, I. Krishnamurthi, and K. Kumar. 2014. An efficacious method for detecting phishing webpages through target domain identification. Decision Support Systems 61 (2014), 12–22.
[32]
L. Richardson. 2020. BeautifulSoup. https://pypi.org/project/beautifulsoup4. Accessed: 2021-03-21.
[33]
S. Rose, D. Engel, N. Cramer, and W. Cowley. 2010. Automatic Keyword Extraction from Individual Documents. In Text Mining: Applications and Theory. Wiley, 1–20.
[34]
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. 2011. ORB: an efficient alternative to SIFT or SURF. In International Conference on Computer Vision. IEEE, 2564–2571.
[35]
S. Schechter, R. Dhamija, A.y Ozment, and I. Fischer. 2007. The Emperor’s New Security Indicators. In Symp. on Security & Privacy. IEEE, 51–65.
[36]
Q. Scheitle, O. Hohlfeld, J. Gamba, J. Jelten, T. Zimmermann, S. Strowes, and N. Vallina-Rodriguez. 2018. A Long Way to the Top. In Internet Measurement Conference. ACM.
[37]
J. Serra. 1983. Image Analysis and Mathematical Morphology. Academic Press.
[38]
N. Shekokar, C. Shah, M. Mahajan, and S. Rachh. 2015. An Ideal Approach for Detection and Prevention of Phishing Attacks. Procedia Computer Science 49 (2015), 82–91.
[39]
S. Sheng, B. Wardman, G. Warner, L. Cranor, J. Hong, and C. Zhang. 2009. An Empirical Analysis of Phishing Blacklists. In Conference on Email and Anti-Spam.
[40]
S. Suzuki and K. Abe. 1985. Topological structural analysis of digitized binary images by border following. Comput. Vis. Graph. Image Process. 30 (1985), 32–46.
[41]
Choon Lin Tan. 2018. Phishing Dataset for Machine Learning: Feature Evaluation.
[42]
L. Wang, Y. Zhang, and J. Feng. 2005. On the Euclidean distance of images. IEEE Trans. on Pattern Analysis & Machine Intelligence 27, 8(2005), 1334–1339.
[43]
Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 4 (2004), 600–612.
[44]
L. Wenyin, G. Liu, B. Qiu, and X. Quan. 2012. Antiphishing through Phishing Target Discovery. IEEE Internet Computing 16 (2012), 52–61.
[45]
Q. Ye, J. Jiao, J. Huang, and H. Yu. 2007. Text detection and restoration in natural scene images. J Vis Commun Image Represent. 18, 6 (2007), 504–513.
[46]
H. Zhang, G. Liu, T. Chow, and W. Liu. 2011. Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach. IEEE Trans on Neural Networks 22, 10 (2011), 1532–1546.

Cited By

View all
  • (2024)Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks against Phishing Website Detectors Using Machine LearningDigital Threats: Research and Practice10.1145/36382535:2(1-51)Online publication date: 20-Jun-2024
  • (2024)Cognition in Social Engineering Empirical Research: A Systematic Literature ReviewACM Transactions on Computer-Human Interaction10.1145/363514931:2(1-55)Online publication date: 29-Jan-2024
  • (2024)WikiPhish: A Diverse Wikipedia-Based Dataset for Phishing Website Detection: Data/Toolset PaperProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653283(361-366)Online publication date: 19-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ARES '21: Proceedings of the 16th International Conference on Availability, Reliability and Security
August 2021
1447 pages
ISBN:9781450390514
DOI:10.1145/3465481
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Phishing Detection
  2. Target Identification
  3. Visual Features

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • ITEA3

Conference

ARES 2021

Acceptance Rates

Overall Acceptance Rate 228 of 451 submissions, 51%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)425
  • Downloads (Last 6 weeks)48
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks against Phishing Website Detectors Using Machine LearningDigital Threats: Research and Practice10.1145/36382535:2(1-51)Online publication date: 20-Jun-2024
  • (2024)Cognition in Social Engineering Empirical Research: A Systematic Literature ReviewACM Transactions on Computer-Human Interaction10.1145/363514931:2(1-55)Online publication date: 29-Jan-2024
  • (2024)WikiPhish: A Diverse Wikipedia-Based Dataset for Phishing Website Detection: Data/Toolset PaperProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653283(361-366)Online publication date: 19-Jun-2024
  • (2024)Phishing URL detection generalisation using Unsupervised Domain AdaptationComputer Networks10.1016/j.comnet.2024.110398245(110398)Online publication date: May-2024
  • (2024)Detection of phishing URLs with deep learning based on GAN-CNN-LSTM network and swarm intelligence algorithmsSignal, Image and Video Processing10.1007/s11760-024-03204-218:6-7(4979-4995)Online publication date: 17-Jun-2024
  • (2023)“Do Users Fall for Real Adversarial Phishing?” Investigating the Human Response to Evasive Webpages2023 APWG Symposium on Electronic Crime Research (eCrime)10.1109/eCrime61234.2023.10485552(1-14)Online publication date: 15-Nov-2023
  • (2023)Phishing Web Page Detection using Web ScrapingSoutheastCon 202310.1109/SoutheastCon51012.2023.10115148(167-174)Online publication date: 1-Apr-2023
  • (2023)Benchmarking Model URL Features and Image Based for Phishing URL Detection2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS)10.1109/ICIMCIS60089.2023.10349059(177-182)Online publication date: 7-Nov-2023
  • (2023)A systematic literature review on phishing website detection techniquesJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.01.00435:2(590-611)Online publication date: 1-Feb-2023
  • (2023)Hybrid phishing detection using joint visual and textual identityExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119723220:COnline publication date: 15-Jun-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media