research-article

Public Access

Opening the Blackbox of VirusTotal: Analyzing Online Phishing Scan Engines

Authors:

Linhai Song, and

Gang WangAuthors Info & Claims

IMC '19: Proceedings of the Internet Measurement Conference

October 2019

Pages 478 - 485

https://doi.org/10.1145/3355369.3355585

Published: 21 October 2019 Publication History

Abstract

Online scan engines such as VirusTotal are heavily used by researchers to label malicious URLs and files. Unfortunately, it is not well understood how the labels are generated and how reliable the scanning results are. In this paper, we focus on VirusTotal and its 68 third-party vendors to examine their labeling process on phishing URLs. We perform a series of measurements by setting up our own phishing websites (mimicking PayPal and IRS) and submitting the URLs for scanning. By analyzing the incoming network traffic and the dynamic label changes at VirusTotal, we reveal new insights into how VirusTotal works and the quality of their labels. Among other things, we show that vendors have trouble flagging all phishing sites, and even the best vendors missed 30% of our phishing sites. In addition, the scanning results are not immediately updated to VirusTotal after the scanning, and there are inconsistent results between VirusTotal scan and some vendors' own scanners. Our results reveal the need for developing more rigorous methodologies to assess and make use of the labels obtained from VirusTotal.

Supplementary Material

peng (peng.zip)

Supplemental movie, appendix, image and software files for, Opening the Blackbox of VirusTotal: Analyzing Online Phishing Scan Engines

Download
4.43 MB

References

[1]

Digital ocean. https://www.digitalocean.com/.

[2]

Irs login page. https://sa.www4.irs.gov/ola/.

[3]

Joe sandbox. https://www.joesecurity.org/.

[4]

Jotti's malware scan. https://virusscan.jotti.org/.

[5]

Namesilo. https://www.namesilo.com/.

[6]

Paypal login page. https://www.paypal.com/us/signin.

[7]

Virscan. http://VirSCAN.org.

[8]

Virustotal. https://www.virustotal.com/.

[9]

Virustotal faq. https://support.virustotal.com/hc/en-us/articles/115002122285-AV-product-on-VirusTotal-detects-a-file-and-its-equivalent-commercial-version-does-not.

[10]

Virustotal public api v2.0. https://www.virustotal.com/en/documentation/public-api/.

[11]

Virustotal vendors. https://support.virustotal.com/hc/en-us/articles/115002146809-Contributors.

[12]

Akhawe, D., and Felt, A. P. Alice in warningland: A large-scale field study of browser security warning effectiveness. In Proc. of USENIX Security (2013).

[13]

Aonzo, S., Merlo, A., Tavella, G., and Fratantonio, Y. Phishing attacks on modern android. In Proc. of CCS (2018).

Digital Library

[14]

Ardi, C., and Heidemann, J. Auntietuna: Personalized content-based phishing detection. In NDSS Usable Security Workshop (USEC) (2016).

[15]

Cai, Z., and Yap, R. H. Inferring the detection logic and evaluating the effectiveness of android anti-virus apps. In Proc. of CODASPY (2016).

Digital Library

[16]

Catakoglu, O., Balduzzi, M., and Balzarotti, D. Automatic extraction of indicators of compromise for web applications. In Proc. of WWW (2016).

Digital Library

[17]

Chen, Y., Nadji, Y., Romero-Gómez, R., Antonakakis, M., and Dagon, D. Measuring network reputation in the ad-bidding process. In Proc. of DIMVA (2017).

[18]

Cheng, B., Ming, J., Fu, J., Peng, G., Chen, T., Zhang, X., and Marion, J.-Y. Towards paving the way for large-scale windows malware analysis: Generic binary unpacking with orders-of-magnitude performance boost. In Proc. of CCS (2018).

Digital Library

[19]

Dong, Z., Kapadia, A., Blythe, J., and Camp, L.J. Beyond the lock icon: real-time detection of phishing websites using public key certificates. In Proc. of eCrime (2015).

[20]

Hong, G., Yang, Z., Yang, S., Zhang, L., Nan, Y., Zhang, Z., Yang, M., Zhang, Y., Qian, Z., and Duan, H. How you get shot in the back: A systematical study about cryptojacking in the real world. In Proc. of CCS (2018).

Digital Library

[21]

Invernizzi, L., Thomas, K., Kapravelos, A., Comanescu, O., Picod, J., and Bursztein, E. Cloak of visibility: Detecting when machines browse a different web. In Proc. of IEEE S&P (2016).

[22]

Kantchelian, A., Tschantz, M. C., Afroz, S., Miller, B., Shankar, V., Bachwani, R., Joseph, A. D., and Tygar, J. D. Better malware ground truth: Techniques for weighting anti-virus vendor labels. In Proc. of AISec (2015).

Digital Library

[23]

Kim, D., Kwon, B. J., and Dumitraş, T. Certified malware: Measuring breaches of trust in the windows code-signing pki. In Proc. of CCS (2017).

Digital Library

[24]

Kim, D., Kwon, B. J., Kozák, K., Gates, C., and DumitraÈ&Zacute;, T. The broken shield: Measuring revocation effectiveness in the windows code-signing pki. In Proc. of USENIX Security (2018).

[25]

Kleitman, S., Law, M. K., and Kay, J. ItâĂ&Zacute;s the deceiver and the receiver: Individual differences in phishing susceptibility and false positives with item profiling. PLOS One (2018).

[26]

Korczynski, D., and Yin, H. Capturing malware propagations with code injections and code-reuse attacks. In Proc. of CCS (2017).

Digital Library

[27]

Kwon, B. J., Mondal, J., Jang, J., Bilge, L., and Dumitraş, T. The dropper effect: Insights into malware distribution with downloader graph analytics. In Proc. of CCS (2015).

Digital Library

[28]

Lever, C., Kotzias, P., Balzarotti, D., Caballero, J., and Antonakakis, M. A lustrum of malware network communication: Evolution and insights. In Proc. of IEEE S&P (2017).

[29]

Li, B., Vadrevu, P., Lee, K. H., Perdisci, R., Liu, J., Rahbarinia, B., Li, K., and Antonakakis, M. Jsgraph: Enabling reconstruction of web attacks via efficient tracking of live in-browser javascript executions. In Proc. of NDSS (2018).

[30]

Miramirkhani, N., Barron, T., Ferdman, M., and Nikiforakis, N. Panning for gold.com: Understanding the dynamics of domain dropcatching. In Proc. of WWW (2018).

Digital Library

[31]

Neupane, A., Saxena, N., Kuruvilla, K., Georgescu, M., and Kana, R. K. Neural signatures of user-centered security: An fmri study of phishing, and malware warnings. In Proc. of NDSS (2014).

[32]

Oest, A., Safaei, Y., Doupé, A., Ahn, G., Wardman, B., and Tyers, K. Phishfarm: A scalable framework for measuring the effectiveness of evasion techniques against browser phishing blacklists. In Proc. of IEEE S&P (2019).

[33]

Oprea, A., Li, Z., Norris, R., and Bowers, K. Made: Security analytics for enterprise threat detection. In Proc. of ACSAC (2018).

Digital Library

[34]

Peng, P., Xu, C., Quinn, L., Hu, H., Viswanath, B., and Wang, G. What happens after you leak your password: Understanding credential sharing on phishing sites. In Proc. of AsiaCCS (2019).

Digital Library

[35]

Razaghpanah, A., Nithyanand, R., Vallina-Rodriguez, N., Sundaresan, S., Allman, M., Kreibich, C., and Gill, P. Apps, trackers, privacy, and regulators: A global study of the mobile tracking ecosystem. In Proc. of NDSS (2018).

[36]

Sarabi, A., and Liu, M. Characterizing the internet host population using deep learning: A universal and lightweight numerical embedding. In Proc. of IMC (2018).

Digital Library

[37]

Schwartz, E. J., Cohen, C. F., Duggan, M., Gennari, J., Havrilla, J. S., and Hines, C. Using logic programming to recover c++ classes and methods from compiled executables. In Proc. of CCS (2018).

Digital Library

[38]

Sharif, M., Urakawa, J., Christin, N., Kubota, A., and Yamada, A. Predicting impending exposure to malicious content from user behavior. In Proc. of CCS (2018).

Digital Library

[39]

Szurdi, J., and Christin, N. Email typosquatting. In Proc. of IMC (2017).

Digital Library

[40]

Tian, K., Jan, S. T. K., Hu, H., Yao, D., and Wang, G. Needle in a haystack: Tracking down elite phishing domains in the wild. In Proc. of IMC (2018).

Digital Library

[41]

Wang, H., Liu, Z., Liang, J., Vallina-Rodriguez, N., Guo, Y., Li, L., Tapiador, J., Cao, J., and Xu, G. Beyond google play: A large-scale comparative study of chinese android app markets. In Proc. of IMC (2018).

Digital Library

[42]

Wang, L., Nappa, A., Caballero, J., Ristenpart, T., and Akella, A. Whowas: A platform for measuring web deployments on iaas clouds. In Proc. of IMC (2014).

Digital Library

[43]

Whittaker, C., Ryner, B., and Nazif, M. Large-scale automatic classification of phishing pages. In Proc. of NDSS (2010).

[44]

Wong, M. Y., and Lie, D. Tackling runtime-based obfuscation in android with tiro. In Proc. of USENIX Security (2018).

[45]

Xu, D., Ming, J., Fu, Y., and Wu, D. Vmhunt: A verifiable approach to partially-virtualized binary code simplification. In Proc. of CCS (2018).

Digital Library

[46]

Xu, Z., Nappa, A., Baykov, R., Yang, G., Caballero, J., and Gu, G. Autoprobe: Towards automatic active malicious server probing using dynamic binary analysis. In Proc. of CCS (2014).

Digital Library

[47]

Zuo, C., and Lin, Z. Smartgen: Exposing server urls of mobile apps with selective symbolic execution. In Proc. of WWW (2017).

Digital Library

Cited By

Sharma RDei Thakur BKaushik NChauhan P(2024)Securing the WebJournal of Information Security and Cybercrimes Research10.26735/UGSQ66207:1(05-28)Online publication date: 2-Jun-2024
https://doi.org/10.26735/UGSQ6620
NAKANO HCHIBA DKOIDE TFUKUSHI NYAGI THARIU TYOSHIOKA KMATSUMOTO T(2024)Understanding Characteristics of Phishing Reports from Experts and Non-Experts on TwitterIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7221E107.D:7(807-824)Online publication date: 1-Jul-2024
https://doi.org/10.1587/transinf.2023EDP7221
Choo ENabeel MKim DDe Silva RYu TKhalil I(2024)A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLsACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365504252:1(55-56)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3673660.3655042
Show More Cited By

Index Terms

Opening the Blackbox of VirusTotal: Analyzing Online Phishing Scan Engines
1. Security and privacy
  1. Software and application security
    1. Web application security

Recommendations

Benchmarking Label Dynamics of VirusTotal Engines
CCS '20: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security

VirusTotal is the largest online anti-malware scanning service. It is widely used by security researchers for labeling malware data or serving as a comparison baseline. However, several important challenges of using VirusTotal are left unaddressed (e.g.,...
Read More
A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs
SIGMETRICS '24

VirusTotal (VT) is a widely used scanning service for researchers and practitioners to label malicious entities and predict new security threats. Unfortunately, it is little known to the end-users how VT URL scanners decide on the maliciousness of ...
Read More
A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLs
SIGMETRICS/PERFORMANCE '24: Abstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

VirusTotal (VT) is a widely used scanning service for researchers and practitioners to label malicious entities and predict new security threats. Unfortunately, it is little known to the end-users how VT URL scanners decide on the maliciousness of ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IMC '19: Proceedings of the Internet Measurement Conference

October 2019

497 pages

ISBN:9781450369480

DOI:10.1145/3355369

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation

Conference

IMC '19

Sponsor:

IMC '19: ACM Internet Measurement Conference

October 21 - 23, 2019

Amsterdam, Netherlands

Acceptance Rates

IMC '19 Paper Acceptance Rate 39 of 197 submissions, 20%;

Overall Acceptance Rate 277 of 1,083 submissions, 26%

Upcoming Conference

IMC '24

Sponsor:
sigcomm
sigcomm

ACM Internet Measurement Conference

November 4 - 6, 2024

Madrid , AA , Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

63
Total Citations
View Citations
1,914
Total Downloads

Downloads (Last 12 months)450
Downloads (Last 6 weeks)52

Other Metrics

View Author Metrics

Citations

Cited By

Sharma RDei Thakur BKaushik NChauhan P(2024)Securing the WebJournal of Information Security and Cybercrimes Research10.26735/UGSQ66207:1(05-28)Online publication date: 2-Jun-2024
https://doi.org/10.26735/UGSQ6620
NAKANO HCHIBA DKOIDE TFUKUSHI NYAGI THARIU TYOSHIOKA KMATSUMOTO T(2024)Understanding Characteristics of Phishing Reports from Experts and Non-Experts on TwitterIEICE Transactions on Information and Systems10.1587/transinf.2023EDP7221E107.D:7(807-824)Online publication date: 1-Jul-2024
https://doi.org/10.1587/transinf.2023EDP7221
Choo ENabeel MKim DDe Silva RYu TKhalil I(2024)A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLsACM SIGMETRICS Performance Evaluation Review10.1145/3673660.365504252:1(55-56)Online publication date: 13-Jun-2024
https://dl.acm.org/doi/10.1145/3673660.3655042
Choo ENabeel MKim DDe Silva RYu TKhalil IGaretto MMarin ACiucu FFanti GRighter R(2024)A Large Scale Study and Classification of VirusTotal Reports on Phishing and Malware URLsAbstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems10.1145/3652963.3655042(55-56)Online publication date: 10-Jun-2024
https://dl.acm.org/doi/10.1145/3652963.3655042
Satvat KGjomemo RVenkatakrishnan VVilela JSchulmann HLi N(2024)TIPCE: A Longitudinal Threat Intelligence Platform Comprehensiveness AnalysisProceedings of the Fourteenth ACM Conference on Data and Application Security and Privacy10.1145/3626232.3653278(349-360)Online publication date: 19-Jun-2024
https://dl.acm.org/doi/10.1145/3626232.3653278
Wang CJia ZBenkraouda HZevnik CHeuermann NFoulger RHandler JWang G(2024)VeriSMS: A Message Verification System for Inclusive Patient Outreach against Phishing AttacksProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642027(1-17)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642027
Takao KHiraishi CTanabe RTakada KFujita AInoue DGañán Cvan Eeten MYoshioka KMatsumoto T(2024)VT-SOS: A Cost-effective URL Warning utilizing VirusTotal as a Second Opinion ServiceNOMS 2024-2024 IEEE Network Operations and Management Symposium10.1109/NOMS59830.2024.10575506(1-5)Online publication date: 6-May-2024
https://doi.org/10.1109/NOMS59830.2024.10575506
Ashawa MOwoh NRiley JOsamor JHosseinzadeh S(2024)An Exploration of shared code execution for malware analysis2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA)10.1109/ACDSA59508.2024.10467679(1-9)Online publication date: 1-Feb-2024
https://doi.org/10.1109/ACDSA59508.2024.10467679
Haider RAslam BAbbas HIqbal Z(2024)C2-Eye: framework for detecting command and control (C2) connection of supply chain attacksInternational Journal of Information Security10.1007/s10207-024-00850-y23:4(2531-2545)Online publication date: 29-Apr-2024
https://doi.org/10.1007/s10207-024-00850-y
Xie QLi F(2024)Crawling to the Top: An Empirical Evaluation of Top List UsePassive and Active Measurement10.1007/978-3-031-56249-5_12(277-306)Online publication date: 20-Mar-2024
https://doi.org/10.1007/978-3-031-56249-5_12
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents