research-article

Beyond heuristics: learning to classify vulnerabilities and predict exploits

Authors:

Mehran Bozorgi,

Lawrence K. Saul,

Stefan Savage,

Geoffrey M. VoelkerAuthors Info & Claims

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 105 - 114

https://doi.org/10.1145/1835804.1835821

Published: 25 July 2010 Publication History

Get Access

Abstract

The security demands on modern system administration are enormous and getting worse. Chief among these demands, administrators must monitor the continual ongoing disclosure of software vulnerabilities that have the potential to compromise their systems in some way. Such vulnerabilities include buffer overflow errors, improperly validated inputs, and other unanticipated attack modalities. In 2008, over 7,400 new vulnerabilities were disclosed--well over 100 per week. While no enterprise is affected by all of these disclosures, administrators commonly face many outstanding vulnerabilities across the software systems they manage. Vulnerabilities can be addressed by patches, reconfigurations, and other workarounds; however, these actions may incur down-time or unforeseen side-effects. Thus, a key question for systems administrators is which vulnerabilities to prioritize. From publicly available databases that document past vulnerabilities, we show how to train classifiers that predict whether and how soon a vulnerability is likely to be exploited. As input, our classifiers operate on high dimensional feature vectors that we extract from the text fields, time stamps, cross references, and other entries in existing vulnerability disclosure reports. Compared to current industry-standard heuristics based on expert knowledge and static formulas, our classifiers predict much more accurately whether and how soon individual vulnerabilities are likely to be exploited.

Supplementary Material

JPG File (kdd2010_bozorgi_bhlc_01.jpg)

Download
9.26 KB

MOV File (kdd2010_bozorgi_bhlc_01.mov)

Download
72.13 MB

References

[1]

W. A. Arbaugh, W. L. Fithen, and J. McHugh. Windows of vulnerability: A case study analysis. Computer, 33(12):52--59, 2000.

Digital Library

Google Scholar

[2]

A. Arora, A. Nandkumar, and R. Telang. Does information security attack frequency increase with vulnerability disclosure? an empirical analysis. Information Systems Frontiers, 8(5), 2006.

Digital Library

Google Scholar

[3]

A. Arora, R. Telang, and H. Xu. Optimal policy for software vulnerability disclosure. In Workshop on Economics and Information Security (WEIS'04), 2004.

Google Scholar

[4]

S. M. Bellovin. On the Brittleness of Software and the Infeasibility of Security Metrics. IEEE Security and Privacy, 4(4), July 2006.

Digital Library

Google Scholar

[5]

Cisco. Risk Assessment: Risk Triage for Security Vulnerability Announcements. Cisco Whitepaper, Accessed September, 2009. http://www.cisco.com/web/about/security/intelligence/vulnerability-risk-triage.html.

Google Scholar

[6]

CVE Editorial Board. Common Vulnerabilities and Exposures: The Standard for Information Security Vulnerability Names. http://cve.mitre.org/.

Google Scholar

[7]

C. Dougherty. Vulnerability metric, Updated on July 24, 2008. https://www.securecoding.cert.org/confluence/ display/seccode/Vulnerability+Metric.

Google Scholar

[8]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR -- A Library for Large Linear Classification. http://www.csie.ntu.edu.tw/~cjlin/liblinear/.

Digital Library

Google Scholar

[9]

Forum of Incident Response and Security Teams (FIRST). Common Vulnerabilities Scoring System (CVSS). http://www.first.org/cvss/.

Google Scholar

[10]

S. Frei, D. Schatzmann, B. Plattner, and B. Trammel. Modeling the Security Ecosystem - The Dynamics of (In)Security. In Proc. of the Workshop on the Economics of Information Security (WEIS), June 2009.

Google Scholar

[11]

IBM. IBM Internet Security Systems X-Force 2008 Trend and Risk Report. White paper, Jan. 2009. http://www-935.ibm.com/services/us/iss/xforce/trendreports/xforce-2008-annual-report.pdf.

Google Scholar

[12]

D. Lewis. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proceedings of ECML-98, the 10th European Conference on Machine Learning, pages 4--15, 1998.

Digital Library

Google Scholar

[13]

P. Mell, K. Scarfone, and S. Romanosky. A complete guide to the common vulnerability scoring system version 2.0, June, 2007. http://www.first.org/cvss/cvss-guide.html.

Google Scholar

[14]

Microsoft TechNet Security Team. Microsoft Security Bulletin. http://www.microsoft.com/technet/security/current.aspx.

Google Scholar

[15]

D. Moore, C. Shannon, and k. claffy. Code-red: a case study on the spread and victims of an internet worm. In Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurement, pages 273--284, 2002.

Digital Library

Google Scholar

[16]

D. Nizovtsev and M. Thursby. Economic analysis of incentives to disclose software vulnerabilities. In Proc. of the Workshop on the Economics of Information Security, 2005.

Google Scholar

[17]

OSVDB. The Open Source Vulnerability Database. http://osvdb.org/.

Google Scholar

[18]

A. Ozment. The likelihood of vulnerability rediscovery and the social utility of vulnerability hunting. In Proc. of the Workshop on the Economics of Information Security, 2005.

Google Scholar

[19]

E. Rescorla. Security holes... who cares? In Proc. of the 12th conference on USENIX Security Symposium, 2003.

Digital Library

Google Scholar

[20]

Secunia Corporation. Secunia Advisories. http://secunia.com.

Google Scholar

[21]

Symantec Corporation. Security Focus. http://www.securityfocus.com.

Google Scholar

[22]

V. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, NY, 1998.

Digital Library

Google Scholar

Cited By

View all

Jia QQu XJiang ZWang C(2024)Enterprise Security Patch Management with Deep Reinforcement LearningSSRN Electronic Journal10.2139/ssrn.4816905Online publication date: 2024
https://doi.org/10.2139/ssrn.4816905
Yin JHong WWang HCao JMiao YZhang Y(2024)A Compact Vulnerability Knowledge Graph for Risk AssessmentACM Transactions on Knowledge Discovery from Data10.1145/367100518:8(1-17)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3671005
Iannone ESellitto GIaccarino EFerrucci FDe Lucia APalomba F(2024)Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?ACM Transactions on Software Engineering and Methodology10.1145/365444333:6(1-41)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3654443
Show More Cited By

Index Terms

Beyond heuristics: learning to classify vulnerabilities and predict exploits
1. Security and privacy
  1. Network security

Recommendations

A threat pattern for the "cross-site scripting (XSS)" attack
PLoP '15: Proceedings of the 22nd Conference on Pattern Languages of Programs

We present a threat pattern that describes cross-site scripting (XSS) attacks. In this attack attackers insert scripts in web applications that will lead to misuses in a target web application. Cross-Site Scripting is listed as number three risk on the ...
Two threat patterns that exploit "security misconfiguration" and "sensitive data exposure" vulnerabilities
EuroPLoP '15: Proceedings of the 20th European Conference on Pattern Languages of Programs

We present threat patterns that describe attacks against applications that take advantage of security misconfigurations in the application stack and applications that expose sensitive data. These patterns provide insight on how to build and configure ...
It's a TRaP: Table Randomization and Protection against Function-Reuse Attacks
CCS '15: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security

Code-reuse attacks continue to evolve and remain a severe threat to modern software. Recent research has proposed a variety of defenses with differing security, efficiency, and practicality characteristics. Whereas the majority of these solutions focus ...

Reviews

Reviewer: Vijay K Gurbani

Machine learning techniques are being applied to all kinds of problems in computer science. This paper applies machine learning to classifying vulnerabilities and predicting time to exploit a vulnerability, once information on it has been released. Bozorgi et al. train a linear support vector machine (SVM) on feature vectors extracted from two publicly available vulnerability databases: the open-source vulnerability database (OSVDB) and MITRE's common vulnerabilities and exposures (CVE). The feature extraction process consists of a frequency count of keywords that appear in a vulnerability disclosure report. The SVM is trained on available vulnerability data from 1991 to 2005; data from 2005 to 2007 is used as a testing vector. After the training, the authors test the classifier on two predictions: (a) whether a given vulnerability will be exploited at all and (b) the time to exploit a known vulnerability. The results indicate that for prediction (a), the classifier achieves a true positive (TP) rate of 95 percent (the false positive (FP) rate is five percent). For prediction (b), the results indicate that the classifier is 98 percent accurate-TP is 98 percent and FP is two percent-in predicting whether a vulnerability will be exploited within two days; other time frames, such as seven, 14, or 30 days, yield the same result. A final contribution of the paper is an alternative vulnerability scoring system that shows how critical a vulnerability is. Current scoring systems have differing ways of representing this and, in fact, some of them have magic numbers embedded in deriving the score. Bozorgi et al. propose using the signed distance to the maximum margin hyperplane separating positive and negative examples as a canonical score for the exploitability of a vulnerability. The paper makes a good argument for using machine learning models to predict vulnerabilities. A more structured approach mitigates the presence of magic numbers that are found in existing manual classification schemes. To be sure, machine learning will not mitigate the importance of human intelligence in determining vulnerabilities-for instance, zero-day exploits cannot be predicted through these techniques-but it can move it a bit closer to being a science rather than an art. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

July 2010

1240 pages

ISBN:9781450300551

DOI:10.1145/1835804

General Chairs:
Bharat Rao
Siemens
,
Balaji Krishnapuram
Siemens
,
Program Chairs:
Andrew Tomkins
Google Inc.
,
Qiang Yang
Hong Kong University of Science and Technology

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '10

Sponsor:

KDD '10: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

July 25 - 28, 2010

DC, Washington, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

172
Total Citations
View Citations
1,722
Total Downloads

Downloads (Last 12 months)87
Downloads (Last 6 weeks)11

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jia QQu XJiang ZWang C(2024)Enterprise Security Patch Management with Deep Reinforcement LearningSSRN Electronic Journal10.2139/ssrn.4816905Online publication date: 2024
https://doi.org/10.2139/ssrn.4816905
Yin JHong WWang HCao JMiao YZhang Y(2024)A Compact Vulnerability Knowledge Graph for Risk AssessmentACM Transactions on Knowledge Discovery from Data10.1145/367100518:8(1-17)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3671005
Iannone ESellitto GIaccarino EFerrucci FDe Lucia APalomba F(2024)Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?ACM Transactions on Software Engineering and Methodology10.1145/365444333:6(1-41)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3654443
Elder SRahman MFringer GKapoor KWilliams L(2024)A Survey on Software Vulnerability Exploitability AssessmentACM Computing Surveys10.1145/364861056:8(1-41)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1145/3648610
Massacci F(2024)The Holy Grail of Vulnerability PredictionsIEEE Security & Privacy10.1109/MSEC.2023.333393622:1(4-6)Online publication date: Jan-2024
https://doi.org/10.1109/MSEC.2023.3333936
Eskandari HBewong MGeaur Rahman MUr Rehman S(2024)OutCenTR: A Method for Predicting Exploits of Cyber Vulnerabilities in High Dimensional DatasetsIEEE Access10.1109/ACCESS.2024.346040212(133030-133044)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3460402
Alqahtani S(2024)Security bug reports classification using fasttextInternational Journal of Information Security10.1007/s10207-023-00793-w23:2(1347-1358)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10207-023-00793-w
Li ZZhang JHan W(2024)A Survey of Cybersecurity Knowledge Base and Its Automatic LabelingNetwork Simulation and Evaluation10.1007/978-981-97-4522-7_4(53-70)Online publication date: 2-Aug-2024
https://doi.org/10.1007/978-981-97-4522-7_4
Charmanas KMittas NAngelis L(2023)Exploitation of Vulnerabilities: A Topic-Based Machine Learning Framework for Explaining and Predicting ExploitationInformation10.3390/info1407040314:7(403)Online publication date: 14-Jul-2023
https://doi.org/10.3390/info14070403
Li YYadavally AZhang JWang SNguyen TChandra SBlincoe KTonella P(2023)Commit-Level, Neural Vulnerability Detection and AssessmentProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616346(1024-1036)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616346
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A threat pattern for the "cross-site scripting (XSS)" attack

Two threat patterns that exploit "security misconfiguration" and "sensitive data exposure" vulnerabilities

It's a TRaP: Table Randomization and Protection against Function-Reuse Attacks

Reviews

Access critical reviews of Computing literature here