Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183440.3195040acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
poster

Identifying security issues in software development: are keywords enough?

Published: 27 May 2018 Publication History

Abstract

Identifying security issues before attackers do has become a critical concern for software development teams and software users. While methods for finding programming errors (e.g. fuzzers 1, static code analysis [3] and vulnerability prediction models like Scandariato et al. [10]) are valuable, identifying security issues related to the lack of secure design principles and to poor development processes could help ensure that programming errors are avoided before they are committed to source code.
Typical approaches (e.g. [4, 6--8]) to identifying security-related messages in software development project repositories use text mining based on pre-selected sets of standard security-related keywords, for instance; authentication, ssl, encryption, availability, or password. We hypothesize that these standard keywords may not capture the entire spectrum of security-related issues in a project, and that additional project-specific and/or domain-specific vocabulary may be needed to develop an accurate picture of a project's security.
For instance, Arnold et al.[1], in a review of bug-fix patches on Linux kernel version 2.6.24, identified a commit (commit message: "Fix - >vm_file accounting, mmap_region() may do do_munmap()" 2) with serious security consequences that was mis-classified as a non-security bug. While no typical security keyword is mentioned, memory mapping ('mmap') in the domain of kernel development has significance from a security perspective, parallel to buffer overflows in languages like C/C++. Whether memory or currency is at stake, identifying changes to assets that the software manages is potentially security-related.
The goal of this research is to support researchers and practitioners in identifying security issues in software development project artifacts by defining and evaluating a systematic scheme for identifying project-specific security vocabularies that can be used for keyword-based classification.
We derive three research questions from our goal:
RQ1: How does the vocabulary of security issues vary between software development projects?
RQ2: How well do project-specific security vocabularies identify messages related to publicly reported vulnerabilities?
RQ3: How well do existing security keywords identify project-specific security-related messages and messages related to publicly reported vulnerabilities?
To address these research questions, we collected developer email, bug tracking, commit message, and CVE record project artifacts from three open source projects : Dolibarr, Apache Camel, and Apache Derby. We manually classified 5400 messages from the three project's commit messages, bug trackers, and emails, and linked the messages to each project's public vulnerability records, Adapting techniques from Bachmann and Bernstein [2], Schermann et al. [11], and Guzzi [5], we analyzed each project's security vocabulary and the vocabulary's relationship to the project's vulnerabilities. We trained two classifiers (Model.A and Model.B) on samples of the project data, and used the classifiers to predict security-related messages in the manually-classified project oracles.
Our contributions include:
• A systematic scheme for linking CVE records to related messages in software development project artifacts
• An empirical evaluation of project-specific security vocabulary similarities and differences between project artifacts and between projects
To summarize our findings on RQ1, we present tables of our qualitative and quantitative results. We tabulated counts of words found in security-related messages. Traditional security keywords (e.g. password, encryption) are present, particularly in the explicit column, but each project also contains terms describing entities unique to the project, for example 'endpoint' (Camel), 'blob' (short for 'Binary Large Object'), 'clob' ('Character Large Object'), 'deadlock' (Derby), and 'invoice', 'order' for Dolibarr. The presence of these terms in security-related issues suggests that they are assets worthy of careful attention during the development life cycle.
Table 1 lists the statistics for security-related messages from the three projects, broken down by security class and security property. Explicit security-related messages (messages referencing security properties) are in the minority in each project. Implicit messages represent the majority of security-related messages in each project.
In Table 2, we present the results of the classifiers built using the various project and literature security vocabularies to predict security-related messages in the oracle and CVE datasets. We have marked in bold the highest result for each performance measure for each dataset. Both Models A and B have a high performance across the projects when predicting for the oracle dataset of the project for which they were built. Further, the project-specific models have higher performance than the literature-based models (Ray.vocab [9] and Pletea.vocab [7]) on the project oracle datasets. Model performance is not sustained and is inconsistent when applied to other project's datasets.
To summarize our findings on RQ2, Table 3 presents performance results for the project vocabulary models on the CVE datasets for each project. We have marked in bold the highest result for each performance measure for each dataset. Results for Model.A shows a high recall for Derby and Camel and a worse than average recall for Dolibarr. However, in Model.B, the recall is above 60% for Dolibarr and over 85% for both Derby and Camel. We reason the low precision is due to our approach of labeling only CVE-related messages as security-related and the rest of the messages are labeled to be not security-related. The Dolibarr results are further complicated by the low proportion of security-related messages compared with the other two projects (as reported in 1).
To summarize our findings on RQ3, Table 2 and Table 3 present the classifier performance results for two sets of keywords, Ray.vocab, and Pletea.vocab, drawn from the literature. In each case, the project vocabulary model had the highest recall, precision and F-Score on the project's oracle dataset. With regards to the CVE-dataset, the project vocabulary model has the highest recall. However, the overall performance, as measured by F-Score, varied by dataset, with the Ray and Pletea keywords scoring higher than the project vocabulary model. The low precision for the classifier built on the project's vocabularies follows the explanation provided under RQ2.
Our results suggest that domain vocabulary model show recalls that outperform standard security terms across our datasets. Our conjecture, supported in our data, is that augmenting standard security keywords with a project's security vocabulary yields a more accurate security picture. In future work, we aim to refine vocabulary selection to improve classifier performance, and to define tools implementing the approach in this paper to aid practitioners and researchers in identifying software project security issues.

References

[1]
Jeff Arnold, Tim Abbott, Waseem Daher, Gregory Price, Nelson Elhage, Geoffrey Thomas, and Anders Kaseorg. 2009. Security Impact Ratings Considered Harmful. In HotOS.
[2]
A. Bachmann and A. Bernstein. 2009. Data Retrieval, Processing and Linking for Software Process Data Analysis. (2009).
[3]
Brian Chess and Gary McGraw. 2004. Static analysis for security. IEEE Security & Privacy 2, 6 (2004), 76--79.
[4]
Jane Cleland-Huang, Raffaella Settimi, Xuchang Zou, and Peter Sole. 2006. The Detection and Classification of Non-Functional Requirements with Application to Early Aspects. In Requirements Engineering, 14th IEEE International Conference (2006--09). Minneapolis, Minnesota, 39--48.
[5]
Anja Guzzi, Alberto Bacchelli, Michele Lanza, Martin Pinzger, and Arie van Deursen. 2013. Communication in Open Source Software Development Mailing Lists. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 277--286. http://dl.acm.org/citation.cfm?id=2487085.2487139
[6]
Abram Hindle, Neil A. Ernst, Michael W. Godfrey, and John Mylopoulos. 2013. Automated Topic Naming. Empirical Softw. Engg. 18, 6 (Dec. 2013), 1125--1155.
[7]
Daniel Pletea, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Security and emotion: sentiment analysis of security discussions on GitHub. In Proceedings of the 11th working conference on mining software repositories. ACM, 348--351.
[8]
Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the "Naturalness" of Buggy Code. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 428--439.
[9]
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 155--165.
[10]
Riccardo Scandariato, James Waiden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering 40, 10 (2014), 993--1006.
[11]
Gerald Schermann, Martin Brandtner, Sebastiano Panichella, Philipp Leitner, and Harald Gall. 2015. Discovering Loners and Phantoms in Commit and Issue Data. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension (ICPC '15). IEEE Press, Piscataway, NJ, USA, 4--14. http://dl.acm.org/citation.cfm?id=2820282.2820287

Cited By

View all
  • (2022)Automated unearthing of dangerous issue reportsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549156(834-846)Online publication date: 7-Nov-2022
  • (2021)Why Don't Developers Detect Improper Input Validation?'Proceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00054(499-511)Online publication date: 22-May-2021
  • (2021)Towards Vulnerability Types Classification Using Pure Self-Attention: A Common Weakness Enumeration Based Approach2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)10.1109/CSE53436.2021.00030(146-153)Online publication date: Oct-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings
May 2018
231 pages
ISBN:9781450356633
DOI:10.1145/3183440
  • Conference Chair:
  • Michel Chaudron,
  • General Chair:
  • Ivica Crnkovic,
  • Program Chairs:
  • Marsha Chechik,
  • Mark Harman
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Check for updates

Author Tags

  1. CVE
  2. classification model
  3. prediction
  4. security
  5. vocabulary

Qualifiers

  • Poster

Funding Sources

Conference

ICSE '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Automated unearthing of dangerous issue reportsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549156(834-846)Online publication date: 7-Nov-2022
  • (2021)Why Don't Developers Detect Improper Input Validation?'Proceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00054(499-511)Online publication date: 22-May-2021
  • (2021)Towards Vulnerability Types Classification Using Pure Self-Attention: A Common Weakness Enumeration Based Approach2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)10.1109/CSE53436.2021.00030(146-153)Online publication date: Oct-2021
  • (2021)An improved text classification modelling approach to identify security messages in heterogeneous projectsSoftware Quality Journal10.1007/s11219-020-09546-729:2(509-553)Online publication date: 1-Jun-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media