poster

Identifying security issues in software development: are keywords enough?

Authors:

Patrick Morrison,

Tosin Daniel Oyetoyan,

Laurie WilliamsAuthors Info & Claims

ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings

Pages 426 - 427

https://doi.org/10.1145/3183440.3195040

Published: 27 May 2018 Publication History

Get Access

Abstract

Identifying security issues before attackers do has become a critical concern for software development teams and software users. While methods for finding programming errors (e.g. fuzzers ¹, static code analysis [3] and vulnerability prediction models like Scandariato et al. [10]) are valuable, identifying security issues related to the lack of secure design principles and to poor development processes could help ensure that programming errors are avoided before they are committed to source code.

Typical approaches (e.g. [4, 6--8]) to identifying security-related messages in software development project repositories use text mining based on pre-selected sets of standard security-related keywords, for instance; authentication, ssl, encryption, availability, or password. We hypothesize that these standard keywords may not capture the entire spectrum of security-related issues in a project, and that additional project-specific and/or domain-specific vocabulary may be needed to develop an accurate picture of a project's security.

For instance, Arnold et al.[1], in a review of bug-fix patches on Linux kernel version 2.6.24, identified a commit (commit message: "Fix - >vm_file accounting, mmap_region() may do do_munmap()" ²) with serious security consequences that was mis-classified as a non-security bug. While no typical security keyword is mentioned, memory mapping ('mmap') in the domain of kernel development has significance from a security perspective, parallel to buffer overflows in languages like C/C++. Whether memory or currency is at stake, identifying changes to assets that the software manages is potentially security-related.

The goal of this research is to support researchers and practitioners in identifying security issues in software development project artifacts by defining and evaluating a systematic scheme for identifying project-specific security vocabularies that can be used for keyword-based classification.

We derive three research questions from our goal:

• RQ1: How does the vocabulary of security issues vary between software development projects?

• RQ2: How well do project-specific security vocabularies identify messages related to publicly reported vulnerabilities?

• RQ3: How well do existing security keywords identify project-specific security-related messages and messages related to publicly reported vulnerabilities?

To address these research questions, we collected developer email, bug tracking, commit message, and CVE record project artifacts from three open source projects : Dolibarr, Apache Camel, and Apache Derby. We manually classified 5400 messages from the three project's commit messages, bug trackers, and emails, and linked the messages to each project's public vulnerability records, Adapting techniques from Bachmann and Bernstein [2], Schermann et al. [11], and Guzzi [5], we analyzed each project's security vocabulary and the vocabulary's relationship to the project's vulnerabilities. We trained two classifiers (Model.A and Model.B) on samples of the project data, and used the classifiers to predict security-related messages in the manually-classified project oracles.

Our contributions include:

• A systematic scheme for linking CVE records to related messages in software development project artifacts

• An empirical evaluation of project-specific security vocabulary similarities and differences between project artifacts and between projects

To summarize our findings on RQ1, we present tables of our qualitative and quantitative results. We tabulated counts of words found in security-related messages. Traditional security keywords (e.g. password, encryption) are present, particularly in the explicit column, but each project also contains terms describing entities unique to the project, for example 'endpoint' (Camel), 'blob' (short for 'Binary Large Object'), 'clob' ('Character Large Object'), 'deadlock' (Derby), and 'invoice', 'order' for Dolibarr. The presence of these terms in security-related issues suggests that they are assets worthy of careful attention during the development life cycle.

Table 1 lists the statistics for security-related messages from the three projects, broken down by security class and security property. Explicit security-related messages (messages referencing security properties) are in the minority in each project. Implicit messages represent the majority of security-related messages in each project.

In Table 2, we present the results of the classifiers built using the various project and literature security vocabularies to predict security-related messages in the oracle and CVE datasets. We have marked in bold the highest result for each performance measure for each dataset. Both Models A and B have a high performance across the projects when predicting for the oracle dataset of the project for which they were built. Further, the project-specific models have higher performance than the literature-based models (Ray.vocab [9] and Pletea.vocab [7]) on the project oracle datasets. Model performance is not sustained and is inconsistent when applied to other project's datasets.

To summarize our findings on RQ2, Table 3 presents performance results for the project vocabulary models on the CVE datasets for each project. We have marked in bold the highest result for each performance measure for each dataset. Results for Model.A shows a high recall for Derby and Camel and a worse than average recall for Dolibarr. However, in Model.B, the recall is above 60% for Dolibarr and over 85% for both Derby and Camel. We reason the low precision is due to our approach of labeling only CVE-related messages as security-related and the rest of the messages are labeled to be not security-related. The Dolibarr results are further complicated by the low proportion of security-related messages compared with the other two projects (as reported in 1).

To summarize our findings on RQ3, Table 2 and Table 3 present the classifier performance results for two sets of keywords, Ray.vocab, and Pletea.vocab, drawn from the literature. In each case, the project vocabulary model had the highest recall, precision and F-Score on the project's oracle dataset. With regards to the CVE-dataset, the project vocabulary model has the highest recall. However, the overall performance, as measured by F-Score, varied by dataset, with the Ray and Pletea keywords scoring higher than the project vocabulary model. The low precision for the classifier built on the project's vocabularies follows the explanation provided under RQ2.

Our results suggest that domain vocabulary model show recalls that outperform standard security terms across our datasets. Our conjecture, supported in our data, is that augmenting standard security keywords with a project's security vocabulary yields a more accurate security picture. In future work, we aim to refine vocabulary selection to improve classifier performance, and to define tools implementing the approach in this paper to aid practitioners and researchers in identifying software project security issues.

References

[1]

Jeff Arnold, Tim Abbott, Waseem Daher, Gregory Price, Nelson Elhage, Geoffrey Thomas, and Anders Kaseorg. 2009. Security Impact Ratings Considered Harmful. In HotOS.

Digital Library

Google Scholar

[2]

A. Bachmann and A. Bernstein. 2009. Data Retrieval, Processing and Linking for Software Process Data Analysis. (2009).

Google Scholar

[3]

Brian Chess and Gary McGraw. 2004. Static analysis for security. IEEE Security & Privacy 2, 6 (2004), 76--79.

Digital Library

Google Scholar

[4]

Jane Cleland-Huang, Raffaella Settimi, Xuchang Zou, and Peter Sole. 2006. The Detection and Classification of Non-Functional Requirements with Application to Early Aspects. In Requirements Engineering, 14th IEEE International Conference (2006--09). Minneapolis, Minnesota, 39--48.

Digital Library

Google Scholar

[5]

Anja Guzzi, Alberto Bacchelli, Michele Lanza, Martin Pinzger, and Arie van Deursen. 2013. Communication in Open Source Software Development Mailing Lists. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 277--286. http://dl.acm.org/citation.cfm?id=2487085.2487139

Digital Library

Google Scholar

[6]

Abram Hindle, Neil A. Ernst, Michael W. Godfrey, and John Mylopoulos. 2013. Automated Topic Naming. Empirical Softw. Engg. 18, 6 (Dec. 2013), 1125--1155.

Digital Library

Google Scholar

[7]

Daniel Pletea, Bogdan Vasilescu, and Alexander Serebrenik. 2014. Security and emotion: sentiment analysis of security discussions on GitHub. In Proceedings of the 11th working conference on mining software repositories. ACM, 348--351.

Digital Library

Google Scholar

[8]

Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, and Premkumar Devanbu. 2016. On the "Naturalness" of Buggy Code. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 428--439.

Digital Library

Google Scholar

[9]

Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 155--165.

Digital Library

Google Scholar

[10]

Riccardo Scandariato, James Waiden, Aram Hovsepyan, and Wouter Joosen. 2014. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering 40, 10 (2014), 993--1006.

Crossref

Google Scholar

[11]

Gerald Schermann, Martin Brandtner, Sebastiano Panichella, Philipp Leitner, and Harald Gall. 2015. Discovering Loners and Phantoms in Commit and Issue Data. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension (ICPC '15). IEEE Press, Piscataway, NJ, USA, 4--14. http://dl.acm.org/citation.cfm?id=2820282.2820287

Digital Library

Google Scholar

Cited By

View all

Pan SZhou JCogo FXia XBao LHu XLi SHassan ARoychoudhury ACadar CKim M(2022)Automated unearthing of dangerous issue reportsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549156(834-846)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549156
Braz LFregnan EÇalikli GBacchelli A(2021)Why Don't Developers Detect Improper Input Validation?'Proceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00054(499-511)Online publication date: 22-May-2021
https://dl.acm.org/doi/10.1109/ICSE43902.2021.00054
Wang TQin SChow K(2021)Towards Vulnerability Types Classification Using Pure Self-Attention: A Common Weakness Enumeration Based Approach2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)10.1109/CSE53436.2021.00030(146-153)Online publication date: Oct-2021
https://doi.org/10.1109/CSE53436.2021.00030
Show More Cited By

Recommendations

Usability issues in security
SP'12: Proceedings of the 20th international conference on Security Protocols

Usability issues in security have been discussed such that users could use the security tools easier. On contrary we presume another aspect of usability issues in security; an interface which causes a slight disturbance and discomfort so that a user ...
Internet of Things security

The Internet of things (IoT) has recently become an important research topic because it integrates various sensors and objects to communicate directly with one another without human intervention. The requirements for the large-scale deployment of the IoT ...
Software security in DevOps: synthesizing practitioners' perceptions and practices
CSED '16: Proceedings of the International Workshop on Continuous Software Evolution and Delivery

In organizations that use DevOps practices, software changes can be deployed as fast as 500 times or more per day. Without adequate involvement of the security team, rapidly deployed software changes are more likely to contain vulnerabilities due to ...

Comments

Information & Contributors

Information

Published In

ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings

May 2018

231 pages

ISBN:9781450356633

DOI:10.1145/3183440

Conference Chair:
Michel Chaudron
Chalmers University of Technology, University of Gothenburg, Sweden
,
General Chair:
Ivica Crnkovic
Chalmers University of Technology, University of Gothenburg, Sweden
,
Program Chairs:
Marsha Chechik
University of Toronto, Canada
,
Mark Harman
Facebook and University College London, United Kingdom

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

Norges Forskningsråd

Conference

ICSE '18

Sponsor:

SIGSOFT
IEEE-CS

ICSE '18: 40th International Conference on Software Engineering

May 27 - June 3, 2018

Gothenburg, Sweden

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
252
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Pan SZhou JCogo FXia XBao LHu XLi SHassan ARoychoudhury ACadar CKim M(2022)Automated unearthing of dangerous issue reportsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549156(834-846)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549156
Braz LFregnan EÇalikli GBacchelli A(2021)Why Don't Developers Detect Improper Input Validation?'Proceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00054(499-511)Online publication date: 22-May-2021
https://dl.acm.org/doi/10.1109/ICSE43902.2021.00054
Wang TQin SChow K(2021)Towards Vulnerability Types Classification Using Pure Self-Attention: A Common Weakness Enumeration Based Approach2021 IEEE 24th International Conference on Computational Science and Engineering (CSE)10.1109/CSE53436.2021.00030(146-153)Online publication date: Oct-2021
https://doi.org/10.1109/CSE53436.2021.00030
Oyetoyan TMorrison P(2021)An improved text classification modelling approach to identify security messages in heterogeneous projectsSoftware Quality Journal10.1007/s11219-020-09546-729:2(509-553)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1007/s11219-020-09546-7

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Usability issues in security

Internet of Things security

Software security in DevOps: synthesizing practitioners' perceptions and practices