Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Proceedings of Static Analysis Summit II

2008

Proceedings of Static Analysis Summit II Paul E. Black (workshop chair) Elizabeth Fong (editor) Information Technology Laboratory U.S. National Institute of Standards and Technology (NIST) Gaithersburg, MD 20899 Static Analysis Summit II (http://samate.nist.gov/SASII) was held 8 and 9 November 2007 in Fairfax, Virginia,and was co-located with SIGAda 2007. A total of 61 people registered, coming from government, universities, tool vendors and service providers, and research companies. The workshop had a keynote address by Professor William Pugh, paper presentations, discussion sessions, a panel on “Obfuscation Versus Analysis – Who Will Win?”, and a new technology demonstration fair. The workshop is one of a series by NIST’s Software Assurance Measurement and Tool Evaluation (SAMATE) project, which is partially funded by DHS to help identify and enhance software security assurance tools. The Call for Papers pointed out that "Black-box" testing cannot realistically find maliciously implanted Trojan horses or subtle errors with many preconditions. For maximum assurance, static analysis must be applied to all levels of software artifacts, from models to source code to binaries. Static analyzers are quite capable and are developing quickly. Yet, developers, auditors, and examiners could use far more capabilities. The goal of this summit is to convene researchers, developers, and government and industrial users to define obstacles to such urgently-needed capabilities and try to identify feasible approaches to overcome them, either engineering or research. The Call for Papers solicited contributions describing basic research, applications, experience, or proposals relevant to static analysis tools, techniques, and their evaluation. These proceedings include the agenda, some discussion notes, and reviewed papers. We are especially grateful to Prof. William Pugh for the enlightening keynote address. I thank those who worked to organize this workshop, particularly Wendy Havens, who handled much of the correspondence. We appreciate the program committee for their efforts reviewing the papers. We are grateful to NIST, especially the Software Diagnostics and Conformance Testing Division, which is in the Information Technology Laboratory, for providing the organizers' time. On behalf of the program committee and the SAMATE team, thanks to everyone for taking their time and resources to join us. Dr. Paul E. Black February 2008 Disclaimer: Any commercial product mentioned is for information only; it does not imply recommendation or endorsement by NIST nor does it imply that the products mentioned are necessarily the best available for the purpose. Static Analysis Summit II Agenda Thursday, 8 November, 2007 12:45 : Static Analysis for Improving Secure Software Development at Motorola - R Krishnan (Motorola), Margaret Nadworny (Motorola), and Nishil Bharill (Motorola) 1:10 : Discussion: most urgently-needed capabilities in static analysis 1:40 : Evaluation of Static Source Code Analyzers for Real-Time Embedded Software Development - Redge Bartholomew (Rockwell Collins) 2:05 : Discussion: greatest obstacles in static analysis 2:50 : Common Weakness Enumeration (CWE) Status Update – Robert Martin (MITRE) and Sean Barnum (Cigital) 3:15 : Discussion: possible approaches to overcome obstacles 3:45 : Panel: Obfuscation vs. Analysis - Who Will Win? – David J. Chaboya (AFRL) and Stacy Prowell (CERT) 4:30 : New Technology Demonstration Fair FindBugs, FX , Static Analysis of x86 executables Friday, 9 November, 2007 8:30 AM: Discussion: Static Analysis at Other Levels 9:00 : Keynote: Judging the Value of Static Analysis - Bill Pugh (UMD) The slides for the keynote address are on-line at http://www.cs.umd.edu/~pugh/JudgingStaticAnalysis.pdf 10:15 : A Practical Approach to Formal Software Verification by Static Analysis - Arnaud Venet (Kestrel Technology) 10:40 : Discussion: inter-tool information sharing 11:10 : Logical Foundation for Static Analysis: Application to Binary Static Analysis for Security - Hassen Saidi (SRI) 11:35 : Wrap up discussion: needs, obstacles, and approaches Discussion and Panel Notes To catalyze discussion, we presented six questions or topics. The discussions were not meant to reach a consensus or express a majority, and seldom did they. Workshop participants presented ideas, questions, recommendations, cautions, and everything in between. Although we try here to note what was said, it is by no means a complete record of what was discussed (none of us wrote fast enough). In some cases we combined similar comments across sessions. We hope these notes convey some feel for the discussions and lead to improvement. 1:10 PM Most Urgently-needed capabilities in Static Analysis Facilitator: Vadim Okun, NIST Be able to analyze Concurrent/race conditions Timing Runtime dispatch, deep hierarchies, highly polymorphic systems Function pointers Integer overflow Numeric computations accurately (comment was: Inaccurate numeric analysis) Inline assembly with multi-languages across multi-boundaries General capabilities needed/capabilities missing Lack of reasoning for reporting Reduce false positives Report the probability that a weakness is exploitable. Tool reports what its coverage, that is, what it looks for Scalability (e.g., 100 million lines) Whole application analysis A standard way to express environment in a standard way so one can state what is known, so the next tool doesn’t have to do the same thing. Need guarantee that if we run this tool, we will not have these exploits. 2:05 PM Greatest Obstacles in static Analysis Facilitator: Paul E. Black, NIST Need scientific surveys or studies that show return on investment of tools. Its likely there is at least one tool for each of the things on the wish list. We need a toolbox. Tools do not know what the requirements are, what the program is supposed to do. To go beyond buffer overflow, which is (almost) always a violation, tools need a specification, like IFIL. Java has JML, and there is Splint for C, but we can’t get people to use annotations! We need an easier way to write annotations. Programmers need skill sets for good code creation; they should be reinforced by tools. The European safety community has used static analysis for years. They made their case. 3:15 PM Possible Approaches to Overcome Obstacles Facilitator: Redge Bartholomew, Rockwell Collins Vendor should provide information about what exactly the tools find. Many people are skeptical about using test sets, particularly fixed sets, to evaluate tools. Tools get tailored to test suites. You can’t include a tool in certification efforts until it is very well qualified. Tests are necessary, but not sufficient. There was a discussion about funding better software, both research and development of techniques and paying well for good quality software. Rod Chapman said, you cannot polish junk. (That is, software must be built well from the beginning. No amount of tools or techniques can “repair” poor software.) He also said, if all software is junk, we might as well buy cheap junk. (That is, if consumers can’t judge the quality of software, it is logical in today’s world to assume the worst of software. Therefore people won’t pay much for software tools or programs.) 3:45 PM Panel: Obfuscation vs. Analysis – Who Will Win? Malware writers use obfuscation to disguise their programs and hide their exploits. Good guys need powerful analysis to crack malware quickly. Good guys also use obfuscation to protect intellectual property, and in military applications, hinder enemies from figuring out weapon systems (remember the Death Star?). They don't want bad guys to crack their techniques. This panel was set up to explore who will win and why. The panelists gave very good presentations, but instead of entertaining controversy, they agreed that analysts ultimately win. 8:30 AM Static Analysis at Other Levels Facilitator: Michael Kass, NIST Here are other static analysis applications or targets, in addition to the “default” source code analysis for bugs: Requirement analysis (lots of resistance from implementers because it is yet another language to learn, but probably will happen) Architectural design review Compiler/Decompilers Code metrics generation, e.g., measuring code Program understanding Reverse engineering (e.g., byte code to UML design) Re-engineering (e.g., re-factoring) Program/property verifier Binary analysis (they are as good as source analyzers if you have the symbol table) The audience suggested more static analysis tools: Source to source transformers Same language translator (e.g., debugger, dissembles, emulators) Threat modeling tools Impact analysis/slicers Model checker (it is not useful unless there is some manual checking) Combining static and dynamic analysis (e.g. static analysis plus testing, static analysis and program verification) 10:40 AM Inter-tool Information Sharing Facilitator: Paul E. Black, NIST The most important requirement for inter-operation of tools is to have common reporting format. Many companies have more than one type of tool, and to facilitate integration among these different tools in a user-friendly environment, it is useful to have one tool’s output become another tool’s input. To promote progress, here are some use cases for information sharing: Generic format to explain “reasoning” of a bug report SA tool -> infeasible paths -> testing SA tool -> no alias in blocks, etc. -> compiler optimization 11:35 AM Wrap Up discussion: Needs, Obstacles, and Approaches Facilitator: Paul E. Black, NIST The biggest need is for people to agree on what content to share and common report formats. One needs to get information from runs into static analysis as the basis of hints, hypotheses, or invariants. We need to identify use cases for information sharing. The biggest challenge today is for tool to explain (to another tool or to human) the following: The complicated path and the “reasoning” or evidence of a bug report. The information provided to an assurance case (e.g., guaranteed no SQL-injection) What areas (either code blocks or types of problems) are NOT analyzed. Such work needs to address the different needs of auditors, assessors and developers. The recommendation is for NIST to conduct a tool exposition. Tool vendors should signup to run their tools with NIST’s selected test source programs. A burning issue is an effective way to get feedback from users about tools. Static Analysis Tools for Security Checking in Code at Motorola R Krishnan, Margaret Nadworny, Nishil Bharill Motorola Software Group <krishnanr@motorola.com; Margaret.Nadworny@motorola.com; Nishil.Bharill@motorola.com> Abstract As part of an overall initiative to improve the security aspects in the software used in Motorola’s products, training and secure coding standards were developed. The goal is to decrease the number of security vulnerabilities introduced during the coding phase of the software development process. This paper describes the creation of the secure coding standards and the efforts to automate as many of the standards as possible. with appropriate tool support and with the engineering community trained on the required tools and process changes. This approach is depicted in Figure-1. Specifically, to instill a security focus in the coding phase, the coding standards are enhanced with security rules, training is required on the basic concepts relating to secure programming, and a static analysis tool is used to automate the identification of any violation of the security rules. Originally, the efforts focused on the Inforce tool from Klocwork, as many Motorola business units already used the tool for quality but without the security flags activated. This paper describes the efforts to evaluate, extend, and create the coverage for the secure coding standards with Klocwork. More recently, an opportunity arose which allowed a team to evaluate other static analysis tools as well. This paper also describes the findings from that evaluation. The coding phase is recognized as a key phase, where vulnerabilities are introduced by the developers into the code which put the system at risk from attack. The vulnerabilities targeted include buffer overflow and format string vulnerabilities. This area was viewed as requiring relative low effort in terms of changing processes but high impact in improving the security of the products. As a result, the coding phase was the first area of security change within the organization. Keywords: static analysis, security, Klocwork Introduction Security is one of the key product quality attributes for products and solutions in telecommunications. A denial of service attack on a telecom network could mean a huge loss in revenue to the operator. With mobile phones used increasingly for a wide range of services beyond telephony from messaging to online shopping, security has become an important aspect of the software on these devices. Factors such as the increased connectivity of devices and the use of open source software increase the security risks. As a result, Motorola has increased the priority and attention to the security related aspects of its products. Motorola Software Group is a software development organization existing in the Corporate Technology Office providing software resources and services to all of the Motorola Businesses. The approach within Motorola Software is to build security into the products throughout the development lifecycle. Changes in software development are institutionalized when they become part of the process, Security Focus in the Coding Phase Recognized security experts from FSC, now Assurent, a subsidiary of TELUS, were engaged to assist with the development of Secure Programming Training to educate the engineers on the need, importance, and details of Secure Programming. In addition, the Assurent staff assisted in the enhancement of the coding standards for C, C++, and Java with security rules. Previously, the coding standards for quality focused on the readability and maintenance aspects of the code. The security rules introduced significant content, addressing what was and what was not recommended from the security perspectives. The content is segregated into rules which are mandatory and guidelines which are recommended and are optional. For the C coding standards, twenty-three rules and twenty-one guidelines were introduced and adopted. The rules include the following aspects: • • • • • • • Buffer Overflows Memory allocation and deallocation Handling of resources such as filenames and directories Use of library functions Overflows in computation Avoiding format string vulnerabilities Input validation • • three reasons: licensing costs, productivity inefficiencies and vendor management. Handling of sensitive data and others. For the C++ coding standards, thirty two rules and three guidelines were introduced and adopted. The rules cover the following aspects: • • • • • • • • • Memory allocation Avoiding C-style strings Initialization Pointer casting Use of vectors instead of arrays Orthogonal security requirements Exceptions Use of STL (Software Template Library) and others. Supporting the Security Rules in Klocwork The following process was used to collaborate with Klocwork to support the Motorola Coding Standards. The security rules in the coding standards were analyzed and the opportunities for automatic detection of violations to the coding standard were identified. Some rules, by the nature of their content, cannot be verified through static analysis. An example of such a rule is: “Resource paths shall be resolved before performing access validation”. This particular rule must be verified through usual inspection practices. This overall analysis of which rules could be automated was a collaborative effort with the Klocwork technical team. For the Java coding standards, sixteen rules and three guidelines were incorporated. The rules include the following aspects: • • • • • • • Use of secure class loaders Object Initialization Securing of packages, classes, methods, variables Handling of sensitive data Random number generation Comparison of classes and others. Secure Programming Training No Code Secure Coding Standards Security Issues? Security Role to Address unresolved KW Security issues and other security concerns Training No Inforce w/ security Yes Inspection No Pro ces s Static analysis tools help in automatically detecting violations of the security rules. Originally, the efforts focused on the Inforce tool from Klocwork as many Motorola Business Units already used the tool for quality but without activated security flags. More information about Klocwork is available on their web site at: www.klocwork.com. No other tools were seriously considered initially. Supporting two different static analysis tools, one for quality and another for security, was not practical for … ol To Figure-1: Instill Change in the Implementation Phase Give Up? No Security Issues? Instill Change Yes Yes Give Up? Yes Document unresolved Issues in release notes Release Figure-2: Use of Klocwork Inforce with Security Flags Test cases were created for each of the rules verifiable through static analysis. Negative testcases create instances of violation of each security rule. The tool is intended to catch and flag these negative errors. These negative test cases check for false negatives. Positive testcases are instances that are in conformance to the rule. The tool is not expected to flag errors on these testcases. These positive testcases check for false positives. The existing checkers available in Klocwork were analyzed. Gaps between the available checkers and the verifiable security rules were identified. Klocwork has a feature where additional checkers can be created without waiting for the next release of the tool. These extensibility checkers were used to address the identified gaps. The new extensibility checkers identified were developed and delivered by Klocwork, in a phased manner for C and Java. The success criteria for these extensibility checkers were to detect the violations in the negative testcases and pass the positive testcases. The C extensibility checkers have not been incorporated into Klocwork’s general releases due to legal intellectual property related issues, which is also why they are not described in further detail. Activities have been initiated to revisit this situation. These extensibility checkers were delivered to Motorola and confirmed. The situation for Java was handled similarly. The checkers were written by Klocwork but were integrated into Release 7.5, as Klocwork determined that the security rules could be identified from published material. For the C++ security rules, Motorola software group engineers in Bangalore were trained by Klocwork to write extensibility checkers. The Bangalore team developed the checkers in-house. Table-1 shows the progress made in the Klocwork Inforce tool with this activity. The first column provides the programming language. The second column provides the total number of security rules including subrules for the corresponding programming language. The third column provides the total number of security rules and subrules which could be automated. The fourth column indicates the number of rules successfully supported in earlier versions of Klocwork, specifically version 6.1 for C and C++ and 7.1XG for Java. This represents the initial results from this activity. The fifth column provides the number of rules successfully supported since Klocwork 7.5. For C, Klocwork 6.1 supported eight rules in Klocwork which was extended to support twenty-two rules in Klocwork 7.5. For C++, Klocwork 6.1 supported two rules which were extended to support nineteen rules in Klocwork 7.5. For Java, Klocwork 7.1XG supported two rules in Klocwork which was extended thirteen rules in Klocwork 7.5. The improvement has been impressive but is by no means complete. Klocwork Benchmarking Activity Buffer overflow is one of the most dangerous coding vulnerabilities in software that continues to be exploited. It has obtained the attention of researchers as well, including the Software Assurance Metrics and Tools Evaluation (SAMATE) [4] project, sponsored by the U.S. Department of Homeland Security (DHS), National Cybersecurity Division and NIST. The required scripts were developed to utilize the test cases/code snippets offered by the MIT Test suite [1] and were passed through Klocwork with all the errors enabled. Overall, five defects were identified in the Klocwork tool itself. Change requests have been submitted to Klocwork and the errors will be addressed in the upcoming 8.0 release of Klocwork. The defects identified include: • • • • • Violation on access to shared memory Function call used as an array index with return value exceeding array bounds Array element value used as index in accessing another array exceeding array bounds Use of function call in strncpy for the value of n, exceeding array bounds Accessing beyond bounds, after assigning the array start address to a pointer variable. SAMATE[4] provides a set of testcases for different languages like C, C++, Java, PHP, etc. A study was performed to understand the coverage of the security rules in the Motorola coding standards in this reference data set. There are 1677 testcases for C and 88 testcases for C++ in the SAMATE reference dataset. These testcases cover aspects such as memory leaks, double free memory errors, input validation, buffer overflow of both stack and heap, null dereference, race condition, variable initialization, command injection, cross-site scripting, format string vulnerabilities and so on. There was considerable overlap between the Motorola test suite and the SAMATE test suite. Thirteen of the security rules in the Motorola Software C coding standard are covered in the SAMATE test set. Four of the security rules in the Motorola Software C++ coding standard are covered in the SAMATE test set. There are thirty-three testcases for Java in the SAMATE reference dataset. These testcases cover aspects such as tainted input, arbitrary file access, tainted output, cross-site scripting, memory resource leaks, and return of private data address from public methods. There are no overlaps with the security rules in the Motorola Software Java coding standard. This is summarized in Table-2. One of the major shortcomings identified with Klocwork was its inability to identify the use of uninitialized array variables. The Klocwork team has analyzed and identified particular aspects of this general problem: Language Number of Security Rules Number of Automated Rules Support in Klocwork 6.1 Support in Klocwork 7.5 C 39 25 8 22 C++ 34 25 2 19 Java 16 5 9 13 Table-1: Security Rules support in Klocwork • Uninitialized use of array elements of simple variable type • Uninitialized use of array elements of complex variable type such as arrays of structures or pointers • Uninitialized use of global arrays • Partial initialization determination: being able to identify that some elements are initialized and some are not • Interprocedural initialization with initialization occurring in a different function. Factors like complex data types, global array variables, partial initialization, and the need for interprocedural analysis make detection of uninitialized use of array elements technically difficult for static analysis tools. Klocwork promises to provide a phased solution over the next couple of releases. Language Number of SAMATE Tests Number of Motorola Rules Covered C 1677 13 C++ 88 4 Java 33 0 Table-2: SAMATE Testcases and Motorola Coding Standard Rules. The inability to identify uninitialized array elements was the root cause for issues identified in field testing of some of Motorola products. The initial response from Klocwork, after reporting this problem, triggered the assessment of other popular security static analysis tools with Klocwork. As the intention of this paper is to create improvement in all security static analysis tools, the names of the other tools will be referenced here as Tool X, Tool Y, and Tool Z. Table-3 shows the comparison of these other tools with the Motorola developed testcases as the basis for comparison. The first column represents the programming language evaluated. The second column provides the total number of security rules including subrules. The third column provides the Klocwork results for release 7.5. Language Number of Security Rules Kloc work 7.5 Tool X Tool Y Tool Z C 39 22 7 7 5 C++ 34 19 0 1 1 Java 16 13 2 1 0 Table-3: Support for Motorola Security Rules in Static Analysis Tools. The last three columns show the results from three well known security static analysis tools. Detailed results for this benchmarking activity can be found in Appendix A for C, Appendix B for Java, and Appendix C for C++. Because none of the positive test cases detected any false positives in this activity, this paper does not elaborate further. Observations: • Klocwork is significantly better supporting the security rules in the Motorola Coding standards due to the collaboration. • Our partnership with Klocwork has been a major factor in the support to these security rules in their tool suite. • Support for detecting uninitialized use of array elements is weak in the major static analysis tools for security. Tool X and Klocwork could handle detection of uninitialized use of array elements of simple data types. These tools, however, suggest that if a single element of the array is initialized, then the entire array is considered to be initialized. Obviously, there is room for improvement. • None of the tools address detection of complex, global, or interprocedural uninitialized array variables. • All the tools detected basic buffer overflow and format string vulnerabilities Please note that by combining the information in table 1 and table 3, Klocwork identified more of the security rules than the other tools even prior to Motorola’s engagement with them. The significance of this activity is that one must be aware of the relevant security rules applying to their domain before engaging any static analysis tool. This engagement is not sufficient with simple tool usage but must be significantly extended for product security. Most of the static analysis tools support extension capability, and the tools X, Y, Z also support extensions. Since Klocwork fared better in comparison with the other tools, the extension capability of the other tools was not studied in depth. Opportunities In Motorola’s experience, use of static analysis tools has helped identify and correct a significant number of vulnerabilities in the coding phase. However, beyond the coverage of test cases indicated in this paper, there remain some opportunities for improvement for static analysis tools. The first opportunity is the considerable analysis required to prune the outputs of false positives. While all of the tools allow some means to minimize the effect, more effort is required. Secondly, the implementation of the checkers is typically example driven. As a result, the checker implementation can be only as complete as the set of examples. This creates the potential for false negatives. Finally, even though a relatively large range of memory related errors including memory leaks are reported by static analysis tools, there is still a need to run dynamic analysis tools for things like memory leak detection [3]. It would be a great benefit, if there could be improved techniques for memory leak detection and other memory related errors in static analysis tools. This type of capability could save a lot of time, effort and cost for software development organizations. Even the creation of an exhaustive test suite of memory related errors with a comparison of the popular static and dynamic analysis tools ability to detect all the different types of memory errors would be a big step forward. In one open source code implementation of the https protocol, three high severity errors were identified in the original code and 17 high severity errors were identified in internally modified code. These errors related to security were detected, by running the code through Klocwork Inforce tool with the security options enabled. This example demonstrates the need for usage of such a tool for third party software as well as for internally developed software. for the organization to follow. These coding standards can be reinforced through training and inspection. However, to optimize the return from these coding standards, they should be automated where possible. Our experience demonstrates that the commercial static analysis tools are lacking in a number of important security areas. It is absolutely necessary for people to own the security requirements for their static analysis tools and work with the vendor to enhance the capabilities. The tools lag the known concepts behind secure practices. Finally, one has to combine automated methods with manual methods such as inspection to capture as many of the errors as possible. Conclusion In this paper, the Motorola experience and approach in bringing a security focus to the coding phase has been shared, especially the use of static analysis tools for security. External experts in the security field were engaged for training and process enrichment. In particular, the coding standards were enhanced with security rules. After implementing the security enhanced coding standards, supporting these new standards in a static analysis tool became a major focus area. A majority of project teams in Motorola were already using the Klocwork tools for quality, which was a major factor in our use of this particular tool. However, a good percentage of these security rules were not detected by the tool. The vendor agreed to work with Motorola to improve the detection of violations to the security rules in the code and the related work has been described in this paper. The results from a couple of benchmarking exercises are also presented. A test suite published from MIT on buffer overflow, was used to identify and close the identified gaps in the Klocwork Inforce tool. In another study reported in this paper, popular static analysis tools were evaluated based on their support for the security rules in the Motorola coding standards. Based on this experience with static analysis tools, several opportunities for improvement in use of this technology were identified. Recommendation The paper thus far may read as a white paper for Klocwork. The value of this paper is in the approach used to make the software developed within our organizations better from both a quality and security perspective. First of all, it is important for an organization to take responsibility for the security of its software instead of relying on external security mechanisms. Secondly, in response to a significant number of security vulnerabilities in the coding phase, it is highly recommended to identify coding standards References [1] Kratkiewicz, K. J. (May, 2005). Diagnostic Test Suite for Evaluating Buffer Overflow Detection Tools – the companion test suite for "Evaluating Static Analysis Tools for Detecting Buffer Overflows in C Code. Retrieved September 11, 2007 from http://www.ll.mit.edu/IST/pubs/KratkiewiczThesis.pdf [2] Howard, M., and LeBlanc, D. (2003). Writing Secure Code. Redmond, Washington: Microsoft Press. [3] Wikipedia (N.D.) Definition of Memory Leak . Retrieved September 11, 2007 from <http://en.wikipedia.org/wiki/Memory_leak> [4] Software Diagnotics and Conformance Testing Division. (July 2005) SAMATE- Software Assurance Metrics and Tool Evaluation. Retrieved September 11, 2007 from http://samate.nist.gov/index.php/Main_Page Appendix A C Secure Coding Standard Test Case KW 6.1 KW 7.5 X Memory Allocation Check 1 X X X “Large” Flagged 2 X EC Arrays 4 X X EC EC Safe directories for file access Check permissions prior to access filenames Z EC Access control for sensitive variables Special characters in filenames or variables Y 18 EC X EC User rights checked for file access Null termination of string buffers 12 X Check return codes of library functions 5 EC %n substitution 15 EC Reference uninitialized variable 6 X X Unsafe library functions 7 X X X X Check variable lengths in unsafe functions 19 X X X X X X X X Integer Overflow For addition and multiplication, result should not exceed operands 9 Expressions as function call parameters 20 Buffer Overflow 10 X X X Element references within array bounds 11 X X X EC X X X indicates the coding standard is covered in the native code. EC indicates that the coding standard is covered partially or completely by an extensibility checker. C Secure Coding Standard, continued Test Case KW 6.1 KW 7.5 String manipulation arrays are null terminated 12 X Externally provided strings not to be used in format string 13 EC %s for printf 14 X X X X Y X X X X indicates the coding standard is covered in the native code. EC indicates that the coding standard is covered partially or completely by an extensibility checker. Appendix B Java Secure Coding Standard Test Case KW 6.1 KW 7.5 Y Secure Class Loader Z C++ Secure Coding Standard Test Case Object Memory Allocation Check 1 EC I/O Streams in C-style strings 2 X C-style strings 3 EC Conversion to C-style strings 4 EC Throw/ “New” Operator 5 EC Initialize types 6 X Array Initialization 7 X Deleting with void pointers and objects w/children 9 primitive Non-primitive manipulation X Restrictive Security Policy 11 Pointer casting C-style casting static _cast X X Private classes, methods, variables 4 X X Finalized classes and methods 5 X X Class Cloning 7 X X Serialization 8 X Undeserializeable Classes 9 X Static ables Vari- 10 Inner Classes and sensitive data 11 Arrays and Strings with sensitive data 13 Random Generator Number 14 X X Class Comparison by Name 15 X X X and KW 6.1 KW 7.5 X X EC EC 10 X 3 array Object Slicing` X Initialization of Objects Field X Appendix C Z EC 12a,b Delete[] for array 8 Array use vectors 14 EC Safe accessor methods 15 EC Virtual destructor of base, polymorphic classes 17 Auto pointers 31 Public method checks arguments 21 X Static member variables 22 EC X Pointers to temporary objects 23 EC Exception handling 24 EC Exceptions throw objects and not pointers 26 EC Catch exceptions 27 EC Unhandled exceptions 28 EC X X X EC X Y Z X Evaluation of Static Source Code Analyzers for Avionics Software Development Redge Bartholomew Rockwell Collins rgbartho@rockwellcollins.com Abstract This paper describes an evaluation of static source code analyzers. The purpose of the evaluation was to determine their adequacy for use in developing realtime embedded software for aviation electronics where the use of development tools and methods is controlled by a federal regulatory agency. It describes the motivation for the evaluation, results, and conclusions. 1. Introduction Business issues motivate avionics developers to accelerate software development using whatever tools and methods will reduce cycle time. At the same time the FAA requires that all software development tools and methods comply with RTCA DO-178B, which requires disciplined and rigorous processes that can also be time consuming and expensive. Source code reviews and structural coverage testing, for example, are both required, and both typically involve considerable manual effort. An obvious solution within the faster-cheaper-better spiral of continuous development improvement is automation: perform the required analysis, testing, and documentation using tools that replace inconsistent and expensive human actions with consistent and (comparatively) cheap machine actions. A static source code analyzer is an example. Potentially, it could replace manual source code reviews, some of the structural coverage testing, enforce compliance with a project coding standard, and produce some of the required documentation. In addition, by eliminating a large number of latent errors earlier in the development cycle, it could significantly reduce down stream activities like unit, integration, and system test. The number of errors a static analyzer has found in open source software provides anecdotal support for this last possibility [8]. However, the use of a static analyzer for avionics development encounters an issue that standard desktop development typically does not. The development process, tools, and qualification plans as described in the Plan for Software Aspects of Certification (RTCA DO-178B, paragraph 11.1), or its referenced plans, must be approved by the FAA’s Designated Engineering Representative. To be cost effective, static analysis tools might have to be accurate enough and cover a broad enough spectrum of errors that the FAA would allow their use to replace manual analysis: if full manual analysis is still required, the amount of effort its use eliminates may not be large enough to justify its acquisition and usage costs. In addition, some appeared to scale poorly, some appeared to have high false positive rates, some seemed to have ambiguous error annunciation markers, and some appeared to integrate poorly into common development environments. These issues could significantly reduce any benefit resulting from use. This paper describes an internal evaluation of static source code analyzers. It had 3 objectives: to determine if static source code analyzers have become cost effective for avionics software development; to determine if they can be qualified to reduce software developers verification work-load; and to determine conditions under which avionics software developers might use them. There was no effort to determine down-select candidates or a source-selection candidate, nor was there any effort to achieve statistically significant results. It provided input to a decision gate in advance of a proposed pilot project. In addition, it relied on software subject to publication restriction and on information subject to proprietary information exchange agreements. As a result, neither product names nor vendor names are identified. Information that could imply product or vendor has been withheld. 2. Background and Scope 2.1 Effectiveness One concern over the use of a static analyzer was whether it could reduce down-stream development costs by reducing rework: whether tools can detect kinds of errors and quantities of errors that manual code reviews typically do not, that typically escape into downstream development and maintenance phases. Another concern was whether they could reduce the cost of compliance with verification standards by automating manual source code reviews, or parts of them. If a static analyzer could be qualified against specific classes of errors (e.g., stack usage, arithmetic overflow, resource contention, and so on) and shown accurate above some minimum threshold, then it might be possible to eliminate those error classes from the manual reviews. Instead, an artifact could be submitted demonstrating that the check for those error classes was performed automatically by a qualified tool. Error classes not included in the tool qualification would still be subject to manual review. A final concern was whether the use of an unqualified static analyzer as a supplement to a manual review would increase the number of downstream errors. If a static analyzer is effective against some kinds of errors but not others – e.g., catches pointer errors but not arithmetic overflows – this could be the case. Code reviewers could assume static analyzers are equally effective against all classes of errors, and minimize the effort they put into the manual analysis. 2.2 Scope Time and resources did not allow for a determination of average number of errors detected by manual source code review versus those detected by a static analyzer. There is enough anecdotal evidence of errors escaping manual review into the test phase to suggest that a static analyzer will typically detect quantities of errors within a targeted error category that manual review will not. A comparison of effectiveness between manual and automated analysis limited to the error categories within which a given tool was designed to operate is a logical next step. Currently, vendors advertise software security and software assurance capabilities. Security in the static analysis context appears to signify resistance to cyber attack and focuses on detection and elimination of errors that in the past have frequently been exploited to deny or corrupt service (e.g., buffer overflow). Assurance appears to signify the detection of the broad band of unintended latent errors whose detection in the real-time embedded context is the subject of source code walkthroughs and structural analysis testing. There appeared to be no error set that could distinguish assurance from security, nor did there appear to be separate static analysis products exclusive to assurance versus security. The focus of this evaluation was software assurance. . 3. Method The evaluation included 18 different static source code analyzers. Because of resource constraints only 6 were evaluated in house. There are, however, published comparative evaluations, and these were used as a supplement. The criteria chosen for the internal evaluations are common with most of the published evaluations. To account for differences in standards across the different reviewers, results were normalized. Tools evaluated by more than a single source provided scale calibration points across the external evaluation sources [1-5] using the internal evaluations as the benchmark. Some vendors do not provide evaluation licenses. As a consequence, some of the in-house evaluations were the result of vendor-performed demonstrations. Some were the result of web-based demonstrations hosted on the vendor’s web site. In these cases vendors provided some of the performance data, but some of it resulted from interpolating available results (e.g., if the tool provided buffer overflow detection for a standard data type, then with confirmation from the vendor, it was assumed the tool provided buffer overflow detection for all standard data types). 4. Evaluation Criteria and Methodology Based on input from developers, engineering managers, and program managers, the criteria used for the evaluation were analysis accuracy, remediation advice, false positive suppression, rule extension, user interface, and ease of integration with an Integrated Development Environment (IDE - e.g., Eclipse). Analysis accuracy consisted of the detection rate for true positive (correctly detected errors), true negative (correctly detected absence of errors), false positive (incorrectly detected errors), and false negative (incorrectly detected absence of errors). Remediation advice is the information the tool provides when detecting an error – information that allows the developer to better understand the error and better understand how to eliminate it. False positive suppression is a measure of how easy it is to suppress redundant error, warning, and information messages or the extent to which a tool allows suppression. Rule extension is a measure of the extent to which the tool allows for the addition of error or conformance checks and how easily this is done. The user interface is a measure of how easy it was to learn to use the tool and how easy it was to perform common tasks. Price was not included, primarily because vendor prices vary greatly depending on quantity, other purchased products, and marketing strategy. The evaluation criteria were weighted. Potential users and their management felt analysis accuracy was the most important tool attribute so its weight was arbitrarily set at twice that of both remediation advice and false-positive suppression, and 3 times that of IDE integration and user interface. Based on developer input, the importance of rule extensibility was arbitrarily set at half that of IDE integration and user interface. Internal evaluations used a 3 point scale, where 3 was best and 0 indicated the absence of a capability. The published comparative evaluations each used a different scoring mechanism. Their results were normalized. To the extent vendors provided evaluation licenses, analysis accuracy was measured using a small subset of the NIST SAMATE Reference Dataset (SRD) test cases [10]. Additional test cases were created to fill gaps (e.g., tests for infinite loops). Evaluations were limited to 6 general error categories: looping, numeric, pointer, range, resource management (e.g., initialization), and type. In all, 90 test cases were used. Both the downloaded test cases as well as those written for this evaluation were written in C to run on a Wintel platform against the MinGW 3.1.0 gcc compiler. Some vendors provided in-house demonstrations but were reluctant to analyze small code segments, preferring to demonstrate effectiveness against large systems or subsystems (i.e., > 500KSLOC). In those cases the evaluation team interviewed technical staff from the tool supplier to determine capability against specific error classes. The group at MIT and MIT’s Lincoln Lab evaluated 5 tools against a set of test cases that were subsequently submitted to the NIST for inclusion in the SRD. The DRDC evaluated tools against test cases that also were submitted to the NIST for inclusion in the SRD. In the case of the other published evaluations, the basis for the accuracy evaluation is unknown. structural coverage testing; but if used prudently, some can reduce the cost of implementation (code, test). In general all evaluated tools displayed significant deficiencies in detecting source code errors against some of the error categories. Determining false negative thresholds of acceptability against the different error categories and then determining each tool’s areas of acceptable strength and unacceptable weakness is a logical next step, but was outside the scope of this effort. On the basis of performance against the criteria, tools fell into two tiers. Only two of the evaluated tools had good scores for analysis accuracy, user interface, remediation advice, and false positive suppression. In both cases rule extension/addition required separate products. One performed poorly against arithmetic, type transformation, and loop errors. Both scaled well from very small segments of code to very large systems. All things considered (e.g., installation, learning curve) both are most effectively used for system or subsystem error checking within the context of a daily/nightly automatic build process, as opposed to evaluating small daily code increments in isolation. Both tightly coupled error detection with change tracking. The change tracking feature could have a significant near-term impact on productivity (nontrivial learning curve) if integrated into an existing formal development process. In the second tier, several had scores that ranged from very good to poor for analysis accuracy, remediation advice, rule extension, and false positive suppression. Several in this tier had good scores for error detection but poor scores for false positive rate and a cumbersome false positive suppression capability. Many in this tier did not scale well (up or down) – e.g., some of the evaluated versions crashed while analyzing large systems. Some had adequate accuracy and remediation advice once the large number of false positive messages was suppressed. Error analysis coverage is narrow compared to the two first tier tools – e.g., will not detect such C errors as: char a[15]; strcpy(a, "0123456789abcdef"); or int i = 2147483647; i = i * 65536; 5. Results or The results of the evaluation are these: For C and C++, some static analyzers are cost effective; none of those evaluated could be qualified as a replacement for manual activities like source code reviews (and may even be detrimental as a supplement to them) or for int i = 0; while (i < 10) { i = i - 1; } Nearly all vendors of error detection tools also provide conformance checking capabilities, usually via separate licensing. Those that provided licenses for this evaluation also provided rule-set extension capabilities. Although conformance checking comparisons were not part of this evaluation, in a brief review, most had adequate capability. Some had significant advantages over others - e.g., out-of-thebox rule set (MISRA C rule set already installed) and ease of extension and modification. Finally, static analyzers that perform interprocedural analysis provide capability not addressed by manual code reviews, which typically only address individual code units (e.g. single compilation units). The ability to detect errors resulting from the impact of cascading function and procedure calls is not realistically available to manual analysis but clearly advantageous, identifying errors that are usually detected during system test or operational test. It was convenient to run this kind of static analyzer as a part of the automatic system build. 6. Conclusions Static analysis can cost-effectively reduce rework (detecting defects before they escape into downstream development phases or into delivered products) but currently cannot replace manual source code reviews. In general, they need better error detection accuracy and broader coverage across error classes. 6.1 Cost Effective Reduction of Rework Some static analyzers – those with broad coverage and high accuracy – are simple enough to use and are accurate enough that downstream cost avoidance exceeds cost of use (license cost, cost of false positive resolution and suppression, etc.). Tools in this category detect some source code errors faster and more effectively than manual reviews. Development teams can use them informally, on a daily/nightly basis, throughout the implementation cycle (code and development test) when integrated into an automated build process, reducing cost by reducing the quantity of errors that escape into development testing, and by reducing the number of iterations through each test phase (e.g., unit, integration, functional, performance). Static analyzers with more limited coverage and lower accuracy, internal demonstration of cost effectiveness is difficult. No one static analyzer was effective against very many of the error classes identified by the Common Weakness Enumeration [7]. In addition, within some error classes where detection capability existed, many demonstrated a high false negative rate against the limited number of test cases. Many with a low false negative rate had a high false positive rate. Distinguishing between true and false detections and suppressing the false positives was a significant effort. It was not clear that the less than optimal reduction in debugging and rework was enough to offset the increase in effort from falsepositive analysis and suppression. 6.2 Reducing Formal Compliance Cost – Automated vs. Manual Analysis Of the 18 static analysis tools evaluated, none was designed to detect all the kinds of errors manual analysis detects. Therefore, none could replace manual analysis in a development environment regulated by an industry standard like DO-178B. Automation of a manual process like the code review would require qualification of the static analyzer: documented demonstration of compliance with requirements within the target operating environment to confirm the tool is at least as effective as the manual process it would replace (e.g., RTCA DO-178B, paragraph 12.2.2). Currently, this would be difficult given the evolving status of the existing government and industry resources, and the absence of performance requirements. If an industry standard defined error categories (e.g., Common Weakness Enumeration), defined tests by which static analyzers could be evaluated against those categories (e.g., SAMATE Reference Dataset), and defined performance requirements against those tests (e.g., less than 1% false negative rate), compliant static analyzers might be able to eliminate manual review within targeted categories. Manual reviews would still be required, but detection of qualified error categories could be eliminated from them. It is unclear that the size of the current safetycritical market is large enough to motivate the tool developer’s investment in qualification. Over time, however, resolving the cost of tool qualification could follow the same path as code coverage tools, where the tool developer now sells the deliverable qualification package or sells a qualification kit from which the user produces the qualification package. It is also possible, if individual tools achieve broad enough error coverage and high enough accuracy, that a user (or possibly a user consortium) may be motivated to qualify it. Qualification by a government-authorized lab could also become cost effective for either the tool developer or the tool user. 6.3 Potential for Increasing Rework Cost There is resistance to using existing static analyzers as a supplement to manual source code reviews [6]. The National Academy of Sciences established the Committee on Certifiably Dependable Software Systems to determine the current state of certification in the dependable systems domain, with the goal of recommending areas for improvement. Its interim report contained a caution against tools that automated software development in the safety-critical context: “…processes such as those recommended in DO178B have collateral impact, because even though they fail to address many important aspects of a critical development, they force attention to detail and self-reflection on the part of engineers, which results in the discovery and elimination of flaws beyond the purview of the process itself. Increasing automation and over-reliance on tools may actually jeopardize such collateral impact.”[6] Given that all evaluated tools exhibited major failures against some error categories, reducing the effort that goes into the manual review would lead to an increase in errors found during the test phase and an increase in errors found in delivered products. For that reason, many of the FAA’s Designated Engineering Representatives are reluctant to approve the use of a static analyzer even as a supplement to manual analysis. Until tools exhibit broader coverage and greater accuracy, their use for any aspect of the formal source code review process (RTCA DO-178B, paragraph 6.3.4.f) is probably premature. 6.4 Automating Conformance Checking If conformance checking is the primary concern, and error detection a secondary issue, any of the evaluated tools would perform adequately without additional functional or performance capability. Some are easier to use out of the box than others, but all significantly reduce the effort of achieving conformance with a coding standard. 6.5 Automating Non-Critical Error Detection In environments where cost is the driving factor and there are no tool qualification issues (e.g., there is no false negative rate requirement), any of the low-end tools could be used without additional functional or performance capability. They can be simple to acquire and simple to install, with an intuitive interface. If it is open-source or inexpensive (e.g., less than $500 acquisition fee per user with a 20% per year recurring fee) and easy to use, demonstrating that it catches some of the common implementation errors that typically escape into the integration and test phases (e.g., uninitialized stack variable, buffer overflow) may be enough to justify usage cost (false positive suppression, learning curve, modification of standard development process, etc.). In a development environment where there is no previous experience with static analysis, using a lowend tool to demonstrate the cost effectiveness of the technology could be a means of subsequently justifying the upgrade to a more expensive and more capable high-end tool. 7. References [1] Zitser, Lippman, Leek, “Testing Static Analysis Tools Using Exploitable Buffer Overflows From Open Source Code”, ACM Foundations of Software Engineering 12, 2004, available at http://www.ll.mit.edu/IST/pubs/04_TestingStatic_Zitser.pdf [2] Kratkiewicz, Lippmann, “A Taxonomy of Buffer Overflows for Evaluating Static and Dynamic Software Testing Tools”, Proceedings of Workshop on Software Security Assurance Tools, Techniques, and Metrics, National Institute of Standards and Technology, February 2006, pp. 44-51 [3] Michaud, et al, “Verification Tools for Software Security Bugs”, Proceedings of the Static Analysis Summit, National Institute of Standards and Technology, July 2006, available at http://samate.nist.gov/docs/ [4] Newsham, Chess, “ABM: A Prototype for Benchmarking Source Code Analyzers”, Proceedings of Workshop on Software Security Assurance Tools, Techniques, and Metrics, National Institute of Standards and Technology, February 2006, pp. 52-59 [5] Forristal, “Review: Source-Code Assessment Tools Kill Bugs Dead”, Secure Enterprise, December 1, 2005, http://www.ouncelabs.com/secure_enterprise.html [6] Committee on Certifiably Dependable Software Systems, Software Certification and Dependability, The National Academies Press, 2004, pp. 11-12 [7] Common Weakness Enumeration, http://cve.mitre.org/cwe/index.html#graphical [8] Chelf, Measuring Software Quality: A Study Of Open Source Software, posted March 2006 at http://www.coverity.com/library/pdf/open_source_quality_re port.pdf [9] Software Considerations in Airborne Systems and Equipment Certification RTCA DO-178B, December 1, 1992 [10] SAMATE Reference Dataset, National Institute of Standards and Technology, http://samate.nist.gov/SRD/ Common Weakness Enumeration (CWE) Status Update Robert A. Martin Sean Barnum MITRE Corporation 202 Burlington Road Bedford, MA 01730 1-781-271-3001 Cigital, Inc. 21351 Ridgetop Circle, Suite 400 Sterling, VA 20166 1-703-404-5762 ramartin@mitre.org sbarnum@cigital.com ABSTRACT This paper is a status update on the Common Weakness Enumeration (CWE) initiative [1], one of the efforts focused on improving the utility and effectiveness of code-based security assessment technology. As hoped, the CWE initiative has helped to dramatically accelerate the use of tool-based assurance arguments in reviewing software systems for security issues and invigorated the investigation of code implementation, design, and architecture issues with automation. 1. INTRODUCTION As the threat from attacks against organizations shifts from the network, operating system, and large institutional applications to individual applications of all types, the need for assurance that each of the software products we acquire or develop are free of known types of security weaknesses has increased. High quality tools and services for finding security weaknesses in code are maturing but still address only a portion of the suspect areas. The question of which tool/service is appropriate/better for a particular job is hard to answer given the lack of structure and definition in the software product assessment industry. As reported last year [2], there are several ongoing efforts working to resolve some of these shortcomings, including the Department of Homeland Security (DHS) National Cyber Security Division (NCSD) sponsored Software Assurance Metrics and Tool Evaluation (SAMATE) project [3] being led by the National Institute of Standards and Technology (NIST) and the Object Management Group (OMG) Software Assurance (SwA) Special Interest Group (SIG) [4]. Since that time, there has been related work started by the Other Working Group on Vulnerabilities (OWG-V) within the ISO/IEC Joint Technical Committee on Information Technology (JTC1) SubCommittee on Programming Languages (SC22) [5] as well as the new efforts at the SANS Institute to develop a national Secure Programming Skills Assessment (SPSA) examination [6] to help identify programmers knowledgeable in avoiding and correcting common software programming weaknesses, among others. While all of these efforts continue to proceed within their stated goals and envisioned contributions, they all depend on the existence of common description of the underlying security weaknesses that can lead to exploitable vulnerabilities in software. Without such a common description, these efforts cannot move forward in a meaningful fashion or be aligned and integrated with each other to provide strategic value. As stated last year, MITRE, with support from Cigital, Inc., is leading a large community of partners from industry, academia, and government to develop, review, use, and support a common weaknesses dictionary/encyclopedia that can be used by those looking for weaknesses in code, design, or architecture, those trying to develop secure application, as well as those teaching and training software developers about the code, design, or architecture weaknesses that they should avoid due to the security problems they can have on applications, systems, and networks. This paper will outline the various accomplishments, avenues of investigation, and new activities being pursued within the CWE initiative. 2. COMMUNITY Over the last year 6 additional organizations have agreed to contribute their intellectual property to the CWE initiative. Done under Non-Disclosure Agreements with MITRE which allow the merged collection of their individual contributions to be publicly shared in the CWE List, AppSIC, Grammatech, Palamida, Security Innovation, SofCheck, and SureLogic have joined the other 13 organizations that have formally agreed to contribute. In addition to these sources, the CWE Community [7], numbering 46 organizations, is now also able to leverage the work, ideas, and contributions of researchers at Apple, Aspect Security, Booz Allen Hamilton, CERIAS/Purdue University, Codescan Labs, James Madison University, McAfee/Foundstone, Object Management Group, PolySpace Technologies, SANS Institute, and Semantic Designs, as well as any other interested parties that wish to come forward and contribute. Over the next year we anticipate the formation of a formal CWE Editorial Board to help manage the evolution of the CWE content. 3. UPDATES There were four drafts of CWE posted over the last year. With Drafts 4 and 5, CWE reached 550 and 599 items respectively. Draft 4 saw the introduction of the CWE ID field and Draft 5 included the introduction of predictable addresses for each CWE based on the CWE ID. During this timeframe the CWE web site expanded to include a “News” section, an “Upcoming Events” section, and a “Status Report” section. Draft 5 included additional details on node relations and alternate terms. With Draft 5 the CWE List was provided in several formats on the web site. Eventually this will be expanded upon to provide style-sheet driven views of the same underlying CWE XML content. Draft 6 of CWE included a new category called “Deprecated” to allow duplicate CWEs to be removed by reassigning them to that category, which has a CWE ID but like the items it will hold, it is not part of totals for CWE. So there are 627 CWE IDs assigned with Draft 6, but two of the older CWEs have been moved to the new deprecated category, which also doesn’t count in the totals for CWE so there are 624 unique weakness concepts, including structuring concepts, in CWE Draft 6. The first formal draft of a schema for the core information that each CWE will have was finalized with Draft 6, covering the five groupings of information about each CWE, including “Identification”, “Descriptive”, “Scoping & Delimiting”, “Prescriptive”, and “Enhancing” types of information. Draft 7 of CWE represents the first recipient of material from the CWE Scrub, described in section 5 of this paper. The main size type changes to CWE included the insertion of 7 new nodes to support grouping portions of CWE into additional Views and one CWE was deprecated. Further details of the changes in Draft 7 are included in the Scrubbing section of this paper. 4. VULNERABILITY THEORY In parallel with the CWE content creation and as part of the Scrub activities, there has been considerable progress in documenting thoughts about the mechanics of vulnerabilities and how weaknesses, attacks, and environmental conditions combine to create exploitable vulnerabilities in software systems. Evolving the initial work on this topic, covered in the Preliminary List of Vulnerability Examples for Researchers (PLOVER) effort [8] in 2005, it now includes the results of working with great variety of issues covered in CWE. In July 2007, the “Introduction to Vulnerability Theory” [9] was published along with a companion document “Structured CWE Descriptions” [9]. The latter, using termi