CompCert: Practical Experience on Integrating and Qualifying a Formally Verified Optimizing Compiler

Sandrine Blazy

CompCert: Practical Experience on Integrating and Qualifying a Formally Verified Optimizing Compiler

2018

CompCert is the first commercially available optimizing compiler that is formally verified, using machine-assisted mathematical proofs, to be exempt from mis-compilation. The executable code it produces is proved to behave exactly as specified by the semantics of the source C program. This article gives an overview of the use of CompCert to gain certification credits for a highly safety-critical industry application, certified according to IEC 60880. We will briefly introduce the target application, illustrate the process of changing the existing compiler infrastructure to CompCert, and discuss performance characteristics. The main part focuses on the tool qualification strategy, in particular on how to take advantage of the formal correctness proof in the certification process....Read more

CompCert: Practical Experience on Integrating and Qualifying a Formally Veriﬁed Optimizing Compiler Daniel K¨ astner 3 , J¨ org Barrho 4 , Ulrich W ¨ unsche 4 , Marc Schlickling 4 , Bernhard Schommer 5 , Michael Schmidt 3 , Christian Ferdinand 3 , Xavier Leroy 1 , Sandrine Blazy 2 , 1: Inria Paris, 2 rue Simone Iff, 75589 Paris, France 2: University of Rennes 1 - IRISA, campus de Beaulieu, 35041 Rennes, France 3: AbsInt Angewandte Informatik GmbH. Science Park 1, D-66123 Saarbr¨ ucken, Germany 4: MTU Friedrichshafen GmbH, Maybachplatz 1, D-88048 Friedrichshafen, Germany 5: Saarland University, Saarland Informatics Campus, Saarbr¨ ucken, Germany Abstract CompCert is the ﬁrst commercially available optimiz- ing compiler that is formally veriﬁed, using machine- assisted mathematical proofs, to be exempt from mis- compilation. The executable code it produces is proved to behave exactly as speciﬁed by the semantics of the source C program. This article gives an overview of the use of CompCert to gain certiﬁcation credits for a highly safety-critical industry application, certiﬁed ac- cording to IEC 60880 [7]. We will brieﬂy introduce the target application, illustrate the process of changing the existing compiler infrastructure to CompCert, and dis- cuss performance characteristics. The main part focuses on the tool qualiﬁcation strategy, in particular on how to take advantage of the formal correctness proof in the certiﬁcation process. 1 Introduction A compiler translates the source code written in a given programming language into executable object code of the target processor. Due to the complexity of the code generation and optimization process compilers may contain bugs. In fact, studies like [23, 5] and [25] have found numerous bugs in all investigated open source and commercial compilers, including compiler crashes and miscompilation issues. Miscompilation means that the compiler silently generates incorrect machine code from a correct source program. In safety-critical systems miscompilation is a serious problem since it can cause erroneous or erratic behavior including memory corruption and program crash, which may manifest sporadically and often is hard to identify and track down. Furthermore many veriﬁcation activi- ties are performed at the architecture, model, or source code level, but all properties demonstrated there may not be satisﬁed at the executable code level when mis- compilation happens. This is not only true for source code review but also for formal, tool-assisted veriﬁca- tion methods such as static analyzers, deductive veri- ﬁers, and model checkers. In consequence, many safety standards require additional, difﬁcult and costly veriﬁ- cation activities to show that the requirements already shown at higher levels are also satisﬁed at the executable object code level. Since 2015 the CompCert compiler has been com- mercially available. CompCert is formally veriﬁed, us- ing machine-assisted mathematical proofs, to be exempt from miscompilation issues. In other words, the ex- ecutable code it produces is proved to behave exactly as speciﬁed by the semantics of the source C program. CompCert is the ﬁrst formally veriﬁed compiler on the market; it provides an unprecedented level of conﬁdence in the correctness of the compilation process. In general, usage of CompCert offers multiple beneﬁts. First, the cost of ﬁnding and ﬁxing compiler bugs and shipping the patch to customers can be avoided. The testing effort required to ascertain software properties at the binary executable level can be reduced since the correctness proof of CompCert C guarantees that all safety prop- erties veriﬁed on the source code automatically hold as well for the generated executable. Whereas in the past for highly critical applications (e.g., according to DO- 178B Level A) compiler optimizations were often com- pletely switched off, using optimized code now becomes feasible. In [19] we have given an overview of the design and the proof concept of CompCert and have presented an evaluation of its performance on the well-known SPEC benchmarks. In this article we report on practical ex- perience with replacing a legacy compiler by CompCert for a highly critical control system from MTU in the nu-

clear power domain. The article is structured as follows: in Sec. 2 we give an overview of the MTU application for which Comp- Cert is used; Sec. 3 describes the relevant considera- tions for applying a traditional non-veriﬁed compiler. In Sec. 4 we brieﬂy summarize the CompCert design and its proof concept. Sec. 5 describes the integration of CompCert into the development process, and the perfor- mance gains observed. The tool qualiﬁcation strategy is detailed in Sec. 6, Sec. 7 concludes. 2 The Application MTU develops diesel engines that are deployed in civil nuclear power plants as drivers for emergency genera- tors to generate electrical power. Such engines are avail- able to the market diversely as either common rail or fuel rack controlled engines with capabilities to produce up to 7 MW electrical power per unit. In case of failures in the electrical grid of a nuclear power plant one or more of these units are requested to provide power to support the capability to control the nuclear plant core and cooling systems. It is obvious that the functional contribution may be mission critical to the overall plant. The engines are controlled by an MTU-developed dig- ital engine control unit (ECU). This ECU performs only safety functions and in particular maintains the safe state requested by the plant operator. This safe state ensures that the engine stands still if required and is controlled to maintain the demanded engine speed if required. Software decomposition The software of the ECU runs on top of a handwritten runtime environment, writ- ten in assembler, speciﬁc to the controller in use. The application consists of handwritten C-code and gener- ated C-code derived from SCADE models. The handwritten C-code implements a scheduler, a hardware abstraction layer, and self-supervision capa- bilities. The hardware abstraction layer polls physical sensor inputs, controls hardware actuators, and provides hardware related self supervision mechanisms which must not interfere with the two former objectives in ﬁxed timing intervals. Such ﬁxed intervals must be small enough to acquire all relevant events and to main- tain sensor acquisitioning sampling theorems. The scheduler provides safe data and control ﬂow interfacing between the concurrent hardware access thread and the main control loop. Such interfacing lim- its the amount of required race condition considerations and allows for maintaining safe timing constraints of the threads. Based on safe over approximations of timing envelopes it is possible to prove that all scheduling con- straints are always maintained. The SCADE model provides the engine controller al- gorithms. The monolithic model strictly follows the synchronous paradigm by separating input acquisition, processing and generating output. The entire model ex- ecution is provided in SCADE which is a prerequisite to make further statements on the model integrity. Development constraints Software and develop- ment process comply with the international standards IEC60880 [7] and IEC61508:2010, part 3 (SCL3 for software) [8]. C was chosen as programming language because of the abundant availability of translators for the targeted PowerPC architecture. Code generators from model driven approaches to C are well introduced and the SCADE generator is validated to translate correctly to a deﬁned language subset. C subset All C-code is produced in a subset of ISO/IEC9899:1999 [9]. Its capability is sufﬁcient for all of the outlined application requirements. This ver- sion of the standard is considered so widely used that the standard and its deﬁciencies are well understood and compilers are more likely to fully comply. Emphasis is put on the objective to enhance robust- ness, to provide exactly one method to solve a prob- lem and to avoid potentially error-prone constructs. The MISRA:2004 [22] standard is a good starting point for choosing such a language subset. In addition continuous research on actual and potential coding defects has been considered. Lastly the subset is formed by complement- ing cultural development among users and testers of the application in question in a structured process. With each programming project an assessment of perceived risks regarding frequency, potential conse- quences and chances of detection is carried out. During development, continuous discussions are encouraged to support risk consciousness. All of these risk considerations are condensed into a set of in-house coding guidelines also reﬂecting the cur- rent project team’s language proﬁciency. Data types are basically restricted to the use of inte- ger arithmetics with as few type conversions as possible. Thus compiler behavior is as explicit as possible mend- ing some of the inherent type unsafety of C. Enums, unions and bit ﬁelds are not part of the language sub- set. The language subset is also designed to be well covered by automatic checking tools. The sound static runtime error analyzer Astr´ ee [19, 16] also includes a coding guideline checker, called RuleChecker, which is suitable for the subset chosen. For the deﬁned speciﬁc rule set it provides a coding guideline coverage of more than 85 %. The remaining 15 % are inevitably attributed to such objectives requir- ing human involvement as to avoid tricky programming, to choose understandable identiﬁer names and to pro- vide helpful comments.

CompCert: Practical Experience on Integrating and Qualifying a Formally Verified Optimizing Compiler Daniel Kästner3, Jörg Barrho4, Ulrich Wünsche4, Marc Schlickling4, Bernhard Schommer5, Michael Schmidt3, Christian Ferdinand3, Xavier Leroy1, Sandrine Blazy2, 1: Inria Paris, 2 rue Simone Iff, 75589 Paris, France 2: University of Rennes 1 - IRISA, campus de Beaulieu, 35041 Rennes, France 3: AbsInt Angewandte Informatik GmbH. Science Park 1, D-66123 Saarbrücken, Germany 4: MTU Friedrichshafen GmbH, Maybachplatz 1, D-88048 Friedrichshafen, Germany 5: Saarland University, Saarland Informatics Campus, Saarbrücken, Germany Abstract CompCert is the first commercially available optimizing compiler that is formally verified, using machineassisted mathematical proofs, to be exempt from miscompilation. The executable code it produces is proved to behave exactly as specified by the semantics of the source C program. This article gives an overview of the use of CompCert to gain certification credits for a highly safety-critical industry application, certified according to IEC 60880 [7]. We will briefly introduce the target application, illustrate the process of changing the existing compiler infrastructure to CompCert, and discuss performance characteristics. The main part focuses on the tool qualification strategy, in particular on how to take advantage of the formal correctness proof in the certification process. not be satisfied at the executable code level when miscompilation happens. This is not only true for source code review but also for formal, tool-assisted verification methods such as static analyzers, deductive verifiers, and model checkers. In consequence, many safety standards require additional, difficult and costly verification activities to show that the requirements already shown at higher levels are also satisfied at the executable object code level. Since 2015 the CompCert compiler has been commercially available. CompCert is formally verified, using machine-assisted mathematical proofs, to be exempt from miscompilation issues. In other words, the executable code it produces is proved to behave exactly as specified by the semantics of the source C program. CompCert is the first formally verified compiler on the market; it provides an unprecedented level of confidence in the correctness of the compilation process. In general, 1 Introduction usage of CompCert offers multiple benefits. First, the A compiler translates the source code written in a given cost of finding and fixing compiler bugs and shipping programming language into executable object code of the patch to customers can be avoided. The testing effort the target processor. Due to the complexity of the code required to ascertain software properties at the binary generation and optimization process compilers may executable level can be reduced since the correctness contain bugs. In fact, studies like [23, 5] and [25] have proof of CompCert C guarantees that all safety propfound numerous bugs in all investigated open source and erties verified on the source code automatically hold as commercial compilers, including compiler crashes and well for the generated executable. Whereas in the past miscompilation issues. Miscompilation means that the for highly critical applications (e.g., according to DOcompiler silently generates incorrect machine code from 178B Level A) compiler optimizations were often completely switched off, using optimized code now becomes a correct source program. In safety-critical systems miscompilation is a serious feasible. problem since it can cause erroneous or erratic behavior In [19] we have given an overview of the design and including memory corruption and program crash, which the proof concept of CompCert and have presented an may manifest sporadically and often is hard to identify evaluation of its performance on the well-known SPEC and track down. Furthermore many verification activi- benchmarks. In this article we report on practical exties are performed at the architecture, model, or source perience with replacing a legacy compiler by CompCert code level, but all properties demonstrated there may for a highly critical control system from MTU in the nu- clear power domain. The article is structured as follows: in Sec. 2 we give an overview of the MTU application for which CompCert is used; Sec. 3 describes the relevant considerations for applying a traditional non-verified compiler. In Sec. 4 we briefly summarize the CompCert design and its proof concept. Sec. 5 describes the integration of CompCert into the development process, and the performance gains observed. The tool qualification strategy is detailed in Sec. 6, Sec. 7 concludes. 2 The Application MTU develops diesel engines that are deployed in civil nuclear power plants as drivers for emergency generators to generate electrical power. Such engines are available to the market diversely as either common rail or fuel rack controlled engines with capabilities to produce up to 7 MW electrical power per unit. In case of failures in the electrical grid of a nuclear power plant one or more of these units are requested to provide power to support the capability to control the nuclear plant core and cooling systems. It is obvious that the functional contribution may be mission critical to the overall plant. The engines are controlled by an MTU-developed digital engine control unit (ECU). This ECU performs only safety functions and in particular maintains the safe state requested by the plant operator. This safe state ensures that the engine stands still if required and is controlled to maintain the demanded engine speed if required. Software decomposition The software of the ECU runs on top of a handwritten runtime environment, written in assembler, specific to the controller in use. The application consists of handwritten C-code and generated C-code derived from SCADE models. The handwritten C-code implements a scheduler, a hardware abstraction layer, and self-supervision capabilities. The hardware abstraction layer polls physical sensor inputs, controls hardware actuators, and provides hardware related self supervision mechanisms which must not interfere with the two former objectives in fixed timing intervals. Such fixed intervals must be small enough to acquire all relevant events and to maintain sensor acquisitioning sampling theorems. The scheduler provides safe data and control flow interfacing between the concurrent hardware access thread and the main control loop. Such interfacing limits the amount of required race condition considerations and allows for maintaining safe timing constraints of the threads. Based on safe over approximations of timing envelopes it is possible to prove that all scheduling constraints are always maintained. The SCADE model provides the engine controller algorithms. The monolithic model strictly follows the synchronous paradigm by separating input acquisition, processing and generating output. The entire model ex- ecution is provided in SCADE which is a prerequisite to make further statements on the model integrity. Development constraints Software and development process comply with the international standards IEC60880 [7] and IEC61508:2010, part 3 (SCL3 for software) [8]. C was chosen as programming language because of the abundant availability of translators for the targeted PowerPC architecture. Code generators from model driven approaches to C are well introduced and the SCADE generator is validated to translate correctly to a defined language subset. C subset All C-code is produced in a subset of ISO/IEC9899:1999 [9]. Its capability is sufficient for all of the outlined application requirements. This version of the standard is considered so widely used that the standard and its deficiencies are well understood and compilers are more likely to fully comply. Emphasis is put on the objective to enhance robustness, to provide exactly one method to solve a problem and to avoid potentially error-prone constructs. The MISRA:2004 [22] standard is a good starting point for choosing such a language subset. In addition continuous research on actual and potential coding defects has been considered. Lastly the subset is formed by complementing cultural development among users and testers of the application in question in a structured process. With each programming project an assessment of perceived risks regarding frequency, potential consequences and chances of detection is carried out. During development, continuous discussions are encouraged to support risk consciousness. All of these risk considerations are condensed into a set of in-house coding guidelines also reflecting the current project team’s language proficiency. Data types are basically restricted to the use of integer arithmetics with as few type conversions as possible. Thus compiler behavior is as explicit as possible mending some of the inherent type unsafety of C. Enums, unions and bit fields are not part of the language subset. The language subset is also designed to be well covered by automatic checking tools. The sound static runtime error analyzer Astrée [19, 16] also includes a coding guideline checker, called RuleChecker, which is suitable for the subset chosen. For the defined specific rule set it provides a coding guideline coverage of more than 85 %. The remaining 15 % are inevitably attributed to such objectives requiring human involvement as to avoid tricky programming, to choose understandable identifier names and to provide helpful comments. Figure 1: CompCert Workflow 3 The Past: Using a Non-Verified Compiler Compiling source code which becomes part of safety software in production use is inherently flagged as a critical task. For such critical tasks a tool must be qualified as suitable by fulfilling a number of criteria defined by the user. MTU only uses critical tools in safety applications if such a tool has been developed within a structured process. It must provide sufficient evidence for reliable operation and user experience must have a positive record. MTU’s tool qualification strategy is depicted in Fig. 2. such service. Alternatively the user may decide to qualify a newer compiler version with a commercial validation suite. This also induces substantial effort and external costs. Neither of these alternatives is satisfactory. 4 The CompCert Compiler Validation / Verification User Experience Structured Development In the following we will give a brief overview of the design and proof concept of CompCert; more details can be found in [19]. Fig. 1 shows the CompCert-based workflow. The input to the compilation process is a set of C source and header files. CompCert itself focuses on the task of compilation and includes neither preprocessor, assembler, nor linker. Therefore it has to be used in combination with a legacy compiler tool chain. Since preprocessing, assembling and linking are Tool well-established stages there are no particular tool chain Qualification requirements. While early versions of CompCert were limited to single-file inputs, CompCert now also supports separate compilation [14]. It reads the set of preprocessed C files emitted by the legacy preprocessor, performs a series of code generation and optimization steps and emits a set of assembly files enhanced by debug information. CompCert generates DWARF2 debugging information for functions and variables, including information Figure 2: MTU tool qualification strategy about their type, size, alignment and location. This also Historically MTU has used a traditional commercially includes local variables so that the values of all variables available C-compiler well proven in use. Use of this can be inspected during program execution in a debugcompiler requires some maintenance effort due to the ger. To this end CompCert introduces a dedicated pass sporadic appearance of new bugs. Each of these bugs which computes the live ranges of local variables and requires evaluation and eventually code changes and their locations throughout the live range. changes of code review checklists for fully standard The generated assembly code can contain formal compliant source code. CompCert annotations which can be inserted at the C When such a proven in use compiler is removed from code level and are carried throughout the code generastandard supplier support there are two options. The tion process. This way, traceability information, or sesupplier may offer the service to check if bugs in later mantic information to be passed to other tools can be compiler versions already existed in the used version. transported to the machine code level. Since they are However the supplier may charge substantial fees for fully covered by the CompCert proof the information is reliable and provides proven links between the machine code and the source code level. After assembling and linking by the legacy tool chain the final executable code is produced. To increase confidence in the assembling and linking stages CompCert provides a tool for translation validation, called Valex, which performs equivalence checks between assembly and executable code (cf. Sec. 4.4). produced by the back-end. 4.2 The CompCert Proof The CompCert front-end and back-end compilation passes are all formally proved to be free of miscompilation errors; as a consequence, so is their composition. The property that is formally verified is semantic preservation between the input code and output code of every pass. To state this property with mathematical precision, 4.1 Design Overview we give formal semantics for every source, intermediate CompCert is structured as a pipeline of 20 compilation and target language, from C to assembly. These semanpasses that bridge the gap between C source files and tics associate to each program the set of all its possible object code, going through 11 intermediate languages. behaviors. Behaviors indicate whether the program terThe passes can be grouped in 4 successive phases: minates (normally by exiting or abnormally by causing Parsing Phase 1 performs preprocessing (using an a runtime error such as dereferencing the null pointer) off-the-shelf preprocessor such as that of GCC), tok- or runs forever. Behaviors also contain a trace of all obenization and parsing into an ambiguous abstract syn- servable input/output actions performed by the program, tax tree (AST), and type-checking and scope resolution, such as system calls and accesses to “volatile” memory obtaining a precise, unambiguous AST and producing areas that could correspond to a memory-mapped I/O error and warning messages as appropriate. The LR(1) device. To a first approximation, a compiler preserves semanparser is automatically generated from the grammar of the C language by the Menhir parser generator, along tics if the generated code has exactly the same set of observable behaviors as the source code (same terminawith a Coq proof of correctness of the parser [11]. tion properties, same I/O actions). This first approximaC front-end compiler The second phase first retion fails to account for two important degrees of freechecks the types inferred for expressions, then deterdom left to the compiler. First, the source program can mines an evaluation order among the several permitted have several possible behaviors: this is the case for C, by the C standard. Implicit type conversions, operawhich permits several evaluation orders for expressions. tor overloading, address computations, and other typeA compiler is allowed to reduce this non-determinism dependent behaviors are made explicit; loops are simby picking one specific evaluation order. Second, a C plified. The front-end phase outputs Cminor code. Cmicompiler can “optimize away” runtime errors present in nor is a simple, untyped intermediate language featuring the source code, replacing them by any behavior of its both structured (if/else, loops) and unstructured conchoice. (This is the essence of the notion of “undefined trol (goto). behavior” in the ISO C standards.) As an example conBack-end compiler This third phase comprises 12 sider an out-of-bounds array access: of the passes of CompCert, including all optimizations int main(void) and most dependencies on the target architecture. The { int t[2]; most important optimization performed is register allot[2] = 1; // out of bounds cation, which uses the sophisticated Iterated Register return 0; Coalescing algorithm [6]. Other optimizations include } function inlining, instruction selection, constant propa- This is undefined behavior according to ISO C, and gation, common subexpression elimination (CSE), and a runtime error according to the formal semantics of redundancy elimination. These optimizations imple- CompCert C. The generated assembly code does not ment several strategies to eliminate computations that check array bounds and therefore writes 1 in a stack are useless or redundant, or to turn them into equivalent location. This location can be padding, in which case but cheaper instruction sequences. Loop optimizations the compiled program terminates normally, or can conand instruction scheduling optimizations are not imple- tain the return address for ”main”, smashing the stack mented yet. and causing execution to continue at PC 1, with unpreAssembling The final phase of CompCert takes the dictable effects. Finally, an optimizing compiler like AST for assembly language produced by the back-end, CompCert can notice that the assignment to t[2] is prints it in concrete assembly syntax, adds DWARF de- useless (the t array is not used afterwards) and remove bugging information coming from the parser, and calls it from the generated code, causing the compiled prointo an off-the-shelf assembler and linker to produce ob- gram to terminate normally. ject files and executable files. To improve confidence, To address the two degrees of flexibility mentioned CompCert provides an independent tool, called Valex above, CompCert’s formal verification uses the follow(cf. Sec. 6), that re-checks the ELF executable file pro- ing definition of semantic preservation, viewed as a reduced by the linker against the assembly language AST finement over observable behaviors: Definition 1 (Semantic preservation) If the compiler produces compiled code C from source code S, without reporting compile-time errors, then every observable behavior of C is either identical to an allowed behavior of S, or improves over such an allowed behavior of S by replacing undefined behaviors with more defined behaviors. The semantic preservation property is a corollary of a stronger property, called a simulation diagram that relates the transitions that C can make with those that S can make. First, the simulation diagrams are proved independently, one for each pass of the front-end and back-end compilers. Then, the diagrams are composed together, establishing semantic preservation for the whole compiler. The proofs are very large, owing to the many passes and the many cases to be considered - too large to be carried using pencil and paper. We therefore use machine assistance in the form of the Coq proof assistant. Coq gives us means to write precise, unambiguous specifications; conduct proofs in interaction with the tool; and automatically re-check the proofs for soundness and completeness. We therefore achieve very high levels of confidence in the proof. At 100,000 lines of Coq and 6 person-years of effort, CompCert’s proof is among the largest ever performed with a proof assistant. 4.3 safety. Users are notified about: integer/floating-point division by zero, out-of-bounds array indexing, erroneous pointer manipulation and dereferencing (buffer overflows, null pointer dereferencing, dangling pointers, etc.), data races, lock/unlock problems, deadlocks, integer and floating-point arithmetic overflows, read accesses to uninitialized variables, unreachable code, nonterminating loops, violations of optional user-defined static assertions. Astrée also provides a module for checking coding rules, called RuleChecker, which supports various coding guidelines (MISRA C:2004 [22], MISRA C:2012 [21], ISO/IEC TS 17961 [10], SEI CERT C [2, 3], CWE [24]), computes code metrics and checks code metric thresholds. RuleChecker is also available as a standalone product, but when used in combination with Astrée it can access the results of the sound static runtime analysis and, hence, can achieve zero false negatives even on semantic rules. 4.4 Translation Validation Currently the verified part of the compilation tool chain ends at the generated assembly code. In order to bridge this gap we have developed a tool for automatic translation validation, called Valex, which validates the assembling and linking stages a posteriori. Proving the Absence of Runtime Errors In safety-critical systems, the use of dynamic memory allocation and recursions is typically forbidden or only used in limited ways. This simplifies the task of static analysis such that for safety-critical embedded systems it is possible to formally prove the absence of runtime errors, or report all potential runtime errors which still exist in the program. Such analyzers are based on the theory of abstract interpretation [4], a mathematically rigorous formalism providing a semantics-based methodology for static program analysis. Abstract interpretation supports formal correctness proofs: it can be proved that an analysis will terminate and that it is sound, i.e., that it computes an over-approximation of the concrete semantics. If no potential error is signaled, definitely no runtime error can occur: there are no false negatives. If a potential error is reported, the analyzer cannot exclude that there is a concrete program execution triggering the error. If there is no such execution, this is a false alarm (false positive). This imprecision is on the safe side: it can never happen that there is a runtime error which is not reported. One example of a sound static runtime error analyzer is the Astrée analyzer [20, 15]. It reports program defects caused by unspecified and undefined behaviors according to the C norm (ISO/IEC 9899:1999 (E)) [9], program defects caused by invalid concurrent behavior, violations of user-specified programming guidelines, and computes program properties relevant for functional Figure 3: Translation Validation with Valex Valex checks the correctness of the assembling and linking of a statically and fully linked executable file PE against the internal abstract assembly representation PA produced by CompCert from the source C program PS . The internal abstract assembly as well as the linked executable are passed as arguments to the Valex tool. The main goal is to verify that every function defined in a C source file compiled by CompCert and not optimized away by it can be found in the linked executable and that its disassembled machine instructions match the abstract assembly code. To that end, after parsing the abstract assembly code Valex extracts the symbol table and all sections from the linked executable. Then the functions contained in the abstract assembly code are disassembled. Extraction and disassembling is done by two invocations of exec2crl, the executable reader of aiT and StackAnalyzer [1]. Apart from matching the instructions in the abstract assembly code against the instructions contained in the linked executable Valex also checks whether symbols are used consistently, whether variable size and initialization data correspond and whether variables are placed in the right sections in the executable. Currently Valex can check linked PowerPC executables that have been produced from C source code by the CompCert C compiler using the Diab assembler and linker from Wind River Systems, or the GCC tool chain (version 4.8, together with GNU binutils 2.24). I CR T W n m terr od up e# t 3 I n m terr od up e# t 2 I n m terr od up e# t 1 S fu ynch nc ro tio no n us ing a maximum coverage of real world interaction noise. If such components are specified to expose defined complete and non contradicting behaviour on their 5 Integration and Performance boundaries and are written as generically as possible, abstract testing comes into reach. Generic behaviour Integration The ECU control software uses a limited does not depend on underlying processor properties set of timing interrupts which does not impair worstsuch as endianness and hardware register allocation. On case execution time estimations. The traditional comthe compiler side it does not depend on compiler spepiler accepts pragma indications to flag C-functions so cific or undefined behaviour. Coding guidelines and arthey can be called immediately from an interrupt vecchitectural constraints may ensure compliance with such tor. The compiler then adds code for saving the system rules. state and more registers than used in a standard PowIf software artifacts comply with these constraints erPC EABI function call. they may be tested independently from hardware and CompCert does not accept this compiler-dependent specific compilation tool chain. CompCert is available pragma nor inline assembly so the user must hand-code for ARM, x86 and PowerPC architectures so that propthe mechanism outlined in the previous paragraph in aserties acquired on one platform hold on the other. sembler language in separate assembly files. Such assembler code can be placed in the runtime environment Code Performance The code generated by CompCert module. Some system state recovery contained in a fall- was subjected to the Valex tool and shows no indicaback exception handler is also transferred to the runtime tions of incompliance. The generated code was inteenvironment. grated into the target hardware and extensively tested in The strategy of using a minimum sufficient subset as a simulated synthetic environment which is a precondiscussed in Sec. 2 above is fully confirmed since only dition to using the integrated system on a real engine. one related change to the source code was necessary. If simulator test and engine test are passed they jointly For more than five years CompCert has fully covered the provide behavioral validation coverage of every aspect chosen range of constructs even during earlier phases of of the functional system requirements. its development. Behaviors undefined according to the C semantics are WCET(µs) not covered by the formal correctness proof of Comp- 2100 CompCert Cert. Only code that exhibits no numeric overflows, di- 1800 -28% Conventional compiler vision by zero, invalid memory accesses or any other 1500 undefined behavior can possibly be functionally cor- 1200 rect. The sound abstract interpretation based analyzer 900 Astrée can prove the absence of runtime errors includ600 -41% ing any undefined behaviors [18, 19]. Therefore we use 300 -19% -21% -22% Astrée to complement the formal correctness argument of CompCert. Further minor modifications were necessary to adapt the build process to the CompCert compiler options. Also the linker control file required some changes since Figure 4: WCET estimates for MTU application CompCert allocates memory segments differently from some traditional popular compilers. All building processes were completed successfully; In the final step an MTU specific flashing tool asall functional tests passed. Thus these tests – on an signs code, constant data as well as initialized and nonadmittedly minimized and robust language subset – exinitialized data as required by the C runtime environposed no indication of compiler flaws. ment specific to the target architecture. To assess the performance of the CompCert compiler Testability Testing functional behaviour on the target further we have investigated the size and the worst-case platform can be tedious. Potentially concurrent software execution time of the generated code. interacts with hardware which does not necessarily beTo determine the memory consumption by code and have according to the synchronous paradigm. The hard- data segments we have analyzed the generated binary ware in turn interacts with the noise charged physical file. Compared to the conventional compiler the code environment. In addition some of that interaction only segment in the executable generated by CompCert is works properly under hard real time restrictions. Thus slightly smaller. The size of the data segment size is typical module or software tests in the target environ- almost identical in both cases. These observations are ment suffer from the necessity to impose severe restric- consistent with our expectations since in CompCert we tions on the behaviors expected in reality. have used more aggressive optimization settings. The It is thus desirable to test software components reach- traditional compiler was configured not to use any opti- mization to ensure traceability and to reduce functional risks introduced by the compiler itself during the optimization stage. Bytes CompCert 800 Conventional compiler 600 6 Tool Qualification MTU’s qualification strategy is built on three columns, namely providing evidence of a structured tool development, sufficient user experience, and confirmation of reliable operation via validation (cf. Sec. 3 and Fig. 2). This strategy has also been applied to qualify CompCert for use within a highly safety-critical application. -39% Compilation As described in Sec. 4 all of CompCert’s front-end and back-end compilation passes are formally -50% -18% proved to be free of miscompilation errors. These for200 mal proofs bring strong confidence in the correctness of the front-end and back-end parts of CompCert. These parts include all optimizations – which are particularly difficult to qualify by traditional methods – and most code generation algorithms. The formal proof does not cover some elements of Figure 5: Worst-case stack usage for MTU application the parsing phase, nor the preprocessing, assembling and linking (cf. [19]) for which external tools are used. With the verified compiler CompCert at hand the de- Therefore we complement the formal proof by applying sign decision was made to lift this restriction. CompCert a publically available validation suite. performs register allocation to access data from regisThe overall qualification strategy for CompCert is deters and minimizes memory accesses. In addition, as picted in Fig. 6. In contrast to validating the correlation opposed to the traditional compiler it accesses memory of source files and the resulting fully linked executable using small data areas. That mechanism lets two regis- file, qualification of the compiler toolchain is split in ters constantly reference base memory addresses so that three phases: traditional testsuite validation, formal veraddress references require two PowerPC assembler in- ification, and translation validation. structions instead of three as before. Preprocessor Source-code preprocessing is mandated The maximum execution time for one computational to a well-used version of gcc. The selected version cycle is assessed with the static WCET (worst-case exis validated using a preprocessor testsuite, for which ecution time) analysis tool aiT [17]. When configured the correlation to the used language subset is manually correctly this tool delivers safe upper execution time proven. MTU uses strict coding rules limiting the use of bounds. All concurrent threads are mapped into one C-language [?] constructs to basic constructs known to computation cycle under worst-case conditions. The be widely in use. Also usage of C preprocessing macros precise mapping definition is part of the architectural is limited by these rules to very basic constructs. The software design on the bare processor. testsuite is tailored to fully cover these demands. Analyses are performed on a normal COTS PC, each It must be ensured that source files and included entry (synchronous function, interrupt) has been anaheader files only use a subset of the features which are lyzed separately. Analysis of timing interrupt is split validated by the above procedure. This may be accomin several modes, and finally, the WCRT (worst-case replished by establishing a suitable checklist and manusponse time) for one computational cycle is calculated. ally applying it to each and every source file. The results for the MTU application are shown in Fig. 4. Effort may however be reduced and the reliability of The computed WCET bounds lead to a total processor that process be vastly improved if a coding guideline load which is about 28% smaller with the CompCertchecker is used. That tool must again be validated to generated code than with the code generated by the conprovide alarms for every violation of any required rule. ventional compiler. The main reason for this behaviour As described above Astrée includes a code checker, is the improved memory performance. The result is concalled RuleChecker, which analyzes each source file sistent with our expectations and with previously pubfor compliance with a predefined set of rules, includlished CompCert research papers. ing MISRA:2004 [22]. It also provides a Qualification We have also determined a safe upper bound of the Support Kit and Qualification Software Life Cycle Data total stack usage in both scenarios, using the static anreports which facilitate the tool qualification process. alyzer StackAnalyzer [13]. The results are shown in Fig. 5. When providing suitable behavioral assump- Assembling and Linking Cross-assembling and tions about the software to the analyzer the overall cross-linking is also done by gcc. To complement the stack usage is around 40% smaller with the CompCert- proven-in-use argument and the implicit coverage by generated code than the code generated by the conven- the validation suite we use the translation validation tional compiler. tool Valex shipped with CompCert which provides al To t t er ru p In t S fu ync nc hr tio on n ou s 400 MTU Coding Rules Verification & Compliance Report Validation Report Validation RuleChecker Runtime Error Analysis Astrée Validation Valex Preprocessing gcc (.c/.h) Files Compilation CompCert (.i) Files Testsuite validation (.json) Files (.s) Files Formal verification Assembling / Linking gcc (.elf) File Translation validation Figure 6: CompCert qualification additional confidence in the correctness of assembler and linker. Each source file is compiled with CompCert using a dedicated option, s.t. CompCert is instructed to serialize its internal abstract assembly representation in JSON format [12]. The generated .json-files as well as the fully linked executable are then passed to the Valex tool. As described in Sec. 4.4 Valex checks the correctness of the assembling and linking of the executable file against the internal abstract assembly representation produced by CompCert. Tools used in the process of qualifying CompCert, namely Astrée and Valex, are also qualified using the qualification strategy described above. By dividing the qualification of CompCert into steps and applying strict coding rules throughout the development, complexity of compiler qualification tremendously decreases making use of CompCert feasible also within a highly safetycritical industrial application. 7 Conclusion CompCert is a formally verified optimizing C compiler: the executable code it produces is proved to behave exactly as specified by the semantics of the source C program. This article reports on practical experience obtained at MTU with replacing a non-verified legacy compiler by CompCert for a highly critical control software of an emergency power generator. We have described the necessary steps to integrate CompCert in the development process, and outlined our tool qualification strategy. The main benefits are higher confidence in the correctness of the generated code, and significantly improved system performance. References [1] AbsInt GmbH, Saarbrücken, Germany. AbsInt Advanced Analyzer for PowerPC, April 2016. User Documentation. [2] CERT – Software Engineering Institute. SEI CERT C Coding Standard – Rules for Developing Safe, Reliable, and Secure Systems. Carnegie Mellon University, 2016. [3] CERT – Software Engineering Institute, Carnegie Mellon University. SEI CERT Coding Standards Website. [4] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th POPL, pages 238–252, Los Angeles, CA, 1977. ACM Press. [5] E. Eide and J. Regehr. Volatiles are miscompiled, and what to do about it. In EMSOFT ’08, pages 255–264. ACM, 2008. [6] L. George and A. W. Appel. Iterated register coalescing. ACM Trans. Prog. Lang. Syst., 18(3):300–324, 1996. [7] IEC 60880. Nuclear power plants instrumentation and control systems important to safety software aspects for computer-based systems performing category a functions, 2006. [8] IEC 61508. Functional safety of electrical/electronic/programmable electronic safety-related systems, 2010. [9] ISO. International standard ISO/IEC 9899:1999, Programming languages – C, 1999. [10] ISO/IEC. Information Technology – Programming Languages, Their Environments and System Software Interfaces – Secure Coding Rules (ISO/IEC TS 17961), Nov 2013. [11] J.-H. Jourdan, F. Pottier, and X. Leroy. Validating LR(1) parsers. In ESOP 2012: 21st European Symposium on Programming, volume 7211 of LNCS, pages 397–416. Springer, 2012. [12] The JSON Data Interchange Format. Technical Report Standard ECMA-404 1st Edition / October 2013, ECMA, Oct. 2013. [13] D. Kästner and C. Ferdinand. Proving the Absence of Stack Overflows. In SAFECOMP ’14: Proceedings of the 33th International Conference on Computer Safety, Reliability and Security, volume 8666 of LNCS, pages 202–213. Springer, September 2014. [14] D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, and C. Ferdinand. Closing the gap – the formally verified optimizing compiler CompCert. In SSS’17: Developments in System Safety Engineering: Proceedings of the Twenty-fifth Safety-critical Systems Symposium, pages 163–180. CreateSpace, 2017. [15] D. Kästner, A. Miné, L. Mauborgne, X. Rival, J. Feret, P. Cousot, A. Schmidt, H. Hille, S. Wilhelm, and C. Ferdinand. Finding All Potential Runtime Errors and Data Races in Automotive Software. In SAE World Congress 2017. SAE International, 2017. [16] D. Kästner, A. Miné, A. Schmidt, H. Hille, L. Mauborgne, S. Wilhelm, X. Rival, J. Feret, P. Cousot, and C. Ferdinand. Finding All Potential Run-Time Errors and Data Races in Automotive Software. In Proceedings of the SAE World Congress 2017 (SAE Technical Paper). SAE International, 2017. [17] D. Kästner, M. Pister, G. Gebhard, M. Schlickling, and C. Ferdinand. Confidence in Timing. Safecomp 2013 Workshop: Next Generation of System Assurance Approaches for Safety-Critical Systems (SASSUR), September 2013. [18] D. Kästner, S. Wilhelm, S. Nenova, P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, and X. Rival. Astrée: Proving the Absence of Runtime Errors. Embedded Real Time Software and Systems Congress ERTS 2 , 2010. [19] X. Leroy, S. Blazy, D. Kästner, B. Schommer, M. Pister, and C. Ferdinand. CompCert - A Formally Verified Optimizing Compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, Toulouse, France, Jan. 2016. SEE. [20] A. Miné, L. Mauborgne, X. Rival, J. Feret, P. Cousot, D. Kästner, S. Wilhelm, and C. Ferdinand. Taking Static Analysis to the Next Level: Proving the Absence of RunTime Errors and Data Races with Astrée. Embedded Real Time Software and Systems Congress ERTS2 , 2016. [21] MISRA Working Group. MISRA-C:2012 Guidelines for the use of the C language in critical systems. MISRA Limited, Mar. 2013. [22] Motor Industry Software Reliability Association. MISRA-C: 2004 – Guidelines for the use of the C language in critical systems, 2004. [23] NULLSTONE Corporation. NULLSTONE for C. http://www.nullstone.com/htmls/ns-c. htm, 2007. [24] The MITRE Corporation. CWE – Common Weakness Enumeration. [25] X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In PLDI ’11, pages 283–294. ACM, 2011.

Log In

CompCert: Practical Experience on Integrating and Qualifying a Formally Verified Optimizing Compiler