Closing the Gap – The Formally Verified
Optimizing Compiler CompCert
Daniel Kästner1, Xavier Leroy2, Sandrine Blazy3, Bernhard Schommer4,
Michael Schmidt1, Christian Ferdinand1
1: AbsInt GmbH, Saarbrücken, Germany
2: Inria Paris-Rocquencourt, Le Chesnay, France
3: University of Rennes 1 - IRISA, Rennes, France
4: Saarland University, Saarbrücken, Germany
Abstract CompCert is the first commercially available optimizing compiler that
is formally verified, using machine-assisted mathematical proofs, to be free from
miscompilation. The executable code it produces is proved to behave exactly as
specified by the semantics of the source C program. CompCert's intended use is the
compilation of safety-critical and mission-critical software meeting high levels of
assurance. This article gives an overview of the design of CompCert and its proof
concept, summarizes the resulting confidence argument, and gives an overview of
relevant tool qualification strategies. We briefly summarize practical experience
and give an overview of recent CompCert developments.
1 Introduction
Modern compilers are highly complex software systems which contain many highly
tuned and sophisticated algorithms; however these can contain bugs. Studies like
(NULLSTONE Corporation 2007, Eide and Regehr 2008) and (Yang et al.2011)
have found numerous bugs in all investigated open source and commercial compilers, including compiler crashes and miscompilation1 issues. Although such wrongcode errors can be detected in the normal software testing stage it does not typically
include systematic checks for them. When they occur in the field, they can be hard
to isolate and to fix.
Whereas in non-critical software functional software bugs tend to have bigger
impact than miscompilation errors, the importance of the latter dramatically increases in safety-critical systems. Contemporary safety standards such as DO178B/C, ISO-26262, or IEC-61508 require identification of potential hazards and
1
Miscompilation means that the compiler silently generates incorrect machine code from a correct
source program.
© D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand.
Published by the Safety-Critical Systems Club. All Rights Reserved
2
D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand
to demonstrate that the software does not violate the relevant safety goals. Many
verification activities are performed at the architecture, model, or source code level,
but all properties demonstrated there may not be satisfied at the executable code
level when miscompilation happens. This is true, not only for source code review
but also for formal, tool-assisted verification methods such as static analysers, deductive verifiers, and model checkers. Moreover, properties asserted by the operating system may be violated when its binary code contains wrong-code errors induced when compiling the OS. In consequence, miscompilation is a non-negligible
risk that must be addressed by additional, difficult and costly verification activities
such as more testing and more code reviews at the generated assembly code level.
The first attempts to formally prove the correctness of a compiler date back to
the 1960's (McCarthy and Painter 1967). Since 2015, with the CompCert compiler,
the first formally-verified optimizing C compiler is commercially available. What
sets CompCert apart from any other production compiler, is that it is formally verified, using machine-assisted mathematical proofs, to be exempt from miscompilation issues. In other words, the executable code it produces is proved to behave
exactly as specified by the semantics of the source C program. This level of confidence in the correctness of the compilation process is unprecedented. In particular,
using the CompCert C compiler is a natural complement to applying formal verification techniques (static analysis, program proof, model checking) at the source
code level: the correctness proof of CompCert C guarantees that all safety properties
verified on the source code automatically hold as well for the generated executable.
Usage of CompCert offers multiple benefits. First, the cost of finding and fixing
compiler bugs and shipping the patch to customers can be avoided. The testing effort required to ascertain software properties at the binary executable level can be
reduced. Whereas in the past for highly critical applications compiler optimizations
were often completely switched off, using optimized code now becomes feasible.
The paper is structured as follows: in section 2 we give a top-level overview of
the CompCert compiler and its tool flow. Section 3 summarizes the main code generation and optimization stages of CompCert and its annotation concept. The formal
CompCert proof is outlined in section 4. Section 5 presents the Valex tool for a
posteriori validation of assembly and linking phases. Section 6 describes the reference interpreter provided by CompCert for testing and semantic validation purposes. Section 7 summarizes the confidence argument for CompCert and adequate
tool qualification strategies. Section 8 summarizes experimental results and practical experience obtained with the CompCert compiler.
2 CompCert Overview
An overview of the CompCert-based workflow is given in Fig. 1. The input to the
compilation process is a set of C source and header files. CompCert itself focuses
on the task of compilation and includes neither preprocessor, assembler, nor linker.
Therefore it has to be used in combination with a legacy compiler tool chain. Since
Closing the Gap – The Formally Verified Optimizing Compiler CompCert
3
preprocessing, assembling and linking are well-established stages there are no particular tool chain requirements; as an example CompCert has successfully been used
with the GCC and Diab compilers.
Fig. 1. CompCert Workflow
While early versions of CompCert were limited to single-file inputs, CompCert
now also supports separate compilation (cf. Sec. 4.3). It reads the set of preprocessed C files produced by the legacy preprocessor, performs a series of code generation and optimization steps (cf. Sec. 3.1) and produces a set of assembly files
enhanced by debug information.
CompCert generates DWARF22 debugging information for functions and variables, including information about their type, size, alignment and location. This also
includes local variables so that the values of all variables can be inspected during
program execution in a debugger. To this end CompCert introduces a dedicated pass
which computes the live ranges of local variables and their locations throughout the
live range.
The generated assembly code can contain formal CompCert annotations which
can be inserted at the C code level and are carried throughout the code generation
process. This way, traceability information, or semantic information to be passed to
other tools can be transported to the machine code level. Since they are fully covered by the CompCert proof the information is reliable and provides proven links
between the machine code and the source code level (cf. Sec. 3.2).
After assembling and linking by the legacy tool chain the final executable code
is produced. To increase confidence in the assembling and linking stages CompCert
provides a tool for translation validation, called Valex, which performs equivalence
checks between assembly and executable code (cf. Sec. 5).
2
cf. DWARF Debugging Standard Website (http://dwarfstd.org).
4
D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand
2.1 Availability
The CompCert sources can be downloaded from Inria 3 free of charge; the current
state of the development can be viewed on Github 4. In addition, a released distribution with long-term support is available from AbsInt, either as a source code download or as pre-compiled binary for Windows and Linux. The package also contains
pre-configured setup files for the compiler driver to control the cooperation between
the CompCert executable and the external cross compiler required for preprocessing, assembling and linking.
3 CompCert Design
Like other compilers, CompCert is structured as a pipeline of compilation passes,
depicted in Fig. 2 along with the intermediate languages involved. The 20 passes
bridge the gap between C source files and object code, going through 11 intermediate languages. The passes can be grouped in four successive phases, described in
the following sections.
3.1 CompCert Phases
Parsing Phase 1 performs preprocessing (using an off-the-shelf preprocessor such
as that of GCC), tokenization and parsing into an ambiguous abstract syntax tree
(AST), and type-checking and scope resolution, obtaining a precise, unambiguous
AST and producing error and warning messages as appropriate. The parser is automatically generated from the grammar of the C language by the Menhir parser generator, along with a Coq5 proof of correctness of the parser (Jourdan et al.2012).
Optionally, some features of C that are not handled by the verified front-end are
implemented by source-to-source rewriting over the AST. For example, bit-fields
in structures are transformed into regular fields plus bit shifting and masking. The
subset of the C language handled here is very large, including all of MISRA-C 2004
(Motor Industry Software Reliability Association 2004) and almost all of ISO C99
(ISO 1999), with the exception of variable-length arrays and unstructured, nonMISRA switch statements (e.g. Duff's device6).
3
http://compcert.inria.fr/download.html
https://github.com/AbsInt/CompCert
5 Coq is a formal proof management system. It provides a formal language to write mathematical
definitions, executable algorithms and theorems together with an environment for semi-interactive
development of machine-checked proofs (http://coq.inria.fr).
6 http://en.wikipedia.org/wiki/Duff’s_device
4
Closing the Gap – The Formally Verified Optimizing Compiler CompCert
5
Fig. 2. CompCert Phases
C front-end compiler The second phase first re-checks the types inferred for expressions, then determines an evaluation order among the several permitted by the
C standard. This is achieved by pulling side effects (assignments, function calls)
outside of expressions, turning them into independent statements. Then, local variables of scalar types whose addresses are never taken (using the & operator) are
identified and turned into temporary variables; all other local variables are allocated
in the stack frame. Finally, all type-dependent behaviours of C (overloaded arithmetic operators, implicit conversions, layout of data structures) are made explicit
through the insertion of explicit conversions and address computations. The frontend phase outputs Cminor7 code.
Back-end compiler The third phase comprises 12 of the passes of CompCert, including all optimizations and most dependencies on the target architecture. It
bridges the gap between the output of the front-end and the assembly code by progressively refining control (from structured control to control-flow graphs to labels
and jumps) and function-local data (from temporary variables to hardware registers
and stack-frame slots). The most important optimization performed is register allocation, which uses the sophisticated Iterated Register Coalescing algorithm (George
7
Cminor is a simple, untyped intermediate language featuring both structured (if/else, loops)
and unstructured control (goto).
6
D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand
and Appel 1996). Other optimizations include function inlining, instruction selection, constant propagation, common subexpression elimination (CSE), and redundancy elimination. These optimizations implement several strategies to eliminate
computations that are useless or redundant, or to turn them into equivalent but
cheaper instruction sequences. Loop optimizations and instruction scheduling optimizations are not implemented yet.
Assembling The final phase of CompCert takes the AST for assembly language
produced by the back-end, prints it in concrete assembly syntax, adds DWARF debugging information coming from the parser, and calls an off-the-shelf assembler
and linker to produce object files and executable files. To improve confidence, the
translation validation tool Valex re-checks the executable file produced by the
linker against the assembly language AST produced by the back-end.
3.2 CompCert Annotations
CompCert provides a general mechanism to attach free-form annotations with formal semantics (plain text possibly mentioning the values of variables) to C program
points. The annotations are transported throughout compilation, all the way to the
generated assembly code, where variable names are expressed in terms of machine
code addresses and machine registers. A simple example is the annotation:
__builtin_annot("x is %1 and y is %2", x, y);
The formal semantics of such an annotation is that of a pro forma “print” statement:
when executed, an observable event is added to the trace of I/O operations which
records the text of the annotation and the values of the argument variables x and y.
In the generated machine code, annotations produce no instructions, just an assembler comment or debugging information consisting of the text of the annotations
where the escapes (%1 and %2) are replaced by the actual locations (in registers or
in memory) where the argument variables x and y were placed by the compiler.
Hence we obtain:
# annotation: x is r7 and y is mem(word,r1+16)
if x was allocated to register r7 and y was allocated to a stack location at offset 16
from the stack pointer r1.
A first advantage of this mechanism is that it provides proven traceability: the
link between machine-level storage cells and source-level variables is covered by
the proof. Another typical use of annotations is to track pieces of code such as library function symbols. We can put annotations at the beginning and the end of
every library function symbol, recording the values of its arguments and result var-
Closing the Gap – The Formally Verified Optimizing Compiler CompCert
7
iables. The semantic preservation proof therefore guarantees that symbols are entered and finished in the same order and with the same arguments and results, both
in the source and generated code. This ensures in particular that the compiler did
not reorder or otherwise alter the sequence of symbol invocations present in the
source program -- a guarantee that cannot be obtained by observing system calls
and volatile memory accesses only.
A third application of the annotation mechanism is to enable WCET tools to
compute more precise worst-case execution time (WCET) bounds. Indeed, WCET
tools like aiT (Ferdinand and Heckmann 2004) operate directly on the executable
code, but they sometimes require programmers to provide additional information
(e.g. the bound of a while loop) that cannot easily be reconstructed from the machine
code alone. When using CompCert, such information can be safely extracted from
annotations inserted at the source code level. A tool automating this task was developed by Airbus: it generates a machine-level annotation file usable by the aiT
WCET Analyser. Compiling a whole flight control software from Airbus (about 4
MB of assembly code) with CompCert resulted in significantly improved performance in terms of WCET bounds and code size (Bedin Franca et al.2012).
4 The CompCert Proof
The CompCert front-end and back-end compilation passes are all formally proved
to be free of miscompilation errors; as a consequence, so is their composition. The
property that is formally verified is semantic preservation between the input code
and output code of every pass.
4.1 Operational Semantics
To state the semantic preservation property with mathematical precision, we give
formal semantics for every source, intermediate and target language, from C to assembly. These semantics associate to each program the set of all its possible behaviours. These behaviours indicate whether the program terminates (normally by exiting or abnormally by causing a run-time error such as dereferencing the null
pointer) or runs forever. Behaviours also contain a trace of all observable input/output actions performed by the program, such as system calls, annotations as described in Sec. 3.2., and accesses to “volatile” memory areas that could correspond
to a memory-mapped I/O device.
Technically, the semantics of the various languages are specified in small-step
operational style as labelled transition systems (LTS). A LTS is a mathematical recurrentsta te
nextstate that describes one step of execution of the
lation
program and its effect on the program state. For assembly languages, program states
trace
8
D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand
are just the contents of processor registers and memory locations. For higher-level
languages such as C, program states have a richer structure, including memory contents, an abstract program point designating the statement or expression to execute
next, environments mapping variables to memory locations, as well as an abstraction of the stack of function calls.
A generic construction defines the observable behaviours from these transition
systems, by iterating transitions from an initial state (the initial call to the main
S1 ... Such sequences of transitions can go on infifunction): S 0
t1
t2
nitely, denoting a program that runs forever, or stop on a state S n from which no
transition is possible, denoting a terminating execution. The concatenation of the
traces t1.t2… describes the I/O actions performed. Several behaviours are possible
for the same program if non-determinism is involved. This can be internal non-determinism (e.g. multiple possible evaluation orders in C) or external non-determinism (e.g. reading from a memory-mapped device can produce multiple results depending on I/O behaviours).
4.2 Semantic Preservation
To a first approximation, a compiler preserves semantics if the generated code has
exactly the same set of observable behaviours as the source code (same termination
properties, same I/O actions). This first approximation fails to account for two important degrees of freedom left to the compiler. First, the source program can have
several possible behaviours: this is the case for C, which permits several evaluation
orders for expressions. A compiler is allowed to reduce this non-determinism by
picking one specific evaluation order. Second, a C compiler can “optimize away”
run-time errors present in the source code, replacing them by any behaviour of its
choice. (This is the essence of the notion of “undefined behaviour” in the ISO C
standards.) As an example consider an out-of-bounds array access:
int main(void)
{ int t[2]; // feasible indices are 0 and 1
t[2] = 1; // out of bounds
return 0;
}
This is undefined behaviour according to ISO C, and a run-time error according
to the formal semantics of CompCert C. The generated assembly code does not
check array bounds and therefore writes 1 in a stack location. This location can be
padding, in which case the compiled program terminates normally, or can contain
the return address for “main”, smashing the stack and causing execution to continue
at address 1, with unpredictable effects. Finally, an optimizing compiler like Comp-
Closing the Gap – The Formally Verified Optimizing Compiler CompCert
9
Cert can notice that the assignment to t[2] is useless (the t array is not used afterwards) and remove it from the generated code, causing the compiled program to
terminate normally.
To address the two degrees of flexibility mentioned above, CompCert's formal
verification uses the following definition of semantic preservation, viewed as a refinement over observable behaviours:
Definition 1 (Semantic Preservation): If the compiler produces compiled code C
from source code S, without reporting compile-time errors, then every observable
behaviour of C is either identical to an allowed behaviour of S, or improves over
such an allowed behaviour of S by replacing undefined behaviours with more defined behaviours.
The semantic preservation property is a corollary of a stronger property, called a
simulation diagram that relates the transitions that C can make with those that S can
make. First, 15 such simulation diagrams are proved independently, one for each
pass of the front-end and back-end compilers. Then, the diagrams are composed
together, establishing semantic preservation for the whole compiler.
The proofs are very large, owing to the many passes and the many cases to be
considered – too large to be carried using pencil and paper. We therefore use machine assistance in the form of the Coq proof assistant. Coq gives us means to write
precise, unambiguous specifications; conduct proofs in interaction with the tool;
and automatically re-check the proofs for soundness and completeness. We therefore achieve very high levels of confidence in the proof. At 100,000 lines of Coq
and 6 person-years of effort, CompCert's proof is among the largest ever performed
with a proof assistant.
4.3 Separate Compilation and Linking
In Definition 1, semantic preservation is stated in terms of whole programs: the
source program S is compiled in one run of the compiler to an executable program
C, whose semantics is then related to that of S. This is not how compilers are used
in practice: the source program is composed of several compilation units residing
in different files; each unit is separately compiled to an object file; finally, the executable is obtained by linking together the object files.
The implementation of CompCert supports this familiar separate compilation
scenario (the -c command-line option). However, until release 2.7, the proof of
semantic preservation did not cover this scenario, leaving open the possibility that
CompCert could miscompile when used for separate compilation. Kang et al. (Kang
et al.2016) found an example of this problem in CompCert 2.5. Consider the declaration:
int * const p;
10
D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand
If this is the only declaration of p in the program, it gets initialized to the default
value for a pointer, namely the null pointer. Since p is const, it keeps this value
through the execution of the program. The alias analysis of CompCert 2.5 was building on those observations to conclude that p has an empty points-to set. Memory
accesses were then optimized under this assumption. All this is correct in a wholeprogram scenario, where the compiler sees that the declaration above is the only
declaration of p. However, after separate compilation of the unit containing the declaration above, the unit can be linked with another unit that declares p with a nonnull initialization:
int x; int * const p = &x;
In the resulting executable program, p is not the null pointer, and the optimizations performed by CompCert 2.5 can be wrong.
This particular issue was fixed in CompCert 2.6 by making the alias analysis
more conservative. However, we still missed formal evidence that all CompCert
optimizations are correct in the presence of separate compilation. To this end, and
following the approach invented by Kang et al. (Kang et al.2016), CompCert 2.7
strengthens the statement and proof of semantic preservation to take separate compilation and linking into account:
Definition 1 (Semantic preservation with separate compilation): Consider n
source compilation units S1, … Sn that compile separately to compiled units C1, …,
Cn without reporting compile-time errors. Assume that the source units link together
without error to a whole source program S = S1 … Sn. Then, the compiled units
link without errors to a whole compiled program C = C 1 … Cn. Moreover,
every observable behaviour of C is either identical to or improved upon an allowed
behaviour of S, as in Definition 1.
This approach of Kang et al. relies on a notion of syntactic linking between two
or more compilation units, written in the definition above, that extends the operations traditionally performed over object files by linkers to all the source, intermediate, and target languages of CompCert. For instance, in the case of the source
CompCert C language, syntactic linking is defined by considering all declarations
of identically-named global variables and functions. If the declarations are compatible, as in extern int x and int x = 1, the most precise declaration is retained
(int x = 1). If two declarations are incompatible, such as int x = 1 and int
x = 2, syntactic linking fails.
A limitation of this approach is that it describes only linking between compilation units written in the same language, but not linking between, say, a C source file
and a hand-written assembly file. Formalizing and reasoning upon such cross-language linking and interoperability is a difficult, active research problem (Ahmed
2015, Neis et al.2015, Stewart et al.2015).
Closing the Gap – The Formally Verified Optimizing Compiler CompCert
11
5 Translation Validation
Currently the verified part of the compilation tool chain ends at the generated assembly code. In order to bridge this gap we have developed a tool for automatic
translation validation, called Valex, which validates the assembling and linking
stages a posteriori.
Fig. 3. Translation Validation with Valex
Valex checks the correctness of the assembling and linking of a statically and
fully linked executable file PE against the internal abstract assembly representation
PA produced by CompCert from the source C program PS. The internal abstract assembly as well as the linked executable are passed as arguments to the Valex tool.
The main goal is to verify that every function defined in a C source file compiled
by CompCert and not optimized away by it can be found in the linked executable
and that its disassembled machine instructions match the abstract assembly code.
To that end, after parsing the abstract assembly code Valex extracts the symbol table
and all sections from the linked executable. Then the functions contained in the
abstract assembly code are disassembled. Extraction and disassembling is done by
two invocations of exec2crl, the executable reader of aiT and StackAnalyzer (Abs
2016). Apart from matching the instructions in the abstract assembly code against
the instructions contained in the linked executable Valex also checks whether symbols are used consistently, whether variable size and initialization data correspond
and whether variables are placed in the right sections in the executable.
Currently Valex can check linked PowerPC executables that have been produced
from C source code by the CompCert C compiler using the Diab assembler and
linker from Wind River Systems, or the GCC tool chain (version 4.8, together with
GNU binutils 2.24).
6 The Reference Interpreter
The CompCert compiler also provides an interpreter that can execute simple C programs without compilation. More precisely, preprocessing, parsing and initial elaboration are performed; the resulting CompCert C abstract syntax is then executed
by interpretation.
12
D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand
This is a reference interpreter, meaning that it implements exactly the formal
semantics of CompCert C against which CompCert is proved correct. In particular,
all behaviours that are undefined in the formal semantics are reported as such by the
interpreter. In contrast, compiling a program that invokes undefined behaviour often
causes this behaviour to become defined or be optimized away, making it impossible to observe by running the compiled executable. Likewise, the reference interpreter can explore all evaluation orders allowed by the CompCert C formal semantics, while CompCert, as a compiler, implements only one of the possible evaluation
orders. This makes the reference interpreter very useful to explore the CompCert C
semantics and test C code fragments for undefined behaviours.
Here is an example of such an exploration. Consider the program:
#include <stdio.h>
int x[2] = { 12, 34 };
int main(void)
{
int i = 65536 * 65536 + 2; // will overflow
printf("i = %d\n", i);
printf("x[i] = %d\n", x[i]);
}
Running it with the -interp -quiet options through CompCert, we obtain:
i = 2
Stuck state: in function main, expression
printf(<ptr __stringlit_2>, <loc x+8>)
Stuck subexpression: <loc x+8>
ERROR: Undefined behaviour
The first line (i = 2) is the output of the printf statement. It shows that the
arithmetic overflow in the computation of i is not undefined behaviour in CompCert C but is defined modulo 232. The following lines diagnose an undefined behaviour, namely accessing the array x outside of its bounds. (Here, <loc x + 8>
means dereferencing the memory location 8 bytes past the beginning of x.) A trace option is available which provides a full trace of interpretation, showing
every execution step taken and every intermediate state.
Technically, the reference interpreter is obtained and proved correct as follows.
In Coq, a computable function step from execution states to sets of (observable
events, execution states) pairs is defined, then proved sound and complete with respect to the transition relation of the CompCert C operational semantics:
S
t
S ' (t , S ' ) step ( S )
Closing the Gap – The Formally Verified Optimizing Compiler CompCert
13
The step function is then extracted to OCaml8 code and linked with handwritten
code that iterates step to form transition sequences. By default, only one successor
state in step(S) is deterministically chosen, but the -random option causes this
choice to be made randomly between all possible successors, and the –all option
triggers an exhaustive breadth-first exploration instead.
There are some limitations with using CompCert in reference interpreter mode.
First, the only standard C library functions supported are printf, malloc and
free. Hence, the only programs that can currently be interpreted are self-contained
tests with fixed inputs. Second, interpretation is 105 to 106 times slower than execution of compiled code, unless exhaustive exploration is requested, in which case
interpretation is exponentially slower.
Despite these limitations, we found the reference interpreter of CompCert useful:
first, to animate the formal semantics of CompCert C, helping build confidence in
it; second, to test code fragments for undefined behaviours.
7 The Confidence Argument
As described in Sec. 4 all of CompCert's front-end and back-end compilation passes
are formally proved to be free of miscompilation errors. These formal proofs bring
strong confidence in the correctness of the front-end and back-end parts of CompCert. These parts include all optimizations – which are particularly difficult to qualify by traditional methods – and most code generation algorithms. As described in
Sec. 2.1 the source code and the corresponding proofs are freely available, as is the
proof assistant Coq. So the source code is amenable to manual review, the proof is
reproducible for everybody and can be manually reviewed as well.
The formal proofs do not cover the following aspects:
1. The preprocessing phase
2. The correctness of the specifications used for the formal proof, i.e. the
formal semantics of C and assembly,
3. Elements of the parsing phase, mostly lexing, type checking and elaboration
4. The assembly and linking phase.
Those aspects can be handled well by traditional qualification methods, i.e. via
a validation suite, to complement the formal proofs. A validation suite for CompCert is currently in development and will be available from AbsInt.
Especially the parsing phase (cf. item 3) can be seen as a straightforward code
generation pass which does not include any optimizations and only performs local
transformations. Since the internal complexity of this stage is low, systematic testing provides good confidence. CompCert can print the result of parsing in concreteC
syntax, facilitating comparison with the C source.
8
https://ocaml.org
14
D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand
However, it is possible to provide additional confidence beyond the significance
of the validation suite, in particular for items 1 and 4. The CompCert reference interpreter described in Sec. 6 can be used to systematically test the C semantics on
which the compiler operates. Likewise, the Valex validator described in Sec. 5 provides confidence in the correctness of the assembling and linking phase. It performs
translation validation of the generated code which is a widely accepted validation
method (Pnueli et al.1998).
At the highest assurance levels, qualification arguments may have to be provided
for the tools that produce the executable CompCert compiler from its verified
sources, namely the “extraction” mechanism of Coq, which produces OCaml code
from the Coq development, combined with the OCaml compiler. We are currently
experimenting with an alternate execution path for CompCert that relies on Coq's
built-in program execution facilities, bypassing extraction and OCaml compilation.
This alternate path runs CompCert much more slowly than the normal path, but fast
enough so that it can be used as a validator for selected runs of normal CompCert
executions.
In summary, CompCert provides unprecedented confidence in the correctness of
the compilation phase: the 'normal' level of confidence is reached by providing a
validation suite, which is currently accepted best practice; the formal proofs provide
much higher levels of confidence concerning the correctness of optimizations and
code generation strategies; finally, the Valex translation validator provides additional confidence in the correctness of the assembling and linking stages.
Fig. 4. Execution Time Comparison for SPEC Benchmarks on PowerPC
8 Practical Experience
CompCert targets the following three architectures: 32-bit PowerPC, ARMv6 and
above, and IA32 (i.e. Intel/AMD x86 in 32-bit mode with SSE2 extension). The
Closing the Gap – The Formally Verified Optimizing Compiler CompCert
15
result of the SPEC CPU20069 benchmarks measured on a PowerPC G5 are illustrated in Fig. 4 and Fig. 5, where Fig. 4 shows the execution time of the generated
code and Fig. 5 its size. The experiments show that the code generated by CompCert
runs about 40% faster than the code generated by GCC without optimizations, approximately 12% slower than GCC 4 at optimization level 1, and 20% slower than
GCC 4 at optimization level 2.
Fig. 5. Code Size Comparison for SPEC Benchmarks on PowerPC.
Regarding code size, the code generated by CompCert in modes -Os is about 1%
smaller than in mode -O; it is about 40% smaller than the code generated by GCC O0, similar to the code size of GCC -O1 (less than 1% difference), 5% smaller than
GCC -O2, and about 20% larger than the code of GCC -Os.
Since SPEC is a general-purpose compiler benchmark we also considered another benchmark which is more oriented towards embedded computing. This suite
comprises computational kernels from various application areas: signal processing,
physical simulation, 3d graphics, text compression, and cryptography.
The results are similar than with the SPEC benchmarks: executing the code generated by CompCert -O reduces execution time to 48% compared to GCC -O0, GCC
-O1 achieves 45%, and GCC -O2 42%. Hence the code generated by CompCert
runs about 52% faster than the code generated by GCC without optimizations, approximately 11% slower than GCC 4 at optimization level 1, and 23% slower than
GCC 4 at optimization level 2.
Regarding code size, the code generated by CompCert in modes -Os here is less
than 1% smaller than in mode -O; it is about 17% smaller than the code generated
by GCC -O0, 4% larger than the code size of GCC -O1, similar to GCC-O2 (difference smaller than 1%), and about 5% larger than the code of GCC -Os.
In general, due to lack of aggressive loop optimizations, performance is lower
on HPC 10 codes involving dense matrix computations. This is also the main reason
9
http://www.spec.org/cpu2006
High-Performance Computing
10
16
D. Kästner, X. Leroy, S. Blazy, B. Schommer, M. Schmidt, C. Ferdinand
for the difference in execution time between CompCert and GCC with high optimization levels.
The performance of CompCert on ARM is similar to the PowerPC architecture.
On IA32, due to its paucity of registers and its specific calling conventions, CompCert is approximately 20% slower than GCC 4 at optimization level 1 on the benchmark suite.
9 Conclusion
CompCert is a formally verified optimizing C compiler: the executable code it produces is proved to behave exactly as specified by the semantics of the source C
program. Experimental studies and practical experience demonstrate that it generates efficient and compact code. Further requirements for industrial application, notably the availability of debug information, and support for Linux and Windows
platforms have been established. Explicit traceability mechanisms enable a seamless mapping from source code properties to properties of the executable object
code. We have summarized the confidence argument for CompCert, which makes
it uniquely well-suited for highly critical applications.
References
[Abs 2016] AbsInt GmbH, Saarbrücken, Germany. AbsInt Advanced Analyzer for PowerPC, April
2016. User Documentation.
[Ahmed 2015] Amal Ahmed. Verified compilers for a multi-language world. In SNAPL 2015: 1st
Summit on Advances in Programming Languages, volume 32 of LIPIcs, pages 15–31. Schloss
Dagstuhl - Leibniz-Zentrum fuer Informatik, 2015.
[Bedin Franca et al.2012] Ricardo Bedin Franca, Sandrine Blazy, Denis Favre-Felix, Xavier
Leroy, Marc Pantel, and Jean Souyris. Formally verified optimizing compilation in ACG-based
flight control software. In ERTS 2012: Embedded Real Time Software and Systems, 2012.
[Eide and Regehr 2008] Eric Eide and John Regehr. Volatiles are miscompiled, and what to do
about it. In EMSOFT ’08, pages 255–264. ACM, 2008.
[Ferdinand and Heckmann 2004] Christian Ferdinand and Reinhold Heckmann. aiT: Worst-Case
Execution Time Prediction by Static Programm Analysis. In René Jacquart, editor, Building
the Information Society. IFIP 18th World Computer Congress, pages 377–384. Kluwer, 2004.
[George and Appel 1996] Lal George and Andrew W. Appel. Iterated register coalescing. ACM
Trans. Prog. Lang. Syst., 18(3):300–324, 1996.
[ISO 1999] ISO. International standard ISO/IEC 9899:1999, Programming languages – C, 1999.
[Jourdan et al.2012] Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. Validating
LR(1) parsers. In ESOP 2012: 21st European Symposium on Programming, volume 7211 of
LNCS, pages 397–416. Springer, 2012.
[Kang et al.2016] Jeehoon Kang, Yoonseung Kim, Chung-Kil Hur, Derek Dreyer, and Viktor
Vafeiadis. Lightweight verification of separate compilation. In POPL 2016: 43rd symposium
on Principles of Programming Languages, pages 178–190. ACM, 2016.
[McCarthy and Painter 1967] John McCarthy and James Painter. Correctness of a compiler for
arithmetic expressions. In Mathematical Aspects of Computer Science, volume 19, pages 33–
41, 1967.
Closing the Gap – The Formally Verified Optimizing Compiler CompCert
17
[Motor Industry Software Reliability Association 2004] Motor Industry Software Reliability Association. MISRA-C: 2004 – Guidelines for the use of the C language in critical systems, 2004.
[Neis et al.2015] Georg Neis, Chung-Kil Hur, Jan-Oliver Kaiser, Craig McLaughlin, Derek
Dreyer, and Viktor Vafeiadis. Pilsner: a compositionally verified compiler for a higher-order
imperative language. In ICFP 2015: 20th International Conference on Functional Programming, pages 166–178. ACM, 2015.
[NULLSTONE Corporation 2007] NULLSTONE Corporation. NULLSTONE for C. http://www.nullstone.com/htmls/ns-c.htm, 2007.
[Pnueli et al.1998] Amir Pnueli, Michael Siegel, and Eli Singerman. Translation validation. In
TACAS’98: Tools and Algorithms for Construction and Analysis of Systems, volume 1384 of
LNCS, pages 151–166. Springer, 1998.
[Stewart et al.2015] Gordon Stewart, Lennart Beringer, Santiago Cuellar, and Andrew W. Appel.
Compositional CompCert. In POPL 2015: 42rd symposium on Principles of Programming
Languages, pages 275–287. ACM, 2015.
[Yang et al.2011] Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In PLDI ’11, pages 283–294. ACM, 2011.