Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
  • Xavier Leroy is professor of Software sciences at Collège de France andmember of the Cambium research team of Inria P... moreedit
The correct compilation of block diagram languages like Lustre, Scade, and a discrete subset of Simulink is important since they are used to program critical embedded control software. We describe the specification and verification in an... more
The correct compilation of block diagram languages like Lustre, Scade, and a discrete subset of Simulink is important since they are used to program critical embedded control software. We describe the specification and verification in an Interactive Theorem Prover of a compilation chain that treats the key aspects of Lustre: sampling, nodes, and delays. Building on CompCert, we show that repeated execution of the generated assembly code faithfully implements the dataflow semantics of source programs. We resolve two key technical challenges. The first is the change from a synchronous dataflow semantics, where programs manipulate streams of values, to an imperative one, where computations manipulate memory sequentially. The second is the verified compilation of an imperative language with encapsulated state to C code where the state is realized by nested records. We also treat a standard control optimization that eliminates unnecessary conditional statements.
Research Interests:
The paper addresses theoretical and practical aspects of implementing multi-stage languages using abstract syntax trees (ASTs), gensym, and reflection. We present an operational account of the correctness of this approach, and report on... more
The paper addresses theoretical and practical aspects of implementing multi-stage languages using abstract syntax trees (ASTs), gensym, and reflection. We present an operational account of the correctness of this approach, and report on our experience with a bytecode compiler called MetaOCaml that is based on this strategy. Current performance measurements reveal interesting characteristics of the underlying OCaml compiler, and illustrate why this strategy can be particularly useful for implementing domain-specific languages in a typed, functional setting.
Preliminary version of \cite{Pessaux-Leroy-exn
This paper present a new approach to the polymorphic typing of data accepting in-place modification in ML-like languages. This approach is based on restrictions over type generalization, and a refined typing of functions. The type system... more
This paper present a new approach to the polymorphic typing of data accepting in-place modification in ML-like languages. This approach is based on restrictions over type generalization, and a refined typing of functions. The type system given here leads to a better integration of imperative programming style with the purely applicative kernel of ML.  In particular, generic functions that allocate mutable data can safely be given fully polymorphic types. We show the soundness of this type system, and give a type reconstruction algorithm.
We compare the efficiency of type-based unboxing strategies with that of simpler, untyped unboxing optimizations, building on our practical experience with the Gallium and Objective Caml compilers. We find the untyped optimizations to... more
We compare the efficiency of type-based unboxing strategies with that of simpler, untyped unboxing optimizations, building on our practical experience with the Gallium and Objective Caml compilers. We find the untyped optimizations to perform as well on the best case and significantly better in the worst case.
This paper presents a program transformation that allows languages with polymorphic typing (e.g. ML) to be implemented with unboxed, multi-word data representations, more efficient than the conventional boxed representations. The... more
This paper presents a program transformation that allows languages with polymorphic typing (e.g. ML) to be implemented with unboxed, multi-word data representations, more efficient than the conventional boxed representations. The transformation introduces coercions between various representations, based on a typing derivation. A prototype ML compiler utilizing this transformation demonstrates important speedups.
Preliminary version of \cite{Leroy-Rouaix-99
Short version of \cite{Leroy-repres
This article investigates an ML-like language with by-name semantics for polymorphism: polymorphic objects are not evaluated once for all at generalization time, but re-evaluated at each specialization. Unlike the standard ML semantics,... more
This article investigates an ML-like language with by-name semantics for polymorphism: polymorphic objects are not evaluated once for all at generalization time, but re-evaluated at each specialization. Unlike the standard ML semantics, the by-name semantics works well with polymorphic references and polymorphic continuations: the naive typing rules for references and for continuations are sound with respect to this semantics. Polymorphism by name leads to a better integration of these imperative features into the ML type discipline. Practical experience shows that it retains most of the efficiency and predictability of polymorphism by value.
Preliminary version of \cite{Leroy-Mauny-dynamics}.
The ML module system provides powerful parameterization facilities, but lacks the ability to split mutually recursive definitions across modules, and does not provide enough facilities for incremental programming. A promising approach to... more
The ML module system provides powerful parameterization facilities, but lacks the ability to split mutually recursive definitions across modules, and does not provide enough facilities for incremental programming.  A promising approach to solve these issues is Ancona and Zucca's mixin modules calculus CMS.  However, the straightforward way to adapt it to ML fails, because it allows arbitrary recursive definitions to appear at any time, which ML does not support.  In this paper, we enrich CMS with a refined type system that controls recursive definitions through the use of dependency graphs.  We then develop a separate compilation scheme, directed by dependency graphs, that translate mixin modules down to a CBV lambda-calculus extended with a non-standard let rec construct
Preliminary version of \cite{Leroy-oncard-verifier-journal
Preliminary version of \cite{Leroy-bytecode-verification-03}.
This paper presents a variant of the SML module system that introduces a strict distinction between abstract types and manifest types (types whose definitions are part of the module specification), while retaining most of the expressive... more
This paper presents a variant of the SML module system that introduces a strict distinction between abstract types and manifest types (types whose definitions are part of the module specification), while retaining most of the expressive power of the SML module system.  The resulting module system provides much better support for separate compilation.
A short survey of the uses of types in compilation.
We present a variant of the Standard ML module system where parameterized abstract types (i.e. functors returning generative types) map provably equal arguments to compatible abstract types, instead of generating distinct types at each... more
We present a variant of the Standard ML module system where parameterized abstract types (i.e. functors returning generative types) map provably equal arguments to compatible abstract types, instead of generating distinct types at each application as in Standard ML. This extension solves the full transparency problem (how to give syntactic signatures for higher-order functors that express exactly their propagation of type equations), and also provides better support for non-closed code fragments.
Motivated by applications to proof assistants based on dependent types, we develop and prove correct a strong reducer and $\beta$-equivalence checker for the $\lambda$-calculus with products, sums, and guarded fixpoints. Our approach is... more
Motivated by applications to proof assistants based on dependent types, we develop and prove correct a strong reducer and $\beta$-equivalence checker for the $\lambda$-calculus with products, sums, and guarded fixpoints.  Our approach is based on compilation to the bytecode of an abstract machine performing weak reductions on non-closed terms, derived with minimal modifications from the ZAM machine used in the Objective Caml bytecode interpreter, and complemented by a recursive ``read back'' procedure.  An implementation in the Coq proof assistant demonstrates important speed-ups compared with the original interpreter-based implementation of strong reduction in Coq.
This paper presents the design and implementation of a ``quasi real-time'' garbage collector for Concurrent Caml Light, an implementation of ML with threads. This two-generation system combines a fast, asynchronous copying collector on... more
This paper presents the design and implementation of a ``quasi real-time'' garbage collector for Concurrent Caml Light, an implementation of ML with threads. This two-generation system combines a fast, asynchronous copying collector on the young generation with a non-disruptive concurrent marking collector on the old generation.  This design crucially relies on the ML compile-time distinction between mutable and immutable objects.
This paper reports on skeleton-based parallel programming in the context of  the Caml functional language.  An experimental implementation, based on TCP sockets and marshaling of function closures, is described and assessed.
We investigate the use of the dot notation in the context of abstract types. The dot notation -- that is, a.f referring to the operation f provided by the abstraction a -- is used by programming languages such as Modula-2 and CLU. We... more
We investigate the use of the dot notation in the context of abstract types.  The dot notation -- that is, a.f referring to the operation f provided by the abstraction a -- is used by programming languages such as Modula-2 and CLU. We compare this notation with the Mitchell-Plotkin approach, which draws a parallel between type abstraction and (weak) existential quantification in constructive logic. The basic operations on existentials coming from logic give new insights about the meaning of type abstraction, but differ completely from the more familiar dot notation. In this paper, we formalize simple calculi equipped with the dot notation, and relate them to a more classical calculus a la Mitchell and Plotkin. This work provides some theoretical foundations for the dot notation, and suggests some useful extensions.
En este artículo estudiamos la relación entre el llamado de procedimientos a distancia (RPC) y los lenguajes con tipaje estático y abstracción de tipos. En particular, mostramos como explotar la información de tipos afin de reducir el... more
En este artículo estudiamos la relación entre el llamado de procedimientos a distancia (RPC) y los lenguajes con tipaje estático y abstracción de tipos.  En particular, mostramos como explotar la información de tipos afin de reducir el tiempo de transmisión de datos a través de la red.  Con este propósito, desarrollamos una formalisación simple que describe la generación automática de interfaces eficientes de comunicación.  Terminamos nuestro estudio con una prueba de corrección que muestra la equivalencia entre la evaluación local y la evaluación distribuída de todo programa.
A short survey on language-base computer security. Extended abstract of invited lecture.
This paper formalizes and proves correct a compilation scheme for mutually-recursive definitions in call-by-value functional languages. This scheme supports a wider range of recursive definitions than standard call-by-value recursive... more
This paper formalizes and proves correct a compilation scheme for mutually-recursive definitions in call-by-value functional languages. This scheme supports a wider range of recursive definitions than standard call-by-value recursive definitions. We formalize our technique as a translation scheme to a lambda-calculus featuring in-place update of memory blocks, and prove the translation to be faithful.
Mixin modules are a framework for modular programming that supports code parameterization, incremental programming via late binding and redefinitions, and cross-module recursion. In this paper, we develop a language of mixin modules that... more
Mixin modules are a framework for modular programming that supports code parameterization, incremental programming via late binding and redefinitions, and cross-module recursion.  In this paper, we develop a language of mixin modules that supports call-by-value evaluation, and formalize a reduction semantics and a sound type system for this language.
This paper reports on the correctness proof of compiler optimizations based on data-flow analysis. We formulate the optimizations and analyses as instances of a general framework for data-flow analyses and transformations, and prove... more
This paper reports on the correctness proof of compiler optimizations based on data-flow analysis.  We formulate the optimizations and analyses as instances of a general framework for data-flow analyses and transformations, and prove that the optimizations preserve the behavior of the compiled programs.  This development is a part of a larger effort of certifying an optimizing compiler by proving semantic equivalence between source and compiled code.
The open-source software community is now comprised of a very large and growing number of contributors and users. The GNU/Linux operating system for instance has an estimated 18 million users worldwide and its contributing developers can... more
The open-source software community is now comprised of a very large and growing number of contributors and users. The GNU/Linux operating system for instance has an estimated 18 million users worldwide and its contributing developers can be counted by thousands. The critical mass of contributors taking part in various opensource projects has helped to ensure high quality for open source software. However, despite the achievements of the open-source software industry, there are issues in the production of large scale open-source software (OSS) such as the GNU/Linux operating system that have to be addressed as the numbers of users, of contributors, and of available applications grow. EDOS is a European project supported by IST started October 2004 and ending in 2007, whose objective is to provide a new generation of methodologies, theoretical models, technical tools and quality models specifically tailored to OSS engineering and to software distribution over the Internet.
This paper presents a formal verification with the Coq proof assistant of a memory model for C-like imperative languages. This model defines the memory layout and the operations that manage the memory. The model has been specified at two... more
This paper presents a formal verification with the Coq proof assistant of a memory model for C-like imperative languages. This model defines the memory layout and the operations that manage the memory. The model has been specified at two levels of abstraction and implemented as part of an ongoing certification in Coq of a moderately-optimising C compiler. Many properties of the memory have been verified in the specification. They facilitate the definition of precise formal semantics of C pointers. A certified OCaml code implementing the memory model has been automatically extracted from the specifications.
This paper reports on the development and formal certification (proof of semantic preservation) of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proof assistant both for programming the... more
This paper reports on the development and formal certification (proof of semantic preservation) of a compiler from Cminor (a C-like imperative language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness.  Such a certified compiler is useful in the context of formal methods applied to the certification of critical software: the certification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well.
This paper illustrates the use of co-inductive definitions and proofs in big-step operational semantics, enabling the latter to describe diverging evaluations in addition to terminating evaluations. We show applications to proofs of type... more
This paper illustrates the use of co-inductive definitions and proofs in big-step operational semantics, enabling the latter to describe diverging evaluations in addition to terminating evaluations. We show applications to proofs of type soundness and to proofs of semantic preservation for compilers.  (See http://gallium.inria.fr/~xleroy/publi/coindsem/ for the Coq on-machine formalization of these results.)
In the mainstream adoption of free and open source software (FOSS), distribution editors play a crucial role: they package, integrate and distribute a wide variety of software, written in a variety of languages, for a variety of purposes... more
In the mainstream adoption of free and open source software (FOSS), distribution editors play a crucial role: they package, integrate and distribute a wide variety of software, written in a variety of languages, for a variety of purposes of unprecedented breadth. Ensuring the quality of a FOSS distribution is a technical and engineering challenge, owing to the size and complexity of these distributions (tens of thousands of software packages). A number of original topics for research arise from this challenge. This paper is a gentle introduction to this new research area, and strives to clearly and formally identify many of the desirable properties that must be enjoyed by these distributions to ensure an acceptable quality level.
Short version of \cite{Appel-Leroy-listmachine-tr}.
This paper presents the formal verification of a compiler front-end that translates a subset of the C language into the Cminor intermediate language. The semantics of the source and target languages as well as the translation between... more
This paper presents the formal verification of a compiler front-end that translates a subset of the C language into the Cminor intermediate language.  The semantics of the source and target languages as well as the translation between them have been written in the specification language of the Coq proof assistant. The proof of observational semantic equivalence between the source and generated code has been machine-checked using Coq.  An executable compiler was obtained by automatic extraction of executable Caml code from the Coq specification of the translator, combined with a certified compiler back-end generating PowerPC assembly code from Cminor, described in previous work.
The widespread adoption of Free and Open Source Software (FOSS) in many strategic contexts of the information technology society has drawn the attention on the issues regarding how to handle the complexity of assembling and managing a... more
The widespread adoption of Free and Open Source Software (FOSS) in many strategic contexts of the information technology society has drawn the attention on the issues regarding how to handle the complexity of assembling and managing a huge number of (packaged) components in a consistent and effective way. FOSS distributions (and in particular GNU/Linux-based ones) have always provided tools for managing the tasks of installing, removing and upgrading the (packaged) components they were made of. While these tools provide a (not always effective) way to handle these tasks on the client side, there is still a lack of tools that could help the distribution editors to maintain, on the server side, large and high-quality distributions. In this paper we present our research whose main goal is to fill this gap: we show our approach, the tools we have developed and their application with experimental results. Our contribution provides an effective and automatic way to support distribution editors in handling those issues that were, until now, mostly addressed using ad-hoc tools and manual techniques.
Transformation to continuation-passing style (CPS) is often performed by optimizing compilers for functional programming languages. As part of the development and proof of correctness of a compiler for the mini-ML functional language,... more
Transformation to continuation-passing style (CPS) is often performed by optimizing compilers for functional programming languages.  As part of the development and proof of correctness of a compiler for the mini-ML functional language, we have mechanically verified the correctness of two CPS transformations for a call-by-value $\lambda$-calculus with $n$-ary functions, recursive functions, data types and pattern-matching.  The transformations generalize Plotkin's original call-by-value transformation and Danvy and Nielsen's optimized transformation, respectively.  We used the Coq proof assistant to formalize the transformations and conduct and check the proofs.  Originalities of this work include the use of big-step operational semantics to avoid difficulties with administrative redexes, and of two-sorted de Bruijn indices to avoid difficulties with $\alpha$-conversion.
Translation validation consists of transforming a program and a posteriori validating it in order to detect a modification of its semantics. This approach can be used in a verified compiler, provided that validation is formally proved to... more
Translation validation consists of transforming a program and a posteriori validating it in order to detect a modification of its semantics. This approach can be used in a verified compiler, provided that validation is formally proved to be correct. We present two such validators and their Coq proofs of correctness.  The validators are designed for two instruction scheduling optimizations: list scheduling and trace scheduling.
Translation validation establishes a posteriori the correctness of a run of a compilation pass or other program transformation. In this paper, we develop an efficient translation validation algorithm for the Lazy Code Motion (LCM)... more
Translation validation establishes a posteriori the correctness of a run of a compilation pass or other program transformation.  In this paper, we develop an efficient translation validation algorithm for the Lazy Code Motion (LCM) optimization. LCM is an interesting challenge for validation because it is a global optimization that moves code across loops.  Consequently, care must be taken not to move computations that may fail before loops that may not terminate.  Our validator includes a specific check for anticipability to rule out such incorrect moves. We present a mechanically-checked proof of correctness of the validation algorithm, using the Coq proof assistant. Combining our validator with an unverified implementation of LCM, we obtain a LCM pass that is provably semantics-preserving and  was integrated in the CompCert formally verified compiler.
Software pipelining is a loop optimization that overlaps the execution of several iterations of a loop to expose more instruction-level parallelism. It can result in first-class performances characteristics, but at the cost of... more
Software pipelining is a loop optimization that overlaps the execution of several iterations of a loop to expose more instruction-level parallelism.  It can result in first-class performances characteristics, but at the cost of significant obfuscation of the code, making this optimization difficult to test and debug. In this paper, we present a translation validation algorithm that uses symbolic evaluation to detect semantics discrepancies between a loop and its pipelined version. Our algorithm can be implemented simply and efficiently, is provably sound, and appears to be complete with respect to most modulo scheduling algorithms.  A conclusion of this case study is that it is possible and effective to use symbolic evaluation to reason about loop transformations.
Following the translation validation approach to high-assurance compilation, we describe a new algorithm for validating {\em a posteriori} the results of a run of register allocation. The algorithm is based on backward dataflow inference... more
Following the translation validation approach to high-assurance compilation, we describe a new algorithm for validating {\em a posteriori} the results of a run of register allocation.  The algorithm is based on backward dataflow inference of equations between variables, registers and stack locations, and can cope with sophisticated forms of spilling and live range splitting, as well as many forms of architectural irregularities such as overlapping registers.  The soundness of the algorithm was mechanically proved using the Coq proof assistant.
Object layout --- the concrete in-memory representation of objects --- raises many delicate issues in the case of the C++ language, owing in particular to multiple inheritance, C compatibility and separate compilation. This paper... more
Object layout --- the concrete in-memory representation of objects --- raises many delicate issues in the case of the C++ language, owing in particular to multiple inheritance, C compatibility and separate compilation.  This paper formalizes a family of C++ object layout scheme and mechanically proves their correctness against the operational semantics for multiple inheritance of Wasserrab {\em et al}.  This formalization is flexible enough to account for space-saving techniques such as empty base class optimization and tail-padding optimization.  As an application, we obtain the first formal correctness proofs for realistic, optimized object layout algorithms, including one based on the popular GNU C++ application binary interface.  This work provides semantic foundations to discover and justify new layout optimizations; it is also a first step towards the verification of a C++ compiler front-end.
The formal verification of programs have progressed tremendously in the last decade. Principled but once academic approaches such as Hoare logic and abstract interpretation finally gave birth to quality verification tools, operating... more
The formal verification of programs have progressed tremendously in the last decade.  Principled but once academic approaches such as Hoare logic and abstract interpretation finally gave birth to quality verification tools, operating over source code (and not just idealized models thereof) and able to verify complex real-world applications. In this talk, I review some of the obstacles that remain to be lifted before source-level verification tools can be taken really seriously in the critical software industry: not just as sophisticated bug-finders, but as elements of absolute confidence in the correctness of a critical application.
This work presents a preliminary evaluation of the use of the CompCert formally specified and verified optimizing compiler for the development of level A critical flight control software. First, the motivation for choosing CompCert is... more
This work presents a preliminary evaluation of the use of the CompCert formally specified and verified  optimizing compiler for the development of level A critical flight control software. First, the motivation for choosing CompCert is presented, as well as the requirements and constraints for safety-critical avionics software. The main point is to allow optimized code generation by relying on the formal proof of correctness instead of the current un-optimized generation required to produce assembly code structurally similar to the algorithmic language (and even the initial models) source code. The evaluation of its performance (measured using WCET) is presented and the results are compared to those obtained with the currently used compiler. Finally, the paper discusses verification and certification issues that are raised when one seeks to use CompCert for the development of such critical software.
We present a formal operational semantics and its Coq mechanization for the C++ object model, featuring object construction and destruction, shared and repeated multiple inheritance, and virtual function call dispatch. These are key C++... more
We present a formal operational semantics and its Coq mechanization for the C++ object model, featuring object construction and destruction, shared and repeated multiple inheritance, and virtual function call dispatch.  These are key C++ language features for high-level system programming, in particular for predictable and reliable resource management.  This paper is the first to present a formal mechanized account of the metatheory of construction and destruction in C++, and applications to popular programming techniques such as ``resource acquisition is initialization.''  We also report on irregularities and apparent contradictions in the ISO C++03 and C++11 standards.
This work presents an evaluation of the CompCert formally specified and verified optimizing compiler for the development of DO-178 level A flight control software. First, some fundamental characteristics of flight control software are... more
This work presents an evaluation of the CompCert formally specified and verified optimizing compiler for the development of DO-178 level A flight control software. First, some fundamental characteristics of flight control software are presented and the case study program is described. Then, the use of CompCert is justified: its main point is to allow optimized code generation by relying on the formal proof of correctness and additional compilation information instead of the current un-optimized generation required to produce predictable assembly code patterns. The evaluation of its performance (measured using WCET and code size) is presented and the results are compared to those obtained with the currently used compiler.
An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next. We present a validator which, when applied to... more
An LR(1) parser is a finite-state automaton, equipped with a stack, which uses a combination of its current state and one lookahead symbol in order to determine which action to perform next.  We present a validator which, when applied to a context-free grammar G and an automaton A, checks that A and G agree.  This validation of the parser provides the correctness guarantees required by verified compilers and other high-assurance software that involves parsing.  The validation process is independent of which technique was used to construct A. The validator is implemented and proved correct using the Coq proof assistant. As an application, we build a formally-verified parser for the C99 language.
This paper reports on the formalization and proof of soundness, using the Coq proof assistant, of an alias analysis: a static analysis that approximates the flow of pointer values. The alias analysis considered is of the points-to kind... more
This paper reports on the formalization and proof of soundness, using the Coq proof assistant, of an alias analysis: a static analysis that approximates the flow of pointer values.  The alias analysis considered is of the points-to kind and is intraprocedural, flow-sensitive, field-sensitive, and untyped.  Its soundness proof follows the general style of abstract interpretation. The analysis is designed to fit in the CompCert C verified compiler, supporting future aggressive optimizations over memory accesses.
The formal verification of compilers and related programming tools depends crucially on the availability of appropriate mechanized semantics for the source, intermediate and target languages. In this invited talk, I review various forms... more
The formal verification of compilers and related programming tools depends crucially on the availability of appropriate mechanized semantics for the source, intermediate and target languages.  In this invited talk, I review various forms of operational semantics and their mechanization, based on my experience with the formal verification of the CompCert~C compiler.
Floating-point arithmetic is known to be tricky: roundings, formats, exceptional values. The IEEE-754 standard was a push towards straightening the field and made formal reasoning about floating-point computations easier and flourishing.... more
Floating-point arithmetic is known to be tricky: roundings, formats, exceptional values. The IEEE-754 standard was a push towards straightening the field and made formal reasoning about floating-point computations easier and flourishing. Unfortunately, this is not sufficient to guarantee the final result of a program, as several other actors are involved: programming language, compiler, architecture.  The CompCert formally-verified compiler provides a solution to this problem: this compiler comes with a mathematical specification of the semantics of its source language (a large subset of ISO C90) and target platforms (ARM, PowerPC, x86-SSE2), and with a proof that compilation preserves semantics.  In this paper, we report on our recent success in formally specifying and proving correct CompCert's compilation of floating-point arithmetic.  Since CompCert is verified using the Coq proof assistant, this effort required a suitable Coq formalization of the IEEE-754 standard; we extended the Flocq library for this purpose.  As a result, we obtain the first formally verified compiler that provably preserves the semantics of floating-point programs.
We discuss the difference between a formal semantics of the C standard, and a formal semantics of an implementation of C that satisfies the C standard. In this context we extend the CompCert semantics with end-of-array pointers and the... more
We discuss the difference between a formal semantics of the C standard, and a formal semantics of an implementation of C that satisfies the C standard. In this context we extend the CompCert semantics with end-of-array pointers and the possibility to byte-wise copy objects. This is a first and necessary step towards proving that the CompCert semantics refines the formal version of the C standard that is being developed in the Formalin project in Nijmegen.
This paper reports on the design and soundness proof, using the Coq proof assistant, of Verasco, a static analyzer based on abstract interpretation for most of the ISO~C~1999 language (excluding recursion and dynamic allocation). Verasco... more
This paper reports on the design and soundness proof, using the Coq proof assistant, of Verasco, a static analyzer based on abstract interpretation for most of the ISO~C~1999 language (excluding recursion and dynamic allocation).  Verasco establishes the absence of run-time errors in the analyzed programs.  It enjoys a modular architecture that supports the extensible combination of multiple abstract domains, both relational and non-relational.  Verasco integrates with the CompCert formally-verified C~compiler so that not only the soundness of the analysis results is guaranteed with mathematical certitude, but also the fact that these guarantees carry over to the compiled code.

And 3 more

The polyhedral model is a high-level intermediate representation for loop nests that supports elegantly a great many loop optimizations. In a compiler, after polyhedral loop optimizations have been performed, it is necessary and difficult... more
The polyhedral model is a high-level intermediate representation for loop nests that supports elegantly a great many loop optimizations. In a compiler, after polyhedral loop optimizations have been performed, it is necessary and difficult to regenerate sequential or parallel loop nests before continuing compilation. This paper reports on the formalization and proof of semantic preservation of such a code generator that produces sequential code from a polyhedral representation. The formalization and proofs are mechanized using the Coq proof assistant. 1 INTRODUCTION Numerical code often consists in nested loops operating over multidimensional arrays. Such loop nests are the target of many compiler optimizations. For example, memory locality can be improved by permuting two nested loops, fusing two consecutive loops, or tiling the iteration space [Muchnick 1997]. Likewise, parallelism can be increased by vectorization or loop scheduling. The polyhedral model, also known as the polytope model, is a high-level, declarative intermediate representation for loop nests [Feautrier and Lengauer 2011]. In the polyhedral model, many loop optimizations can be expressed and performed in a uniform manner as transformations over the polytopes describing the iteration space. (Section 2 illustrates the approach on a simple example.) Optimizers based on the polyhedral model have been integrated in compilers for conventional languages: for example, Graphite [Trifunović et al. 2010] adds polyhedral optimizations to GCC, and Polly [Grosser et al. 2012] to LLVM. Another use for the polyhedral model is to synthesize efficient software or hardware implementations of matrix and tensor computations. Domain-specific languages such as Halide [Ragan-Kelley et al. 2017], Tensor Comprehensions [Vasilache et al. 2020] or VOBLA [Beaugnon et al. 2014] make it easy to write high-level specifications of such computations, which can, then, be automatically translated to polyhedral models and compiled to efficient low-level code. This paper reports on the formal specification of a polyhedral model and the formal verification of one part of a loop optimizer based on the polyhedral model. Compiler verification applies program proof and other verification techniques to the compiler in order to rule miscompilation out: the generated code is guaranteed to execute as prescribed by the semantics of the source program.
Over 25 implementations of different functional languages are benchmarked using the same program, a floating-point intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for... more
Over 25 implementations of different functional languages are benchmarked using the same program, a floating-point intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation.
This paper presents a program analysis to estimate uncaught exceptions in ML programs. This analysis relies on unification-based type inference in a non-standard type system, using rows to approximate both the flow of escaping exceptions... more
This paper presents a program analysis to estimate uncaught exceptions in ML programs.  This analysis relies on unification-based type inference in a non-standard type system, using rows to approximate both the flow of escaping exceptions (a la effect systems) and the flow of result values (a la control-flow analyses).  The resulting analysis is efficient and precise; in particular, arguments carried by exceptions are accurately handled.
Objects with dynamic types allow the integration of operations that essentially require run-time type-checking into statically-typed languages. This article presents two extensions of the ML language with dynamics, based on our work on... more
Objects with dynamic types allow the integration of operations that essentially require run-time type-checking into statically-typed languages. This article presents two extensions of the ML language with dynamics, based on our work on the CAML implementation of ML, and discusses their usefulness. The main novelty of this work is the combination of dynamics with polymorphism.
This article presents a novel approach to the problem of bytecode verification for Java Card applets. By relying on prior off-card bytecode transformations, we simplify the bytecode verifier and reduce its memory requirements to the... more
This article presents a novel approach to the problem of bytecode verification for Java Card applets.  By relying on prior off-card bytecode transformations, we simplify the bytecode verifier and reduce its memory requirements to the point where it can be embedded on a smart card, thus increasing significantly the security of post-issuance downloading of applets on Java Cards.  This article describes the on-card verification algorithm and the off-card code transformations, and evaluates experimentally their impact on applet code size.
A simple implementation of a SML-like module system is presented as a module parameterized by a base language and its type-checker. This demonstrates constructively the applicability of that module system to a wide range of programming... more
A simple implementation of a SML-like module system is presented as a module parameterized by a base language and its type-checker. This demonstrates constructively the applicability of that module system to a wide range of programming languages. Full source code available in the Web appendix \url{http://gallium.inria.fr/~xleroy/publi/modular-modules-appendix/
This paper presents a purely syntactic account of type generativity and sharing -- two key mechanisms in the Standard ML module system -- and shows its equivalence with the traditional stamp-based description of these mechanisms. This... more
This paper presents a purely syntactic account of type generativity and sharing -- two key mechanisms in the Standard ML module system -- and shows its equivalence with the traditional stamp-based description of these mechanisms. This syntactic description recasts the Standard ML module system in a more abstract, type-theoretic framework.
Bytecode verification is a crucial security component for Java applets, on the Web and on embedded devices such as smart cards. This paper reviews the various bytecode verification algorithms that have been proposed, recasts them in a... more
Bytecode verification is a crucial security component for Java applets, on the Web and on embedded devices such as smart cards. This paper reviews the various bytecode verification algorithms that have been proposed, recasts them in a common framework of dataflow analysis, and  surveys the use of proof assistants to specify bytecode verification and prove its correctness.  (Extended and revised version of \cite{Leroy-survey-verif}.)
The ML module system provides powerful parameterization facilities, but lacks the ability to split mutually recursive definitions across modules, and does not provide enough facilities for incremental programming. A promising approach to... more
The ML module system provides powerful parameterization facilities, but lacks the ability to split mutually recursive definitions across modules, and does not provide enough facilities for incremental programming.  A promising approach to solve these issues is Ancona and Zucca's mixin modules calculus CMS.  However, the straightforward way to adapt it to ML fails, because it allows arbitrary recursive definitions to appear at any time, which ML does not support.  In this paper, we enrich CMS with a refined type system that controls recursive definitions through the use of dependency graphs.  We then develop a separate compilation scheme, directed by dependency graphs, that translate mixin modules down to a CBV lambda-calculus extended with a non-standard let rec construct
Using a call-by-value functional language as an example, this article illustrates the use of coinductive definitions and proofs in big-step operational semantics, enabling it to describe diverging evaluations in addition to terminating... more
Using a call-by-value functional language as an example, this article illustrates the use of coinductive definitions and proofs in big-step operational semantics, enabling it to describe diverging evaluations in addition to terminating evaluations.  We formalize the connections between the coinductive big-step semantics and the standard small-step semantics, proving that both semantics are equivalent.  We then study the use of coinductive big-step semantics in proofs of type soundness and proofs of semantic preservation for compilers. A methodological originality of this paper is that all results have been proved using the Coq proof assistant.  We explain the proof-theoretic presentation of coinductive definitions and proofs offered by Coq, and show that it facilitates the discovery and the presentation of the results. (See \verb|http://gallium.inria.fr/~xleroy/coindsem/| for the Coq on-machine formalization of these results.)
This article describes the formal verification of a compilation algorithm that transforms parallel moves (parallel assignments between variables) into a semantically-equivalent sequence of elementary moves. Two different specifications... more
This article describes the formal verification of a compilation algorithm that transforms parallel moves (parallel assignments between variables) into a semantically-equivalent sequence of elementary moves. Two different specifications of the algorithm are given: an inductive specification and a functional one, each with its correctness proofs. A functional program can then be extracted and integrated in the Compcert verified compiler.
This article presents the formal verification, using the Coq proof assistant, of a memory model for low-level imperative languages such as C and compiler intermediate languages. Beyond giving semantics to pointer-based programs, this... more
This article presents the formal verification, using the Coq proof assistant, of a memory model for low-level imperative languages such as C and compiler intermediate languages.  Beyond giving semantics to pointer-based programs, this model supports reasoning over transformations of such programs.  We show how the properties of the memory model are used to prove semantic preservation for three passes of the Compcert verified compiler.
This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both... more
This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness.  Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well.  (Much extended and revised version of \cite{Leroy-compcert-06}.)
This article presents the formal semantics of a large subset of the C language called Clight. Clight includes pointer arithmetic, struct and union types, C loops and structured switch statements. Clight is the source language of the... more
This article presents the formal semantics of a large subset of the C language called Clight. Clight includes pointer arithmetic, struct and union types, C loops and structured switch statements. Clight is the source language of the CompCert verified compiler. The formal semantics of Clight is a big-step semantics equipped with traces of input/output events that observes both terminating and diverging executions.  The formal semantics of Clight is mechanized using the Coq proof assistant. In addition to the semantics of Clight, this article describes its integration in the CompCert verified compiler and several ways by which the semantics was validated.
This paper formalizes and proves correct a compilation scheme for mutually-recursive definitions in call-by-value functional languages. This scheme supports a wider range of recursive definitions than previous methods. We formalize our... more
This paper formalizes and proves correct a compilation scheme for mutually-recursive definitions in call-by-value functional languages.  This scheme supports a wider range of recursive definitions than previous methods.  We formalize our technique as a translation scheme to a lambda-calculus featuring in-place update of memory blocks, and prove the translation to be correct.
Function uncurrying is an important optimization for the efficient execution of functional programming languages. This optimization replaces curried functions by uncurried, multiple-argument functions, while preserving the ability to... more
Function uncurrying is an important optimization for the efficient execution of functional programming languages.  This optimization replaces curried functions by uncurried, multiple-argument functions, while preserving the ability to evaluate partial applications. First-order uncurrying (where curried functions are optimized only in the static scopes of their definitions) is well understood and implemented by many compilers, but its extension to higher-order functions (where uncurrying can also be performed on parameters and results of higher-order functions) is challenging. This article develops a generic framework that expresses higher-order uncurrying optimizations as type-directed insertion of coercions, and prove its correctness.  The proof uses step-indexed logical relations and was entirely mechanized using the Coq proof assistant.
This paper reports on the development and formal verification (proof of semantic preservation) of CompCert, a compiler from Clight (a large subset of the C programming language) to PowerPC assembly code, using the Coq proof assistant both... more
This paper reports on the development and formal verification (proof of semantic preservation) of CompCert, a compiler from Clight (a large subset of the C programming language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness.  Such a verified compiler is useful in the context of critical software and its formal verification: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well.
We propose a benchmark to compare theorem-proving systems on their ability to express proofs of compiler correctness. In contrast to the first POPLmark, we emphasize the connection of proofs to compiler implementations, and we point out... more
We propose a benchmark to compare theorem-proving systems on their ability to express proofs of compiler correctness. In contrast to the first POPLmark, we emphasize the connection of proofs to compiler implementations, and we point out that much can be done without binders or alpha-conversion. We propose specific criteria for evaluating the utility of mechanized metatheory systems; we have constructed solutions in both Coq and Twelf metatheory, and we draw conclusions about those two systems in particular.
(In French.)  A short introduction to compiler verification, published in the French popular science magazine La Recherche.
Floating-point arithmetic is known to be tricky: roundings, formats, exceptional values. The IEEE-754 standard was a push towards straightening the field and made formal reasoning about floating-point computations easier and flourishing.... more
Floating-point arithmetic is known to be tricky: roundings, formats, exceptional values. The IEEE-754 standard was a push towards straightening the field and made formal reasoning about floating-point computations easier and flourishing. Unfortunately, this is not sufficient to guarantee the final result of a program, as several other actors are involved: programming language, compiler, and architecture.  The CompCert formally-verified compiler provides a solution to this problem: this compiler comes with a mathematical specification of the semantics of its source language (a large subset of ISO C99) and target platforms (ARM, PowerPC, x86-SSE2), and with a proof that compilation preserves semantics.  In this paper, we report on our recent success in formally specifying and proving correct CompCert's compilation of floating-point arithmetic.  Since CompCert is verified using the Coq proof assistant, this effort required a suitable Coq formalization of the IEEE-754 standard; we extended the Flocq library for this purpose.  As a result, we obtain the first formally verified compiler that provably preserves the semantics of floating-point programs.
This paper formalizes the folklore result that strongly-typed applets are more secure than untyped ones. We formulate and prove several security properties that all well-typed applets possess, and identify sufficient conditions for the... more
This paper formalizes the folklore result that strongly-typed applets are more secure than untyped ones.  We formulate and prove several security properties that all well-typed applets possess, and identify sufficient conditions for the applet execution environment to be safe, such as procedural encapsulation, type abstraction, and systematic type-based placement of run-time checks. These results are a first step towards formal techniques for developing and validating safe execution environments for applets.
The goal of this lecture is to show how modern theorem provers---in this case, the Coq proof assistant---can be used to mechanize the specification of programming languages and their semantics, and to reason over individual programs and... more
The goal of this lecture is to show how modern theorem provers---in this case, the Coq proof assistant---can be used to mechanize the specification of programming languages and their semantics, and to reason over individual programs and over generic program transformations, as typically found in compilers.  The topics covered include: operational semantics (small-step, big-step, definitional interpreters); a simple form of denotational semantics; axiomatic semantics and Hoare logic; generation of verification conditions, with application to program proof; compilation to virtual machine code and its proof of correctness; an example of an optimizing program transformation (dead code elimination) and its proof of correctness.
This report details the design and implementation of the ZINC system. This is an experimental implementation of the ML language, which has later evolved in the Caml Light system. This system is strongly oriented toward separate... more
This report details the design and implementation of the ZINC system.  This is an experimental implementation of the ML language, which has later evolved in the Caml Light system. This system is strongly oriented toward separate compilation and the production of small, standalone programs; type safety is ensured by a Modula-2-like module system.  ZINC uses simple, portable techniques, such as bytecode interpretation; a sophisticated execution model helps counterbalance the interpretation overhead.
Ce rapport est un cours d'introduction à la programmation du système Unix, mettant l'accent sur la communication entre les processus. La principale nouveauté de ce travail est l'utilisation du langage Caml Light, un dialecte du langage... more
Ce rapport est un cours d'introduction à la programmation du système Unix, mettant l'accent sur la communication entre les processus. La principale nouveauté de ce travail est l'utilisation du langage Caml Light, un dialecte du langage ML, à la place du langage C qui est d'ordinaire associé à la programmation système. Ceci donne des points de vue nouveaux à la fois sur la programmation système et sur le langage ML.
The polymorphic type discipline, as in the ML language, fits well within purely applicative languages, but does not extend naturally to the main feature of algorithmic languages: in-place update of data structures. Similar typing... more
The polymorphic type discipline, as in the ML language, fits well within purely applicative languages, but does not extend naturally to the main feature of algorithmic languages: in-place update of data structures. Similar typing difficulties arise with other extensions of applicative languages: logical variables, communication channels, continuation handling. This work studies (in the setting of relational semantics) two new approaches to the polymorphic typing of these non-applicative features. The first one relies on a restriction of generalization over types (the notion of dangerous variables), and on a refined typing of functional values (closure typing). The resulting type system is compatible with the ML core language, and is the most expressive type systems for ML with imperative features proposed so far. The second approach relies on switching to ``by-name'' semantics for the constructs of polymorphism, instead of the usual ``by-value'' semantics. The resulting language differs from ML, but lends itself easily to polymorphic typing. Both approaches smoothly integrate non-applicative features and polymorphic typing. (English translation of \cite{Leroy-these}.)
Languages with polymorphic types (e.g. ML) have traditionally been implemented using Lisp-like data representations---everything has to fit in one word, if necessary by being heap-allocated and handled through a pointer. The reason is... more
Languages with polymorphic types (e.g. ML) have traditionally been implemented using Lisp-like data representations---everything has to fit in one word, if necessary by being heap-allocated and handled through a pointer. The reason is that, in contrast with conventional statically-typed languages such as Pascal, it is not possible to assign one unique type to each expression at compile-time, an absolute requirement for using more efficient representations (e.g. unallocated multi-word values). In this paper, we show how to take advantage of the static polymorphic typing to mix correctly two styles of data representation in the implementation of a polymorphic language: specialized, efficient representations are used when types are fully known at compile-time; uniform, Lisp-like representations are used otherwise.
Extended version of \cite{Hirschowitz-Leroy-Wells-rec}.
Module systems are important for software engineering: they facilitate code reuse without compromising the correctness of programs. However, they still lack some flexibility: first, they do not allow mutually recursive definitions to span... more
Module systems are important for software engineering: they facilitate code reuse without compromising the correctness of programs. However, they still lack some flexibility: first, they do not allow mutually recursive definitions to span module boundaries; second, definitions inside modules are bound early, and cannot be overridden later, as opposed to inheritance and overriding in class-based object-oriented languages, which follow the late binding semantics. This paper examines an alternative, hybrid idea of modularization concept, called mixin modules. We develop a language of call-by-value mixin modules with a reduction semantics, and a sound type system for it, guaranteeing that programs will run correctly.
Ce rapport présente une vue d'ensemble du système Caml Special Light, une implémentation expérimentale du langage Caml offrant deux extensions majeures: premièrement, un calcul de modules (incluant les foncteurs et les vues multiples d'un... more
Ce rapport présente une vue d'ensemble du système Caml Special Light, une implémentation expérimentale du langage Caml offrant deux extensions majeures: premièrement, un calcul de modules (incluant les foncteurs et les vues multiples d'un même module) dans le style de celui de Standard ML, mais s'appuyant sur les avancées récentes dans la théorie du typage des modules et préservant la compatibilité avec la compilation séparée; deuxièmement, un double compilateur, produisant à la fois du du code natif efficace, pour les applications de Caml gourmandes en temps de calcul, et du code abstrait interprété, pour la rapidité de compilation et le confort de mise au point.
Same technical contents as the conference paper \cite{Cardelli-Leroy-dot},  plus a drawing by Luca Cardelli
We propose a benchmark to compare theorem-proving systems on their ability to express proofs of compiler correctness. In contrast to the first POPLmark, we emphasize the connection of proofs to compiler implementations, and we point out... more
We propose a benchmark to compare theorem-proving systems on their ability to express proofs of compiler correctness. In contrast to the first POPLmark, we emphasize the connection of proofs to compiler implementations, and we point out that much can be done without binders or alpha-conversion. We propose specific criteria for evaluating the utility of mechanized metatheory systems; we have constructed solutions in both Coq and Twelf metatheory, and we draw conclusions about those two systems in particular.
The POPLmark challenge is a collective experiment intended to assess the usability of theorem provers and proof assistants in the context of fundamental research on programming languages. In this report, we present a solution to the... more
The POPLmark challenge is a collective experiment intended to assess the usability of theorem provers and proof assistants in the context of fundamental research on programming languages.  In this report, we present a solution to the challenge, developed with the Coq proof assistant, and using the ``locally nameless'' presentation of terms with binders introduced by McKinna, Pollack, Gordon, and McBride.
Le typage statique avec types polymorphes, comme dans le langage ML, s'adapte parfaitement aux langages purement applicatifs, leur apportant souplesse et expressivité. Mais il ne s'étend pas naturellement au trait principal des langages... more
Le typage statique avec types polymorphes, comme dans le langage ML, s'adapte parfaitement aux langages purement applicatifs, leur apportant souplesse et expressivité. Mais il ne s'étend pas naturellement au trait principal des langages algorithmiques: la modification en place des structures de données.  Des difficultés de typage similaires apparaissent avec d'autres extensions des langages applicatifs: les variables logiques, la communication inter-processus à travers des canaux, et la manipulation de continuations en tant que valeurs. Ce travail étudie, dans le cadre de la sémantique relationnelle, deux nouvelles approches du typage polymorphe de ces constructions non-applicatives. La première repose sur une restriction de l'opération de généralisation des types (la notion de variables dangereuses), et sur un typage plus fin des valeurs fonctionnelles (le typage des fermetures). Le système de types obtenu reste compatible avec le noyau applicatif de ML, et se révèle être le plus expressif parmi les systèmes de types pour ML plus traits impératifs proposés jusqu'ici. La seconde approche repose sur l'adoption d'une sémantique ``par nom'' pour les constructions du polymorphisme, au lieu de la sémantique ``par valeur'' usuelle. Le langage obtenu s'écarte de ML, mais se type très simplement avec du polymorphisme. Les deux approches rendent possible l'interaction sans heurts entre les traits non-applicatifs et le typage polymorphe. (See \cite{Leroy-thesis} for an English translation.)
Un même matériel informatique peut remplir de nombreuses fonctions différentes par simple changement du logiciel qu'il exécute. Cette extraordinaire plasticité a permis à l'ordinateur de sortir des centres de calcul et de se répandre... more
Un même matériel informatique peut remplir de nombreuses fonctions différentes par simple changement du logiciel qu'il exécute.  Cette extraordinaire plasticité a permis à l'ordinateur de sortir des centres de calcul et de se répandre partout, des objets du quotidien aux infrastructures de la Cité.  Quels concepts fondamentaux sous-tendent cette prouesse technique?  Comment maîtriser l'incroyable et souvent effrayante complexité du logiciel?  Comment éviter les «bugs» de programmation et résister aux attaques? Comment établir qu'un logiciel est digne de confiance?  À ces questions, la logique mathématique offre des éléments de réponse qui permettent de construire une approche scientifiquement rigoureuse du logiciel.
This is an English translation of Xavier Leroy's inaugural lecture at Collège de France in 2019.
Ce livre permet d'aborder la programmation en Caml de façon simple et concrète. Véritable cours de programmation, il introduit progressivement les mécanismes du langage et les montre à l'oeuvre face aux problèmes fondamentaux de la... more
Ce livre permet d'aborder la programmation en Caml de façon simple et concrète.  Véritable cours de programmation, il  introduit progressivement les mécanismes du langage et les montre à l'oeuvre face aux problèmes fondamentaux de la programmation. Outre de nombreux exemples introductifs, ce livre détaille la conception et la réalisation de six programmes complets et réalistes illustrant des domaines réputés difficiles: compilation, synthèse de types, automates, etc.
Écrit par deux des implémenteurs du compilateur Caml Light, ce livre décrit de manière exhaustive toutes les constructions du langage de programmation et fournit une documentation complète du système Caml Light