The goal of the project is to implement a small static analyzer by abstract interpretation for a simple language.
The analyzer will be based on the same numeric abstract domains as the ones seen in the course and lab sessions. But, it will compute the abstract semantics using a different iteration method. In the project, the program is first converted into a control-flow graph by a front-end. Then, abstract values corresponding to sets of possible memory environments are attached to graph nodes (program locations) and propagated along the graph arcs (program instructions) until stabilization. This makes it easy to support non-structured control-flow (such as gotos) as well as inter-procedural analysis.
The analyzer comprises three parts:
See also the presentation slides for the project. (available on moodle)
The project must be delivered before Sunday, June 4th, 18h00.
You can work alone or in group of 2. Please, favor the latter.
The project must be delivered by email to the teaching assistant, in the form of a compressed archive (with
This directory must contains the source files of your analyzer. Typing
If you use another programming langage (than OCaml), please write a short
The analyzer itself is expected to be a stand-alone program that takes as argument a source file containing a
Since the soundness and precision of your analyzer will be tested automatically, make sure that possible assertions failures are reported in the following format:
The directory should also contains:
Language and front-end
The language syntax is a simple subset of C, a description of the syntax is available.
The sources of the front-end are available on the moodle.
The front-end works as follows:
Please read the description of the control-flow graph data-structure. The data-structure is defined in the file
You must implement an iterator, able to traverse the control-flow graph and compute an abstract information at each node. Note that you don’t need to support procedure calls as a core feature: this is one of several possible extensions. However, you must support arbitrary gotos, including backward gotos (which can be used to disguise loops).
The iterator should be generic in the choice of the abstract domain: i.e., a functor parameterized by a module obeying the DOMAIN signature discussed below.
Make sure that your iterator always terminates (employing widening if needed).
We suggest employing a classic worklist algorithm, which maintains a map from nodes to abstract values as well as a set of nodes to update. At each step, a node is extracted from the worklist and updated. The update consists in:
Other iteration algorithms exist, in particular those proposed by François Bourdoncle.
In case of loops or backward gotos, the control-flow graph will have cycles, causing the same nodes to be considered many times. In order for the algorithm to finish in finite time and be efficient in practice, you will need to apply widening at selected widening points to enforce convergence. It is sufficient that any cycle in the control-flow graph has at least one node where widening is applied. You can for instance select as widening nodes all loop heads and the destination of backward gotos.
You must implement at least two numeric abstract domains seen in the course:
Calling the analyzer with the
In order to test your iterator before you design your abstract domains, you can start by implementing a concrete domain first as we did in the practical sessions, i.e., a domain manipulating sets of program environments without any abstraction. Note that, in this case, the analyzer may not always terminate. The concrete interpretation is optional.
We suggest that you first program abstract domains able to abstract sets of integers (e.g., a single interval), following the signature
In case of more complex expressions, featuring arithmetic operators, such as
As the guards are more complex, you are only required to support
As hinted above, the implementation of a
You must implement at least one of the following extensions.
This extension should be performed when a
The analysis considered in part 2 is forward: given the memory environment at the beginning of the program, an abstraction of the environments reachable during program execution is computed. The analysis outputs a map from graph nodes to abstract invariants. A backward analysis starts form this map, and considers some target location in the middle of the program and a target abstract environment. It then traces backward the program execution from the target location to refine the result of the forward analysis by only considering executions that will reach the target location with the target environment.
In this extension, we target the environments that do not satisfy an assertion. The analysis will thus help discovering which program executions cause the assertion to fail. You should provide examples illustrating how your backward analysis achieves this.
Building a backward analysis requires two changes with respect to a forward analysis:
This extension consists in implementing the support for function calls.
To simplify, you can assume that there are not recursive calls. Hence, at any point of the execution, there exists at most a single instance of each local variable and formal function argument (as opposed to a stack of such variables, required to implement recursivity). This is compatible with the way programs are translated into graphs in the front-end: all variables and function arguments are considered as global variables. Supporting recursivity is more complex and requires some changes to the front-end.
You should provide a few examples and discuss the results of your analysis on these examples.
We suggest implementing a simple inter-procedural analysis where abstract environments flow from call sites (source nodes of a call instruction arc) to the entry node of the called function, and back from the exit node of the called function to the return site (destination nodes of a call instruction arc).
Calling the analyzer with
This extension requires you to implement an analysis using the polyhedra abstract domain. You will use the Apron library that already implements all
the polyhedral abstract operators, and use to implement a module obeying the
Relational analyses such as polyhedra are especially useful in the presence of loops, where a relational invariant must be found (which is not possible with non-relational domains such as intervals). You should provide a few examples illustrating this point and discuss the results of your analysis on these examples, comparing in particular the polyhedral and interval analyses.
This extension requires you to implement an abstract domain functor able to represent disjunctions of abstract elements of a base domain, for instance, associate several intervals to a variable. This is especially useful to avoid loosing precision at control-flow joins, and to represent non-convex invariants. We will see in the course several ways to design a disjunctive domain, and you can choose whichever way you wish (disjunctive completion, state partitioning, trace partitioning).
After discussing it with the teacher !
Resources and Bibliography
On the HC4-revise algorithm:
On backwards analysis:
On polyhedral analyses and Apron:
On disjunctive analyses: