Projet d'analyseur statique
Plan de la page Introduction Organization Language and front-end Expected work Iterator Abstract domains Extensions Backward analysis Inter-procedural analysis Polyhedral analysis Disjunctive analysis Other Idea Resources and Bibliography Front-end Software: On the HC4-revise algorithm: On backwards analysis: On polyhedral analyses and Apron: On disjunctive analyses:	Introduction The goal of the project is to implement a small static analyzer by abstract interpretation for a simple language. The analyzer will be based on the same numeric abstract domains as the ones seen in the course and lab sessions. But, it will compute the abstract semantics using a different iteration method. In the project, the program is first converted into a control-flow graph by a front-end. Then, abstract values corresponding to sets of possible memory environments are attached to graph nodes (program locations) and propagated along the graph arcs (program instructions) until stabilization. This makes it easy to support non-structured control-flow (such as gotos) as well as inter-procedural analysis. The analyzer comprises three parts: A front-end that parses the language and converts it into a control-flow graph. This part is given to you (see part 1 below). An iterator that propagates abstract elements along the graph arcs. You will have to program at least a simple forward iterator using a worklist algorithm (see parts 2 and 3 below). An abstract domain, the elements of which represent sets of memory environments (mapping program variables to program values, i.e., integers). We will see several kinds of abstract domains in the course, and you will have to program several of them: at least the constant and the interval domains (see below). See also the presentation slides for the project. (available on moodle) Organization The project must be delivered before Sunday, June 4th, 18h00. You can work alone or in group of 2. Please, favor the latter. The project must be delivered by email to the teaching assistant, in the form of a compressed archive (with `tar` or `zip`), called `yournames{.tgz, .zip}` which must contain a directory called `yournames` (example: `dupont-durand` ). This directory must contains the source files of your analyzer. Typing `make` should compile the static analyzer executable called `analyzer`. If you use another programming langage (than OCaml), please write a short `README` explaining which library & version is required and how to install them. Please note that the last point regarding the name of the executable applies! The analyzer itself is expected to be a stand-alone program that takes as argument a source file containing a `main` function, analyzes it (including the initialization code and the body of the main function), and outputs: an abstract invariant for every graph node; a list of assertions which failed, if any. Since the soundness and precision of your analyzer will be tested automatically, make sure that possible assertions failures are reported in the following format: `File "filename.c", line n: Assertion failure ...` The directory should also contains: A short report (~2 pages) discussing your analyzer, your implementation choices, your difficulties, and your experiments A few (5-6) analysis examples demonstrating the features of your analyzer, including the extensions (provide both the source and the analysis result for each example). Language and front-end The language syntax is a simple subset of C, a description of the syntax is available. The sources of the front-end are available on the moodle. The front-end works as follows: The language is parsed using the lexer defined in `lexer.mll` and the parser in `parse.mly`. The parser outputs an abstract syntax tree defined in abstract_syntax_tree.ml. The tree is then transformed into a control-flow graph by `tree_to_cfg.ml`. Control-flow graphs can be printed using cfg_printer.ml. The output is in the dot format that can be exploited with Graphviz (for instance, with the dotty dot graph viewer). The `analyzer.ml` file shows how to parse a file, translate it into a graph, and print the resulting graph. By typing make, you get a program that takes as argument a source file and that outputs information about the graph as well as a cfg.dot graph file. Please read the description of the control-flow graph data-structure. The data-structure is defined in the file `cfg.ml`. In the following, you will only need to manipulate control-flow graphs; hence, most of `abstract_syntax_tree.ml` can be ignored (only the definition of the operators is reused in `cfg.ml`). Expected work Iterator You must implement an iterator, able to traverse the control-flow graph and compute an abstract information at each node. Note that you don’t need to support procedure calls as a core feature: this is one of several possible extensions. However, you must support arbitrary gotos, including backward gotos (which can be used to disguise loops). The iterator should be generic in the choice of the abstract domain: i.e., a functor parameterized by a module obeying the DOMAIN signature discussed below. Make sure that your iterator always terminates (employing widening if needed). Hints We suggest employing a classic worklist algorithm, which maintains a map from nodes to abstract values as well as a set of nodes to update. At each step, a node is extracted from the worklist and updated. The update consists in: computing a new value for the node by applying the abstract instruction for each predecessor arc (i.e., each arc with the selected node as destination) and taking the abstract join of the results; if the node’s abstract value has changed, putting all the node’s successors into the worklist; the algorithm is finished when there is no more node in the worklist. Other iteration algorithms exist, in particular those proposed by François Bourdoncle. In case of loops or backward gotos, the control-flow graph will have cycles, causing the same nodes to be considered many times. In order for the algorithm to finish in finite time and be efficient in practice, you will need to apply widening at selected widening points to enforce convergence. It is sufficient that any cycle in the control-flow graph has at least one node where widening is applied. You can for instance select as widening nodes all loop heads and the destination of backward gotos. Abstract domains You must implement at least two numeric abstract domains seen in the course: The constant domain. The interval domain. Calling the analyzer with the `--domain constant` (resp. `--domain interval`) should perform the analysis on the constants abstract domain (resp. interval domain). Hints In order to test your iterator before you design your abstract domains, you can start by implementing a concrete domain first as we did in the practical sessions, i.e., a domain manipulating sets of program environments without any abstraction. Note that, in this case, the analyzer may not always terminate. The concrete interpretation is optional. The file `domain.ml` proposes a signature `DOMAIN` for abstract (or concrete) domains. In particular: objects of type `t` represent an abstraction of a (possibly empty) set of memory environments. `init` creates a representation for a single environment, mapping each variable to `0` (initial memory state of the program). `bottom` creates a representation for the empty set of environments. `assign` models the `CFG_assign` instruction (assignment of an integer expression into a variable). `guard` models the `CFG_guard` and `CFG_assert` instructions (filter environments by a boolean expression). `join` models the set-union of environments (used when several arcs go into the same node). `widen` is the widening, used to accelerate convergence in case of loops. `subset` is the inclusion checking, useful to know when the analysis is finished. We suggest that you first program abstract domains able to abstract sets of integers (e.g., a single interval), following the signature `VALUE_DOMAIN` in `value_domain.ml`. An abstract environment is then: either the bottom element (empty set of environments); or a map from variables to abstractions of non-empty integer sets (e.g., a map from variables to intervals). The `VALUE_DOMAIN` signature suggests defining operators unary and binary to evaluate, in the abstract, the effect of various numeric operators, such as addition, multiplication, etc. They can be directly used to implement assign required by DOMAIN. This is simply a generalization of interval arithmetic to arbitrary, non-necessarily interval, abstractions of sets of values. You are required to support all 5 operations `+`, `-`, ``, `/`, `%` in assignments in a precise way (i.e., don’t always return the top element). Modeling a `guard` is more difficult: given the expected result of the operator, such as the fact that `x<y` is true, we must use this information to refine the information we have on the variables `x` and `y`. This is the purpose of the `compare` operation in the signature `VALUE_DOMAIN`. More precisely, `compare x y op r` returns a pair `x',y'` of abstract environments that refine the arguments `x` and `y`: `x'` abstracts the subset of integer values `i` from `x` such that there exists a value `j` in `y` satisfying `i op j`; and likewise for `y'`. In case of more complex expressions, featuring arithmetic operators, such as `x+y<z`, once an abstract information on the value of `x+y` is deduced from the fact that it is smaller than `z`, the information must be propagated to derive information on `x` and `y`. This is the role of the `bwd_unary` and `bwd_binary` operators. For instance, similarly to `compare`, `bwd_binary x y op r` returns a pair of abstract environments `x',y'` that refine the argument `x, y`: `x'` abstracts the subset of values `i` from `x` such that, for some value `j` in `y`, `i op j` is in `r`; and likewise for `y'`. The full algorithm on an arbitrary expression works in two phases: it first annotates the expression tree by bottom-up evaluation (from leaves to root) using unary and binary; and then refines the values by top-down propagation (from root to leaves) using `bwd_unary` and `bwd_binary`. The algorithm is actually a standard constraint programming algorithm known as HC4-revise; it is described in this paper (see Algorithms 2, 3 and 4). As the guards are more complex, you are only required to support `+`, `-`, and `` in a precise way. For other operators, you can soundly set x’,y’ to x,y (i.e., no refinement). However, you are required to support all the boolean operators `!`, `&&`, `\|\|` in guards. As hinted above, the implementation of a `DOMAIN` for constants and for intervals can be derived in a generic way from that of a `VALUE_DOMAIN`. We thus suggest to implement a functor taking a `VALUE_DOMAIN` module as argument and returning a `VALUE` module, and program separately an instance of `VALUE_DOMAIN` implementing constants and another one representing intervals. Extensions You must implement at least one of the following extensions. Backward analysis This extension should be performed when a `--backward` flag is added to the command line. The analysis considered in part 2 is forward: given the memory environment at the beginning of the program, an abstraction of the environments reachable during program execution is computed. The analysis outputs a map from graph nodes to abstract invariants. A backward analysis starts form this map, and considers some target location in the middle of the program and a target abstract environment. It then traces backward the program execution from the target location to refine the result of the forward analysis by only considering executions that will reach the target location with the target environment. In this extension, we target the environments that do not satisfy an assertion. The analysis will thus help discovering which program executions cause the assertion to fail. You should provide examples illustrating how your backward analysis achieves this. Hints Building a backward analysis requires two changes with respect to a forward analysis: The iterator must be modified to propagate abstract information from the destination node of each arc to its source node. A backward version of assignments must be added to the `DOMAIN` signature and implemented in each abstract domain. More precisely, given abstract environments `x` before and `r` after the assignment, a variable `var` and an expression `expr`, `bwd_assign x var expr r` returns a refinement `x'` of `x` such that, after the assignment `var = expr`, the result is in `r`. It thus propagates any refinement from the destination node `r` backward to the source node `x`. This function can be implemented using the `bwd_unary` and `bwd_binary` operators from `VALUE_DOMAIN`, which were already used for guards. Note that the backward version of guards is identical to the forward version, so, the assignment is the only instruction that requires implementing a special backward version. Inter-procedural analysis This extension consists in implementing the support for function calls. To simplify, you can assume that there are not recursive calls. Hence, at any point of the execution, there exists at most a single instance of each local variable and formal function argument (as opposed to a stack of such variables, required to implement recursivity). This is compatible with the way programs are translated into graphs in the front-end: all variables and function arguments are considered as global variables. Supporting recursivity is more complex and requires some changes to the front-end. You should provide a few examples and discuss the results of your analysis on these examples. Hints We suggest implementing a simple inter-procedural analysis where abstract environments flow from call sites (source nodes of a call instruction arc) to the entry node of the called function, and back from the exit node of the called function to the return site (destination nodes of a call instruction arc). Polyhedral analysis Calling the analyzer with `--domain poly` should perform the polyhedral analysis. This extension requires you to implement an analysis using the polyhedra abstract domain. You will use the Apron library that already implements all the polyhedral abstract operators, and use to implement a module obeying the `DOMAIN` signature that can be plugged into your iterator (you are not asked to reimplement polyhedral operators yourself). Relational analyses such as polyhedra are especially useful in the presence of loops, where a relational invariant must be found (which is not possible with non-relational domains such as intervals). You should provide a few examples illustrating this point and discuss the results of your analysis on these examples, comparing in particular the polyhedral and interval analyses. Disjunctive analysis This extension requires you to implement an abstract domain functor able to represent disjunctions of abstract elements of a base domain, for instance, associate several intervals to a variable. This is especially useful to avoid loosing precision at control-flow joins, and to represent non-convex invariants. We will see in the course several ways to design a disjunctive domain, and you can choose whichever way you wish (disjunctive completion, state partitioning, trace partitioning). Other Idea After discussing it with the teacher ! Resources and Bibliography Front-end description of the langage syntax. Description of the control-flow graph data-structure. Software: Ocaml language. Extended maps included in the front-end. Apron library and its documentation. Zarith arbitrary precision number library. Graphviz graph visualization. On the HC4-revise algorithm: Frédéric Benhamou, Frédéric Goualard, Laurent Granvilliers. Revising hull and box consistency. On backwards analysis: François Bourdoncle. Abstract debugging of higher-order imperative languages. Xavier Rival. Understanding the origin of alarms in Astrée. On polyhedral analyses and Apron: Bertrand Jeannet and Antoine Miné. APRON: A library of numerical abstract domains for static analysis. On disjunctive analyses: Patrick Cousot and Radhia Cousot. Abstract interpretation frameworks (example 6.7). Xavier Rival and Laurent Mauborgne. The trace partitioning abstract domain. Page generée le 23-01-2025 à 16:48. TOP

Projet d'analyseur statique

Introduction

The goal of the project is to implement a small static analyzer by abstract interpretation for a simple language.

The analyzer will be based on the same numeric abstract domains as the ones seen in the course and lab sessions. But, it will compute the abstract semantics using a different iteration method. In the project, the program is first converted into a control-flow graph by a front-end. Then, abstract values corresponding to sets of possible memory environments are attached to graph nodes (program locations) and propagated along the graph arcs (program instructions) until stabilization. This makes it easy to support non-structured control-flow (such as gotos) as well as inter-procedural analysis.

The analyzer comprises three parts:

A front-end that parses the language and converts it into a control-flow graph. This part is given to you (see part 1 below).
An iterator that propagates abstract elements along the graph arcs. You will have to program at least a simple forward iterator using a worklist algorithm (see parts 2 and 3 below).
An abstract domain, the elements of which represent sets of memory environments (mapping program variables to program values, i.e., integers). We will see several kinds of abstract domains in the course, and you will have to program several of them: at least the constant and the interval domains (see below).

See also the presentation slides for the project. (available on moodle)

Organization

The project must be delivered before Sunday, June 4th, 18h00.

You can work alone or in group of 2. Please, favor the latter.

The project must be delivered by email to the teaching assistant, in the form of a compressed archive (with tar or zip), called yournames{.tgz, .zip} which must contain a directory called yournames (example: dupont-durand ).

This directory must contains the source files of your analyzer. Typing make should compile the static analyzer executable called analyzer.

If you use another programming langage (than OCaml), please write a short README explaining which library & version is required and how to install them. Please note that the last point regarding the name of the executable applies!

The analyzer itself is expected to be a stand-alone program that takes as argument a source file containing a main function, analyzes it (including the initialization code and the body of the main function), and outputs:

an abstract invariant for every graph node;
a list of assertions which failed, if any.

Since the soundness and precision of your analyzer will be tested automatically, make sure that possible assertions failures are reported in the following format:

File "filename.c", line n: Assertion failure ...

The directory should also contains:

A short report (~2 pages) discussing your analyzer, your implementation choices, your difficulties, and your experiments
A few (5-6) analysis examples demonstrating the features of your analyzer, including the extensions (provide both the source and the analysis result for each example).

Language and front-end

The language syntax is a simple subset of C, a description of the syntax is available.

The sources of the front-end are available on the moodle.

The front-end works as follows:

The language is parsed using the lexer defined in lexer.mll and the parser in parse.mly.
The parser outputs an abstract syntax tree defined in abstract_syntax_tree.ml.
The tree is then transformed into a control-flow graph by tree_to_cfg.ml.
Control-flow graphs can be printed using cfg_printer.ml. The output is in the dot format that can be exploited with Graphviz (for instance, with the dotty dot graph viewer).

The analyzer.ml file shows how to parse a file, translate it into a graph, and print the resulting graph. By typing make, you get a program that takes as argument a source file and that outputs information about the graph as well as a cfg.dot graph file.

Please read the description of the control-flow graph data-structure. The data-structure is defined in the file cfg.ml. In the following, you will only need to manipulate control-flow graphs; hence, most of abstract_syntax_tree.ml can be ignored (only the definition of the operators is reused in cfg.ml).

Expected work

Iterator

You must implement an iterator, able to traverse the control-flow graph and compute an abstract information at each node. Note that you don’t need to support procedure calls as a core feature: this is one of several possible extensions. However, you must support arbitrary gotos, including backward gotos (which can be used to disguise loops).

The iterator should be generic in the choice of the abstract domain: i.e., a functor parameterized by a module obeying the DOMAIN signature discussed below.

Make sure that your iterator always terminates (employing widening if needed).

Hints

We suggest employing a classic worklist algorithm, which maintains a map from nodes to abstract values as well as a set of nodes to update. At each step, a node is extracted from the worklist and updated. The update consists in:

computing a new value for the node by applying the abstract instruction for each predecessor arc (i.e., each arc with the selected node as destination) and taking the abstract join of the results;
if the node’s abstract value has changed, putting all the node’s successors into the worklist;
the algorithm is finished when there is no more node in the worklist.

Other iteration algorithms exist, in particular those proposed by François Bourdoncle.

In case of loops or backward gotos, the control-flow graph will have cycles, causing the same nodes to be considered many times. In order for the algorithm to finish in finite time and be efficient in practice, you will need to apply widening at selected widening points to enforce convergence. It is sufficient that any cycle in the control-flow graph has at least one node where widening is applied. You can for instance select as widening nodes all loop heads and the destination of backward gotos.

Abstract domains

You must implement at least two numeric abstract domains seen in the course:

The constant domain.
The interval domain.

Calling the analyzer with the --domain constant (resp. --domain interval) should perform the analysis on the constants abstract domain (resp. interval domain).

Hints

In order to test your iterator before you design your abstract domains, you can start by implementing a concrete domain first as we did in the practical sessions, i.e., a domain manipulating sets of program environments without any abstraction. Note that, in this case, the analyzer may not always terminate. The concrete interpretation is optional.

The file domain.ml proposes a signature DOMAIN for abstract (or concrete) domains. In particular:

objects of type t represent an abstraction of a (possibly empty) set of memory environments.
init creates a representation for a single environment, mapping each variable to 0 (initial memory state of the program).
bottom creates a representation for the empty set of environments.
assign models the CFG_assign instruction (assignment of an integer expression into a variable).
guard models the CFG_guard and CFG_assert instructions (filter environments by a boolean expression).
join models the set-union of environments (used when several arcs go into the same node).
widen is the widening, used to accelerate convergence in case of loops.
subset is the inclusion checking, useful to know when the analysis is finished.

We suggest that you first program abstract domains able to abstract sets of integers (e.g., a single interval), following the signature VALUE_DOMAIN in value_domain.ml. An abstract environment is then:

either the bottom element (empty set of environments);
or a map from variables to abstractions of non-empty integer sets (e.g., a map from variables to intervals).

The VALUE_DOMAIN signature suggests defining operators unary and binary to evaluate, in the abstract, the effect of various numeric operators, such as addition, multiplication, etc. They can be directly used to implement assign required by DOMAIN. This is simply a generalization of interval arithmetic to arbitrary, non-necessarily interval, abstractions of sets of values. You are required to support all 5 operations +, -, *, /, % in assignments in a precise way (i.e., don’t always return the top element).

Modeling a guard is more difficult: given the expected result of the operator, such as the fact that x<y is true, we must use this information to refine the information we have on the variables x and y. This is the purpose of the compare operation in the signature VALUE_DOMAIN. More precisely, compare x y op r returns a pair x',y' of abstract environments that refine the arguments x and y: x' abstracts the subset of integer values i from x such that there exists a value j in y satisfying i op j; and likewise for y'.

In case of more complex expressions, featuring arithmetic operators, such as x+y<z, once an abstract information on the value of x+y is deduced from the fact that it is smaller than z, the information must be propagated to derive information on x and y. This is the role of the bwd_unary and bwd_binary operators. For instance, similarly to compare, bwd_binary x y op r returns a pair of abstract environments x',y' that refine the argument x, y: x' abstracts the subset of values i from x such that, for some value j in y, i op j is in r; and likewise for y'. The full algorithm on an arbitrary expression works in two phases: it first annotates the expression tree by bottom-up evaluation (from leaves to root) using unary and binary; and then refines the values by top-down propagation (from root to leaves) using bwd_unary and bwd_binary. The algorithm is actually a standard constraint programming algorithm known as HC4-revise; it is described in this paper (see Algorithms 2, 3 and 4).

As the guards are more complex, you are only required to support +, -, and * in a precise way. For other operators, you can soundly set x’,y’ to x,y (i.e., no refinement). However, you are required to support all the boolean operators !, &&, || in guards.

As hinted above, the implementation of a DOMAIN for constants and for intervals can be derived in a generic way from that of a VALUE_DOMAIN. We thus suggest to implement a functor taking a VALUE_DOMAIN module as argument and returning a VALUE module, and program separately an instance of VALUE_DOMAIN implementing constants and another one representing intervals.

Extensions

You must implement at least one of the following extensions.

Backward analysis

This extension should be performed when a --backward flag is added to the command line.

The analysis considered in part 2 is forward: given the memory environment at the beginning of the program, an abstraction of the environments reachable during program execution is computed. The analysis outputs a map from graph nodes to abstract invariants. A backward analysis starts form this map, and considers some target location in the middle of the program and a target abstract environment. It then traces backward the program execution from the target location to refine the result of the forward analysis by only considering executions that will reach the target location with the target environment.

In this extension, we target the environments that do not satisfy an assertion. The analysis will thus help discovering which program executions cause the assertion to fail. You should provide examples illustrating how your backward analysis achieves this.

Hints

Building a backward analysis requires two changes with respect to a forward analysis:

The iterator must be modified to propagate abstract information from the destination node of each arc to its source node.
A backward version of assignments must be added to the DOMAIN signature and implemented in each abstract domain. More precisely, given abstract environments x before and r after the assignment, a variable var and an expression expr, bwd_assign x var expr r returns a refinement x' of x such that, after the assignment var = expr, the result is in r. It thus propagates any refinement from the destination node r backward to the source node x. This function can be implemented using the bwd_unary and bwd_binary operators from VALUE_DOMAIN, which were already used for guards. Note that the backward version of guards is identical to the forward version, so, the assignment is the only instruction that requires implementing a special backward version.

Inter-procedural analysis

This extension consists in implementing the support for function calls.

To simplify, you can assume that there are not recursive calls. Hence, at any point of the execution, there exists at most a single instance of each local variable and formal function argument (as opposed to a stack of such variables, required to implement recursivity). This is compatible with the way programs are translated into graphs in the front-end: all variables and function arguments are considered as global variables. Supporting recursivity is more complex and requires some changes to the front-end.

You should provide a few examples and discuss the results of your analysis on these examples.

Hints

We suggest implementing a simple inter-procedural analysis where abstract environments flow from call sites (source nodes of a call instruction arc) to the entry node of the called function, and back from the exit node of the called function to the return site (destination nodes of a call instruction arc).

Polyhedral analysis

Calling the analyzer with --domain poly should perform the polyhedral analysis.

This extension requires you to implement an analysis using the polyhedra abstract domain. You will use the Apron library that already implements all the polyhedral abstract operators, and use to implement a module obeying the DOMAIN signature that can be plugged into your iterator (you are not asked to reimplement polyhedral operators yourself).

Relational analyses such as polyhedra are especially useful in the presence of loops, where a relational invariant must be found (which is not possible with non-relational domains such as intervals). You should provide a few examples illustrating this point and discuss the results of your analysis on these examples, comparing in particular the polyhedral and interval analyses.

Disjunctive analysis

This extension requires you to implement an abstract domain functor able to represent disjunctions of abstract elements of a base domain, for instance, associate several intervals to a variable. This is especially useful to avoid loosing precision at control-flow joins, and to represent non-convex invariants. We will see in the course several ways to design a disjunctive domain, and you can choose whichever way you wish (disjunctive completion, state partitioning, trace partitioning).

Other Idea

After discussing it with the teacher !

Resources and Bibliography

Front-end

Software:

Ocaml language.
Extended maps included in the front-end.
Apron library and its documentation.
Zarith arbitrary precision number library.
Graphviz graph visualization.

On the HC4-revise algorithm:

Frédéric Benhamou, Frédéric Goualard, Laurent Granvilliers. Revising hull and box consistency.

On backwards analysis:

François Bourdoncle. Abstract debugging of higher-order imperative languages.
Xavier Rival. Understanding the origin of alarms in Astrée.

On polyhedral analyses and Apron:

Bertrand Jeannet and Antoine Miné. APRON: A library of numerical abstract domains for static analysis.

On disjunctive analyses:

Patrick Cousot and Radhia Cousot. Abstract interpretation frameworks (example 6.7).
Xavier Rival and Laurent Mauborgne. The trace partitioning abstract domain.

Plan de la page

Introduction

Organization

Language and front-end

Expected work

Iterator

Abstract domains

Extensions

Backward analysis

Inter-procedural analysis

Polyhedral analysis

Disjunctive analysis

Other Idea

Resources and Bibliography

Front-end

Software:

On the HC4-revise algorithm:

On backwards analysis:

On polyhedral analyses and Apron:

On disjunctive analyses: