This introduction to static analysis by abstract interpretation has the
objective of being simple, intuitive and informal. More technical
introductions as well as bibliographic references are provided in
[1,2,3].
A 30mn video (in French) can also be useful.
The concrete semantics of programs formalizes the set of
all possible executions of this program in all possible execution
environments. If an execution is represented by a curve showing the
evolution of the vector x(t) of values of the input,
state and output variables of the program as a function of the time
t, this concrete semantics can be represented by a set of
curves (with continuous time for short):
x(t)
t
The concrete semantics of a program is an "infinite" mathematical object which is not computable: it is not possible to write a program able to represent and to compute all possible executions of any program in all its possible execution environments.
Hence, all non trivial questions about the concrete semantics of a program are undecidable: it is not possible to write a program able to answer any question about the possible executions of any program (since the concrete semantics of this program would have to be computable).
The mathematical analogy is that there is no theorem prover able, for example, to prove any theorem of arithmetics without human assistance.
Safety properties of a program express that no possible execution of the
program when considering all possible execution environments can reach an
erroneous state. Graphically, the set of these erroneous states can be
represented as a forbidden zone:
x(t)
t
The verification of safety properties consists in proving that the intersection of the concrete semantics of the program with the forbidden zone is empty. Since the program concrete semantics is not computable, the verification problem is undecidable. It is not always possible to answer the safety questions completely automatically, with finite computer resources, without any uncertainty about the answer and without any human intervention.
Testing/debugging consists in considering a subset of the possible executions:
x(t)
t
Testing/debugging is not a proof, since some erroneous trajectories might be forgotten. This is the problem of absence of coverage.
Bounded model checking consists in exploring the prefixes of the possible executions:
x(t)
t
Bounded model checking is not a proof, since late errors can be missed. This is the same problem of absence of coverage.
Abstract interpretation consists in considering an abstract semantics,
that is a superset of the concrete program semantics:
x(t)
t
The abstract semantics covers all possible cases. Whence, if the abstract semantics is safe (i.e. does not intersect the forbidden zone) then so is the concrete semantics.
Formal methods are abstract interpretations which differ in the way the abstract semantics is obtained. In all cases, the abstract semantics must be chosen to be computer representable.
In all cases, the abstract semantics should be sound (section 1.8 ), remain sufficiently precise to avoid false alarms (section 1.10 ) while remaining as simple as possible to avoid combinatorial explosion phenomena (section 1.13 ).
In formal methods the abstract semantics must be chosen as a superset
of the concrete semantics since otherwise reasonings in the abstract
might not be correct in the concrete:
x(t)
t
We say that an abstraction is sound (or correct) if the abstract semantics covers all possible cases of the concrete semantics. All formal methods are required to use sound abstractions: if a potential error is not signaled that it should be definitely impossible. Contrary to testing/debugging formal methods provide full coverage.
Yet, some methods, presented as formal, do not explore
all possible trajectories, but only prefixes (such as
the "bounded model-checking") or may not terminate
(such as the "refinement model-checking") and therefore
should be considered as incorrect, since some errors
can be forgotten:
x(t)
t
In absence of alarms, this yields a corectness proof. However, the
consequence of the overapproximation of the possible executions is that
inexisting executions are considered, some of which are erroneous, which
leads to false alarms (also called false positives). A
false alarm corresponds to the case when the abstract semantics
intersects the forbidden zone while the concrete semantics does not
intersect this forbidden zone. So a potential error is signaled which can
never occur in reality:
x(t)
t
Automatic formal methods are all incomplete: there exists an infinity of programs for which potential errors at execution are signaled, even if they are definitely impossible in reality.
Such false alarms are inevitable because all questions that formal methods try to solve are undecidable. Il will always be possible to find a program without any run-time error for which the abstract semantics will be too imprecise to prove the absence of error fully automatically, without any human intervention. The imprecision problem can often be solved by choosing a more refined abstract semantics, which is more precise and so often more complex, which results in larger computation costs.
In summary, with testing/debugging a subset of the possible program executions is considered, so that errors can be forgotten. With abstract interpretation-based formal methods, a superset of the possible program executions is considered, so that, because of the potential imprecision, one can invent errors that do not exist in reality (so called "false alarms").
Hence one would like to have correct formal methods (without erroneous abstractions by absence of coverage) and complete (without false alarms due to over-coverage). Such an abstraction does exists mathematically [4], but it is impossible to compute, and this because if undecidability, which amounts to say that it is impossible to execute the program in all possible execution environment in a finite time with a finite memory.
An invariant is a property which holds for all trajectories.
In the example below, a first invariant would be that all states reachable during the course of the computation can be on any trajectory. This is already an approximation since this leaves the possibility to jump from one trajectory to another, and even to go backwards.
A second invariant, less precise, would be that all trajectories are in
the green zone
. This is a less precise
approximation which leaves the possibility to reach points out of any
trajectory:
x(t)
t
This second invariant imples a third one, even less precise, stating that all
trajectories are outside the forbidden red zone
:
x(t)
t
A program invariant is a program property which holds during the program execution. The ASTRÉE static analyzer computes automatically an abstract semantics which consists of local abstract invariants attached to program points or to program blocks stating properties about part of the program variables which are visible at that program point or withing that block. These local invariants holds of the involved variables whenever control reaches that program point or stays within that block.
An abstract domain of the ASTRÉE static analyzer is a computer representation of a given category of invariants and of the operations involved in the computation of these abstract invariants. ASTRÉE uses a great number of abstract domains which are combined to obtain complex abstract invariants. The term "abstract" makes reference to the fact that the computed invariants are correct approximations of the program concrete semantics.
A classical abstract domain is that of intervals
[2,6,8] which approximate an
ordered set of values by their minimum and their maximum. Applied to our
example, this interval abstraction yields the following invariant:
x(t)
t
which is too
imprecise to prove the specification. To be precise, ASTRÉE
incorporates abstract domains which are expressive enough to take into
account the properties of command programs controlling complex discrete
dynamical systems.