# Representation of Sets of Trees for Abstract Interpretation

PhD thesis of Laurent Mauborgne

Electronic versions:
compressed postscript (1 512 K),
DVI (1 759 K) and PDF (1 226 K).
## Contents

This thesis develops a new representation for possibly infinite sets of
possibly infinite trees. This representation is called **tree
schema**. The main properties of this representation are:
- an expressive power strictly greater than infinite tree automata or
tree automata with equality relations between brothers,
- a concern in efficiency of the representation in the design of tree
schemata. Emptiness testing has a constant time complexity.
- the ability to find some possible approximations during fixpoint iteration
- the possibility of adding alien constraints in the tree schemata easily.

In order to achieve these results, a new representation, which is an extension
of the BDDs of
Bryant, have been devised. This representation allows the
efficient representation of some infinite relations.

New algorithms to represent regular trees have been devised.

#### Abstract Interpretation

Abstract interpretation \cite{cousot78, cousot:cousot77, cousot:cousot92} is a
formalized framework designed to deal with approximations,
specially useful for the static analysis of programs. This framework is
very general and can be applied to a great variety of problems. In fact, a
lot of works have been achieved, either based on abstract interpretation
or trying to improve different parts of the framework. Because abstract
interpretation is so general, there have been theoretical advances and
researches to improve the possible applications of abstract interpretation.
When we try to apply abstract interpretation techniques, we must consider
the objects manipulated during the process. This question is crucial to
the implementation because it determines the accuracy and the complexity
of the computation. But in many cases, the objects that are available from
classical computer science may be inadequate, because they don't take
advantage of the possibility of approximation. Thus some specific objects and
their manipulation have been defined to be used in abstract interpretation.
Usually, they cannot represent any possible object but they come with
approximation mechanisms. If we consider for example sets of numbers or tuples
of numbers, Patrick Cousot and Nicolas Halbwachs presented in
\cite{cousot:halbwachs78} a representation based on convex polyhedra. Philippe
Granger proposed the use of congruences in \cite{granger89}.
#### Representation of Sets of Trees

Finite and infinite trees are present in a way or another in most parts of
computer science. They can be seen as the underlying structure of many
different kind of objects such as words, graphs ---provided we can distinguish
a node---, and of course non linear data structures. In program semantics, they
appear as control flow graphs or traces of execution. Even a set of trees can
be seen as a tree.
Alas, representing sets of trees is difficult.
In the case of finite trees, automata techniques can be implemented but
they lack some natural approximation method. When it comes to infinite trees
---a necessary step to represent graphs, for example--- we can only find
theoretical description of automata, but no realistic implementation is
available. When someone needs sets of trees in an abstract interpretation,
he will have to restrain his analysis to linear structures, such as in
\cite{deutsch92}, or to very crude approximations losing any relation between
different parts of the trees as in \cite{heintze:jaffar90a,heintze92}.
The goal of this thesis is to provide a set of tools to manipulate sets of
possibly infinite trees in abstract interpretation. The user of this set of
tools should only care about basic operations over sets of trees, such as
intersection, projection, etc. The set of tools would then take care of
computing those operations using an efficient internal representation of sets
of trees.
As representing sets of possibly infinite trees is a difficult task, some
operations on sets of trees will require the use of automatic approximation.
In order to make things easier for the user, operations of explicit
approximation will be provided, as well as algorithms to decide whether
there is an approximation that seems to suit the current iteration on the
set of trees. In order to give the possibility to trade between accuracy and
complexity options to force more automatic approximations should be
available.
In order to fulfill our objectives, the representation of sets of trees is
to combine two main qualities: efficiency and expressive power.
#### High Expressive Power for the Representation

All experiments in abstract interpretation indicate that the later we
approximate the more accurate the final result. In order to
approximate as late as possible, automatic approximation should be
reduced to a minimum. It means that for every representation used in
an abstract interpretation, we must try be able to represent the
largest set of object possible. In this thesis, we try in every area of the
representation to go as far as possible in the expressive power without,
hopefully, jeopardizing the efficiency of the representation.
#### Unicity

The efficiency of a representation lies in its compactness and in the
complexity of the algorithms using the objects. This is a matter of balance,
so we cannot use algorithms from data compression for example, because,
although the compactness of compressed objects is very good, their manipulation
would be very difficult. The notion of unicity, on the other hand, seems
to deal with every aspect of the efficiency of the representation.
The representation of an object is said to be unique if no other representation
represents this object in the same frame. A typical example of non-unicity is
the use of variable names: they produce at least as many possible
representation for a given object as the number of possible variable names.
Striving after the unicity of a representation is moving towards a better
compactness of this representation. Actually, if it is not unique, then the
representation contains some redundancy.
The problem with unicity is that it may require a lot of computation to
find a kind of normal form (the minimal state representation when possible).
Our argument is that, in most case, the gain in most algorithms makes up for
this overhead. It is particularly true if we need equality testing or
emptiness testing.
#### Formal Definitions

Because it deals with the essence of approximation from a very theoretical
point of vue, abstract interpretation is a mathematical model. It is very
well formalized, and this formal presentation of the model allows a very
high confidence in its correctness and a genericity of the results. The
drawback ---or the advantage--- is that any extension or application of
abstract interpretation should be very formal too.
In our case, a formal definition is an advantage, because we have to deal with
infinite structures and infinite behavior. Experience shows that in this
case, informal presentation and intuitive proofs can lead to wrong results.