Representation of Sets of Trees for Abstract Interpretation

PhD thesis of Laurent Mauborgne

Electronic versions: compressed postscript (1 512 K), DVI (1 759 K) and PDF (1 226 K).

Contents

Summary of the thesis
The introduction chapter of the thesis
- The motivations of the subject of the thesis
- The goal of the thesis
- Some of the guiding ideas of the thesis

Summary of the thesis

This thesis develops a new representation for possibly infinite sets of possibly infinite trees. This representation is called tree schema. The main properties of this representation are:

an expressive power strictly greater than infinite tree automata or tree automata with equality relations between brothers,
a concern in efficiency of the representation in the design of tree schemata. Emptiness testing has a constant time complexity.
the ability to find some possible approximations during fixpoint iteration
the possibility of adding alien constraints in the tree schemata easily.

In order to achieve these results, a new representation, which is an extension of the BDDs of Bryant, have been devised. This representation allows the efficient representation of some infinite relations.

New algorithms to represent regular trees have been devised.

Introduction of the thesis

Motivations

Abstract Interpretation

Abstract interpretation \cite{cousot78, cousot:cousot77, cousot:cousot92} is a formalized framework designed to deal with approximations, specially useful for the static analysis of programs. This framework is very general and can be applied to a great variety of problems. In fact, a lot of works have been achieved, either based on abstract interpretation or trying to improve different parts of the framework. Because abstract interpretation is so general, there have been theoretical advances and researches to improve the possible applications of abstract interpretation. When we try to apply abstract interpretation techniques, we must consider the objects manipulated during the process. This question is crucial to the implementation because it determines the accuracy and the complexity of the computation. But in many cases, the objects that are available from classical computer science may be inadequate, because they don't take advantage of the possibility of approximation. Thus some specific objects and their manipulation have been defined to be used in abstract interpretation. Usually, they cannot represent any possible object but they come with approximation mechanisms. If we consider for example sets of numbers or tuples of numbers, Patrick Cousot and Nicolas Halbwachs presented in \cite{cousot:halbwachs78} a representation based on convex polyhedra. Philippe Granger proposed the use of congruences in \cite{granger89}.

Representation of Sets of Trees

Finite and infinite trees are present in a way or another in most parts of computer science. They can be seen as the underlying structure of many different kind of objects such as words, graphs ---provided we can distinguish a node---, and of course non linear data structures. In program semantics, they appear as control flow graphs or traces of execution. Even a set of trees can be seen as a tree. Alas, representing sets of trees is difficult. In the case of finite trees, automata techniques can be implemented but they lack some natural approximation method. When it comes to infinite trees ---a necessary step to represent graphs, for example--- we can only find theoretical description of automata, but no realistic implementation is available. When someone needs sets of trees in an abstract interpretation, he will have to restrain his analysis to linear structures, such as in \cite{deutsch92}, or to very crude approximations losing any relation between different parts of the trees as in \cite{heintze:jaffar90a,heintze92}.

Goal of the Thesis

The goal of this thesis is to provide a set of tools to manipulate sets of possibly infinite trees in abstract interpretation. The user of this set of tools should only care about basic operations over sets of trees, such as intersection, projection, etc. The set of tools would then take care of computing those operations using an efficient internal representation of sets of trees. As representing sets of possibly infinite trees is a difficult task, some operations on sets of trees will require the use of automatic approximation. In order to make things easier for the user, operations of explicit approximation will be provided, as well as algorithms to decide whether there is an approximation that seems to suit the current iteration on the set of trees. In order to give the possibility to trade between accuracy and complexity options to force more automatic approximations should be available.

Guiding Ideas

In order to fulfill our objectives, the representation of sets of trees is to combine two main qualities: efficiency and expressive power.

High Expressive Power for the Representation

All experiments in abstract interpretation indicate that the later we approximate the more accurate the final result. In order to approximate as late as possible, automatic approximation should be reduced to a minimum. It means that for every representation used in an abstract interpretation, we must try be able to represent the largest set of object possible. In this thesis, we try in every area of the representation to go as far as possible in the expressive power without, hopefully, jeopardizing the efficiency of the representation.

Unicity

The efficiency of a representation lies in its compactness and in the complexity of the algorithms using the objects. This is a matter of balance, so we cannot use algorithms from data compression for example, because, although the compactness of compressed objects is very good, their manipulation would be very difficult. The notion of unicity, on the other hand, seems to deal with every aspect of the efficiency of the representation. The representation of an object is said to be unique if no other representation represents this object in the same frame. A typical example of non-unicity is the use of variable names: they produce at least as many possible representation for a given object as the number of possible variable names. Striving after the unicity of a representation is moving towards a better compactness of this representation. Actually, if it is not unique, then the representation contains some redundancy. The problem with unicity is that it may require a lot of computation to find a kind of normal form (the minimal state representation when possible). Our argument is that, in most case, the gain in most algorithms makes up for this overhead. It is particularly true if we need equality testing or emptiness testing.

Formal Definitions

Because it deals with the essence of approximation from a very theoretical point of vue, abstract interpretation is a mathematical model. It is very well formalized, and this formal presentation of the model allows a very high confidence in its correctness and a genericity of the results. The drawback ---or the advantage--- is that any extension or application of abstract interpretation should be very formal too. In our case, a formal definition is an advantage, because we have to deal with infinite structures and infinite behavior. Experience shows that in this case, informal presentation and intuitive proofs can lead to wrong results.