TP Overview

Comes mainly from here and here. Thanks Antoine and Marc !

Installation

Install Ocaml

We will use OCaml, of course. Every version between 4.05.0 and 4.10.0 are ok. Probably older one too, I didn’t check. It is already installed in the Labsession computers.

If you want it on your computer, the easiest way to install it is to install opam. Please refer to the official webpage.

Don’t forget to add eval $(opam config env) to your .bashrc.

Once OCaml is installed, you need to install some OCaml libraries:

opam install zarith dune menhir

If you are working on a personnal computer you can use the depext package to install external dependancies.

opam install depext
opam depext -i zarith dune menhir

Other ressources :

  • OCamlPro’s cheatsheets
  • Jean-Christophe Filliatre’s advice to setup an OCaml environment.

Compile the subject

First download the source archive and decompress it using

tar -xzvf tp-semantics.tar.gz

To compile the project run

make

You should see a bunch of warning (you should ignore them…) and two executables analyser.bc and analyzer.exe.

If you have any errors during this process please contact me as soon as possible.

Language

Start by downloading the package. You will work with it during the 5 sessions.

The package contains:

  • a parser for the language, programmed in OCamlLex and Menhir: lexer.mll and parser.mly;

  • the type of abstract syntax trees output by the parser, defined in abstract_syntax_tree.ml;

  • a pretty-printer, to print back an abstract syntax tree into the original language, in abstract_syntax_printer.ml;

  • in analyzer.ml, a simple driver that takes a file name passed as argument, parses it, and prints it back.

Syntax

The language is a very simple “curly brackets” C-like language. A program is composed of a sequence of statements of the form:

  • variable declaration: var var1 [= expr1], var2 [=expr2],...;

  • assignment: var = expr;

  • conditional statement: if (expr) stat_1; or if (expr) stat_1; else stat_2;

  • while loops: while (expr) stat;

  • blocks in curly brackets { stat_1; ... stat_n; }

Non-standard statements include:

  • assertions of boolean expressions: assert (expr);

  • variable printing: print (var_1,...,var_n);

  • failure: halt; which stops the program immediately

Expressions include:

  • integer arithmetic operators: + (unary and binary), - (unary and binary), *, /, % (modulo);

  • boolean operators: && (and, binary), || (or, binary), ! (negation, unary);

  • integer comparisons <, <=, >, >=;

  • equality == and disequality !=, that can be used to compare either two integers or two boolean values;

  • constants, including integers, and the boolean constants true and false;

  • the special expression rand(l,h) that denotes the non-deterministic interval of integers between the constant l and the constant h.

The operators have their usual precedence, and you can group expressions using parentheses.

You can use the standard C-like /* ... */ and // comments.

A single declaration can declare an arbitrary number of variables var1, var2, etc. Note that there is no static type information in declarations: the language is dynamically typed. Each variable name is optionally followed by an initialization of the form = expr, which behaves similarly to an assignment. If there is no initialization expression, the variable is assumed to be non-initialized. Note that using a non-initialized value in an expression results in a value error.

A variable declared in a block starts existing at the point of declaration and stops existing at the end of the block; this is the variable scope. It is important to distinguish variables of the same name, in case several of them exist at the same time (in nested scopes). Scoping is already resolved in the base package.

Semantics

Variables have no type and can hold either an integer or a boolean value, à la Python: upon assignation, a variable takes the type of the value it is assigned. In our simple language, there are only boolean and integer values. There is a type error in case an integer value is used with a boolean operator, or the converse. Subsequently, we do not distinguish statically between boolean and integer expressions: only values have a type. It is an error to use operators with values of the wrong type, such as adding two boolean values. This is detected at run-time, when evaluating the expression in the current environment.

Other run-time errors include:

  • Divisions and modulos by zero;
  • using a non-boolean value in tests, loops and assert conditions;
  • using a variable that has never been assigned to;
  • asserting a condition that is false;
  • executing the halt statement.

ERROR module

The ERROR signature provides some utilities on set of errors. It manipulates two types: ERROR.err that corresponds to an error, ERROR.t, that represents a set of error to collect all errors raised in the evaluation.

  • make_err creates an error
  • the error function creates a singleton (of type ERROR.t)
  • empty, add, union, do what you expect.

To avoid having to handle overflows, our integer values will not be represented using machine integers, but as unbounded mathematical integers. You will use the ZArith library, which provides a simple API (similar to Int32) to manipulate unbounded integers.

Code architecture

The code shall be modular and as reusable as possible. This is story time! At the very top, there is the Interpreter module: its job is to iterate over the AST to run the program/analysis. There are two iterators: a concrete one (that may not terminate) and an abstract one, that always terminates. You will write the first one for TP1 and reuse it for the next session. The other interpreter is to write in TP3 and use it for all the following sessions.

Under it, there is an Environment module. It represents all the information about the memory state. There are several of them.

  • concrete: it stores a set of mappings variable → value, to run a non deterministic program. It needs an underlying module to handle value computations. This is TP1.

  • typing: it stores a mapping from variables to types. This will be TP2.

  • values: it stores a mapping from variables to abstract values. It needs an underlying module to handle abstract values computations. This will be TP3.

  • equality: it only stores equality relation between variables. This will be TP4.

  • product: it has two underlying environments and stores both values and make them cooperate. This will be TP5.

Some of them need an underlying module that handles computation on values: this is the Value module. There is only one. It takes two underlying domains: one for numeric values and one for boolean values. The value of a variable is thus an integer and a boolean (abstract or concrete) value. Both of them are optional. This module is in two parts: since the code is very redundant, the module ValueBasics that handle the orgy of pattern matching, and Value just use it with appropriate functions.

Now, there are integer (resp. boolean) value modules in int_domain (resp. bool_domain).

Under all of that, there is the error module. It is the common types of errors across all the hierarchy of modules.

Now, look at analyzer.ml. There is a pattern matching to instantiate all the functors. You have to implement the corresponding modules (everywhere there are assert false). Implement in the order of instantiation, from the bottom to the top.