Comes mainly from here and here.
Thanks Antoine and Marc !
Installation
Install Ocaml
We will use OCaml, of course. Every version between 4.05.0 and 4.10.0 are ok. Probably older one too, I didn’t check. It is already installed in the Labsession computers.
If you want it on your computer, the easiest way to install it is to install opam. Please refer to the official webpage.
Don’t forget to add eval $(opam config env) to your .bashrc .
Once OCaml is installed, you need to install some OCaml libraries:
opam install zarith dune menhir
If you are working on a personnal computer you can use the depext package to install external dependancies.
opam install depext
opam depext -i zarith dune menhir
Other ressources :
- OCamlPro’s cheatsheets
- Jean-Christophe Filliatre’s advice to setup an OCaml environment.
Compile the subject
First download the source archive and decompress it using
tar -xzvf tp-semantics.tar.gz
To compile the project run
make
You should see a bunch of warning (you should ignore them…) and two executables analyser.bc and analyzer.exe .
If you have any errors during this process please contact me as soon as possible.
Language
Start by downloading the package. You will work with it during the 5 sessions.
The package contains:
a parser for the language, programmed in OCamlLex and Menhir: lexer.mll and parser.mly ;
the type of abstract syntax trees output by the parser, defined in abstract_syntax_tree.ml ;
a pretty-printer, to print back an abstract syntax tree into the original language, in abstract_syntax_printer.ml ;
in analyzer.ml , a simple driver that takes a file name passed as argument, parses it, and prints it back.
Syntax
The language is a very simple “curly brackets” C-like language. A program is composed of a sequence of statements of the form:
variable declaration: var var1 [= expr1], var2 [=expr2],...;
assignment: var = expr;
conditional statement: if (expr) stat_1; or if (expr) stat_1; else stat_2;
while loops: while (expr) stat;
blocks in curly brackets { stat_1; ... stat_n; }
Non-standard statements include:
assertions of boolean expressions: assert (expr);
variable printing: print (var_1,...,var_n);
failure: halt ; which stops the program immediately
Expressions include:
integer arithmetic operators: + (unary and binary), - (unary and binary), * , / , % (modulo);
boolean operators: && (and, binary), || (or, binary), ! (negation, unary);
integer comparisons < , <= , > , >= ;
equality == and disequality != , that can be used to compare either two integers or two boolean values;
constants, including integers, and the boolean constants true and false ;
the special expression rand(l,h) that denotes the non-deterministic interval of integers between the constant l and the constant h .
The operators have their usual precedence, and you can group expressions using parentheses.
You can use the standard C-like /* ... */ and // comments.
A single declaration can declare an arbitrary number of variables var1 , var2 , etc. Note that there is no static type information in declarations: the
language is dynamically typed. Each variable name is optionally followed by an initialization of the form = expr , which behaves similarly to an assignment. If there is no initialization
expression, the variable is assumed to be non-initialized. Note that using a non-initialized value in an expression results in a value error.
A variable declared in a block starts existing at the point of declaration and stops existing at the end of the block; this is the variable scope. It is important to distinguish
variables of the same name, in case several of them exist at the same time (in nested scopes). Scoping is already resolved in the base package.
Semantics
Variables have no type and can hold either an integer or a boolean value, à la Python: upon assignation, a variable takes the type of the value it is assigned. In our simple language,
there are only boolean and integer values. There is a type error in case an integer value is used with a boolean operator, or the converse. Subsequently, we do not distinguish statically between
boolean and integer expressions: only values have a type. It is an error to use operators with values of the wrong type, such as adding two boolean values. This is detected at run-time, when
evaluating the expression in the current environment.
Other run-time errors include:
- Divisions and modulos by zero;
- using a non-boolean value in tests, loops and assert conditions;
- using a variable that has never been assigned to;
- asserting a condition that is false;
- executing the
halt statement.
ERROR module
The ERROR signature provides some utilities on set of errors. It manipulates two types: ERROR.err that corresponds to an error, ERROR.t , that represents a set
of error to collect all errors raised in the evaluation.
make_err creates an error
- the
error function creates a singleton (of type ERROR.t )
empty , add , union , do what you expect.
To avoid having to handle overflows, our integer values will not be represented using machine integers, but as unbounded mathematical integers. You will use the ZArith library, which provides a simple API (similar to Int32) to manipulate unbounded integers.
Code architecture
The code shall be modular and as reusable as possible. This is story time! At the very top, there is the Interpreter module: its job is to iterate over the AST to run
the program/analysis. There are two iterators: a concrete one (that may not terminate) and an abstract one, that always terminates. You will write the first one for TP1 and reuse it for the next
session. The other interpreter is to write in TP3 and use it for all the following sessions.
Under it, there is an Environment module. It represents all the information about the memory state. There are several of them.
concrete: it stores a set of mappings variable → value, to run a non deterministic program. It needs an underlying module to handle value computations. This is TP1.
typing: it stores a mapping from variables to types. This will be TP2.
values: it stores a mapping from variables to abstract values. It needs an underlying module to handle abstract values computations. This will be TP3.
equality: it only stores equality relation between variables. This will be TP4.
product: it has two underlying environments and stores both values and make them cooperate. This will be TP5.
Some of them need an underlying module that handles computation on values: this is the Value module. There is only one. It takes two underlying domains: one for
numeric values and one for boolean values. The value of a variable is thus an integer and a boolean (abstract or concrete) value. Both of them are optional. This module is in two parts: since the code
is very redundant, the module ValueBasics that handle the orgy of pattern matching, and Value just use it with appropriate functions.
Now, there are integer (resp. boolean) value modules in int_domain (resp. bool_domain ).
Under all of that, there is the error module. It is the common types of errors across all the hierarchy of modules.
Now, look at analyzer.ml . There is a pattern matching to instantiate all the functors. You have to implement the corresponding modules (everywhere there are assert
false). Implement in the order of instantiation, from the bottom to the top.
|