Static Analysis of Spreadsheet Applications

Host laboratory: École Normale Supérieure
Graduate school: École Polytechnique
Employer: INRIA
Date: September 23rd, 2015

Popular Science Summary

Spreadsheets are ubiquitous and employed by large user bases worldwide. From small data tables to big applications comprised of formulas and macros, they are used extensively across industries and remain at the heart of modern businesses. It is important to be able to automatically verify and inspect the properties of numerous and complex spreadsheets. However, it is a difficult task, especially for complex spreadsheets, which can be run with different input data by multiple users in varied ways. To address these challenges, this thesis proposes to use and reason about abstractions that over-approximates concrete behaviors of spreadsheet programs under various circumstances. This approach consists in applying abstract interpretation based static analysis to spreadsheets.

First, we start by formalizing spreadsheet languages, which combine formulas and macros, and manipulate data in two-dimensional arrays. Second, we design a tool that permits of reasoning about possibly dynamic spreadsheet zones, instead of exact spreadsheet cells. Third, we propose an analysis to verify the safety of spreadsheet applications by types. For instance, ill-typed operations such as comparisons of integer values with string values are deemed suspicious, and can be clues to spreadsheet defects; thus, the analysis raises alarms if such operations may emerge in a future execution of a spreadsheet. The experimental results of our analyzer show that our tool is able to automatically verify a large number of real-world spreadsheets and discovers bugs that are difficult to detect by code review or by testing. Last, we extend the analysis to automatically identify all the used, defined and available cells that a spreadsheet application can involve. As a result, users have more transparency and assurance when developing their spreadsheets, especially those that contain macros. This work is actually an example of extension of our analysis to answer different needs generated by the development and management of spreadsheet applications.

Jury

Mr Eric GOUBAULT (École Polytechnique), President of the committee
Mr Pierre JOUVELOT (Mines ParisTech), Examiner
Mr Xavier LEROY (INRIA), Examiner
Mr Francesco LOGOZZO (Facebook), Examiner
Mr Xavier RIVAL (École Normale Supérieure & INRIA), Supervisor
Mr Peter SESTOFT (IT University of Copenhagen), Referee
Mr Shriram KRISHNAMURTHI (Brown University), Referee