Section: New Results

The OCaml language and system

The OCaml system

Participants : Damien Doligez, Alain Frisch [Lexifi SAS] , Jacques Garrigue [University of Nagoya] , Fabrice Le Fessant, Xavier Leroy, Luc Maranget, Gabriel Scherer, Mark Shinwell [Jane Street] , Leo White [Jane Street] , Jeremy Yallop [OCaml Labs, Cambridge University] .

This year, we released versions 4.02.2 and 4.02.3 of the OCaml system. These are minor releases that fix about 100 bugs and implement 12 minor new features, including support for nonrecursive type definitions and a higher-level interface with documentation generation tools.

Most of our activity was devoted to preparing the next major release of OCaml, version 4.03.0, which is expected in the first quarter of 2016. The novelties we worked on include:

  • Inline record types as arguments to constructors of sum types, combining the clarity and extensibility brought by named record fields with the compact in-memory representation of unnamed constructor arguments.

  • Improved redudancy and exhaustiveness checks for pattern-matching over generalized algebraic data types (GADTs) [41] .

  • Improved unboxing optimizations for numbers, including the ability to mark arguments and results of external C functions as unboxed.

  • The garbage collector was made more incremental, so as to reduce the worst-case GC pause times.

  • The native-code compiler was ported to two new architectures: PowerPC 64 bits (including IBM's new little-endian variant) and IBM zSystems.

On the organization side, we switched to Github as the central repository for the OCaml development sources. Github facilitates collaborative work among the growing community of contributors to the OCaml code base. In 2015, more than 100 contributors proposed small or large improvements to the OCaml compiler distribution.

Memory profiling OCaml applications

Participants : Fabrice Le Fessant, Çagdas Bozman [OCamlPro] , Albin Coquereau [OCamlPro] .

Most modern languages make use of automatic memory management to discharge the programmer from the burden of explicitly allocating and releasing chunks of memory. As a consequence, when an application exhibits an unexpected usage of memory, programmers need new tools to understand what is happening and how to solve such an issue. In OCaml, the compact representation of values, with almost no runtime type information, makes the design of such tools more complex.

In the past, we have experimented with different tools to profile the memory usage of real OCaml applications, in particular one that saves snapshots of the heap after every garbage collection. Snapshots can then be analysed to display the evolution of memory usage, with detailed information on the types of values, where they were allocated and from where they are still reachable.

This year, we experimented in three new directions, mostly driven by the size of the snapshots to be analysed:

  • We studied several ways of displaying snapshots. Because of the large amount of information contained in a snapshot, it is hard for a typical user to find what he or she is looking for. We tried multiple filtering methods, based on graph algorithms, to remove the least significant information from the reports given to the user.

  • We experimented with new algorithms to compress and analyse huge memory snapshots, i.e., snapshots that are too big to fit in the computer's memory. Indeed, standard analyses on snapshots bigger than the available memory are too long to run in practice because of random disk accesses. Thus, we tried several compression methods for snapshots and graph-reduced them to fit in memory, without losing any information, reaching a 50x speedup in complete analysis time.

  • We implemented a new graph algorithm to merge sets of blocks in memory by the sets of roots they are reachable from. Such a computation was heretofore supposed to be untractable in practice, but could actually be computed in our case on huge compressed snapshots in reasonable time.

Advanced development tools for OCaml

Participants : Fabrice Le Fessant, Pierre Chambart [OCamlPro] , Michael Laporte [OCamlPro] .

In order to promote the use of OCaml in industrial contexts, we have worked on improving the tools that accompany OCaml:

  • We developed the first prototype of a native debugger for OCaml, based on the LLDB debugging framework on top of LLVM. For that, we first generated a full OCaml binding for the LLDB library, by parsing the C++ headers of the libraries and automatically generating OCaml and C++ stubs. We were then able to use the OCaml binding to develop several tools, ranging from a simple tool that displays the internal GC information of a finished OCaml application, to an almost complete debugger, which displays OCaml values using runtime type information added for memory profiling.

  • We also developed a new profiling framework for OCaml, called operf. The framework is composed of two tools: operf-micro can be used to run micro-benchmarks directly from inside modified OCaml compiler sources, while the operf-macro tool can be used to evaluate the impact of a new compiler optimization on a large set of OPAM packages.

  • Finally, we came up with new ideas for ocp-build, a generic building tool with OCaml-specific support, to improve the expressiveness of its package description language and to easily describe cross-compilation of OCaml packages.

Error diagnosis in Menhir parsers

Participant : François Pottier.

LR parsers are powerful and efficient, but traditionally have done a poor job of explaining syntax errors. Although it is easy to report where an error was detected, it seems difficult to explain what has been understood so far and what is expected next. The OCaml and CompCert compilers, until now, have offered little information to the user beyond the traditional “syntax error” message.

In 2003, Jeffery proposed associating a fixed diagnostic message with every state of the LR automaton (therefore ignoring the automaton's stack). This simple approach may seem tempting. However, a typical automaton has hundreds or thousands of states. Not all of them can trigger an error, but it is difficult to tell which can, and which cannot. Furthermore, for certain states, it is difficult (or even impossible) to write an accurate diagnostic message, because some vital contextual information resides in the stack, which Jeffery's method cannot access.

In 2015, François Pottier proposed a reachability algorithm for LR automata, which he implemented in the Menhir parser generator (see section  6.3 ). This algorithm allows finding out which states can trigger an error and (therefore) require writing a diagnostic message. Furthermore, Pottier proposed two mechanisms for influencing where errors are detected. If used appropriately, these mechanisms make it easier (or possible) to write an accurate diagnostic message.

Pottier applied this approach to the C grammar in the front-end of the CompCert compiler, therefore allowing CompCert to produce better diagnostic messages when a C program is syntactically incorrect.

A short paper describing this work will be presented at JFLA 2016 [29] . A longer paper is in submission.

Improvements to Menhir

Participants : Frédéric Bour [independent consultant] , Jacques-Henri Jourdan, François Pottier, Yann Régis-Gianas [team πr2] , Gabriel Scherer.

In 2015, The Menhir parser generator (see section  6.3 ) was extended with many new features, several of which originated in the Merlin IDE for OCaml and were ported back into Menhir.

  • The parsers generated by Menhir are now incremental: they can be stopped and resumed at any point, at essentially no cost. This is exploited in Merlin, where the text is re-parsed after every keystroke.

  • The state of the parser can be inspected by the user. This allows building custom libraries, outside Menhir, for error diagnosis, error recovery, etc. This is exploited in Merlin, where a valid abstract syntax tree is built (and passed to the OCaml type-checker) even if the text contains syntax errors.

  • A reachability algorithm has been implemented (see section  7.4.4 ). It allows finding out which states can trigger an error and (therefore) require a diagnostic message to be written. It is accompanied with several tools that help maintain the database of diagnostic messages as the grammar evolves.

  • Compatibility with ocamlyacc has been improved, in particular insofar as the computation of locations is concerned. This should help port the OCaml parser from ocamlyacc to Menhir, a transition that we envision making in the near future. This should help improve the quality of OCaml's syntax error messages.