Section: New Results

Shared-memory concurrency

The Linux Kernel Memory Model

Participants : Luc Maranget, Jade Alglave [University College London & ARM Ltd] , Paul Mckenney [IBM Corporation] , Andrea Parri [Sant'Anna School of Advanced Studies, Pisa, Italy] , Alan Stern [Harvard University] .

Modern multi-core and multi-processor computers do not follow the intuitive “sequential consistency” model that would define a concurrent execution as the interleaving of the executions of its constituent threads and that would command instantaneous writes to the shared memory. This situation is due both to in-core optimisations such as speculative and out-of-order execution of instructions, and to the presence of sophisticated (and cooperating) caching devices between processors and memory. Luc Maranget is taking part in an international research effort to define the semantics of the computers of the multi-core era, and more generally of shared-memory parallel devices or languages, with a clear initial focus on devices.

This year saw a publication on languages in an international conference. A multi-year effort to define a weak memory model for the Linux Kernel has yielded a scholarly paper [18] presented at the Architectural Support for Programming Languages and Operating Systems (ASPLOS) conference in March 2018. The article describes a formal model, the Linux Kernel Memory Model (LKMM), which defines how Linux kernel programs are supposed to behave. The model, a cat model, can be simulated using the herd simulator, allowing programmers to experiment and develop intuitions. The model was tested against hardware and refined in consultation with Linux maintainers. Finally, the ASPLOS paper formalizes the fundamental law of the Read-Copy-Update synchronization mechanism and proves that one of its implementations satisfies this law. It is worth noting that the LKMM is now part of the Linux kernel source (in the tools/ ) section). Luc Maranget and his co-authors are the official maintainers of this document.

The ARMv8 and RISC-V memory model

Participants : Will Deacon [ARM Ltd] , Luc Maranget, Jade Alglave [University College London & ARM Ltd] .

Jade Alglave and Luc Maranget are working on a mixed-size version of the ARMv8 memory model. This model builds on the aarch64.cat model authored last year by Will Deacon (ARM Ltd). This ongoing work is subject to IP restrictions which we hope to lift next year.

Luc Maranget is an individual member of the memory model group of the RISC-V consortium (https://riscv.org/). Version V2.3 of the User-Level ISA Specification is now complete and should be released soon. This version features the first occurrence of a detailed memory model expressed in English, as well as its transliteration in Cat authored by Luc Maranget.

Work on diy

Participant : Luc Maranget.

This year, new synchronisation primitives were added to the Linux kernel memory model; ARMv8 atomic instructions were added; and more.

A more significant improvement is the introduction of mixed-size accesses. The tools can now handle a new view of memory, where memory is made up of elementary cells (typically bytes) that can be read or written as groups of contiguous cells (typically up to quadwords of 8 bytes). This preliminary work paves the way to the simulation of more elaborate memory models.

Unifying axiomatic and operational weak memory models

Participants : Jean-Marie Madiot, Jade Alglave [University College London & ARM Ltd] , Simon Castellan [Imperial College London] .

Modern multi-processors optimize the running speed of programs using a variety of techniques, including caching, instruction reordering, and branch speculation. While those techniques are perfectly invisible to sequential programs, such is not the case for concurrent programs that execute several threads and share memory: threads do not share at every point in time a single consistent view of memory. A weak memory model offers only weak consistency guarantees when reasoning about the permitted behaviors of a program. Until now, there have been two kinds of such models, based on different mathematical foundations: axiomatic models and operational models.

Axiomatic models explicitly represent the dependencies between the program and memory actions. These models are convenient for causal reasoning about programs. They are also well-suited to the simulation and testing of hardware microprocessors.

Operational models represent program states directly, thus can be used to reason on programs: program logics become applicable, and the reasoning behind nondeterministic behavior is much clearer. This makes them preferable for reasoning about software.

Jean-Marie Madiot has been collaborating with weak memory model expert Jade Alglave and concurrent game semantics researcher Simon Castellan in order to unify these styles, in a way that attempts to combine the best of both approaches. The first results are a formalisation of TSO-style architectures using partial-order techniques similar to the ones used in game semantics, and a proof of a stronger-than-state-of-art “data-race freedom” theorem: well-synchronised programs can assume a strong memory model. These results have been submitted for publication.

This is a first step towards tractable verification of concurrent programs, combining software verification using concurrent program logics, in the top layer, and hardware testing using weak memory models, in the bottom layer. Our hope is to leave no unverified gap between software and hardware, even (and especially) in the presence of concurrency.

Granularity control for parallel programs

Participants : Umut Acar, Vitaly Aksenov, Arthur Charguéraud, Adrien Guatto [Université Paris Diderot] , Mike Rainey, Filip Sieczkowski [University of Wrocław] .

This year, the DeepSea team continued their work on granularity control techniques for parallel programs.

A first line of research is based on the use of programmer-supplied asymptotic complexity functions, combined with runtime measurements. This work first appeared at PPoPP 2018 [16] in the form of a brief announcement, and was subsequently accepted for publication at PPoPP 2019 as a full paper.

A second line of research, known as heartbeat scheduling, is based on instrumenting the runtime system so that parallel function calls are initially executed as normal function calls, by pushing a frame on the stack, and subsequently can be promoted and become independent threads. This research has been presented at PLDI 2018 [14].

Theory and analysis of concurrent algorithms

Participant : Vitaly Aksenov.

Vitaly Aksenov, in collaboration with Petr Kuznetsov (Télécom ParisTech) and Anatoly Shalyto (ITMO University), proved that no wait-free linearizable implementation of a stack using read, write, compare & swap and fetch & add operations can be help-free. This proof corrects a mistake in an earlier proof by Censor-Hillel et al. The result was published at the the International Conference on Networked Systems (NETYS 2018) [17].

Vitaly Aksenov, in collaboration with Dan Alistarh (IST Austria) and Petr Kuznetsov (Télécom ParisTech), worked on performance prediction for coarse-grained locking. They describe a simple model that can be used to predict the throughput of coarse-grained lock-based algorithms. They show that their model works well for CLH locks, and thus can be expected to work for other popular lock designs such as TTAS or MCS. This work appeared as a brief announcement at PODC 2018 [16].

The aforementioned results by Vitaly Aksenov are also covered in his Ph.D. manuscript [11].