Section: Highlights of the Year

Highlights of the Year

In 2015, several achievements are worth noticing in three realms, namely in computer science, computational structural biology, and software.

Computer Science

Beyond Two-sample-tests: Localizing Data Discrepancies in High-dimensional Spaces

Reference: [17]

In a nutshell: A classical problem in statistics is to decide whether two populations exhibit a statistically significant difference—the so-called two-sample test problem (TST). If so, another classical problem is to assess the magnitude of the difference—the so-called effect size calculation. While various effect size calculations were available for univariate data, hardly any existed for multivariate data.

Assessment: In this work, we provide one of the very first (if not the first) effect size calculation for multivariate data. The method combines techniques from machine learning (regression) and computational topology (topological persistence).

Computational Structural Biology

High Resolution Crystal Structures Leverage Protein Binding Affinity Predictions

Reference: [20]

In a nutshell: The binding affinity of two proteins forming a complex is a key quantity, whose estimation from structural data has remained elusive, a difficulty owing to the variety of protein binding modes. In this work, we present sparse models using up to five variables describing enthalpic and entropic variations upon binding, and a (cross-validation based) model selection procedure identifying the best sparse models built from a subset of these variables.

Assessment: Our estimation method ranks amongst the top two or three known so far, and is possibly the most accurate when applied to high resolution crystal structures. One of its key limitations (similar to contenders) is that the crystal structures of the partners and that of the complex are required. This limitation motivates our work on energy landscapes, see below.

Unveiling Contacts within Macro-molecular assemblies by solving Minimum Weight Connectivity Inference Problems

Reference: [14]

In a nutshell: Following the 2002 Nobel prize in chemistry of Fenn and Tanaka, and the recent developments led in particular by Carol Robinson (Oxford), native mass spectrometry is about to become a technique of major importance in structural biology, providing information on large assemblies (more than 10 subunits) studied in solution. One key question is to infer pairwise contacts between subunits from native mass spectrometry data.

Assessment: In this work, we provide a method to predict pairwise contacts between subunits of a large assembly, based on the composition of oligomers. The method is based on a mixed linear integer program, and essentially doubles the prediction performances of the method developed by Robinson et al.

Hybridizing Rapidly Growing Random Trees and Basin Hopping Yields an Improved Exploration of Energy Landscapes

Reference: [22]

In a nutshell: Energy landscapes of biomolecular systems code their emergent thermodynamic and kinetic properties, so that their exploration is a question of paramount importance. This task requires in particular finding (metastable) states and their occupancy probabilities. Landscape exploration methods can be ascribed to two categories: continuous methods related to molecular dynamics, and discrete methods related to Monte Carlo sampling.

Assessment: In this work, we present a discrete sampling method combining features of robotics inspired methods (rapidly expanding random trees), and of biophysics inspired methods (basin hopping). Our hybrid algorithm outperforms contenders significantly. It is possibly one of the most efficient sampling method for energy landscapes known to date, but making such a statement will require testing thoroughly on a variety of systems. The method may strike a major impact if we manage to qualify the conformational ensembles generated from a thermodynamic standpoint.

Conformational Ensembles and Sampled Energy Landscapes: Analysis and Comparison

Reference: [16]

In a nutshell: A paper presenting novel methods to analyze conformational ensembles and sampled energy landscapes, using techniques from optimal transportation theory and computational topology.

Assessment: The method proposed significantly enriches those classically used in biophysics, and triggered a collaboration with David Wales (Cambridge), one of the leading scientists on energy landscapes.

The Structural Bioinformatics Library

We released the Structural Bioinformatics Library, a library whose main features are detailed below.