Section: New Software and Platforms


Until October 2014, ABS was distributing isolated programs to solve selected tasks in computational structural biology, including:

  • vorpatch and compatch : Modeling and Comparing Protein Binding Patches,

  • intervor : Modeling Macro-molecular Interfaces,

  • vorlume : Computing Molecular Surfaces and Volumes with Certificates,

  • ESBTL : the Easy Structural Biology Template Library.

This software has been completely repackaged within the Structural Bioinformatics Library, a C++ library developed in the scope of an Inria supported ADT. The SBL will be released early 2015. Below, we briefly review its spirit and contents.

The Structural Bioinformatics Library (SBL): overview. The Structural Bioinformatics Library (SBL ) is a generic C++/python library providing combinatorial, geometric and topological tools to solve problems in computational structural biology (CSB). Its design is meant to accommodate both the variety of models coding the physical and chemical properties of macro-molecular systems, and the variety of operations undertaken on these models. The models supported either consist of unions of balls (van der Waals models, solvent accessible models), or representations of conformations based on Cartesian or internal coordinates (distances and angles between the atoms). The operations provided revolve around the problem of understanding the relationship between the structure and the function of macro-molecules and their complexes, and deal with complementary aspects, namely geometric, topological, and combinatorial methods are used to foster our understanding of bio-physical and biological properties. Software development in this context is especially challenging due to the interactions between these complex models and operations.

To accommodate this complexity, software components of the SBL are organized into four categories:

  • SBL-APPLICATIONS: end-user applications solving specific applied problems.

  • SBL-CORE: low-level generic C++ classes templated by traits classes specifying C++ concepts (The design has been guided by that used in the Computational Geometry Algorithm Library (CGAL), see http://www.cgal.org ).

  • SBL-MODELS: C++ models matching the C++ concepts required to instantiate classes from SBL-CORE .

  • SBL-MODULES: C++ classes instantiating classes from the SBL-CORE with specific biophysical models from SBL-MODELS . A module may be seen as a black box transforming an input into an output. With modules, an application workflow consists of interconnected modules.

The SBL for end-users. End users will find in the SBL portable applications running on Linux, and MacOS. These applications split into the following categories:

  • Space Filling Models: applications dealing with molecular models defined by unions of balls. Current statistics are:

    • # classes: 151

    • # lines of C++/python: 65,000

    • # pages of documentation (user + reference manuals): 1000

  • Conformational Analysis: applications dealing with molecular flexibility. Current statistics are:

    • # classes: 110

    • # lines of C++/python: 49,000

    • # pages of documentation (user + reference manuals): 800

  • Data Analysis: applications to handle input data and results, using standard tools revolving around the XML file format (in particular the XPath query language). These tools allow automating data storage, parsing and retrieval, so that upon running calculations with applications, statistical analysis and plots are a handful of python lines away.

  • Large assemblies: applications dealing with macro-molecular assemblies involving from tens to hundreds of macro-molecules.

The SBL for developers. Development with the SBL may occur at two levels.

Low level developments may use classes from SBL-CORE and SBL-MODELS . In fact, such developments are equivalent to those based upon C++ libraries such as CGAL (http://www.cgal.org/ ) or boost C++ libraries (http://www.boost.org/ ). It should be noticed that the SBL heavily relies on these libraries. The SBL-CORE is organized into four sub-sections:

  • CADS : Combinatorial Algorithms and Data Structures.

  • GT : Computational geometry and computational topology.

  • CSB : Computational Structural Biology.

  • IO : Input / Output.

It should also be stressed that these packages implement algorithms not available elsewhere, or available in a non-generic guise. Due to the modular structure of the library, should valuable implementations be made available outside the SBL (e.g. in CGAL or boost), a substitution may occur.

Intermediate level developments should be based upon modules, since modules allow the development of applications without the burden of instantiating low level classes. In fact, once modules are available, designing an application merely consists of connecting modules.

Interoperability. The SBL is interoperable with existing molecular modeling systems, at several levels:

  • At the library level, our state-of-the-art algorithms (e.g. the computation of molecular surfaces and volumes) can be integrated within existing software (e.g. molecular dynamics software), by instantiating the required classes from SBL-CORE , or using the adequate modules.

  • At the application level, our applications can easily be integrated within processing pipelines, since the format used for input and output are standard ones. (For input, the PDB format can always be used. For output, our applications generate XML files.)

  • Finally, for visualization purposes, our applications generate outputs for the two reference molecular modeling environments, namely Visual Molecular Dynamics (http://www.ks.uiuc.edu/Research/vmd/ ) and Pymol (http://www.pymol.org/ ).

Releases, distribution, and licence. The SBL will be released under a proprietary open source licence. In a nutshell, academic users can use and modify the code at their discretion, for private purposes. But distributing these changes, or doing business with the SBL is forbidden. However, novel capabilities matching the design choices of the library will be welcome, and may be integrated.

The source code will be distributed from http://structural-bioinformatics-library.org/ , as a tarball and also via a git repository. Bugzilla will be used to handle user's feedback and bug tracking.

The releases are scheduled as follows:

  • February 2015: applications from the space filling model group, and the accompanying low level classes.

  • April 2015: applications from conformational analysis group, and the accompanying low level classes.

  • July 2015: applications from large assemblies group, and the accompanying low level classes.