Section: New Software and Platforms

Platforms and toolboxes

A goal of the team is to facilitate interplays between tools for biological data analysis and integration. Our tools aim at guiding the user to progressively reduce the space of models (families of sequences of genes or proteins, families of keys actors involved in a system response, dynamical models) which are compatible with both knowledge and experimental observations.

Most of our tools are developed in collaboration with the GenOuest resource and data center hosted in the IRISA laboratory, including their computer facilities [more info]. It worths considering them into larger dedicated environments to benefit from the expertise of other research groups.

  • The BioShadock repository allows one to share the different docker containers that we are developing [website].

  • The Inria chile Mobyle portal gathers some of the tools that were developed in collaboration with Dyliss, such as meneco, shogen and lombarde [website].

  • The bioASP portal gather most of ASP-based python packages that we are developping in collaboration with Potsdam university [website]

  • The GenOuest galaxy portal now provides access to most tools for integrative biology and sequence annotation (access on demand).

AuReMe - Tracable reconstruction of metabolic networks

The toolbox AuReMe allows for the Automatic Reconstruction of Metabolic networks based on the combination of multiple heteregeneous data and knowledge sources. Since 2016, the workflow has been made available as a Docker image to facilitate its distribution among the scientific community [web page].

  • The Model-management PADmet module allows conciliating genomics and metabolic network information used to produce the metabolic model within a local database that traces all the reconstruction process steps and to connect software in the pipeline. This toolbox was completly redesigned in 2016. [package]

  • The meneco python package allows filling the gaps of a metabolic network by using a qualitative approach to elaborate the biosynthetic capacities; the problem is viewed as a combinatorial optimization problem encoded in a Answer Set Programming Problem [87] [64]. [python package].

  • The shogen python package allows aligning genome and metabolic network to identify genome units which contain a large density of genes coding for enzymes that regulate successive reactions of metabolic pathways; the problem is also encoded with an ASP program. [62]. [python package].

  • The Manual curation assistance PADmet module allows for curating the reported metabolic networks and modify metadata [package].

  • The Wiki-export PADmet module enables the export of the metabolic network and its functional genomic unit as a local wiki platform allowing the user-friendly investigation of the network together with the main steps used to reconstruct it. It was developed in 2016. [package].

Filtering interaction networks with graph-based optimization criteria

The goal is to offer a toolbox for the reconstruction of networks from genome, literature and large-scale observation data (expression data, metabolomics...) in order to elucidate the main regulators of an observed phenotype. Most of the optimization issues are addressed with Answer Set Programming.

  • The lombarde package enables the filtering of transcription-factor/binding-site regulatory networks with mutual information reported by the response to environmental perturbations. The high level of false-positive interactions is filters according to graph-based criteria. Knowledge about regulatory modules such as operons or the output of the shogen package can be taken into account [48][13] [web server].

  • The KeyRegulatorFinder package allows searching key regulators of lists of molecules (like metabolites, enzymes or genes) by taking advantage of knowledge databases in cell metabolism and signaling. The complete information is transcribed into a large-scale interaction graph which is filtered to report the most significant upstream regulators of the considered list of molecules [61] [package].

  • The powerGrasp python package provides an implementation of graph compression methods oriented toward visualization, and based on power graph analysis. [package].

  • The iggy package enables the repairing of an interaction graph with respect to expression data. It proposes a range of different operations for altering experimental data and/or a biological network in order to re-establish their mutual consistency, an indispensable prerequisite for automated prediction. For accomplishing repair and prediction, we take advantage of the distinguished modeling and reasoning capacities of Answer Set Programming. [5] [93] [Python package] [web server].

Caspo - Studying synchronous boolean networks

The caspo pipeline is dedicated to automated reasoning on logical signaling networks. The main underlying issue is that inherent experimental noise is considered, so that many different logical networks can be compatible with a set of experimental observations.

Software provides an easy to use software for the study of synchronous logical (boolean) networks. In 2016, the tool was redesigned to enhance its functionalities and integrated in a docker container to facilitate its use on any platform [86] [28] [python package and docker container].

  • The caspo-learn module performs an automated inference of logical networks from the observed response to different perturbations (phosphoproteomics datasets). It allows for identifying admissible large-scale logic models saving a lot of efforts and without any a priori bias. It is also included in the cellNopt package (http://www.cellnopt.org/) [7] [94].

  • The caspo-classify, predict and visualize modules allows for classifying a family of boolean networks with respect to their input–out- put predictions [7].

  • The caspo-design module designs experimental perturbations which would allow for an optimal discrimination of rival models in a family of boolean networks [95].

  • The caspo-control module identifies key-players of a family of networks: it computes robust intervention strategies (i.e. inclusion minimal sets of knock-ins and knock-outs) that force a set of target species or compounds into a desired steady state [73].

  • caspo-timeseries module have been designed by our colleagues from LRI as an extension of the caspo pipeline to take into account time-series observation datasets in the learning procedure [23] [python package and docker container].

cadbiom - Building and analyzing the asynchronous dynamics of enriched logical networks

Based on Guarded transition semantic, the cadbiom software provides a formal framework to help the modeling of biological systems such as cell signaling network. It allows investigating synchronization events in biological networks. In 2016, the tool was integrated in a docker container in order to facilitate its use on any platform [49] [docker container] [web server].

  • The cadbiom graphical interface is useful to build and study moderate size models. It provides means for model exploration, simulation and checking. For large-scale models, the graphical interface allows to focus on specific nodes of interest.

  • The cadbiom API allows to load a model (including large-scale ones), perform static analysis (exploration, frontier computation, statistics, and dependence graph computation) and check temporal properties on a finite horizon in the future or in the past.

  • Exploring large-scale knowledge repositories A main feature of cadbiom is that automatic translation of the large-scale PID repository (about 10,000 curated interactions) have been automatically translated into the cadbiom formalism. Therefore, the API allows for computing the upstream regulators of any set of genes based on this large-scale repository.

Protomata - Expressive pattern discovery on protein sequences

Protomata is a machine learning suite for the inference of automata characterizing (functional) families of proteins from available sequences. Based on a new kind of alignment said partial and local, it learns precise characterizations of the families – beyond the scope of classical sequence patterns such as PSSM, Profile HMM, or Prosite Patterns – allowing to predict new family members with a high specificity.

Protomata gives access to the three main modules as stand-alone programs, which are also integrated in a single workflow protomata-learner:

  • Paloma builds partial local multiple alignments;

  • Protobuild infers automata from these alignements;

  • Protomatch and Protoalign scan, parse and align new sequences based on the automata infered previously. This module was improved in 2016 by embedding new options to score the sequences with respect to all accepting paths (Forward score) in addition to the scoring module based on the best path (Viterbi score). More generally, we have worked on the efficiency of the automata's weighting scheme based on the state-of-the-art schemes used for profile HMMs.

The suite is completed by many tools to handle or visualize data and can be used online via a [web interface].

Logol - Complex pattern modelling and matching

The Logol toolbox is a swiss-army-knife for pattern matching on DNA/RNA/Protein sequences, using a high-level grammatical formalism to permit a large expressivity for patterns [54]. A Logol pattern consists in a complex combination of motifs (such as degenerated strings) and structures (such as imperfect stem-loop ou repeats). Compared to other specialized pattern matching tools, some of the Logol key features are the possibilities to divide a pattern description into several sub-patterns, to enable the use of ambiguous models or to permit the inclusion of negative conditions in a pattern definition. Possible fields of application are the detection of mutated binding sites [32] or stem-loop identification (e.g. in CRISPR (http://crispi.genouest.org/) [10]) [web page].

  • The Graphical designer allows a user to iteratively build a complex pattern based on basic graphical patterns. The associated grammar file is an export of the graphical designer. In 2015, the efficiency of the tool was improved by slight evolutions of the underlying grammar.

  • The LogolMatch parser takes as input a biological (nucleic or amino acid) sequence and a grammar file (i.e. a pattern). It combines a grammar analyzer, a sequence analyzer and a prolog Library. It returns a file containing all the occurrences of the pattern in the sequence with their parsing details.

  • Full genome analysis, and connection to biological databases have been made available recently.