EN FR
EN FR


Section: New Results

Proteins structures

Protein sequence alignment

In comparative protein modeling, the quality of a template model depends heavily on the quality of the initial alignment between a given protein with unknown structure to various template proteins, whose tertiary structure is available in the Protein Data Bank (PDB). Although pairwise sequence alignment has been solved for more than three decades, there remains a large discrepancy between the accuracy of the best sequence alignment between two amino acid sequences, as produced by the Needleman-Wunsch or Smith-Waterman algorithms, and that of the best structural alignment between two protein X-ray structures, as produced by the software Dali , Ce , Topofit , etc. To improve the quality of initial alignments in template modeling, one can integrate valuable information from an ensemble of generated suboptimal alignments, that is alignments whose score is below the best possible score. In a collaboration with P. Clote (Boston College/Digiteo [26] , we presented a novel algorithm to produce suboptimal pairwise alignments.

Protein-protein interaction :

A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate solutions, and then a scoring function is used to rank them in order to extract a native-like conformation. We have already demonstrated that using Voronoi constructions and a defined set of parameters, we could optimize an accurate scoring function. However, the precision of such a function is still not sufficient for large-scale exploration of the interactome.

Another geometric construction was also tested: the Laguerre tessellation. It also allows fast computation without losing the intrinsic properties of the biological objects. Related to the Voronoi construction, it was expected to better represent the physico-chemical properties of the partners. In , we present the comparison between both constructions.

We also worked on introducing a hierarchical analysis of the original complex three-dimensional structures used for learning, obtained by clustering. Using this clustering model we can optimize the scoring functions and get more accurate solutions. This scoring function has been tested on Capri scoring ensembles, and an at least acceptable conformation is found in the top 10 ranked solutions in all cases. This work was part of the thesis of Thomas Bourquard, defended in 2009.

A strong emphasis was recently made on the design of efficient complex filters. To achieve this goal, we focused on the use of collaborative filtering methods state of the art machine learning approaches combined with our genetic algorithm [9] .

We have also proposed an approach that improves the predictions made by Hex , a state-of-the art docking tool developped by Inria Nancy. We applied Voronoi fingerprint to the output of Hex and learn how to rank them, and we have tested new ranking strategies. The obtained ranking improve the initial ranking of Hex [33] , [23] .

We also decided to extend these techniques to the analysis of protein-nucleic acid complexes. The first preliminary developments and tests were performed by Adrien Guilhot during his M1 internship for two months.

Transmembrane β-barrels:

We have recently proposed an algorithm [31] that classifies Transmembrane β-Barrel Proteins (TMB) and predicts their structure. It first uses a simple probabilistic model to filter out the proteins and strands which are not beta-barrel. Then, we build a graph-theoretic model to fold into the super-secondary structure via dynamic programming. This step runs in O(n 3 ) time for the common up-down topology, and at most O(n 5 ) for the Greek key motifs, where n is the number of amino acids. Finally a predicted three-dimensional structure is built from the geometric criteria. If the pseudoenergy is insufficient, the protein is classified as a non-TMB protein. We have tested this approach on TMB and non-TMB proteins for classification and structure prediction. We tested classification on a dataset of 14238 proteins including 48 TMB and 14190 non-TMB proteins. Our classification results are very accurate and comparable to other algorithms [21] , [5] . Especially, our PPV, MCC and F-Scores are second only to a very recent algorithm by Freeman and Wimley [39] , which relies heavily on training data. We also tested the structure prediction on 42 proteins from the TMB and compared to other existing algorithms. The results are comparable to existing algorithms, the accuracy ranges from 85-93%, depending upon the parameter used. This is very promising given that other algorithms rely heavily on homology and training datasets and may be overfitting. Our approach can be further improved by refining the energetic model, especially on turns and loops.

In addition, we have developed consensus methods to combine multiple secondary structures into one more reliable solution. Our results show that our technique can be used to combine multiple solutions to produce structures that are more than any of the input structures. These methods are based mainly on social choice theory and known properties of TMB proteins. In addition, we are working on methods for combining information on the super-secondary structures, and using them to augment the supersecondary structure provided by our approach.