Section: New Results


Distance-Constrained Elementary Path Problem

Participants : Sebastien François, Rumen Andonov.

Given a directed graph G=(V,E,l) with weights le0 associated with arcs eE and a set of vertex pairs with distances between them (called distance constraints), the problem is to find an elementary path in G that satisfies a maximum number of distance constraints. We call it Distance-Constrained Elementary Path (DCEP) problem. This problem is motivated by applications in genome assembly. We describe three Mixed Integer Programming (MIP) formulations for this problem and discuss their advantages [25].

Complete Assembly of Circular Genomes Based on Global Optimization

Participants : Sebastien François, Rumen Andonov, Dominique Lavenier.

The goal here is to develop a new methodology and tools based on strong mathematical foundations and novel optimization techniques for solving the genome assembly problem. During the current year we focused on the last two stages of genome assembly, namely scaffolding and gap-filling, and showed that they can be solved as part of a single optimization problem. We obtained this by modeling genome assembly as a problem of finding a simple path in a specific graph that satisfies as many as possible of the distance constraints encoding the insert-size information. We formulated it as a mixed-integer linear programming problem and applied an optimization solver to find the exact solution on a benchmark of chloroplasts. Our tool is called GAT (Genscale Assembly Tool) and we tested it on a set of 33 chloroplast genome data. Comparisons with some of the most popular recent assemblers show that our tool produces assemblies of significantly higher quality than these heuristics [26]. These results fully justify the efforts for designing exact approaches for genome assembly.