EN FR
EN FR


Section: New Results

Modelling and analysing a network of individuals, or a network of individuals' networks

On unrooted and root-uncertain variants of several well-known phylogenetic network problems Genetic hybridisation is the process individuals from genetically distinct populations that are able to interbreed and this produce a hybrid.

The hybridisation number problem refers to finding the minimum number of hybridisation events necessary to explain conflicts among several evolutionary trees. It requires to embed a set of binary rooted phylogenetic trees into a binary rooted phylogenetic network such that the number of nodes with in-degree two is minimised. However, from a biological point of view accurately inferring the root location in a phylogenetic tree is notoriously difficult and poor root placement can artificially inflate the hybridisation number. To this end, a number of relaxed variants of this problem were studied in [29]. We started by showing that the fundamental problem of determining whether an unrooted phylogenetic network displays (i.e. embeds) an unrooted phylogenetic tree, is NP-hard. On the positive side, we showed that this problem is FPT in reticulation number. In the rooted case, the corresponding FPT result is trivial, but here a more subtle argumentation was required. Next, we showed that the hybridisation number problem for unrooted networks (when given two unrooted trees) is equivalent to the problem of computing the tree bisection and reconnect distance of the two unrooted trees. We then considered the “root uncertain” variant of the hybridisation number. Here we are free to choose the root location in each of a set of unrooted input trees such that the hybridisation number of the resulting rooted trees is minimised. On the negative side, we showed that this problem is APX-hard. On the positive side, we showed that it is FPT in the hybridisation number, via kernelisation, for any number of input trees.

Phylogenetic tree reconciliation. Phylogenetic tree reconciliation consists in a mapping of one tree (usually the symbiont tree) to the other (the host tree) using event-based maximum parsimony. Given a cost model for the events, many optimal reconciliations are however possible. Any further biological interpretation of them must therefore take this into account, making the capacity to enumerate all optimal solutions a crucial point. Indeed, the problem is not just that if we proposed a single solution, there is a good chance we would miss the “true” answer, but also that we would lose the capacity to verify whether there exist some characteristics that are common to enough of the solutions to increase our confidence in the “story” such reconciliation tells of the past.

When the ERABLE team started addressing this issue, only two algorithms existed that attempted such enumeration; in one case (software CoRe-Pa ) not all possible solutions were produced while in the other (software Notung ) not all cost vectors were handled. We then introduced a polynomial-delay algorithm, called Eucalypt , for enumerating all optimal reconciliations, and showed that in general many solutions exist (Donati et al., Algorithms for Molecular Biology, 10(1):11, 2015). Some might not be time-feasible. However, we further showed that, among the many solutions that are usually found, in the majority of the cases, at least some will be time-feasible, and we provided a polynomial algorithm to test for time-feasibility. We also considered a restricted version of the model where host switches are allowed to happen only between species that are within some fixed distance along the host tree. This restriction allows to reduce the number of time-feasible solutions while preserving the same optimal cost, as well as to find time-feasible solutions with a cost close to the optimal in the cases where no time-feasible optimal solution is found.

More recently, we defined two equivalence relations that enable to identify many reconciliations with a single one, thereby reducing their number. These results were published in a paper which was accepted at CIBB 2017 and will appear in the LNCI-LNCS proceedings of the conference (published after CIBB). Extensive experiments indicated that the number of output solutions greatly decreases in general. By how much clearly depends on the constraints that are given as input. An extended journal version of this work that includes its theoretical part will be submitted at the beginning of 2018. Other forms of grouping (or clustering) solutions are also being explored that rely instead on defining a distance between two different reconciliations. Two approaches are being investigated, one in collaboration with a researcher in Italy (paper in preparation), and the other with researchers in the UK (one paper submitted and one in preparation).

Improving the biological realism of coevolutionary models. The host-symbiont coevolutionary models developed so far needed also to be improved. The realism we wished to add to such models was for now the possibility to handle the case of multiple associations of a symbiont. Among the few previous works that allowed for this, all presented some limitation either in terms of the model or of the algorithm developed. Handling such multiple associations requires to introduce an event that was little or not formally considered in the literature. This is the event of spread, which precisely corresponds to the invasion of different hosts by a same symbiont. In this case, as when spreads are not considered, the optimal reconciliations obtained will depend on the choice made for the costs of the events. The need to develop statistical methods to assign the most appropriate ones therefore remained also of actuality. This is one of the problems we addressed in the PhD of Laura Urbini that was defended in October 2017. Two types of spread were in fact introduced: vertical and horizontal. The first corresponds to the case where the evolution of the symbiont “freezes” while the symbiont continues to be associated with a host and with the new species that descend from this host. The second includes both an invasion, of the symbiont which remains with the initial host but at the same time gets associated with (“invades”) another one incomparable with the first, and a double freeze (in relation to the evolution of the host with which it was initially associated and in relation to the evolution of the second one it “invaded”). Two papers addressing distinct aspects related to the spread problem with different approaches are in preparation and will be submitted before the end of 2017 or beginning of 2018.

Estimating the frequency and expansion process of an infection We addressed the question of how often an infection occurs and of whether its expansion reached an equilibrium using as model Wolbachia. Wolbachia is a bacterial genus that infects about half of all arthropods, with diverse and extreme consequences ranging from sex-ratio distortion and mating incompatibilities to protection against viruses. These phenotypic effects, combined with efficient vertical transmission from mothers to offspring, satisfactorily explain the invasion dynamics of Wolbachia within species. However, beyond the species level, the lack of congruence between the host and symbiont phylogenetic trees indicates that Wolbachia horizontal transfers and extinctions do happen and underlie its global distribution.

In [3], we inferred recent acquisition/loss events from the distribution of Wolbachia lineages across the mitochondrial DNA tree of 3600 arthropod specimens, spanning 1100 species from Tahiti and the surrounding islands. We showed that most events occurred within the last million years, but are likely attributable to individual level variation (e.g., imperfect maternal transmission) rather than to population level variation (e.g., Wolbachia extinction). At the population level, we estimated that mitochondria typically accumulate 4.7% substitutions per site during an infected episode, and 7.1% substitutions per site during the uninfected phase. Using a Bayesian time calibration of the mitochondrial tree, these numbers translate into infected and uninfected phases of approximately 7 and 9 million years. Infected species thus lose Wolbachia slightly more often than uninfected species acquire it, supporting the view that its present incidence, estimated here slightly below 0.5, represents an epidemiological equilibrium.