Section:
New Results
Read against read comparison for
Nanopore data
In the team, we developed two years ago seeds with errors, which allow
to find all common approximate patterns with a limited number of errors. The idea behind these seeds, called
seeds, is to divide the sequence in blocks so that the distribution
of errors is no longer random. This year, we have used these seeds
in the context of long reads analysis. With this data,
reads against
reads comparison suffers from a high loss of sensitivity,
because the single read error-rate is already high.
Our application case is the detection of adapter sequences
in ONT sequencing. We have shown that the use of these seeds
instead of exact -mers allowed a more accurate reconstruction of the sequences of the
adapters.
The method takes two steps: first the identification of -mers
potentially composing the adapter using a counting approach that
takes into account errors in the read, and then the reconstruction of the complete sequence of the adapter
with a greedy algorithm. Our results show that the seeds with errors allow to obtain
accurate consensus sequences for more 80% of the samples, compared to 40% with the
usual -mer approach. This work was done within the ANR ASTER during the first year of
the thesis of Quentin Bonenfant and was presented at the national
workshop Seqbio 2018.