Section: New Results
Navigating through unexplored pre-miRNA candidates
The computational search for novel miRNA precursors often involves some sort of structural analysis with the aim of identifying which type of structures are prone to being recognised and processed by the cellular miRNA-maturation machinery. A natural way to tackle this problem is to perform clustering over the candidate structures along with known miRNA precursor structures. Mixed clusters allows then the identification of candidates that are similar to known precursors. Given the large number of candidate pre-miRNAs that can be identified in single-genome approaches, even after applying several filters for robustness and stability, a conventional structural clustering approach is unfeasible. We presented a method, MinDist , to represent candidate structures in a feature space which summarises key sequence/structure characteristics of each candidate. We demonstrated that proximity in this feature space is related to sequence/structure similarity, and we selected candidates which have a high similarity to known precursors. Additional filtering steps were then applied to further reduce the number of candidates to those with greater transcriptional potential. Our method was compared to another single-genome method (TripletSVM ) in two datasets, showing better performance in one and comparable performance in the other. Additionally, we showed that our approach allows for a better interpretation of the results. The MinDist method is available upon request and will be made available online. This work has been submitted to publication. This work was done in collaboration with A. T. Freitas and R. Backofen.