DocumentCode :
2737932
Title :
PIntron: A fast method for gene structure prediction via maximal pairings of a pattern and a text
Author :
Bonizzoni, Paola ; Vedova, Gianluca Della ; Pirola, Yuri ; Rizzi, Raffaella
Author_Institution :
Dip. Inf. Sist. e Comun. (DISCo), Univ. Milano-Bicocca, Milan, Italy
fYear :
2011
fDate :
3-5 Feb. 2011
Firstpage :
33
Lastpage :
39
Abstract :
A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing both accuracy and efficiency in time and space, when large clusters of over than 20,000 ESTs and genes longer than 1Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task. We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output. PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron and it is able to process in a few seconds some critical genes that are not manageable by other gene structure prediction tools. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when compared with ENCODE data.
Keywords :
biology computing; genetics; genomics; graph theory; inference mechanisms; macromolecules; molecular biophysics; molecular configurations; software tools; ENCODE data; EST; PIntron; embedding graph; exons; gene structure prediction; genome; inference; introns; mRNA; maximal pattern-text pairings; software tool; splice site junctions; spliced alignments; transcript sequences; Accuracy; Algorithm design and analysis; Bioinformatics; Genomics; Pipelines; Splicing; alternative splicing; gene structure; maximal pairing; transcript alignment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-61284-851-8
Type :
conf
DOI :
10.1109/ICCABS.2011.5729935
Filename :
5729935
Link To Document :
بازگشت