Title :
Designing a tighter searching space for pairwise global sequence alignments over multiple scoring systems
Author :
Changjin Hong ; Tewfik, Ahmed H.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Minnesota, Minneapolis, MN, USA
Abstract :
The need to repeat alignments on a pair of amino acid residue sequences to select an appropriate scoring function and detect a significance of score arises in genomics and proteomics. While computing the alignments obtained through a set of typical scoring matrices with the corresponding default gap cost, we observe that many aligned segments are shared with a reference global alignment. We show that parameters extracted from the search for an alignment corresponding to a good scoring system can be used to predict the deviation of alignments computed with respect to different scoring matrices. By training sample pairs of protein sequences from SCOP 1.71 of the ASTRAL database, we build the approximated probability distribution of distance from a node on a reference path to the alignments based on other scoring schemes with respect to the proposed parameters. We show that the overall computational cost to perform alignments using `three´ scoring matrices and the proposed method can be reduced to 11% of a normal Needleman-Wunch global alignment with an average 92% accuracy.
Keywords :
bioinformatics; genomics; organic compounds; search engines; ASTRAL database SCOP 1.71; alignment deviation prediction; alignment search; amino acid residue sequence; default gap cost; genomics; global sequence alignment computation; good scoring system; multiple scoring system; normal Needleman-Wunch global alignment; pairwise global sequence alignment; parameter extraction; parameter-dependent scoring scheme; probability distribution approximation; protein sequence training; proteomics; reference global alignment-shared segment; reference path node-alignment distance; score significance detection; scoring function selection; scoring matrix-dependent sequence alignment computation; scoring scheme-based distance; searching space design; segment alignment; sequence alignment accuracy; sequence alignment-associated computational cost; Accuracy; Bioinformatics; Computational efficiency; Europe; Probability distribution; Proteins; Signal processing; dynamic programming; global sequence alignment; heuristic searching space; scoring matrices;
Conference_Titel :
Signal Processing Conference, 2007 15th European
Conference_Location :
Poznan
Print_ISBN :
978-839-2134-04-6