مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning Scoring Schemes for Sequence Alignment from Partial Examples

DocumentCode :

747698

Title :

Learning Scoring Schemes for Sequence Alignment from Partial Examples

Author :

Kim, Eagu ; Kececioglu, John

Author_Institution :

Dept. of Comput. Sci., Univ. of Arizona, Tucson, AZ

Volume :

Issue :

fYear :

2008

Firstpage :

546

Lastpage :

556

Abstract :

When aligning biological sequences, the choice of scoring scheme is critical. Even small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to learn parameter values that are appropriate for biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct reference alignments, this is the problem of finding parameter values that make the scores of the reference alignments be as close as possible to those of optimal alignments of their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the reference alignment is not specified, and to an improved formulation based on minimizing the average error between the scores of the reference alignments and the scores of optimal alignments. Experiments on benchmark biological alignments show we can learn scoring schemes that generalize across protein families, and that boost the accuracy of multiple sequence alignment by as much as 25 percent.

Keywords :

biology computing; linear programming; matrix algebra; proteins; biological sequence alignment; inverse parametric sequence alignment; linear programming; protein; reference alignments; substitution score matrices; Analysis of Algorithms and Problem Complexity; Biology and genetics; Linear programming; Pattern matching; Algorithms; Amino Acid Sequence; Artificial Intelligence; Molecular Sequence Data; Pattern Recognition, Automated; Proteins; Sequence Alignment; Sequence Analysis, Protein;

fLanguage :

English

Journal_Title :

Computational Biology and Bioinformatics, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

1545-5963

Type :

jour

DOI :

10.1109/TCBB.2008.57

Filename :

4540089

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=747698