Multiple sequence alignment using Hidden Markov model with augmented set based on BLOSUM 80 and its influence on phylogenetic accuracy

Author

Afiahayati ; Hartati, Sri

Author_Institution

Dept. of Comput. Sci. & Electron., Gadjah Mada Univ., Yogyakarta, Indonesia

fYear

2010

fDate

2-3 Aug. 2010

Firstpage

Lastpage

Abstract

The basic tasks in molecular biology data analysis such as multiple sequence alignment (MSA) and phylogenetic tree inference. The quality of the phylogenetic tree depends on the quality of the MSA. Hidden Markov model (HMM) is one of the good methods to produce the MSA, but having sequences with low similarity, this method will produce less optimal MSA. This research works on performing multiple alignments of protein sequences with low similarity using the HMM, which can be used as input and it produces more accurate phylogenetic tree. The research is carried out by building augmented set. The parameters are the number of child sequences and the percentage of mutation applied in child sequence. The mutation process is based on substitution matrix BLOSUM 80. Augmented set used as input into the HMM to obtain the MSA. Baum welch learning algorithm is used to estimate the parameters in HMM. While Viterbi algorithm is used to arrange the alignment from unaligned sequences. The prototype tool is built using Java programming language and utilizing Biojava library. In this research, the accuracy of phylogenetic trees using MSA with augmented set is compared with the MSA without augmented set. There are two phylogenetic tree inference methods used in here. First, neighbour joining is conducted using ClustalX tool. Second, parsimony methods is conducted using Phylip Protpars tool. The data are the amino acid sequences of ribosomes 16S from mitochondria. The accuracy of phylogenetic tree using neighbour joining method increases when the datasets with criteria : the number of sequences and HDS (high diverge sequence) are small enough, and the difference between maximum length and average length of sequences is small enough. While the accuracy of phylogenetic trees using the augmented set and the parsimony method can increase or decrease arbitrarily.

Keywords

biology computing; data analysis; hidden Markov models; molecular biophysics; BLOSUM 80; Biojava library; ClustalX tool; Java programming language; Phylip Protpars tool; Viterbi algorithm; amino acid sequences; augmented set; hidden Markov model; mitochondria; molecular biology data analysis; multiple sequence alignment; parsimony method; phylogenetic accuracy; phylogenetic tree inference; protein sequences; ribosomes 16S; substitution matrix; Multimedia communication; Augmented Set; HMM; Phylogenetic Trees; Ribosomes 16S Protein Sequence;

fLanguage

English

Publisher

ieee

Conference_Titel

Distributed Framework and Applications (DFmA), 2010 International Conference on

Conference_Location

Yogyakarta

Print_ISBN

978-1-4244-9335-7

Type

conf

Filename

5952343

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=547184