• DocumentCode
    547184
  • Title

    Multiple sequence alignment using Hidden Markov model with augmented set based on BLOSUM 80 and its influence on phylogenetic accuracy

  • Author

    Afiahayati ; Hartati, Sri

  • Author_Institution
    Dept. of Comput. Sci. & Electron., Gadjah Mada Univ., Yogyakarta, Indonesia
  • fYear
    2010
  • fDate
    2-3 Aug. 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    The basic tasks in molecular biology data analysis such as multiple sequence alignment (MSA) and phylogenetic tree inference. The quality of the phylogenetic tree depends on the quality of the MSA. Hidden Markov model (HMM) is one of the good methods to produce the MSA, but having sequences with low similarity, this method will produce less optimal MSA. This research works on performing multiple alignments of protein sequences with low similarity using the HMM, which can be used as input and it produces more accurate phylogenetic tree. The research is carried out by building augmented set. The parameters are the number of child sequences and the percentage of mutation applied in child sequence. The mutation process is based on substitution matrix BLOSUM 80. Augmented set used as input into the HMM to obtain the MSA. Baum welch learning algorithm is used to estimate the parameters in HMM. While Viterbi algorithm is used to arrange the alignment from unaligned sequences. The prototype tool is built using Java programming language and utilizing Biojava library. In this research, the accuracy of phylogenetic trees using MSA with augmented set is compared with the MSA without augmented set. There are two phylogenetic tree inference methods used in here. First, neighbour joining is conducted using ClustalX tool. Second, parsimony methods is conducted using Phylip Protpars tool. The data are the amino acid sequences of ribosomes 16S from mitochondria. The accuracy of phylogenetic tree using neighbour joining method increases when the datasets with criteria : the number of sequences and HDS (high diverge sequence) are small enough, and the difference between maximum length and average length of sequences is small enough. While the accuracy of phylogenetic trees using the augmented set and the parsimony method can increase or decrease arbitrarily.
  • Keywords
    biology computing; data analysis; hidden Markov models; molecular biophysics; BLOSUM 80; Biojava library; ClustalX tool; Java programming language; Phylip Protpars tool; Viterbi algorithm; amino acid sequences; augmented set; hidden Markov model; mitochondria; molecular biology data analysis; multiple sequence alignment; parsimony method; phylogenetic accuracy; phylogenetic tree inference; protein sequences; ribosomes 16S; substitution matrix; Multimedia communication; Augmented Set; HMM; Phylogenetic Trees; Ribosomes 16S Protein Sequence;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Framework and Applications (DFmA), 2010 International Conference on
  • Conference_Location
    Yogyakarta
  • Print_ISBN
    978-1-4244-9335-7
  • Type

    conf

  • Filename
    5952343