• DocumentCode
    2088951
  • Title

    Pairwise Statistical Significance of Local Sequence Alignment Using Substitution Matrices with Sequence-Pair-Specific Distance

  • Author

    Agrawal, Ankit ; Huang, Xiaoqiu

  • Author_Institution
    Dept. of Comput. Sci., Iowa State Univ., Ames, IA, USA
  • fYear
    2008
  • fDate
    17-20 Dec. 2008
  • Firstpage
    94
  • Lastpage
    99
  • Abstract
    Pairwise sequence alignment forms the basis of numerous other applications in bioinformatics. The quality of an alignment is gauged by statistical significance rather than by alignment score alone. Therefore, accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, it was shown that pairwise statistical significance does better in practice than database statistical significance, and also provides quicker individual pairwise estimates of statistical significance without having to perform time-consuming database search. Under an evolutionary model, a substitution matrix can be derived using a rate matrix and a fixed distance. Although the commonly used substitution matrices like BLOSUM62, etc. were not originally derived from a rate matrix under an evolutionary model, the corresponding rate matrices can be back calculated. Many researchers have derived different rate matrices using different methods and data. In this paper, we show that pairwise statistical significance using rate matrices with sequence-pair-specific distance performs significantly better compared to using a fixed distance. Pairwise statistical significance using sequence-pair-specific distanced substitution matrices also outperforms database statistical significance reported by BLAST.
  • Keywords
    bioinformatics; estimation theory; matrix algebra; sequences; statistical databases; bioinformatics; database statistical significance; evolutionary model; fixed distance; pairwise local sequence alignment; pairwise statistical significance estimation; rate matrix; sequence-pair-specific distance; substitution matrix; Application software; Bioinformatics; Computer science; DNA; Information technology; Matrices; Probability; Proteins; Sequences; Spatial databases; Evolutionary distance; Pairwise statistical significance; Rate matrix; Sequence alignment; Substitution matrix;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology, 2008. ICIT '08. International Conference on
  • Conference_Location
    Bhubaneswar
  • Print_ISBN
    978-1-4244-3745-0
  • Type

    conf

  • DOI
    10.1109/ICIT.2008.63
  • Filename
    4731306