• DocumentCode
    3230383
  • Title

    Clustering orthologs based on sequence and domain similarities

  • Author

    Zhang, Fa ; Feng, Shengzhong ; Ozer, Hatice ; Yuan, Bo

  • Author_Institution
    Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing
  • fYear
    2005
  • fDate
    1-1 July 2005
  • Lastpage
    651
  • Abstract
    In this paper, we present a fully automatic computational method to cluster orthologs and in-paralogs from multiple species. We use the program Blastp to generate a pairwise distance matrix, which is then normalized for each homologous group between and within the species included. We also used protein domains and their organization in protein sequences as an additional criterion for filtering false relationships. Ortholog clusters are first seeded with multiple reciprocal best pairwise matches, after which the Markov graph-flow algorithm is applied to include in-paralogs. Classification parameters such as the inflation index are optimized according to the functional consistency in each of the clusters. This was inferred by the comparison of ontological annotations available for each of the sequences belonging to the same cluster. We apply our program on six completely sequenced eukaryotic genomes, assigns confidence values for both orthologs and in-paralogs. We note significant improvement for the clustering of orthologs with recent paralogs, comparing our results with similar efforts at NCBI and TIGR. This provides an automatic and robust method to cluster orthologous genes of multiple genomes
  • Keywords
    biology computing; genetics; molecular biophysics; Blastp; Markov graph-flow algorithm; classification parameters; domain similarity; eukaryotic genomes; fully automatic computational method; in-paralogs; inflation index; multiple genomes; orthologous genes; orthologs clustering; pairwise distance matrix; protein domains; protein sequences; sequence similarity; Bioinformatics; Biomedical computing; Biomedical informatics; Computers; Databases; Filtering; Genomics; Phylogeny; Proteins; Systematics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High-Performance Computing in Asia-Pacific Region, 2005. Proceedings. Eighth International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    0-7695-2486-9
  • Type

    conf

  • DOI
    10.1109/HPCASIA.2005.27
  • Filename
    1592336