• DocumentCode
    1151602
  • Title

    Ortholog Clustering on a Multipartite Graph

  • Author

    Vashist, Akshay ; Kulikowski, Casimir A. ; Muchnik, Ilya

  • Author_Institution
    Dept. of Comput. Sci., New Jersey State Univ., Piscataway, NJ
  • Volume
    4
  • Issue
    1
  • fYear
    2007
  • Firstpage
    17
  • Lastpage
    27
  • Abstract
    We present a method for automatically extracting groups of orthologous genes from a large set of genomes by a new clustering algorithm on a weighted multipartite graph. The method assigns a score to an arbitrary subset of genes from multiple genomes to assess the orthologous relationships between genes in the subset. This score is computed using sequence similarities between the member genes and the phylogenetic relationship between the corresponding genomes. An ortholog cluster is found as the subset with the highest score, so ortholog clustering is formulated as a combinatorial optimization problem. The algorithm for finding an ortholog cluster runs in time O(|E| + |V| log |V|), where V and E are the sets of vertices and edges, respectively, in the graph. However, if we discretize the similarity scores into a constant number of bins, the runtime improves to O(|E| + |V|). The proposed method was applied to seven complete eukaryote genomes on which the manually curated database of eukaryotic ortholog clusters, KOG, is constructed. A comparison of our results with the manually curated ortholog clusters shows that our clusters are well correlated with the existing clusters
  • Keywords
    biology computing; cellular biophysics; genetics; graph theory; molecular biophysics; molecular configurations; optimisation; combinatorial optimization problem; eukaryote genomes; eukaryotic ortholog clusters KOG; genomes; ortholog clustering; orthologous genes; phylogenetic relationship; sequence similarities; weighted multipartite graph; Bioinformatics; Clustering algorithms; Databases; Evolution (biology); Genetics; Genomics; Organisms; Phylogeny; Runtime; Sequences; Graph-theoretic methods; biology; clustering algorithms; genetics.; Algorithms; Animals; Cluster Analysis; Computational Biology; Databases, Protein; Fungi; Genome; Genomics; Humans; Multigene Family; Plants;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2007.1004
  • Filename
    4104456