• DocumentCode
    2580678
  • Title

    Detecting homogeneity in protein sequence clusters for automatic functional annotation and noise detection

  • Author

    Chen, Chien-Yu

  • Author_Institution
    Graduate Sch. of Biotechnol. & Bioinformatics, Yuan Ze Univ., Chung-Li, Taiwan
  • fYear
    2005
  • fDate
    15-16 Aug. 2005
  • Abstract
    Protein sequence clustering is a process that aims to identify sets of homologous proteins in a protein database (Kriventseva et al., 2001). The information derived from protein sequence clustering is then widely used for further analysis such as protein family discovery, function prediction, and database compression. For some applications of protein sequence clustering, it is highly desirable that a hierarchical structure, also referred to as dendrogram, which shows how proteins are clustered at various levels is generated. However, it is not an easy task to decide the boundary of natural clusters that correspond to protein families. According to our previous studies, the weighted average precision of the homogeneous clusters in the hierarchy of 41.0 Swiss-Prot database is 98.5% (Chen et al., 2004). Our experimental results show that there are 2158 protein families getting its best matching rate on a homogeneous cluster, among which the biggest one contains 293 proteins. This result shows that many protein families possess the homogeneity property on their sequences. Those 2158 best matched clusters deliver a weighted average precision of 97.34% and a weighted average recall of 91.41%.
  • Keywords
    biology computing; pattern clustering; proteins; sequences; automatic functional annotation; automatic noise detection; database compression; dendrogram; function prediction; homologous proteins; protein database; protein family discovery; protein sequence clustering; Bioinformatics; Biotechnology; Clustering algorithms; Educational institutions; Gaussian distribution; Information analysis; Protein sequence; Spatial databases; Statistical distributions; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Emerging Information Technology Conference, 2005.
  • Print_ISBN
    0-7803-9328-7
  • Type

    conf

  • DOI
    10.1109/EITC.2005.1544342
  • Filename
    1544342