• DocumentCode
    3542927
  • Title

    Clustering metagenome fragments using growing self organizing map

  • Author

    Overbeek, Marlinda Vasty ; Kusuma, Wisnu A. ; Buono, Andrea

  • fYear
    2013
  • fDate
    28-29 Sept. 2013
  • Firstpage
    285
  • Lastpage
    289
  • Abstract
    The microorganism samples taken directly from environment are not easy to assemble because they contains mixtures of microorganism. If sample complexity is very high and comes from highly diverse environment, the difficulty of assembling DNA sequences is increasing since the interspecies chimeras can happen. To avoid this problem, in this research, we proposed binning based on composition using unsupervised learning. We employed trinucleotide and tetranucleotide frequency as features and GSOM algorithm as clustering method. GSOM was implemented to map features into high dimension feature space. We tested our method using small microbial community dataset. The quality of cluster was evaluated based on the following parameters : topographic error, quantization error, and error percentage. The evaluation results show that the best cluster can be obtained using GSOM and tetranucleotide.
  • Keywords
    DNA; bioinformatics; feature extraction; genomics; microorganisms; pattern clustering; self-organising feature maps; unsupervised learning; DNA sequence assembling; GSOM algorithm; clustering method; composition based binning; error percentage; feature mapping; feature space; growing self organizing map; interspecies chimeras; metagenome fragment clustering; microbial community dataset; microorganism samples; quantization error; tetranucleotide frequency; topographic error; trinucleotide frequency; unsupervised learning; Bioinformatics; Biological cells; Clustering algorithms; DNA; Genomics; Quantization (signal); Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computer Science and Information Systems (ICACSIS), 2013 International Conference on
  • Conference_Location
    Bali
  • Type

    conf

  • DOI
    10.1109/ICACSIS.2013.6761590
  • Filename
    6761590