Title :
Clustering metagenome fragments using growing self organizing map
Author :
Overbeek, Marlinda Vasty ; Kusuma, Wisnu A. ; Buono, Andrea
Abstract :
The microorganism samples taken directly from environment are not easy to assemble because they contains mixtures of microorganism. If sample complexity is very high and comes from highly diverse environment, the difficulty of assembling DNA sequences is increasing since the interspecies chimeras can happen. To avoid this problem, in this research, we proposed binning based on composition using unsupervised learning. We employed trinucleotide and tetranucleotide frequency as features and GSOM algorithm as clustering method. GSOM was implemented to map features into high dimension feature space. We tested our method using small microbial community dataset. The quality of cluster was evaluated based on the following parameters : topographic error, quantization error, and error percentage. The evaluation results show that the best cluster can be obtained using GSOM and tetranucleotide.
Keywords :
DNA; bioinformatics; feature extraction; genomics; microorganisms; pattern clustering; self-organising feature maps; unsupervised learning; DNA sequence assembling; GSOM algorithm; clustering method; composition based binning; error percentage; feature mapping; feature space; growing self organizing map; interspecies chimeras; metagenome fragment clustering; microbial community dataset; microorganism samples; quantization error; tetranucleotide frequency; topographic error; trinucleotide frequency; unsupervised learning; Bioinformatics; Biological cells; Clustering algorithms; DNA; Genomics; Quantization (signal); Vectors;
Conference_Titel :
Advanced Computer Science and Information Systems (ICACSIS), 2013 International Conference on
Conference_Location :
Bali
DOI :
10.1109/ICACSIS.2013.6761590