DocumentCode
3542927
Title
Clustering metagenome fragments using growing self organizing map
Author
Overbeek, Marlinda Vasty ; Kusuma, Wisnu A. ; Buono, Andrea
fYear
2013
fDate
28-29 Sept. 2013
Firstpage
285
Lastpage
289
Abstract
The microorganism samples taken directly from environment are not easy to assemble because they contains mixtures of microorganism. If sample complexity is very high and comes from highly diverse environment, the difficulty of assembling DNA sequences is increasing since the interspecies chimeras can happen. To avoid this problem, in this research, we proposed binning based on composition using unsupervised learning. We employed trinucleotide and tetranucleotide frequency as features and GSOM algorithm as clustering method. GSOM was implemented to map features into high dimension feature space. We tested our method using small microbial community dataset. The quality of cluster was evaluated based on the following parameters : topographic error, quantization error, and error percentage. The evaluation results show that the best cluster can be obtained using GSOM and tetranucleotide.
Keywords
DNA; bioinformatics; feature extraction; genomics; microorganisms; pattern clustering; self-organising feature maps; unsupervised learning; DNA sequence assembling; GSOM algorithm; clustering method; composition based binning; error percentage; feature mapping; feature space; growing self organizing map; interspecies chimeras; metagenome fragment clustering; microbial community dataset; microorganism samples; quantization error; tetranucleotide frequency; topographic error; trinucleotide frequency; unsupervised learning; Bioinformatics; Biological cells; Clustering algorithms; DNA; Genomics; Quantization (signal); Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Computer Science and Information Systems (ICACSIS), 2013 International Conference on
Conference_Location
Bali
Type
conf
DOI
10.1109/ICACSIS.2013.6761590
Filename
6761590
Link To Document