DocumentCode :
1312556
Title :
The Impact of Normalization and Phylogenetic Information on Estimating the Distance for Metagenomes
Author :
Chien-Hao Su ; Tse-Yi Wang ; Ming-Tsung Hsu ; Weng, F.C.-H. ; Cheng-Yan Kao ; Daryi Wang ; Huai-Kuang Tsai
Author_Institution :
Inst. of Inf. Sci., Acad. Sinica, Taipei, Taiwan
Volume :
9
Issue :
2
fYear :
2012
Firstpage :
619
Lastpage :
628
Abstract :
Metagenomics enables the study of unculturable microorganisms in different environments directly. Discriminating between the compositional differences of metagenomes is an important and challenging problem. Several distance functions have been proposed to estimate the differences based on functional profiles or taxonomic distributions; however, the strengths and limitations of such functions are still unclear. Initially, we analyzed three well-known distance functions and found very little difference between them in the clustering of samples. This motivated us to incorporate suitable normalizations and phylogenetic information into the functions so that we could cluster samples from both real and synthetic data sets. The results indicate significant improvement in sample clustering over that derived by rank-based normalization with phylogenetic information, regardless of whether the samples are from real or synthetic microbiomes. Furthermore, our findings suggest that considering suitable normalizations and phylogenetic information is essential when designing distance functions for estimating the differences between metagenomes. We conclude that incorporating rank-based normalization with phylogenetic information into the distance functions helps achieve reliable clustering results.
Keywords :
bioinformatics; evolution (biological); genetics; genomics; microorganisms; distance estimation; metagenomes; metagenomics; ortaxonomic distributions; phylogenetic information; rank-based normalization; sample clustering; synthetic data sets; synthetic microbiomes; unculturable microorganisms; Accuracy; Bioinformatics; Communities; Computational biology; Correlation; Phylogeny; Reliability; Metagenomics; clustering.; distance functions; normalization; phylogenetic information; Cluster Analysis; Databases, Genetic; Environmental Microbiology; Metagenome; Metagenomics; Microbiota; Models, Genetic; Phylogeny;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2011.111
Filename :
6007126
Link To Document :
بازگشت