Title :
A machine learning framework for trait based genomics
Author :
Zhang, Wei ; Zen, Erliang ; Liu, Dan ; Jones, Stuart ; Emrich, Scott
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Notre Dame, South Bend, IN, USA
Abstract :
Microbial communities perform many important ecological functions across a wide range of natural and man-made environments. Recently, the utility of trait based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. In this paper, we proposed a machine learning framework to quantitatively link the genotype with functional traits. Genes from bacteria genomes belonging to different functional trait groups were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. We focused on a binary functional trait in this paper, but plan to extend our approach to continuous functional traits in the future. Experimental results demonstrated that functional trait related genes can be detected using our method.
Keywords :
bioinformatics; cellular biophysics; data mining; feature extraction; genomics; learning (artificial intelligence); microorganisms; text analysis; COG; Cluster of Orthologs; TF-IDF technique; bacteria genomes; binary functional trait; feature selection method; functional trait groups; functional trait related genes; machine learning framework; microbial community; text mining domain; trait based genomics; Accuracy; Bioinformatics; Communities; Genomics; Microorganisms; Support vector machines; Ecology; Feature Selection; Functional Trait; Machine Learning; Microbial communities; Ortholog; Sequencing;
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2012 IEEE 2nd International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4673-1320-9
Electronic_ISBN :
978-1-4673-1319-3
DOI :
10.1109/ICCABS.2012.6182648