DocumentCode :
2340211
Title :
Automated learning of genome sequences by computational intelligence
Author :
Yang, Mary Qu ; Yang, Jack Y. ; Zuojie Luo ; Ersoy, Okan K.
Author_Institution :
Purdue Electr. & Comput. Eng. Sch., Purdue Univ., West Lafayette, IN
fYear :
0
fDate :
0-0 0
Abstract :
Advent of high-throughput sequencing technology has led to an exploration of DNA sequence data available. Structures and functions of protein sequence coded for by sequenced genomes remain largely unknown. Automated identification of protein functions and interactions have been largely relying on the known 3D structures or sequence homologues. In particular, intrinsic unstructured or disordered proteins lack specific 3D structures and are unconsented during evolution, but play central roles in diseases characterized by protein misfolding and aggregation. Can we assign protein functions to sequences without relying on 3D structures, to provide useful information for the study of diseases? We developed machine learning techniques to rapidly assess protein functions from sequences. The problem of assigning functional classes to proteins is complicated by the fact that a single protein can participate in several different pathways and thus can have multiple functions (due to complex interactions among proteins). It follows that the instances in the resulting classification problem can carry multiple class labels. We have developed a tree-based classifier that capable of classifying multiply-labeled data and gained an insight into the multi-functional nature of proteins. The algorithm has been used with ensemble methods in connection with other computational intelligence to form a committee machine. Results have been compared favorably to those achieved algorithms such as decision trees and support vector machines
Keywords :
DNA; biology computing; genetics; learning (artificial intelligence); pattern classification; proteins; trees (mathematics); DNA sequence data; automated learning; computational intelligence; disordered proteins; genome sequences; intrinsic unstructured proteins; machine learning; protein functions; protein interactions; tree-based classifier; Bioinformatics; Classification tree analysis; Computational intelligence; DNA; Decision trees; Diseases; Genomics; Machine learning; Protein engineering; Protein sequence; Protein function; classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence Methods and Applications, 2005 ICSC Congress on
Conference_Location :
Istanbul
Print_ISBN :
1-4244-0020-1
Type :
conf
DOI :
10.1109/CIMA.2005.1662321
Filename :
1662321
Link To Document :
بازگشت