Title :
Automatic Discovery of Bioluminescent Proteins from Large Protein Databases
Author :
Tao Meng ; Mei-Ling Shyu ; Hua Zhang
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Miami, Coral Gables, FL, USA
Abstract :
Accurate annotation of different protein features becomes increasingly important in enriching gene ontology databases. In this work, we present a framework to predict the bioluminescence of any given protein sequence. Bioluminescent proteins are produced by living organisms and emit light naturally. Bioluminescence is deemed to have different functions in living organisms including camouflage, attraction to prey, communication, etc. In addition, bioluminescent proteins are also widely used as labels in assay development, reporters of gene expression, and imaging agents in biotechnology. Currently, bioluminescent proteins are mainly curated by researchers through experimental analysis, which is a time consuming process. However, the data mining based algorithms provide an efficient way to detect candidate bioluminescent proteins and suggest prioritization of the experimental work. While traditional alignment based algorithms (such as BLAST) show promising results in terms of sequence analysis, it suffers from the limitation that the testing sequence should show homology to the sequences in the available training data sets. In order to overcome such a limitation, our proposed framework uses a set of homology-independent features that are extracted directly from the primary sequences to represent the global physicochemical properties as well as the sequence order characteristics of proteins. In addition, a novel subspace-based data filtering algorithm is proposed to eliminate noise from the training data. One existing framework addressing the same problem was implemented and compared with our proposed framework. The experimental results indicate that our proposed framework shows promising performance. In addition, the proposed framework is generic and could easily be applied to annotations of other protein properties.
Keywords :
biology computing; bioluminescence; data mining; interference suppression; ontologies (artificial intelligence); proteins; accurate annotation; alignment based algorithms; assay development; automatic discovery; bioluminescence; bioluminescent proteins; biotechnology; camouflage; data mining based algorithms; gene expression; gene ontology databases; global physicochemical properties; homology independent feature extraction; imaging agents; large protein databases; living organisms; noise elimination; prey; protein features; protein sequence; sequence analysis; sequence order characteristics; subspace based data filtering algorithm; testing sequence; time consuming process; training data sets; Amino acids; Feature extraction; Protein sequence; Support vector machines; Testing; Training; Bioluminescence; Classification; Lasso; Subspace-based filtering;
Conference_Titel :
Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on
Conference_Location :
Irvine, CA
DOI :
10.1109/ICSC.2013.67