Title :
Classification and function estimation of protein by using data compression and genetic algorithms
Author :
Chiba, Shinji ; Sugawara, Ken ; Watanabe, Toshinori
Author_Institution :
Sendai Nat. Coll. of Technol., Japan
Abstract :
Protein has a complicated spatial structure and has chemical and physical functions that originate from the structure. It is important to predict the structure and functions of the protein from a DNA sequence or an amino acid sequence from a viewpoint of biology, medical science, protein engineering and so on. But at present no method is available to predict them accurately from the sequence. Instead, there are some approaches to estimate the functions approximately based on a similarity retrieval of sequences. We propose a new method for similarity retrieval of amino acid sequence based on the concept of homology retrieval using data compression. Introduction of the compression by dictionary technique enables us to describe the text data as an n-dimensional vector using n dictionaries which are generated by compressing n typical texts, and it also enables us to classify them based on their similarity. To classify the data clearly, it is effective to use ideal character strings as dictionaries. In this paper, we introduce genetic algorithm for dictionary generation and classify the amino acid sequences. Effectiveness of our proposal is examined using real genome data
Keywords :
biocomputing; data compression; genetic algorithms; proteins; DNA sequence; amino acid sequence; biology; data compression; function classification; function estimation; genetic algorithms; homology retrieval; ideal character strings; medical science; n-dimensional vector; protein; protein engineering; similarity retrieval; spatial structure; Amino acids; Biology; Chemicals; DNA; Data compression; Dictionaries; Genetic algorithms; Information retrieval; Protein engineering; Sequences;
Conference_Titel :
Evolutionary Computation, 2001. Proceedings of the 2001 Congress on
Conference_Location :
Seoul
Print_ISBN :
0-7803-6657-3
DOI :
10.1109/CEC.2001.934277