DocumentCode
1652280
Title
Estimation of protein function with an evolutionary dictionary
Author
Chiba, Shinji ; Sugawara, Ken
Author_Institution
Dept. of Inf. Eng., Sendai Nat. Coll. of Technol., Kitahara, Japan
Volume
1
fYear
2002
Firstpage
315
Lastpage
320
Abstract
Proteins have complicated spatial structure and have chemical and physical functions that originate from the structure. Today no method is available to predict the function accurately from the DNA/Amino acid sequence. Instead, there are some approaches to estimate the functions approximately based on a similarity retrieval of sequences. In this paper, we propose two types of methods for amino acid sequence retrieval by an evolutionary dictionary. One is based on homology retrieval. Introduction of the compression by evolutionary dictionary technique enables us to describe the text data as an n-dimensional vector using n dictionaries which are generated by compressing n typical texts, and it also enables us to classify them based on their sequential similarity. The other is based on motif retrieval. As there are some common arrangements in functionally similar amino acid sequences, we can make a "dictionary" which is specific to the group. In this method, we introduce a genetic algorithm and refine the dictionary. Effectiveness of our proposal is examined using real genome data
Keywords
DNA; data compression; genetic algorithms; proteins; amino acid sequence retrieval; data compression; evolutionary dictionary; genetic algorithm; homology retrieval; motif retrieval; n-dimensional vector; physical functions; protein function estimation; real genome data; similarity retrieval; spatial structure; text data; Amino acids; Bioinformatics; Chemicals; DNA; Dictionaries; Genetic algorithms; Genomics; Proposals; Proteins; Sequences;
fLanguage
English
Publisher
ieee
Conference_Titel
Evolutionary Computation, 2002. CEC '02. Proceedings of the 2002 Congress on
Conference_Location
Honolulu, HI
Print_ISBN
0-7803-7282-4
Type
conf
DOI
10.1109/CEC.2002.1006253
Filename
1006253
Link To Document