• DocumentCode
    1652280
  • Title

    Estimation of protein function with an evolutionary dictionary

  • Author

    Chiba, Shinji ; Sugawara, Ken

  • Author_Institution
    Dept. of Inf. Eng., Sendai Nat. Coll. of Technol., Kitahara, Japan
  • Volume
    1
  • fYear
    2002
  • Firstpage
    315
  • Lastpage
    320
  • Abstract
    Proteins have complicated spatial structure and have chemical and physical functions that originate from the structure. Today no method is available to predict the function accurately from the DNA/Amino acid sequence. Instead, there are some approaches to estimate the functions approximately based on a similarity retrieval of sequences. In this paper, we propose two types of methods for amino acid sequence retrieval by an evolutionary dictionary. One is based on homology retrieval. Introduction of the compression by evolutionary dictionary technique enables us to describe the text data as an n-dimensional vector using n dictionaries which are generated by compressing n typical texts, and it also enables us to classify them based on their sequential similarity. The other is based on motif retrieval. As there are some common arrangements in functionally similar amino acid sequences, we can make a "dictionary" which is specific to the group. In this method, we introduce a genetic algorithm and refine the dictionary. Effectiveness of our proposal is examined using real genome data
  • Keywords
    DNA; data compression; genetic algorithms; proteins; amino acid sequence retrieval; data compression; evolutionary dictionary; genetic algorithm; homology retrieval; motif retrieval; n-dimensional vector; physical functions; protein function estimation; real genome data; similarity retrieval; spatial structure; text data; Amino acids; Bioinformatics; Chemicals; DNA; Dictionaries; Genetic algorithms; Genomics; Proposals; Proteins; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation, 2002. CEC '02. Proceedings of the 2002 Congress on
  • Conference_Location
    Honolulu, HI
  • Print_ISBN
    0-7803-7282-4
  • Type

    conf

  • DOI
    10.1109/CEC.2002.1006253
  • Filename
    1006253