• DocumentCode
    1013022
  • Title

    Motif discoveries in unaligned molecular sequences using self-organizing neural networks

  • Author

    Derong Liu ; DasGupta, B. ; Huaguang Zhang

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Illinois Univ., Chicago, IL, USA
  • Volume
    17
  • Issue
    4
  • fYear
    2006
  • fDate
    7/1/2006 12:00:00 AM
  • Firstpage
    919
  • Lastpage
    928
  • Abstract
    In this paper, we study the problem of motif discoveries in unaligned DNA and protein sequences. The problem of motif identification in DNA and protein sequences has been studied for many years in the literature. Major hurdles at this point include computational complexity and reliability of the search algorithms. We propose a self-organizing neural network structure for solving the problem of motif identification in DNA and protein sequences. Our network contains several layers, with each layer performing classifications at different levels. The top layer divides the input space into a small number of regions and the bottom layer classifies all input patterns into motifs and nonmotif patterns. Depending on the number of input patterns to be classified, several layers between the top layer and the bottom layer are needed to perform intermediate classifications. We maintain a low computational complexity through the use of the layered structure so that each pattern´s classification is performed with respect to a small subspace of the whole input space. Our self-organizing neural network will grow as needed (e.g., when more motif patterns are classified). It will give the same amount of attention to each input pattern and will not omit any potential motif patterns. Finally, simulation results show that our algorithm outperforms existing algorithms in certain aspects. In particular, simulation results show that our algorithm can identify motifs with more mutations than existing algorithms. Our algorithm works well for long DNA sequences as well.
  • Keywords
    DNA; biology computing; computational complexity; molecular biophysics; neural nets; pattern classification; proteins; search problems; computational complexity; input patterns; motif discoveries; motif identification; motif patterns; pattern classification; protein sequence; search algorithms; self-organizing neural networks; unaligned DNA sequence; unaligned molecular sequences; Computational complexity; DNA; Earth; Genetic mutations; Intelligent networks; Neural networks; Organisms; Proteins; RNA; Sequences; DNA sequences; motif finding; neural networks; protein sequences; self-organization; subtle signals;
  • fLanguage
    English
  • Journal_Title
    Neural Networks, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9227
  • Type

    jour

  • DOI
    10.1109/TNN.2006.875987
  • Filename
    1650247