• DocumentCode
    431002
  • Title

    A modified Kohonen network for DNA splice junction classification

  • Author

    Naenna, Thanakorn ; Bress, Robert A. ; Embrechts, Mark J.

  • Author_Institution
    Dept. of Ind. Eng., Mahidol Univ., Nakornpathom, Thailand
  • Volume
    B
  • fYear
    2004
  • fDate
    21-24 Nov. 2004
  • Firstpage
    215
  • Abstract
    This paper describes an application of Kohonen network, self-organizing maps (SOMs), for exon/intron classification in DNA using windowed splice junction data. Splice junctions are groups of nucleotides that serve as boundaries between sections of DNA that code for genetic material and sections that do not. Genes are often interrupted by sections of noncoding DNA sequences. The data used for this study is human DNA data taken from the National Center for Bioinformatics Information (http://www.ncbi.nih.gov/). The DNA dataset contains 1,424 DNA sequences with 128 descriptors for each sequence. SOMs were used to classify each DNA sequence into three categories that are sequences that transition from gene (exon) to nongene (intron), nongene (intron) to gene (exon), and no transition categories where the two-basepair code for the splice junction was coincidental. The multidimensional sequences are clustered into a two-dimensional space that was graphically displayed for data exploration and classification. Visual and graphical capabilities of SOMs are applied to classify the DNA dataset. The topographic properties of SOMs preserve similar sequences close to each other on the output map. Clusters of the dataset are determined and labeled based on the classes of the output neuron in the cluster. The highest frequency classes mapped on the output neuron are labeled as the classes of the output neurons.
  • Keywords
    DNA; biology computing; biotechnology; genetics; pattern classification; self-organising feature maps; DNA sequences; DNA splice junction classification; Kohonen network; bioinformatics; gene; neurons; self-organizing maps; Bioinformatics; Biological materials; DNA; Frequency; Genetics; Humans; Multidimensional systems; Neurons; Self organizing feature maps; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    TENCON 2004. 2004 IEEE Region 10 Conference
  • Print_ISBN
    0-7803-8560-8
  • Type

    conf

  • DOI
    10.1109/TENCON.2004.1414570
  • Filename
    1414570