• DocumentCode
    1691955
  • Title

    DNA visual and analytic data mining

  • Author

    Hoffman, Patrick ; Grinstein, Georges ; Marx, Kenneth ; Grosse, Ivo ; Stanley, Eugene

  • Author_Institution
    Inst. for Visualization & Perception Res., Massachusetts Univ., Lowell, MA, USA
  • fYear
    1997
  • Firstpage
    437
  • Lastpage
    441
  • Abstract
    Describes data exploration techniques designed to classify DNA sequences. Several visualization and data mining techniques were used to validate and attempt to discover new methods for distinguishing coding DNA sequences (exons) from non-coding DNA sequences (introns). The goal of the data mining was to see whether some other, possibly non-linear combination of the fundamental position-dependent DNA nucleotide frequency values could be a better predictor than the AMI (average mutual information). We tried many different classification techniques including rule-based classifiers and neural networks. We also used visualization of both the original data and the results of the data mining to help verify patterns and to understand the distinction between the different types of data and classifications. In particular, the visualization helped us develop refinements to neural network classifiers, which have accuracies as high as any known method. Finally, we discuss the interactions between visualization and data mining and suggest an integrated approach.
  • Keywords
    DNA; biology computing; data visualisation; deductive databases; knowledge acquisition; neural nets; pattern classification; sequences; AMI; DNA sequence classification; accuracy; analytic data mining; average mutual information; coding DNA sequences; data exploration techniques; data visualization; exons; fundamental position-dependent DNA nucleotide frequency values; introns; neural network classifiers; noncoding DNA sequences; nonlinear combination; rule-based classifiers; visual data mining; Biological cells; DNA; Data analysis; Data mining; Data visualization; Frequency; Neural networks; Organisms; Proteins; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Visualization '97., Proceedings
  • Conference_Location
    Phoenix, AZ, USA
  • Print_ISBN
    0-8186-8262-0
  • Type

    conf

  • DOI
    10.1109/VISUAL.1997.663916
  • Filename
    663916