• DocumentCode
    522824
  • Title

    Nearest neighbor training of side effect machines for sequence classification

  • Author

    Ashlock, Daniel ; McEachern, Andrew

  • Author_Institution
    Dept. of Math. & Stat., Univ. of Guelph, Guelph, ON, Canada
  • fYear
    2010
  • fDate
    2-5 May 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Side effect machines operate by associating side effects with the states of a finite state machine. The use of side effect machines permits the researcher to leverage information stored in the state transition structure, making machines that might be identical as recognizers behave differently as classifiers. The side effect machines in this study associate a counter with each state so that the number of times each state is visited becomes a numerical feature associated with each state. The key to effective use of these numerical feature is to locate side effect machines for which the count vectors are good feature sets. In this study side effect machines are selected with an evolutionary algorithm. The Rand index of nearest neighbor classification of the count vectors serves as the fitness function for selecting side effect machines. A parameter study is performed on simple synthetic data and then side effect machines are trained to classify two sets of biological sequences. The first set comprises two categories of HLA sequences from the human major histocompatibility complex. The second are positive and negative examples of human endogenous retroviral sequences taken from the human genome. The retroviral sequences are challenging but good results are obtained. The HLA data is classified with complete accuracy.
  • Keywords
    biology computing; evolutionary computation; finite state machines; genomics; learning (artificial intelligence); pattern classification; Rand index; biological sequences; count vectors; evolutionary algorithm; finite state machine; fitness function; human endogenous retroviral sequences; human genome; human major histocompatibility complex; nearest neighbor training; sequence classification; side effect machines; state transition structure; synthetic data; Automata; Bioinformatics; Clustering algorithms; DNA; Evolution (biology); Evolutionary computation; Genomics; Humans; Nearest neighbor searches; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
  • Conference_Location
    Montreal, QC
  • Print_ISBN
    978-1-4244-6766-2
  • Type

    conf

  • DOI
    10.1109/CIBCB.2010.5510426
  • Filename
    5510426