• DocumentCode
    1797651
  • Title

    Issues on sampling negative examples for predicting prokaryotic promoters

  • Author

    Gusmao, Eduardo G. ; de Souto, Marcilio C. P.

  • Author_Institution
    Med. Sch., Inst. for Biomed. Eng., RWTH Aachen Univ., Aachen, Germany
  • fYear
    2014
  • fDate
    6-11 July 2014
  • Firstpage
    494
  • Lastpage
    501
  • Abstract
    Supervised learning methods have been successfully used to build classifiers for the identification of promoter regions. The classifier is often built from a dataset that has examples of promoter (positive) and non-promoter (negative) regions. Thus, a careful selection of the data used for constructing and evaluating a promoter finding algorithm is a very important issue. In this context, experimentally known promoter regions can be safely assumed to be positive training instances. In contrast, since definite knowledge whether a given region represents a non-promoter is not generally available, negative instances are not straightforward to be obtained. To make the problem more complex, for the case of promoter, there is not a unique definition of what a negative instance is. As a consequence, depending on which definition of non-promoter region one assumed to build the data, such a choice could affect significantly the performance of the classifier and/or yield a biased estimate of the performance. We present an empirical study of the effect of this kind of problem for promoter prediction in E. coli. As far as we are concerned, up to now, there is no such a kind of study for the context of prokaryotic promoter prediction.
  • Keywords
    bioinformatics; learning (artificial intelligence); e coli; negative example sampling; negative instance; nonpromoter regions; positive training instances; prokaryotic promoter prediction; promoter finding algorithm; supervised learning methods; Context; DNA; Encoding; Feature extraction; Genomics; Support vector machines; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), 2014 International Joint Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4799-6627-1
  • Type

    conf

  • DOI
    10.1109/IJCNN.2014.6889557
  • Filename
    6889557