• DocumentCode
    507363
  • Title

    Distinguishing Coding from Non-coding Sequences in a Prokaryote Complete Genome Based on the Global Descriptor

  • Author

    Han, Guo-Sheng ; Yu, Zu-Guo ; Anh, Vo ; Chan, Raymond H.

  • Author_Institution
    Sch. of Math. & Comput. Sci., Xiangtan Univ., Xiangtan, China
  • Volume
    5
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    42
  • Lastpage
    46
  • Abstract
    Recognition of coding sequences in a complete genome is animportant problem in DNA sequence analysis. Their rapid and accurate recognition contributes to various relevant research and application. In this paper, we aim to distinguish the coding sequences from the non-coding sequences in a prokaryote complete genome. We select a data set of 51 available bacterial genomes. Then, we use the global descriptor method on the coding/non-coding primary sequences and obtain 36 parameters for each coding/non-coding primary sequence. These parameters are used to generate some spaces, whose points represent coding/non-coding sequences in our selected data set. In order to evaluate this method, we perform Fisher´s linear discriminant algorithm on it and get relative satisfactory discriminant accuracies. The average accuracies of the global descriptor method (36 parameters) for the training and test sets are 97.81% and 97.49%, respectively. Finally, a comparison with Z curve methods using the same data set is undertaken. When we combine our method with the Z curve method, higher accuracies are obtained. This good performance indicates that the global descriptor method of this paper may complement the existing methods for the gene finding problem.
  • Keywords
    DNA; encoding; image recognition; microorganisms; DNA sequence analysis; Prokaryote complete genome; Z curve methods; accurate recognition contributes; available bacterial genomes; coding primary sequences; distinguishing coding; fishers linear discriminant algorithm; global descriptor method; non coding primary sequence; non coding sequences; noncoding sequences; recognition coding sequences; relative satisfactory discriminant accuracies; relevant research application; selected data set; Archaea; Bioinformatics; DNA; Genomics; Linear discriminant analysis; Mathematics; Microorganisms; Proteins; Sequences; Testing; coding/noncoding DNA; global descriptor; prokaryote genome;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3735-1
  • Type

    conf

  • DOI
    10.1109/FSKD.2009.248
  • Filename
    5360662