• DocumentCode
    1987897
  • Title

    On gene prediction by cross-species comparative sequence analysis

  • Author

    Chen, Rong ; Ali, Hesham

  • Author_Institution
    Dept. of Comput. Sci., Nebraska Univ., Omaha, NE, USA
  • fYear
    2003
  • fDate
    11-14 Aug. 2003
  • Firstpage
    446
  • Lastpage
    447
  • Abstract
    Sequencing of large fragments of genomic DNA makes it possible to perform comparisons of genomic sequences for identification of protein-coding regions. We have conducted a comparative analysis of homologous genomic sequences of organisms with different evolutionary distances and determined the degree of conservation of the noncoding regions between closely related organisms. In contrast, more distance shows much less intron similarity but less conservation on the exon structures. Based on this finding and training of data sets, we proposed a model by which coding sequences could be identified by comparing sequences of multiple species, both close and approximately distant. The reliability of the proposed method is evaluated in terms of sensitivity and specificity, and results are compared to those obtained by other popular gene prediction programs. Provided sequences can be found from other species at appropriate evolutionary distances, this approach could be applied in newly sequenced organisms where no species-dependent statistical models are available.
  • Keywords
    DNA; cellular biophysics; evolutionary computation; genetics; molecular biophysics; physiological models; proteins; cross-species comparative sequence analysis; degree of conservation; evolutionary distances; exon structure; gene prediction; genomic DNA; homologous genomic sequences; intron; noncoding regions; protein-coding region identification; species-dependent statistical models; Bioinformatics; DNA; Genomics; Humans; Mice; Organisms; Proteins; Sensitivity and specificity; Sequences; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
  • Print_ISBN
    0-7695-2000-6
  • Type

    conf

  • DOI
    10.1109/CSB.2003.1227366
  • Filename
    1227366