DocumentCode :
1622950
Title :
Training neural networks to identify coding regions in genomic DNA
Author :
Roberts, L. ; Steele, N. ; Reeves, C. ; King, G.J.
Author_Institution :
Coventry Univ., UK
fYear :
1995
Firstpage :
399
Lastpage :
403
Abstract :
The four nitrogenous bases of DNA spell out the recipes from which proteins are made. A gene typically contains five thousand or so bases but often only a small percentage of these are protein coding. Computer based prediction systems are increasingly relied upon as submissions to the major genetic databases are growing exponentially. Several systems exist to locate coding regions (exons) and noncoding regions (introns) within genomic DNA; the common models used are neural networks and Markov chains (M. Borodovsky and J. McIninch (1993), A. Krogh et al. (1994). One of the most successful programs is called GRAIL. Currently, two versions of GRAIL are available: GRAIL-I (E. Uberbacher and R. Mural (1991), and GRAIL-II (Y. Xu et al. (1994). In GRAIL-I, a neural network receives its inputs from seven statistical measures taken on a 99 base window. Performance is improved in GRAIL-II by the addition of variable length windows, neural nets trained to locate intron/exon boundaries, and a number of steps designed to evaluate candidate exons and eliminate improbable ones. Both versions of GRAIL predict coding regions in human DNA. A simulation of GRAIL-I was carried out with the goal of improving classification performance without resorting to the additional measures used in GRAIL-II. The intention was then to supplement the resulting module with modules based on physiochemical measures of DNA (such as melting profiles, twist and wedge angles) to enable precise exon prediction in plant sequences
Keywords :
DNA; biology computing; genetics; learning (artificial intelligence); neural nets; proteins; GRAIL; GRAIL-I; GRAIL-II; Markov chains; classification performance; coding region identification; computer based prediction systems; exons; genomic DNA; human DNA; intron/exon boundaries; introns; neural network training; nitrogenous bases; physiochemical measures; plant sequences; precise exon prediction; statistical measures; variable length windows;
fLanguage :
English
Publisher :
iet
Conference_Titel :
Artificial Neural Networks, 1995., Fourth International Conference on
Conference_Location :
Cambridge
Print_ISBN :
0-85296-641-5
Type :
conf
DOI :
10.1049/cp:19950589
Filename :
497852
Link To Document :
بازگشت