Title :
Predicting coding region candidates in the DNA sequence based on visualization without training
Author :
Chen, Bo ; Ji, Ping
Author_Institution :
Dept. of Ind. & Syst. Eng., Hong Kong Polytech. Univ., Hong Kong, China
Abstract :
Identifying the protein coding regions in the DNA sequence is an active issue in computational biology. Presently, there are many outstanding methods in predicting the coding regions with extreme high accuracy, after conducting preceding training process. However, the training dependence may reduce adaptability of the methods, particularly for new sequences from unknown organisms with no or small training sets. In this paper, we firstly present a Self Adaptive Spectral Rotation (SASR) approach, which was first introduced in a previous work published in Nucleic Acids Research. This approach is adopted to visualize the Triplet Periodicity (TP) property, which is a simple and universal coding related property. After that, we use a segmentation technique to computationally analyze the visualization and provide a numerical prediction of the coding region candidates in the DNA sequence. This approach does not require any training process, so it can work before any extra information is available, especially is helpful when dealing with new sequences from unknown organisms. Hence, it could be an efficient tool for coding region prediction in the early stage study.
Keywords :
DNA; biology computing; data visualisation; proteins; DNA sequence; computational biology; protein coding region candidate prediction; selfadaptive spectral rotation approach; training dependence; triplet periodicity property visualization; Accuracy; DNA; Encoding; Hidden Markov models; Organisms; Predictive models; Training;
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2011 IEEE Symposium on
Conference_Location :
Paris
Print_ISBN :
978-1-4244-9896-3
DOI :
10.1109/CIBCB.2011.5948454