Title of article :
Chemometrics for QSAR with low sequence homology: Mycobacterial promoter sequences recognition with 2D-RNA entropies
Author/Authors :
Elena and Gonzلlez-Dيaz، نويسنده , , Humberto and Pérez-Bello، نويسنده , , Alcides and Cruz-Monteagudo، نويسنده , , Maykel and Gonzلlez-Dيaz، نويسنده , , Yenny and Santana، نويسنده , , Lourdes and Uriarte، نويسنده , , Eugenio، نويسنده ,
Issue Information :
دوفصلنامه با شماره پیاپی سال 2007
Abstract :
Predicting mycobacterial sequences promoter of protein synthesis is important in the study of protein metabolism regulation. This goal is however considered a challenging computational biology task due to low inter-sequences homology. Consequently, a previous work based only on DNA sequence had to use a large input parameter set and multilayered feed-forward ANN architecture trained using the error-back-propagation algorithm to raise an overall accuracy up to 97% [Kalate, et al. 2003. Comput. Biol. Chem. 27, 555–564]. Subsequently, one could expect that a notably simpler model may be derived using parameters based on non-linear structural information. In the present work, a method based on molecular folding negentropies (Θk) is introduced to predict by the first time mycobacterial promoter sequences (mps) from the corresponding RNA secondary structure. The best QSAR equation found was the classification function mps = 4.921 × 0ΘM − 1.205, which recognised 126/135 mps (93.3%) and 100% of 245 control sequences (cs). The model have shown a very high Mathew regression coefficient C = 0.949. Both average overall accuracy and predictability were 97.6%. Additionally, several machine learning algorithms were applied in order to reaffirm the validity of the LDA model from the chemometrics point of view. This linear model with only one parameter (0ΘM) may be considered the simpler reported up-to-date by large, without lack of accuracy (97%) with respect to Kalate et al.ʹs model.
Keywords :
Mycobacterial promoter sequences , RNA secondary structure , Markov models , QSAR , Machine learning algorithms , Information theory , entropy
Journal title :
Chemometrics and Intelligent Laboratory Systems
Journal title :
Chemometrics and Intelligent Laboratory Systems