Title :
CM-HMM: Inter-residue contact and HMM-profiles based enzyme subfamily prediction and structure analysis
Author :
Taewijit, Siriwon ; Waiyamai, Kitsana
Author_Institution :
Dept. of Comput. Eng., Kasetsart Univ., Bangkok, Thailand
Abstract :
Enzyme family prediction is extensively used to identify new family members. It is well known that enzyme function is strongly related to its structure. In this work, we proposed a novel approach, CM-HMM, to predict enzyme subfamily which yielded high accuracy when sequence similarity is less than 30%. Moreover, it provided descriptive information for 3-dimensional analysis. Our method used information from residue contact map, representative matrix of a contact between 2 residues on 3-dimensional domain which implies a contact between secondary structures, and HMM-profiles. In addition, binding or catalytic site regions were used to improve accuracy. Therefore only proteins having PDB structure can be used to be representative feature of protein sequence. The challenge was increased when we found that only 11.21% of standard dataset have PDB structures. Empirical results using a support vector machine showed that CM-HMM yielded 73.71% accuracy by jackknife test and 79.24% accuracy for an independent dataset test. Furthermore, the result can exhibit approximate protein structure based on contact map. Each structure which is laid on each position of contact map is directly related to its function. This information can help researcher understand the underlying of unknown enzyme. These suggest that CM-HMM´s feature extraction from contact map and HMM-profiles are sufficiently significant for enzyme subfamily classification.
Keywords :
biochemistry; bioinformatics; catalysis; enzymes; feature extraction; pattern classification; proteomics; support vector machines; 3-dimensional analysis; HMM-profiles; approximate protein structure; binding site regions; catalytic site regions; enzyme subfamily classification; enzyme subfamily prediction; enzyme subfamily structure analysis; feature extraction; jackknife test; representative matrix; residue contact map; secondary structures; sequence similarity; support vector machine; Accuracy; Databases; Feature extraction; Hidden Markov models; Proteins; Support vector machine classification; HMM-profiles; enzyme subfamily prediction; inter-residue contact; protein contact map; support vector machine;
Conference_Titel :
Cognitive Informatics (ICCI), 2010 9th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8041-8
DOI :
10.1109/COGINF.2010.5599792