DocumentCode :
2357501
Title :
Order-sensitive clustering for remote homologous protein detection
Author :
Chen, Jin ; Hsu, Wynne ; Lee, Mong Li
Author_Institution :
Sch. of Comput., National Univ. of Singapore, Singapore
fYear :
2003
fDate :
3-5 Nov. 2003
Firstpage :
26
Lastpage :
30
Abstract :
Traditional sequence alignment methods are effective in identifying homologous proteins that are highly similar. However, these approaches do not perform well for remote homologous proteins, that is, proteins whose 3D structures are similar but their sequences are not. Recent biological research reveals that protein sequences contain residues that determine the 3D structure of proteins. In this work, we investigate incorporating this information to aid in the clustering of protein databases. We capture protein residues in the form of patterns with fixed order among them. First, the significant patterns are extracted from the protein sequences. Based on the extracted patterns, we perform sequence mining to generate the order among them. Finally, we adopt a partition-based method to cluster protein sequences using the patterns and order features. Experiments on COG and SCOP40 datasets show that our new approach is able to generate high quality clusters that are similar to those determined manually by the biologists.
Keywords :
biology computing; data mining; pattern clustering; proteins; scientific information systems; sequences; 3D protein structure; COG datasets; SCOP40 datasets; homologous protein identification; order-sensitive clustering; partition-based method; protein database; protein residues; protein sequence clustering; protein sequence pattern extraction; protein sequences; remote homologous protein detection; sequence alignment; sequence mining; Artificial intelligence; Cells (biology); Clustering algorithms; DNA; Data mining; Protein sequence; Spatial databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2003. Proceedings. 15th IEEE International Conference on
ISSN :
1082-3409
Print_ISBN :
0-7695-2038-3
Type :
conf
DOI :
10.1109/TAI.2003.1250166
Filename :
1250166
Link To Document :
بازگشت