Title :
An online cluster analysis method for large-scale protein sequences
Author :
Tang, DongMing ; Zhu, Qingxin ; Zhang, YueFei ; Zhang, Jiang
Author_Institution :
Sch. of Comput. Sci. & Eng., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
Abstract :
As modern high-throughput sequencing technologies continue to improve, there is an overwhelming amount of protein sequences un-annotated in the biomedical databases. Clustering protein sequences into homologous groups can help to annotate uncharacterized protein sequences. In this paper, we introduce an online cluster analysis method for large-scale protein sequences based on online clustering algorithms and alignment-free similarity measure for protein sequences, namely, OnlineCAPS. The OnlineCAPS has many advantages, such as the memory requirements and computation cost are very low, the method is fast and enables us to extract clusters from a large scale set of protein sequences, and it can be deployed on the web server, and can perform clustering progress when uploading sequences dataset. The experimental results illustrate the efficiency of the proposed method.
Keywords :
database management systems; medical computing; proteins; OnlineCAPS; biomedical databases; large-scale protein sequences; online cluster analysis method; web server; Algorithm design and analysis; Biomedical engineering; Biomedical measurements; Clustering algorithms; Computer science; Databases; Large-scale systems; Pattern analysis; Proteins; Sequences; Clustering; Online clustering; Pattern recognition; Protein sequences; Sequences analysis;
Conference_Titel :
BioMedical Information Engineering, 2009. FBIE 2009. International Conference on Future
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-4690-2
Electronic_ISBN :
978-1-4244-4692-6
DOI :
10.1109/FBIE.2009.5405808