Title :
Mining Clusters of Sequences Using Extended Sequence Element-Based Similarity Measure
Author_Institution :
Kyonggi Inst. of Technol., Siheung
Abstract :
Computing technologies have enabled the collection of large amounts of complex data in many fields. There has been enormous growth in the amount of commercial and scientific data. Such datasets consist of sequence data that have an inherent sequential nature. In this paper, we study how to cluster these sequence datasets. We propose an extended concept of the measure of similarity. In addition, we propose an effective hierarchical clustering algorithm. Using a splice dataset, we show that the quality of clusters generated by our proposed approach is better than that of clusters produced by traditional clustering algorithms.
Keywords :
data mining; database management systems; complex data; extended sequence element-based similarity measure; hierarchical clustering algorithm; mining clusters; splice dataset; Bioinformatics; Clustering algorithms; Clustering methods; Computer industry; Mining industry; Proteins; Technology management; Transaction databases; Web mining;
Conference_Titel :
Innovative Computing, Information and Control, 2007. ICICIC '07. Second International Conference on
Conference_Location :
Kumamoto
Print_ISBN :
0-7695-2882-1
DOI :
10.1109/ICICIC.2007.387