Title :
Centre-based clustering for Y-Short Tandem Repeats (Y-STR) as numerical and categorical data
Author :
Seman, Ali ; Bakar, Z.A. ; Sapawi, Azizian Mohd.
Author_Institution :
Centre for Comput. Sci. Studies, Univ. Teknol. MARA (UiTM), Shah Alam, Malaysia
Abstract :
Centre-based clustering is among the most applicable method for partitioning objects into homogenous groups. This paper presents two Centre-based clustering; K-Means and K-Modes algorithms to investigate and evaluate the clustering results of Y-STR data. The main goal of this paper is to compare the accuracy of clustering Y-STR results for different types of data: numerical and categorical data. The results show that the Y-STR data is more favour to categorical data. The accuracy of the Y-STR, treated as categorical data is 49%, whereas the numerical data is only a 26% chance producing a good clustering result. However, the amount of time taken by numerical data is much better compared to categorical data.
Keywords :
pattern clustering; Y-short tandem repeats; categorical data; centre-based clustering; k-means clustering; k-modes clustering; numerical data; Bioinformatics; Clustering algorithms; Clustering methods; Computer science; DNA; Helium; Partitioning algorithms; Sequences; Centre-based clustering; Y-STR; categorical data; numerical data;
Conference_Titel :
Information Retrieval & Knowledge Management, (CAMP), 2010 International Conference on
Conference_Location :
Shah Alam, Selangor
Print_ISBN :
978-1-4244-5650-5
DOI :
10.1109/INFRKM.2010.5466953