Gene Expression Analysis Using Clustering

Author

Dhiraj, Kumar ; Rath, Santanu Kumar ; Pandey, Abhishek

Author_Institution

Dept of Comput. Sci. & Eng., Nat. Inst. of Technol. Rourkela, Rourkela, India

fYear

2009

fDate

11-13 June 2009

Firstpage

1

Lastpage

4

Abstract

Data mining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. In this paper, k-means clustering algorithm has been extensively studied for gene expression analysis. Since our purpose is to demonstrate the effectiveness of the k-means algorithm for a wide variety of data sets, we have chosen two pattern recognition data and thirteen microarray data sets with both overlapping and non-overlapping cluster boundaries, where the number of features/genes ranges from 4 to 7129 and number of sample ranges from 32 to 683. The number of clusters ranges from two to eleven. We use the clustering error rate (or, clustering accuracy) as evaluation metrics to measure the performance of k-means algorithm.

Keywords

data mining; genetics; lab-on-a-chip; medical computing; pattern clustering; biomedical industry; data mining; gene expression analysis; k-means clustering algorithm; microarray data sets; nonoverlapping cluster boundaries; overlapping cluster boundaries; pattern recognition; Breast; Cancer; Clustering algorithms; Clustering methods; Fungi; Gene expression; Iris; Lungs; Partitioning algorithms; Pattern recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Bioinformatics and Biomedical Engineering , 2009. ICBBE 2009. 3rd International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-2901-1

Electronic_ISBN

978-1-4244-2902-8

Type

conf

DOI

10.1109/ICBBE.2009.5162877

Filename

5162877