Title :
Gene-expression data semi-supervised clustering in Multi-Objective optimization framework
Author :
Alok, Abhay Kumar ; Saha, Simanto ; Ekbal, Asif
Author_Institution :
Comput. Sci. Eng., Indian Inst. of Technol., Patna, Patna, India
Abstract :
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large collection of genes and the complicated biological networks it is hard to study the generated large volume of data which often contains millions of measurements. In general clustering techniques are used to determine natural structures and capture exciting patterns from the given data as a first step of studying the gene expression data. In this paper the problem of gene expression data clustering is formulated as a semi-supervised classification problem. So here semi-supervised clustering is modelled as multiobjective optimization problems. Here five objective functions are used and simultaneously optimized by AMOSA. Among the five objective functions, first four objective functions quantify some unsupervised properties like total symmetry, compactness and separability present in the clusters and last one captures the supervised information. In order to generate the supervised information, Fuzzy C-means algorithm is invoked on the data sets. Based on the highest membership values of data points with respect to different clusters, labeled information are extracted. In each case only 10% class labeled information of data points are randomly selected which act as supervised information in case of semi-supervised clustering. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on three publicly available benchmark gene expression data sets. Results are compared with existing techniques for gene expression data clustering.
Keywords :
bioinformatics; fuzzy set theory; optimisation; pattern classification; pattern clustering; AMOSA; fuzzy C-means algorithm; gene expression data clustering; multiobjective optimization framework; objective functions; semisupervised classification problem; semisupervised clustering; Clustering algorithms; Equations; Gene expression; Indexes; Linear programming; Mathematical model; Optimization; AMOSA; ARI-index; FCM index; Fuzzy C-means; I-index; Multiobjective optimization; Semi-supervised clustering; Silhouette- index; Sym-index; XB-index;
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
Conference_Location :
New Delhi
Print_ISBN :
978-1-4799-3078-4
DOI :
10.1109/ICACCI.2014.6968270