DocumentCode :
703719
Title :
Simultaneous feature selection and semi-supervised clustering for gene-expression data
Author :
Alok, Abhay Kumar ; Saha, Sriparna ; Ekbal, Asif ; Kanekar, Neha
Author_Institution :
Comput. Sci. Eng., Indian Inst. of Technol. Patna, Patna, India
fYear :
2015
fDate :
19-21 Feb. 2015
Firstpage :
1
Lastpage :
5
Abstract :
In this paper, a new multiobjective optimization based technique is developed for simultaneous feature selection and semi-supervised clustering. Thereafter the proposed technique is applied for solving the problem of classifying gene expression data. Here a modern simulated annealing based multiobjective optimization technique namely AMOSA is utilized as the background optimization methodology. Features and cluster centers are represented in the form of a string. Based on the available features and the cluster centers, genes belonging to different clusters are assigned based on point symmetry distance. Four objective functions are simultaneously optimized by AMOSA to obtain the appropriate partitioning. First two cluster validity indices are symmetry distance based Sym-index and the Euclidean distance based XB-index, which are based on some unsupervised properties. Third one is a supervised information based cluster validity index, Minkowski index and last one is a function counting the number of features. For generating the supervised information, initially Fuzzy C-mean clustering technique is applied on the given gene expression data set. Thereafter based on the highest membership values of the data points to their respective clusters, randomly 10% data points with their class labels are chosen for measuring external validity index, MS Index. The proposed technique is applied on some publicly available gene-expression data sets. Results are compared with the existing techniques of gene expression data clustering.
Keywords :
bioinformatics; feature selection; genetics; learning (artificial intelligence); pattern classification; pattern clustering; simulated annealing; AMOSA; Euclidean distance; MS Index; Minkowski index; Sym-index; XB-index; background optimization methodology; cluster validity indices; fuzzy C-mean clustering; gene-expression data; objective functions; point symmetry distance; semisupervised clustering; simulated annealing; simultaneous feature selection; supervised information; symmetry distance; Clustering algorithms; Euclidean distance; Gene expression; Indexes; Linear programming; Simulated annealing; AMOSA; Multiobjective optimization; feature selection; gene expression data clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Signal Processing, Informatics, Communication and Energy Systems (SPICES), 2015 IEEE International Conference on
Conference_Location :
Kozhikode
Type :
conf
DOI :
10.1109/SPICES.2015.7091467
Filename :
7091467
Link To Document :
بازگشت