• DocumentCode
    165952
  • Title

    Gene-expression data semi-supervised clustering in Multi-Objective optimization framework

  • Author

    Alok, Abhay Kumar ; Saha, Simanto ; Ekbal, Asif

  • Author_Institution
    Comput. Sci. Eng., Indian Inst. of Technol., Patna, Patna, India
  • fYear
    2014
  • fDate
    24-27 Sept. 2014
  • Firstpage
    1081
  • Lastpage
    1086
  • Abstract
    Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large collection of genes and the complicated biological networks it is hard to study the generated large volume of data which often contains millions of measurements. In general clustering techniques are used to determine natural structures and capture exciting patterns from the given data as a first step of studying the gene expression data. In this paper the problem of gene expression data clustering is formulated as a semi-supervised classification problem. So here semi-supervised clustering is modelled as multiobjective optimization problems. Here five objective functions are used and simultaneously optimized by AMOSA. Among the five objective functions, first four objective functions quantify some unsupervised properties like total symmetry, compactness and separability present in the clusters and last one captures the supervised information. In order to generate the supervised information, Fuzzy C-means algorithm is invoked on the data sets. Based on the highest membership values of data points with respect to different clusters, labeled information are extracted. In each case only 10% class labeled information of data points are randomly selected which act as supervised information in case of semi-supervised clustering. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on three publicly available benchmark gene expression data sets. Results are compared with existing techniques for gene expression data clustering.
  • Keywords
    bioinformatics; fuzzy set theory; optimisation; pattern classification; pattern clustering; AMOSA; fuzzy C-means algorithm; gene expression data clustering; multiobjective optimization framework; objective functions; semisupervised classification problem; semisupervised clustering; Clustering algorithms; Equations; Gene expression; Indexes; Linear programming; Mathematical model; Optimization; AMOSA; ARI-index; FCM index; Fuzzy C-means; I-index; Multiobjective optimization; Semi-supervised clustering; Silhouette- index; Sym-index; XB-index;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advances in Computing, Communications and Informatics (ICACCI, 2014 International Conference on
  • Conference_Location
    New Delhi
  • Print_ISBN
    978-1-4799-3078-4
  • Type

    conf

  • DOI
    10.1109/ICACCI.2014.6968270
  • Filename
    6968270