• DocumentCode
    863552
  • Title

    f -Information Measures for Efficient Selection of Discriminative Genes From Microarray Data

  • Author

    Maji, Pradipta

  • Author_Institution
    Machine Intell. Unit, Indian Stat. Inst., Kolkata
  • Volume
    56
  • Issue
    4
  • fYear
    2009
  • fDate
    4/1/2009 12:00:00 AM
  • Firstpage
    1063
  • Lastpage
    1069
  • Abstract
    Among the great amount of genes presented in microarray gene expression data, only a small fraction is effective for performing a certain diagnostic test. In this regard, mutual information has been shown to be successful for selecting a set of relevant and nonredundant genes from microarray data. However, information theory offers many more measures such as the f-information measures that may be suitable for selection of genes from microarray gene expression data. This paper presents different f-information measures as the evaluation criteria for gene selection problem. To compute the gene-gene redundancy (respectively, gene-class relevance), these information measures calculate the divergence of the joint distribution of two genes´ expression values (respectively, the expression values of a gene and the class labels of samples) from the joint distribution when two genes (respectively, the gene and class label) are considered to be completely independent. The performance of different f-information measures is compared with that of the mutual information based on the predictive accuracy of naive Bayes classifier, K -nearest neighbor rule, and support vector machine. An important finding is that some f-information measures are shown to be effective for selecting relevant and nonredundant genes from microarray data. The effectiveness of different f-information measures, along with a comparison with mutual information, is demonstrated on breast cancer, leukemia, and colon cancer datasets. While some f -information measures provide 100% prediction accuracy for all three microarray datasets, mutual information attains this accuracy only for breast cancer dataset, and 98.6% and 93.6% for leukemia and colon cancer datasets, respectively.
  • Keywords
    Bayes methods; bioinformatics; cancer; feature extraction; genetics; information theory; molecular biophysics; support vector machines; K-nearest neighbor rule; breast cancer; colon cancer; f-information measure; gene selection problem; gene-class relevance; gene-gene redundancy; information theory; leukemia; microarray gene expression data; mutual information; naive Bayes classifier; support vector machine; Accuracy; Breast cancer; Colon; Distributed computing; Gene expression; Genetic communication; Information theory; Mutual information; Performance evaluation; Testing; Classification; feature selection; gene selection; microarray analysis; mutual information; Gene Expression Profiling; Models, Genetic; Oligonucleotide Array Sequence Analysis;
  • fLanguage
    English
  • Journal_Title
    Biomedical Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9294
  • Type

    jour

  • DOI
    10.1109/TBME.2008.2004502
  • Filename
    4625953