• DocumentCode
    3061060
  • Title

    Feature Extraction from Microarray Expression Data by Integration of Semantic Knowledge

  • Author

    Cho, Young-Rae ; Xu, Xian ; Hwang, Woochang ; Zhang, Aidong

  • Author_Institution
    State Univ. of New York at Buffalo, Buffalo
  • fYear
    2007
  • fDate
    13-15 Dec. 2007
  • Firstpage
    606
  • Lastpage
    611
  • Abstract
    Microarray techniques give biologists first peek into the molecular states of living tissues. Previous studies have proven that it is feasible to build sample classifiers using the gene expressional profiles. To build an effective sample classifier, dimension reduction process is necessary since classic pattern recognition algorithms do not work well in high dimensional space. In this paper, we present a novel feature extraction algorithm based on the concept of virtual genes by integrating microarray expression data sets with domain knowledge embedded in gene ontology (GO) annotations. We define semantic similarity to measure the functional associations between two genes using the annotation on each GO term. We then identify the groups of genes, called virtual genes, that potentially interact with each other for a biological function. The correlation in gene expression levels of virtual genes can be used to build a sample classifier. For a colon cancer data set, the integration of microarray expression data with GO annotations significantly improves the accuracy of sample classification by more than 10%.
  • Keywords
    biological tissues; cancer; feature extraction; genetic engineering; medical image processing; ontologies (artificial intelligence); pattern classification; semantic Web; virtual reality; biological function; colon cancer; feature extraction; gene expressional profiles; gene ontology annotations; living tissues; microarray expression data; molecular states; pattern classification; pattern recognition algorithms; semantic knowledge integration; virtual genes; Application software; Computer science; Data analysis; Data engineering; Feature extraction; Gene expression; Knowledge engineering; Machine learning; Ontologies; Spatial databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
  • Conference_Location
    Cincinnati, OH
  • Print_ISBN
    978-0-7695-3069-7
  • Type

    conf

  • DOI
    10.1109/ICMLA.2007.10
  • Filename
    4457296