• DocumentCode
    2463073
  • Title

    Identifying Complex Biological Interactions based on Categorical Gene Expression Data

  • Author

    Goertzel, Ben ; Pennachin, Cassio ; de Souza Coelho, Lucio Souza ; Mudado, Mauricio

  • Author_Institution
    Biomind LLC, Rockville
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    1434
  • Lastpage
    1441
  • Abstract
    A novel method, MUTIC (model utilization-based clustering), is described for identifying complex interactions between genes or gene-categories based on gene expression data. The method deals with binary categorical data, which consists of a set of gene expression profiles divided into two biologically meaningful categories. It does not require data from multiple time points. Gene expression profiles are represented by feature vectors whose component features are either gene expression values, or averaged expression values corresponding to gene ontology or protein information resource categories. A supervised learning algorithm (genetic programming) is used to learn an ensemble of classification models distinguishing the two categories based on the feature vectors corresponding to their members. Each feature is associated with a "model utilization vector," which has an entry for each high-quality classification model found, indicating whether or not the feature was used in that model. These utilization vectors are then clustered using a variant of hierarchical clustering called Omniclust. The result is a set of model-utilization-based clusters, in which features are gathered together if they are often considered together by classification models - which may be because they\´re co-expressed, or may be for subtler reasons involving multi-gene interactions. The MUTIC method is illustrated via applying it to a dataset regarding gene expression in human brains of various ages. Compared to traditional expression-based clustering, MUTIC yields clusters that have higher mathematical quality (in the sense of homogeneity and separation) and also yield novel insights into the underlying biological processes.
  • Keywords
    biology computing; genetic algorithms; genetics; learning (artificial intelligence); pattern clustering; binary categorical data; biological process; categorical gene expression data; classification model; complex biological interactions; feature vector; gene expression profile; gene ontology; gene-categories; genetic programming; model utilization vector; model utilization-based clustering; protein information resource category; supervised learning algorithm; Biological interactions; Biological processes; Biological system modeling; Gene expression; Genetic programming; Humans; Information resources; Ontologies; Proteins; Supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation, 2006. CEC 2006. IEEE Congress on
  • Conference_Location
    Vancouver, BC
  • Print_ISBN
    0-7803-9487-9
  • Type

    conf

  • DOI
    10.1109/CEC.2006.1688477
  • Filename
    1688477