• DocumentCode
    1157599
  • Title

    Associative clustering for exploring dependencies between functional genomics data sets

  • Author

    Kaski, Samuel ; Nikkilä, Janne ; Sinkkonen, Janne ; Lahti, Leo ; Knuuttila, Juha E A ; Roos, Christophe

  • Author_Institution
    Dept. of Comput. Sci., Helsinki Univ., Finland
  • Volume
    2
  • Issue
    3
  • fYear
    2005
  • Firstpage
    203
  • Lastpage
    216
  • Abstract
    High-throughput genomic measurements, interpreted as cooccurring data samples from multiple sources, open up a fresh problem for machine learning: What is in common in the different data sets, that is, what kind of statistical dependencies are there between the paired samples from the different sets? We introduce a clustering algorithm for exploring the dependencies. Samples within each data set are grouped such that the dependencies between groups of different sets capture as much of pairwise dependencies between the samples as possible. We formalize this problem in a novel probabilistic way, as optimization of a Bayes factor. The method is applied to reveal commonalities and exceptions in gene expression between organisms and to suggest regulatory interactions in the form of dependencies between gene expression profiles and regulator binding patterns.
  • Keywords
    Bayes methods; biology computing; genetics; learning (artificial intelligence); molecular biophysics; optimisation; statistical analysis; Bayes factor; associative clustering; clustering algorithm; functional genomics data sets; gene expression; machine learning; optimization; pairwise dependencies; regulator binding patterns; regulatory interactions; statistical dependencies; Bioinformatics; Clustering algorithms; Gene expression; Genetics; Genomics; Machine learning; Machine learning algorithms; Organisms; Regulators; Statistical analysis; Index Terms- Biology and genetics; clustering; contingency table analysis; machine learning; multivariate statistics.; Algorithms; Artificial Intelligence; Chromosome Mapping; Cluster Analysis; Computer Simulation; Databases, Genetic; Gene Expression Profiling; Information Storage and Retrieval; Models, Genetic; Multigene Family; Oligonucleotide Array Sequence Analysis; Statistics as Topic;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2005.32
  • Filename
    1504685