• DocumentCode
    2736877
  • Title

    Keynote: High-resolution sequence and chromatin signatures predict transcription factor binding in the human genome

  • Author

    Leslie, Christina

  • Author_Institution
    Comput. Biol. Program, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
  • fYear
    2011
  • fDate
    3-5 Feb. 2011
  • Firstpage
    2
  • Lastpage
    2
  • Abstract
    Accurately modeling the DNA sequence preferences of transcription factors and predicting their genomic binding sites are key problems in regulatory genomics. These efforts have long been frustrated by the limited availability and accuracy of TF binding site motifs. Today, protein binding microarray (PBM) experiments and chromatin immunoprecipitation followed by sequencing (ChlP-seq) experiments are generating unprecedented high-resolution data on in vitro and in vivo TF binding. This paper will present a flexible new discriminative framework for representing and learning TF binding preferences using these massive data sets. Support vector regressions models were trained with a novel string kernel on PBM data to learn the mapping from probe sequences to binding intensities. Results confirm that discriminative sequence models presented here significantly outperform existing motif discovery algorithms, and it is found that ChlP-trained models greatly improved TF occupancy prediction over PBM-trained models, suggesting distinct in vivo sequence information. Finally, discriminative chromatin models using histone modification ChlP-seq data were trained and results show that models combining sequence and chromatin signatures strongly outperformed using either one alone. This work establishes effective new techniques for analyzing next generation sequencing data sets to study the interplay of chromatin and sequence in TF binding in the human genome.
  • Keywords
    biology computing; genomics; learning (artificial intelligence); molecular biophysics; molecular configurations; physiological models; proteins; regression analysis; support vector machines; DNA sequence preferences; chromatin immunoprecipitation; chromatin signature; genomic binding sites; high-resolution sequence; histone modification ChlP-seq data; human genome; learning; motif discovery algorithms; protein binding microarray; regulatory genomics; string kernel; support vector regressions models; transcription factor binding; Bioinformatics; Data models; Genomics; In vitro; In vivo; Predictive models; Support vector machines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
  • Conference_Location
    Orlando, FL
  • Print_ISBN
    978-1-61284-851-8
  • Type

    conf

  • DOI
    10.1109/ICCABS.2011.5729880
  • Filename
    5729880