• DocumentCode
    1784765
  • Title

    Discovering protein-DNA binding cores by aligned pattern clustering

  • Author

    Lee, En-Shiun Annie ; Ho-Yin Sze-To ; Man-Hon Wong ; Kwong-Sak Leung ; Lau, Terrence Chi-Kong ; Wong, Andrew K. C.

  • Author_Institution
    Syst. Design Eng., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2014
  • fDate
    2-5 Nov. 2014
  • Firstpage
    125
  • Lastpage
    130
  • Abstract
    Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and gene regulation. Variations (or mutations) in binding cores are ubiquitous and have different levels of effects on the binding specificity. To alleviate expensive experiments, we have developed a new method to discover directly from sequence data binding cores and study the effect due to variations. Although existing computational methods have produced satisfactory TF-TFBS binding cores, they are only one-to-one mappings with no site-specific information on residue/nucleotide variations; and also are largely overlapped. In this study, we propose a new representation for modeling TF-TFBS binding with variants known as TF-TFBS Co-Supportive Aligned Pattern Clusters (APCs), which are more compact, with more details for site-specific variants, and biologically more intuitive for analysis. To achieve this task, we have also developed an algorithm to discover TF-TFBS Co-Supportive APCs to capture binding cores at a higher precision with much faster runtime (≥1600X) comparing to other methods. The variants in TF-TFBS Co-Supportive APCs are also statistically analyzed and demonstrated that they can assist homology modeling to synthesize new biological knowledge.
  • Keywords
    DNA; bioinformatics; data mining; genetics; genomics; molecular clusters; pattern clustering; proteins; statistical analysis; TF-TFBS binding cores; TFTFBS cosupportive aligned pattern clusters; aligned pattern clustering; binding specificity; biological knowledge; deciphering protein-DNA binding; gene regulation; homology modeling; mutations; one-to-one mappings; protein-DNA binding cores; residue-nucleotide variations; sequence data binding cores; site-specific variants; statistical analysis; Amino acids; DNA; Educational institutions; Pattern matching; Proteins; Three-dimensional displays; Aligned Pattern Cluster; Association Rule Mining; Binding Cores; Protein-DNA Binding;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
  • Conference_Location
    Belfast
  • Type

    conf

  • DOI
    10.1109/BIBM.2014.6999140
  • Filename
    6999140