• DocumentCode
    3728173
  • Title

    Data Analytics for Protein-DNA Binding Interactions

  • Author

    Ka-Chun Wong

  • Author_Institution
    Dept. of Comput. Sci., City Univ. of Hong Kong, Kowloon Tong, China
  • fYear
    2015
  • Firstpage
    1573
  • Lastpage
    1578
  • Abstract
    Determining the protein-DNA binding specificity is an important step in understanding genetic codes. With a large amount of protein-DNA complexes, mature statistical and data mining techniques, and efficient computational power, a fundamental and comprehensive protein-DNA binding sequence analysis is conducted and described in this work. In particular, two different types of analysis are proposed and described. Firstly, statistical analysis is conducted to give holistic insights into the protein-DNA binding sequences. Secondly, data mining techniques are applied to extract interesting sequence patterns which takes into account both sides (protein and DNA sides). The results demonstrate that there are statistically enriched sequence patterns among the protein-DNA binding sequences. Nonetheless, it also confirms that there is not any general principle in protein-DNA binding in a big data analytics manner. To address that, contemporary data mining methods are introduced to discover advanced sequence patterns. The patterns are validated with an external database, revealing biological insights into protein-DNA binding interactions.
  • Keywords
    "Proteins","DNA","Data mining","Amino acids","Statistical analysis","Color"
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/SMC.2015.278
  • Filename
    7379410