• DocumentCode
    35393
  • Title

    Discovery of Spatially Cohesive Itemsets in Three-Dimensional Protein Structures

  • Author

    Cheng Zhou ; Meysman, Pieter ; Cule, Boris ; Laukens, Kris ; Goethals, Bart

  • Author_Institution
    Dept. of Math. & Comput. Sci., Univ. of Antwerp, Antwerp, Belgium
  • Volume
    11
  • Issue
    5
  • fYear
    2014
  • fDate
    Sept.-Oct. 2014
  • Firstpage
    814
  • Lastpage
    825
  • Abstract
    In this paper we present a cohesive structural itemset miner aiming to discover interesting patterns in a set of data objects within a multidimensional spatial structure by combining the cohesion and the support of the pattern. We propose two ways to build the itemset miner, VertexOne and VertexAll, in an attempt to find a balance between accuracy and run-times. The experiments show that VertexOne performs better, and finds almost the same itemsets as VertexAll in a much shorter time. The usefulness of the method is demonstrated by applying it to find interesting patterns of amino acids in spatial proximity within a set of proteins based on their atomic coordinates in the protein molecular structure. Several patterns found by the cohesive structural itemset miner contain amino acids that frequently co-occur in the spatial structure, even if they are distant in the primary protein sequence and only brought together by protein folding. Further various indications were found that some of the discovered patterns seem to represent common underlying support structures within the proteins.
  • Keywords
    data mining; molecular biophysics; molecular configurations; proteins; VertexAll; VertexOne; amino acids; atomic coordinates; cohesive structural itemset mining; data objects; multidimensional spatial structure; primary protein sequence; protein folding; protein molecular structure; spatial cohesive itemsets; spatial proximity; three-dimensional protein structures; Amino acids; Bioinformatics; Data mining; Itemsets; Protein engineering; Proteins; Itemset mining; cohesion; multidimensional data; protein structure;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2311795
  • Filename
    6767049