• DocumentCode
    174831
  • Title

    Protein Data Modelling for Concurrent Sequential Patterns

  • Author

    Jing Lu ; Keech, Malcolm ; Cuiqing Wang

  • Author_Institution
    Technol. Sch., Southampton Solent Univ., Southampton, UK
  • fYear
    2014
  • fDate
    1-5 Sept. 2014
  • Firstpage
    5
  • Lastpage
    9
  • Abstract
    Protein sequences from the same family typically share common patterns which imply their structural function and biological relationship. The challenge of identifying protein motifs is often addressed through mining frequent item sets and sequential patterns, where post-processing is a useful technique. Earlier work has shown that Concurrent Sequential Patterns mining can be applied in bioinformatics, e.g. to detect frequently occurring concurrent protein sub-sequences. This paper presents a companion approach to data modelling and visualisation, applying it to real-world protein datasets from the PROSITE and NCBI databases. The results show the potential for graph-based modelling in representing the integration of higher level patterns common to all or nearly all of the protein sequences.
  • Keywords
    bioinformatics; data mining; data visualisation; graph theory; proteins; NCBI database; PROSITE database; bioinformatics; concurrent sequential pattern mining; data visualisation; graph-based modelling; protein data modelling; Amino acids; Biological system modeling; Data mining; Data models; Databases; Proteins; ConSP modelling; bioinformatics; biological databases; concurrent sequential patterns (ConSP); data mining; knowledge representation; protein sequences; visualisation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
  • Conference_Location
    Munich
  • ISSN
    1529-4188
  • Print_ISBN
    978-1-4799-5721-7
  • Type

    conf

  • DOI
    10.1109/DEXA.2014.19
  • Filename
    6974818