DocumentCode
174831
Title
Protein Data Modelling for Concurrent Sequential Patterns
Author
Jing Lu ; Keech, Malcolm ; Cuiqing Wang
Author_Institution
Technol. Sch., Southampton Solent Univ., Southampton, UK
fYear
2014
fDate
1-5 Sept. 2014
Firstpage
5
Lastpage
9
Abstract
Protein sequences from the same family typically share common patterns which imply their structural function and biological relationship. The challenge of identifying protein motifs is often addressed through mining frequent item sets and sequential patterns, where post-processing is a useful technique. Earlier work has shown that Concurrent Sequential Patterns mining can be applied in bioinformatics, e.g. to detect frequently occurring concurrent protein sub-sequences. This paper presents a companion approach to data modelling and visualisation, applying it to real-world protein datasets from the PROSITE and NCBI databases. The results show the potential for graph-based modelling in representing the integration of higher level patterns common to all or nearly all of the protein sequences.
Keywords
bioinformatics; data mining; data visualisation; graph theory; proteins; NCBI database; PROSITE database; bioinformatics; concurrent sequential pattern mining; data visualisation; graph-based modelling; protein data modelling; Amino acids; Biological system modeling; Data mining; Data models; Databases; Proteins; ConSP modelling; bioinformatics; biological databases; concurrent sequential patterns (ConSP); data mining; knowledge representation; protein sequences; visualisation;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications (DEXA), 2014 25th International Workshop on
Conference_Location
Munich
ISSN
1529-4188
Print_ISBN
978-1-4799-5721-7
Type
conf
DOI
10.1109/DEXA.2014.19
Filename
6974818
Link To Document