Computation for Genomics Knowledge Discovery

Author

Butler, Greg

Author_Institution

Dept. of Comput. Sci. & Software Eng., Concordia Univ., Montreal, QC, Canada

fYear

2015

fDate

18-18 May 2015

Firstpage

46

Lastpage

50

Abstract

Knowledge discovery in genomics involves large scale graph processing and inference which is different from high-performance computing in genomics for sequence analysis. Genomics datasets are becoming increasing large and varied due to advances in biotechnology. Traditional sequence analysis therefore is computation-intensive for tasks such as assembly of reads, mapping reads to genomes, variation analysis across genomes, sequence similarity, sequence clustering, phylogenetics, and sequence motif and pattern finding. Beyond these data analysis steps come annotation steps to determine genes and their roles. This is knowledge discovery by inference from experimentally characterized genes, with provenance tracking the evidence for and against the annotation, post-processing by rules to catch systematic errors in annotation, gap-filling in systems biology network models, and propagation of changes in our knowledge of experimentally characterized genes. How can we engineer software for these kinds of systems that require high performance computing?

Keywords

data mining; genomics; sequences; software engineering; biotechnology; genomics knowledge discovery; high performance computing; large scale graph processing; sequence analysis; software engineering; Bioinformatics; Databases; Genomics; Ontologies; Organisms; Proteins; bioinformatics; change propagation; graph processing; provenance;

fLanguage

English

Publisher

ieee

Conference_Titel

Software Engineering for High Performance Computing in Science (SE4HPCS), 2015 IEEE/ACM 1st International Workshop on

Conference_Location

Florence

Type

conf

DOI

10.1109/SE4HPCS.2015.14

Filename

7173510