Title :
Computation for Genomics Knowledge Discovery
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Concordia Univ., Montreal, QC, Canada
Abstract :
Knowledge discovery in genomics involves large scale graph processing and inference which is different from high-performance computing in genomics for sequence analysis. Genomics datasets are becoming increasing large and varied due to advances in biotechnology. Traditional sequence analysis therefore is computation-intensive for tasks such as assembly of reads, mapping reads to genomes, variation analysis across genomes, sequence similarity, sequence clustering, phylogenetics, and sequence motif and pattern finding. Beyond these data analysis steps come annotation steps to determine genes and their roles. This is knowledge discovery by inference from experimentally characterized genes, with provenance tracking the evidence for and against the annotation, post-processing by rules to catch systematic errors in annotation, gap-filling in systems biology network models, and propagation of changes in our knowledge of experimentally characterized genes. How can we engineer software for these kinds of systems that require high performance computing?
Keywords :
data mining; genomics; sequences; software engineering; biotechnology; genomics knowledge discovery; high performance computing; large scale graph processing; sequence analysis; software engineering; Bioinformatics; Databases; Genomics; Ontologies; Organisms; Proteins; bioinformatics; change propagation; graph processing; provenance;
Conference_Titel :
Software Engineering for High Performance Computing in Science (SE4HPCS), 2015 IEEE/ACM 1st International Workshop on
Conference_Location :
Florence
DOI :
10.1109/SE4HPCS.2015.14