Author_Institution :
Institute for Quantum Information Science, University of Calgary, Alberta T2N 1N4, Canada
Abstract :
Summary form only given. Recent advances in high throughput data acquisition, distributed sensors, and networked information systems offer unprecedented opportunities in collaborative, integrative data analysis (e.g., for discovery of a priori unknown complex relationships, construction of predictive models from data), hypothesis generation, and knowledge creation. However, realizing these opportunities presents several challenges in practice: Data and knowledge repositories are often autonomous, large, and distributed. Semantic differences, differences in scope, intended use, and privacy considerations further complicate their effective use in practice. In this talk, I will summarize some recent progress on algorithms for constructing predictive models from distributed, semantically disparate data in settings where centralized access to data is be neither feasible nor desirable. I will briefly outline some approaches to selective reuse of knowledge from multiple autonomous knowledge bases; and the automated composition of autonomous software services into complex workflows. I will conclude the talk with some open research challenges in Discovery Informatics that need to be addressed in order to be able to fully realize the promise offered exponential growth in the volume, velocity, and variety of data in scientific discovery. Much of this research has been carried out in collaboration with current and former members of the Iowa State University Artificial Intelligence Research Laboratory and has been supported in part by grants from the National Science Foundation.