Title :
Data mining and sharing tool for high content screening large scale biological image data
Author :
Shah, Asmi H. ; Gopalakrishnan, Ganesh ; Rajendran, Adithya ; Liebel, Urban
Author_Institution :
Med. Sch., MGH, Harvard Univ., Boston, MA, USA
Abstract :
The constantly developing high content and high throughput screening (HCS & HTS) microscopy and next generation sequencing technologies routinely produce experiment datasets in the terabyte (TB) or petabyte (PB) range resulting in millions of data files that can vary from simply numbers to signals and images. If the collaborators working on the same project are spread over large geographical distances, data sharing, interactive visualization and collaborative annotation techniques become important determinants of the success of a research project. On the other hand, there are hundreds of bioinformatic and cheminformatic databases, billions of documents in available literature, and many image based biological repositories, which need to be referred simultaneously to make sense out of the acquired data. To draw the conclusions from such increasingly complex and large scale data sources, the scientific community must be provided with simple to use methods to retrieve, analyze, visualize, annotate, and crosslink these data sources on a common platform in an efficient manner. However, on the other hand, these modern biological experiments and the subsequent analyses are completed with use of an array of different software suites and automated tools. A constant feedback from the experimenter is needed to change experimental paradigms for the follow-up experiments. Such a software platform to address the HCS data in these aspects does not exist yet, to the best of our knowledge. We have developed a simple to use software package called “AskMe” for users to publish their large scale biological experiment data on to the web by use of data mining and visualization concepts. With use of AskMe, scientists can share these HCS datasets easily with their collaborators or made publicly accessible to the whole scientific community. From the initial stages of experiments, AskMe can ease the experimental analysis process by mining data and providing useful visual- zations. Moreover, integration and crosslinks to other databases also allow easy evaluation of data generated. By these principles, we bring the tools to the data and make the data access transparent to the users without any capacity tradeoff.
Keywords :
biology computing; data mining; data visualisation; image processing; AskMe; HCS datasets; HCS microscopy; HTS microscopy; TB; World Wide Web; automated tools; bioinformatic databases; biological experiments; cheminformatic databases; collaborative annotation; collaborators; constant feedback; data files; data mining; experimental paradigms; experimenter; high content screening large scale biological image data; high throughput screening microscopy; image based biological repositories; interactive visualization; large scale data sources; next generation sequencing technologies; petabyte; scientific community; sharing tool; software package; software platform; software suites; subsequent analysis; terabyte; visualization concepts; Biology; Chemicals; Communities; Data mining; Data visualization; Three-dimensional displays; Web pages; bigdata; bioimage informatics; data mining; high content screening; high throughput screening; large scale data handling; web;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004341