Abstract :
Recent years have witnessed extraordinary advancement in the data generation capacity of high throughput sequencing technology. This has led to an enormous increase in the amount of data that needs to be managed, stored, visualised, integrated and shared. Moreover there is a widely recognised need for integration of high throughput sequencing data with heterogeneous data sources in both clinical and translational research. Additionally, clinical data sets have stringent privacy and security requirements that must be adhered to when deploying data handling systems. Also, many researchers who need to access these data sets are non-specialists in the IT domain so need systems that are easy to use. Herein, we present an overview of a novel cloud-based research management software system, which has simplicity, scalability, security (including traceability), speed, reproducibility and integration at its core. Indeed, its goal is to democratise research data management-enabling researchers who do not have specialist IT administration, software coding or data management training to handle large sets of integrated cloud-based software tools that automate many complex research data workflows without the need for extensive customisation or manual intervention. Herein, we describe the sequence information management platform (SimplicityTM), a workflow based bioinformatics management tool, which allows life science researchers to rapidly annotate large amounts of DNA and protein sequence data, and receive a detailed, editable and customisable generated report. The efficacy of the SimplicityTM is demonstrated by showing how the system enables rapid publication of life science discoveries. We present results of a workflow run by SimplicityTM for the Marie Cure funded project ClouDx-i.
Keywords :
DNA; bioinformatics; cloud computing; data integration; data privacy; data visualisation; molecular biophysics; molecular configurations; proteins; security of data; DNA sequence data; IT domain; Simplicity; clinical data sets; clinical research; cloud-based research management software system; complex research data workflows; data amount; data generation capacity; data handling systems deployment; data integration; data management; data sharing; data storage; data visualisation; heterogeneous data sources; high throughput sequencing technology; integrated cloud-based software tools; life science researchers; privacy requirements; project ClouDx-i; protein sequence data; research data management; security requirements; sequence information management platform; translational research; workflow based bioinformatics management tool; Bioinformatics; Cloud computing; Genomics; Manuals; Random access memory; Sequential analysis; cloud architecture; data provenance; high throughput sequencing; publishable report; traceability;