Reproducibility, Replicability and Trust in Science
FASTGenomics will present its reproducible ecosystem at the Reproducibility, Replicability and Trust in Science conference by the Wellcome Genome Campus on Thursday, September 10, 2020. For all those, who cannot attend, we publicly provide our lightning talk video on Youtube.
In order to enhance open science and the reproducibility of scientific results, it has become best practice to provide data and code along with paper publication. However, this practice is not enough to guarantee reproducibility and replicability of results for the following reasons:
1) data and metadata are often shared in an incomplete, redacted or already preprocessed version;
2) workflows are often incompletely described, e.g. lacking pre-processing steps or containing analytical solutions without computational implementations;
3) code is often shared without explicit and complete versioning of all required software; and
4) parameters for the analysis are often left out or forgotten (e.g., seed values for random numbers).
In conclusion, we argue that it is not enough to publish data and code (see also Chen et al., 2019, Nature Physics) – what is needed is a self-contained and executable instance of the complete analysis including data and code.
FASTGenomics enables the users to explore data, share data and analyses, and publish interactive results. Thus, users can conduct their complete research in a reproducible environment. The platform offers an intuitive interface for the non-coding expert and contains multiple software environments with state-of-the-art analysis tools, which makes it easy to reuse existing results and algorithms. In addition, users can choose from several tutorial, example and best-practices notebooks to get started.
Technically, FASTGenomics is realized on a load-balanced Kubernetes cluster, which makes it flexible and scalable. Software environments are dockerized and connected to user-selected datasets that can be explored, e.g. with Jupyter notebooks or interactive dashboard visualizations. With its modular design that allows to exchange datasets and analyses, FASTGenomics offers both reproducibility and replicability (Claerbout & Karrenbach, 1992, SEG Expanded Abstracts) of scientific results.
The platform is generic and agnostic to data formats by design, which makes is applicable for broad use in academic and industrial sectors. It can also be implemented as an on-premise solution for institutions and organizations that handle highly sensitive data.
In our presentation we also talk about the latest COVID-19 study by Schulte-Schrepping et al., 2020. The German COVID-19 Omics Initiative (deCOI) used FASTGenomics to provide open, reproducible and interactive access to the results of Schulte-Schrepping et al. (Cell, 2020). Researchers worldwide can now access the project to find all processed data, full code to generate study results and run the complete analysis in a containerized and cloud-based environment (no own study-specific infrastructure required). This example is a good showcase how reproducible science benefits the community and helps fight the pandemic.
At the conference many other tools and incentives to foster open and reproducible science are presented, so make sure to have a look at the conference page.