Reproducibility in the era of big genomic data and precision medicine requires the development and implementation of quality control and standards in experimental protocols and data analysis methods. The Massive Analysis and Quality Control (MAQC) Society in partnership with the Food and Drug Administration (FDA) and its National Center for Toxicological Research (NCTR) brought together academic and industry leaders at this year’s FDA/NCTR-MAQC 2022 Conference at the FDA Headquarters (White Oak Campus, Silver Spring, MD) on September 26-27, 2022 to discuss the challenges and potential solutions to the current reproducibility crisis in life science research.

Magna Labs founder, Gwenn Berry, was invited to present on the role of automated testing and benchmarking in improving the accuracy, reproducibility and scalability of bioinformatic software or pipelines.

The talk focused on the importance of quality assessment of bioinformatic tools as a means to complement existing community efforts such as the precisionFDA, DREAM and CAMI challenges in bringing benchmarking and evaluation standards to biomedical research (Figure 1). Ms Berry first highlighted the concept of “shifting left” (Figure 2) and continuous testing - to evaluate bioinformatic tools more frequently and earlier in the development process to detect errors and implement changes more quickly. This will minimize the propagation and accumulation of unexpected software bugs or performance that may impact the application of bioinformatic tools in the downstream data analysis and result interpretation in scientific studies. She then introduced our bioinformatic software test automation platform, Miqa, as a “shift left” technology that can improve data reproducibility in science by easing and streamlining the quality control process of bioinformatic tools from the earliest stages of the tool development life cycle.

Figure 1. Quality assessment in bioinformatic tool development
Figure 1. Quality assessment in bioinformatic tool development

Shift left quality model in software development
Figure 2. Shift left model.  Source: van der Cruijsen 2017.

Quality control in bioinformatic tool development

Rapid development in DNA sequencing technologies has expanded biomedical research in genomics, transcriptomics, proteomics and other omics disciplines. This scientific advancement has led to both a massive proliferation and a growing need for standardization and quality control of bioinformatic tools used to analyze omics data. While community efforts have been made to streamline existing analysis workflows, bioinformatic tool and pipeline validation is still largely a manual and ad hoc process with little consistency across projects. Without a solution for ongoing and rigorous assessment using diverse datasets, research teams are forced to “reinvent the wheel” - risking improper or incomplete evaluation, therefore hindering the translation of omics data to reliable research and clinical applications.

Miqa - an automated bioinformatic software evaluation and benchmarking platform

To address this unmet need for a standardized bioinformatic tool assessment approach, we have developed an automated bioinformatic software testing platform, Miqa, to evaluate performance (e.g., accuracy, concordance, precision) and detect errors in bioinformatic tools with every code change. At the architectural level, it employs cloud-computing and software containerization technologies to facilitate continuous testing in any bioinformatic tool development workflow. At the application level, it is customizable to accommodate different omics data types and analytical requirements. With a user-friendly interface, Miqa enables quick comparison of a tool against previous versions or gold standards, under different parameters or computing environments, or with different test datasets.

Miqa is designed to streamline the continual evaluation, by users and developers, of bioinformatic tools, minimizing undetected bugs and improving data reproducibility. As clinical, translational, and quality-critical research applications for omics technologies continue to rapidly expand, scalable and automated testing techniques will ensure the implementation of software engineering best practices in the ongoing development of increasingly sophisticated bioinformatic tools.