The increasing adoption of sequencing and omics technologies prompted the massive expansion of bioinformatic software within the biomedical field. The first human reference genome was released in 2000. Since then, there has been a growing wealth of published studies related to bioinformatics, computational biology and precision medicine. We can now quickly transform large-scale sequencing data into new scientific knowledge and data-driven decisions - from drug discovery, personalized medicine, clinical diagnostics to infectious disease surveillance.
While high quality genomic research hinges on high quality bioinformatic software, the latter is not always the case. Many bioinformatic tools lack the rigorous testing and routine maintenance of non-scientific software. Software bugs that impact result accuracy sometimes remain undetected during development. A small error or unvalidated change in the code may lead to detrimental consequences in scientific research and medical care, such as paper retractions, delayed drug development and clinical misdiagnoses.
Currently, there is an unmet need for robust and reliable software in the scientific community. Experimental design and results interpretation are often prioritized over validation testing of the underlying software that are of equal importance to a good research study.
A paradigm shift is necessary to integrate proper software development and validation processes into scientific research.
Without re-inventing the wheel, standards and best practices of software engineering can conveniently be introduced into the bioinformatic software development process. A major challenge in computational biology research is data reproducibility. Different software versions, operating systems or computing architectures may generate unexpectedly different results from the same input data. When software is no longer maintained or becomes obsolete, it is even more difficult to compare analysis results between previous and current input datasets. Instead of manually testing only when needed, it is possible to implement an automated, continuous integration and testing process to evaluate result accuracy and software performance with every code change. This will help to streamline software development and routine updates, reducing the time spent on bug detection and fixes and maximizing the time spent on scientific innovations.