2013 IEEE 6th International Conference on Cloud Computing (CLOUD)
Download PDF

Abstract

Cancer affects millions of people worldwide. With the advent of novel DNA sequencing technologies,whole genome sequencing (WGS) is becoming an integral part of cancer diagnostics that can potentially enable tailored treatments of individual patients. To alleviate these problems,a user-friendly, portable and extendable SV calling workflow, sv-callers developed, that includes four state-of-the-art tools to detect SVs in cancer genomes using on-premises HPC systems. The workflow's parallel execution environment enables to scale from a single computer to high-performance compute clusters with minimal effort. The workflow supports SV analysis in either germline or somatic mode, and requires a list of (paired) WGS samples including a reference genome as input. Users may change the workflow parameters and/or software versions using the YAML configuration files. We performed SV analyses on single and paired (tumor/normal) WGS samples, and report on the results obtained using different academic HPC systems.

I.   Introduction

Cancer affects millions of people worldwide. With the advent of novel DNA sequencing technologies, whole genome sequencing (WGS) is becoming an integral part of cancer diagnostics that can potentially enable tailored treatments of individual patients. Despite the advances in large-scale cancer genomics projects (such as TCGA and ICGC's PCAWG) systematic and comprehensive analysis of massive genomic data, in particular the detection and interpretation of structural variations (SVs) in the genomes, remains challenging due to computational and algorithmic limitations [1][2]. A range of methods is available to detect SVs in short-read sequencing data, each producing different results. Therefore, comprehensive SV detection requires the use of multiple methods or tools (callers). In fact, most SV callers implement more than one approach including evidence from split read information, discordantly aligned read pairs, read depth and short-read assembly to improve sensitivity and/or specificity [3][6]. Alternatively, multiple tools, often written in different languages, can be readily combined using a workflow management system. However, a workflow developed on one computing system is not necessarily portable to or reusable on another system due to the complexity of software environments involved, system usage policies or the use of different batch schedulers by HPC clusters (e.g. Grid Engine, Slurm or Torque).

II.   Workflow Implementation

To alleviate these problems, we developed a user-friendly, portable and extendable SV calling workflow, sv-callers [7] , that includes four state-of-the-art tools to detect SV s in cancer genomes using on-premises HPC systems (Fig. 1). In particular, the workflow supports automated deployment of the required software, easy configuration and addition of new analysis tools. Moreover, the workflow's parallel execution environment enables to scale from a single computer to high-performance compute clusters with minimal effort. For this, we used the actively maintained Snakemake workflow system [8],Conda package manager and the newly developed Xenon software suite that provides unified access to different compute and storage resources [9][10]. The workflow supports SV analysis in either germline or somatic mode, and requires a list of (paired) WGS samples including a reference genome as input. Users may change the workflow parameters and/or software versions using the YAML configuration files. We performed SV analyses on single and paired (tumor/normal) WGS samples, and report on the results obtained using different academic HPC systems. Graphic: Sv-callers workflow includes four tools to detect SV s in (paired) WGS samples given a reference genome (a). SV calling jobs are distributed over HPC nodes and executed in parallel using the xenon software suite (b): specific batch scheduler and/or file transfer commands are executed using the xenon-eli command-line tool whereas the xenon library translates each command to the one used by the target cluster.

Fig. 1:Fig. 1: Sv-callers workflow includes four tools to detect SV s in (paired) WGS samples given a reference genome (a). SV calling jobs are distributed over HPC nodes and executed in parallel using the xenon software suite (b): specific batch scheduler and/or file transfer commands are executed using the xenon-eli command-line tool whereas the xenon library translates each command to the one used by the target cluster.

III.   Conclusion

The sv-callers is an easy-to-use, portable and scalable workflow for comprehensive detection of SVs in WGS data, thereby aiding in future genome- first-based clinical decision-making for cancer patients.

References


  • [1]C. Alkan, B.P. Coe, and E.E. Eichler, Genome structural variation discovery and genotyping, Nature reviews. Genetics, vol. 12, no. 5, p. 363–376, May2011.
  • [2]C.K. Yung, Large-Scale Uniform Analysis of Cancer Whole Genomes in Multiple Computing Environments, bioRxiv preprint, DOI: 10.1101/161638, 2017.
  • [3]T. Rausch, T. Zichner, A. Schlattl, A.M. Stütz, V. Benes, and J.O. Korbel, DELLY: structural variant discovery by integrated paired-end and split-read analysis, Bioinformatics (Oxford, England), vol. 28, no. 18, p. i333–i339, Sep.2012.
  • [4]R.M. Layer, C. Chiang, A.R. Quinlan, and I.M. Hall, LUMPY: a probabilistic framework for structural variant discovery, Genome biology, vol. 15, no. 6, p. R84, Jun.2014.
  • [5]X. Chen, Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications, Bioinformatics (Oxford, England), vol. 32, no. 8, p. 1220–1222, Apr.2016.
  • [6]D.L. Cameron, GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly, Genome research, vol. 27, no. 12, p. 2050–2060, Dec.2017.
  • [7]A. Kuzniar, sv-callers: a portable workflow for structural variant calling (version 1.0.0), Zenodo, DOI: 10.52811zenodo.1217112, Apr.2018.
  • [8]J. Köster and S. Rahmann, Snakemake-a scalable bioinformatics workflow engine, Bioinformatics, May2018.
  • [9]J. Maassen, Xenon: a middleware abstraction library that provides a simple programming interface to various compute and storage resources (version 2.6.0), Zenodo, DOI: 10.52811zenodo.1194353, Mar.2018.
  • [10]S. Verhoeven and J.H. Spaaks, Xenon-cli: a command line interface to perform job and file operations (version 2.4.0), Zenodo, DOI: 10.52811zenodo.597603, Mar.2018.

Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles