Docker Containers and Reusable Workflows for Genomic Analysis
Omar Sobh & Umberto Ravaioli (KnowEnG Center), Ravi Madduri & Ian Foster (U Chicago) and Matthew Turk (NCSA, UI)
As a part of the NIH BD2K commons pilot project, BDDS center in U Chicago, KnowEnG and National Center for Supercomputing Applications at the UI have collaborated to create and package genomic analytical methods developed by the centers into standardized software containers, enabling efficient scheduling and execution of container onto local and cloud computing resources. In our collaboration, we have developed a common pattern for genomic analysis leveraging the Bagit format for digital preservation, and the use of Apache Mesos and Docker containers for large scale re-usable genomic workflows.
1) Fetch data in the Bagit format from the BDDS center which contains gene variant and expression data.
(2) The databags would then be sent to the KnowEnG ETL pipeline using Mesos and Docker containers for processing, analytics, and visualizations.
(3) The KnowEnG results are the stored in the bagit format and sent to the Globus Genomics portal for archiving and re-use.