One of our main design goals for the new DNAnexus platform was to make it possible for expert bioinformaticians to run all of their genome analyses in the cloud — eliminating the need to copy enormous data sets between multiple compute and storage silos. Many features of the platform were built with this in mind: the extensive API and SDK, the scriptable command-line interface, the built-in genome browser, and the integration with R and other high-level environments, to name a few.
In this blog post, we’ll look at what may be the most important DNAnexus feature enabling bioinformaticians to move their work into the cloud: not only can you select from a library of popular analysis tools (and publish new ones), but you can also upload your own private programs to run on the platform. We call these two types of executables, apps and applets, respectively, and conceptually they reflect the mix of off-the-shelf and custom code involved in any end-to-end bioinformatics analysis. You can experiment with apps and applets in DNAnexus firsthand by signing up for a trial account, complete with $125 in compute and storage credits.
Apps represent general-purpose tools expected to be of interest to a wide audience on the platform, striving for compatibility, ease of use, and robustness. They’re published in a dedicated section of the website, and typically include extensive metadata and documentation. Let’s take a look at the app representing the Velvet de novo assembler:
The app page includes details about the tool, links to source repositories, citation links generated from publication DOIs, and detailed documentation of the inputs and outputs. You’ll also notice the page has a separate “versions” tab. When a new version of an app is published, the old version is automatically archived and remains available.
Overall, an app like this should be easily usable by any bioinformatician on the platform, even if they’re not already familiar with Velvet. The documentation and input/output specification is presented in a standardized format (try dx run velvet -h) and there’s no chain of compilation dependencies to deal with. Of course, from the development side, polishing the platform app to this point takes significant time and effort.
Applets are lighter-weight executables that can be used as scripts for project-specific analyses or ad hoc data manipulations, proprietary analysis pipelines, or development/testing versions of apps. Unlike apps, they reside inside your projects alongside data objects — completely private unless you choose to share a project with others. At the same time, they have full access to the platform’s API and parallelization infrastructure; an applet could very well use 1,000 cores!
Suppose, for example, we wrote a script to perform some custom quality filtration on sequencing reads before providing them to the de novo assembler. Using the dx-app-wizard utility in the SDK, we could quickly package this as an applet and upload it to the platform. There’s no need to prepare documentation and metadata, nor to wait for any kind of approval or review by DNAnexus. The result is an applet in our project along with our data:
It’s easy to run apps and applets, and to mix them together in workflows. Here’s what we see when we click “Run” in the project:
Applets in the project are shown at the top, followed by a list of platform-wide apps. Similarly, the command-line interface ‘dx run’ can launch both apps and applets.
Apps and applets both benefit from the platform’s built-in reproducibility infrastructure. Every time an executable — app or applet — is uploaded into the system, it receives a permanent, unique ID. This ID is recorded in every job launched with the executable and by extension every file and data object produced by any job. Thus, any data produced on the platform can be traced to the exact executable, inputs, and parameter settings used to create it. While apps and applets both have this permanent ID, apps also have a semantic version number in the form xx.yy.zz, to ease association with version numbers of upstream open-source projects.
One final point of interest: the ability of any user to upload arbitrary executables into our platform obviously raises intriguing security considerations. You can be sure, however, that the platform is built with the security and access control infrastructure necessary to make this feasible. We’ll delve into that topic in a future blog post.