Apps vs. Applets in DNAnexus

One of our main design goals for the new DNAnexus platform was to make it possible for expert bioinformaticians to run all of their genome analyses in the cloud — eliminating the need to copy enormous data sets between multiple compute and storage silos. Many features of the platform were built with this in mind: the extensive API and SDK, the scriptable command-line interface, the built-in genome browser, and the integration with R and other high-level environments, to name a few.

In this blog post, we’ll look at what may be the most important DNAnexus feature enabling bioinformaticians to move their work into the cloud: not only can you select from a library of popular analysis tools (and publish new ones), but you can also upload your own private programs to run on the platform. We call these two types of executables, apps and applets, respectively, and conceptually they reflect the mix of off-the-shelf and custom code involved in any end-to-end bioinformatics analysis. You can experiment with apps and applets in DNAnexus firsthand by signing up for a trial account, complete with $125 in compute and storage credits.

Apps represent general-purpose tools expected to be of interest to a wide audience on the platform, striving for compatibility, ease of use, and robustness. They’re published in a dedicated section of the website, and typically include extensive metadata and documentation. Let’s take a look at the app representing the Velvet de novo assembler:

DNAnexus apps

The app page includes details about the tool, links to source repositories, citation links generated from publication DOIs, and detailed documentation of the inputs and outputs. You’ll also notice the page has a separate “versions” tab. When a new version of an app is published, the old version is automatically archived and remains available.

Overall, an app like this should be easily usable by any bioinformatician on the platform, even if they’re not already familiar with Velvet. The documentation and input/output specification is presented in a standardized format (try dx run velvet -h) and there’s no chain of compilation dependencies to deal with. Of course, from the development side, polishing the platform app to this point takes significant time and effort.

Applets are lighter-weight executables that can be used as scripts for project-specific analyses or ad hoc data manipulations, proprietary analysis pipelines, or development/testing versions of apps. Unlike apps, they reside inside your projects alongside data objects — completely private unless you choose to share a project with others. At the same time, they have full access to the platform’s API and parallelization infrastructure; an applet could very well use 1,000 cores!

Suppose, for example, we wrote a script to perform some custom quality filtration on sequencing reads before providing them to the de novo assembler. Using the dx-app-wizard utility in the SDK, we could quickly package this as an applet and upload it to the platform. There’s no need to prepare documentation and metadata, nor to wait for any kind of approval or review by DNAnexus. The result is an applet in our project along with our data:

DNAnexus applets

It’s easy to run apps and applets, and to mix them together in workflows. Here’s what we see when we click “Run” in the project:

run apps

Applets in the project are shown at the top, followed by a list of platform-wide apps. Similarly, the command-line interface ‘dx run’ can launch both apps and applets.

Apps and applets both benefit from the platform’s built-in reproducibility infrastructure. Every time an executable — app or applet — is uploaded into the system, it receives a permanent, unique ID. This ID is recorded in every job launched with the executable and by extension every file and data object produced by any job. Thus, any data produced on the platform can be traced to the exact executable, inputs, and parameter settings used to create it. While apps and applets both have this permanent ID, apps also have a semantic version number in the form xx.yy.zz, to ease association with version numbers of upstream open-source projects.

One final point of interest: the ability of any user to upload arbitrary executables into our platform obviously raises intriguing security considerations. You can be sure, however, that the platform is built with the security and access control infrastructure necessary to make this feasible. We’ll delve into that topic in a future blog post.

 

Developer Spotlight: A De Novo Assembler Named Ray

sebastien boisvertWe recently launched the DNAnexus developer program, and to our delight one user was able to contribute a valuable new app in less than a day. Sébastien Boisvert, a doctoral student at the Université Laval in Québec, Canada, converted a software application he had previously written for short-read de novo assembly to an app for the DNAnexus community.

Boisvert is the mind behind Ray, a scalable genome assembler built specifically for next-gen, short-read sequence data and related applications, such as metagenomics. Ray was first reported in 2010 in the Journal of Computational Biology. Written in C++, it is an MPI-based parallel tool using a single executable to eliminate the need for writing perl scripts. Ray is sequencing platform-agnostic, so it can be used with data from any short-read sequencer.

Today, Ray is primarily used by bioinformaticians who have ongoing access to a supercomputer. The software’s peer-to-peer design makes it ideal to run on systems with hundreds or thousands of nodes — which also makes it just right for a cloud computing environment. When Boisvert heard that DNAnexus was opening its doors to developer-contributed apps, he immediately looked into how to submit Ray so even more users could have access to the tool. From his perspective, cloud computing offers a more instantaneous experience with massively parallel computing to people who don’t readily have supercomputer access, and also provides the type of infrastructure management that allows users to focus on what they want to compute, rather than how to manage queries and coding.

Boisvert remarked that the DNAnexus documentation for contributing an app was straightforward and that the interface in particular was easy to use. Writing the wrapper to convert the software code into an app took less than a day. He worked with the Developer Program support team at DNAnexus to make sure everything was working properly, and now Ray is available for any DNAnexus user to add to an analysis pipeline — and it’s free. (Check out Boisvert’s own blog about cloud computing options, where he notes that it’s fun to start an app in DNAnexus!)

As our developer program continues to grow, we look forward to working with more contributors to get their great apps into our platform so they can be broadly available to our growing community of users. If you’re interested in learning more about our Developer Program, please visit https://dnanexus.com/developers.

Just Launched: The DNAnexus Developer Program

Join a dynamic genomics app incubator community!

sdk Calling all bioinformaticians, computational scientists and hackers! DNAnexus, a company leveraging cloud computing to facilitate the analysis of extremely large biological data sets, has kicked off an app developer program and is looking to add novel genomics tools for users of our new platform.

Genomic data is the next frontier in truly challenging, Big Data problems. Our platform is designed to help scientists collaborate and analyze DNA data within a secure, web-based environment. Users will be able to upload or build workflows and project pipelines, choosing from their own tools, DNAnexus-provided apps, and now apps contributed by external developers like you.

Why should you care?

Uploading your app to the DNAnexus platform offers lots of advantages:

  • The DNAnexus platform is the most flexible and configurable API-based infrastructure for enabling genomic data analysis and data sharing.
  • The DNAnexus platform accepts DNA data from any sequencing instrument, so you can write for multiple sequencers and gain users among a much broader audience than a vendor-specific environment.
  • Join early and incur no out-of-pocket expenses for developing and testing your app. Receive a $1,000 credit toward cloud storage and compute resources.
  • Get recognition! We’ll be profiling our best-contributed apps and the genius developers behind them as we roll out the platform.
  • Easily showcase your app and its functionality on behemoth data sets.
  • Working with DNAnexus is easy and we are more than happy to provide free technical support while you are developing your app.
  • DNAnexus is building in monetization opportunities, so as the platform comes out of beta your app can create a flow of income.

Join Today!code

Interested in learning more? Email developers@dnanexus.com with questions. Send the following information to join the program:

1. Your name and institution
2. Briefly explain the problem you aim to solve
3. Describe the genomics tool you plan to build