Keeping the Genome Browser Responsive

The new DNAnexus genome browser is very powerful, and lets users visualize a wide variety of data — and with the power of apps and applets, users can even customize what gets displayed in the browser by generating new, personalized spans tracks.

As regular web developers know, this flexibility in a JavaScript-based environment can have an effect on user experience. We have explored many different approaches to this challenge, and the new DNAnexus genome browser demonstrates that it is indeed possible to run a resource-hungry application in any popular web browser while sacrificing very little of the user’s perception and time.

In this blog post, we’ll go through techniques we considered and how we ultimately solved the problem.

genome browser

The Problem

First, let’s go through a little background to get everyone on the same page. The tracks in the genome browser are all generated using JavaScript, a versatile language that most web browsers use to grant interactivity to web pages. This allows us to take advantage of the user’s computer to render and interact with the genome browser tracks, rather than doing computation and rendering on the servers and presenting a static image to the user.

However, browser-based JavaScript has a number of problems in this context. First among these is that in most cases it’s still slow compared to many other languages. Monumental and very successful efforts have been made in recent years to make it more efficient, but it is still rare that a piece of code written in JavaScript will perform at the same level as a similar algorithm written in other languages such as C or even Perl.

Also, in most cases, JavaScript only has one thread. This means that with a handful of exceptions, your browser will execute the scripts given to it in order, and execution of one script will delay execution of any other script until the first has finished. In an age when even home computers have many cores between which processing can be divided, this can result in a slower perceived performance.

Finally, and most notably, in many web browsers the thread that executes JavaScript and the thread responsible for updating the user interface and responding to user events (such as clicking, typing, and mouse movement) are the same thread. This means that while JavaScript is executing, the user can’t interact with the web page in any way (including closing the browser tab in many cases).

Since the genome browser has the potential to process tens of thousands of elements, this could result in the user waiting for tens of seconds, unable to do anything, while the browser renders the slew of data coming into it. Unless fixed, this would make the genome browser range from merely aggravating to downright unusable.

One solution: web workers

One of the most anticipated features of the nascent HTML5 specification is web workers, a new method for offloading JavaScript processing into a background process. This solves most of the above-mentioned problems quite handily: it allows a semblance of multi-threading and it doesn’t interfere with the user interface. However, implementation in modern browsers is iffy, with some fairly recent browsers still lacking support. Additionally, availability of the general JavaScript execution environment is limited, with no access to the document model or certain other resources.

If you have the luxury of only supporting browsers that have web workers enabled, this is probably still the best option. However, the new DNAnexus genome browser needs to run on IE 9, which rules out this option for us.

Another solution: self-monitoring timeouts

A slightly more complicated solution is to launch processing chunks in timeouts, which are JavaScript’s way of delaying code execution until after a set period of time. This allows other scripts to execute in the meantime, and more crucially allows the UI to update itself on a fairly regular basis. It is also compatible with almost all browsers, and allows access to the entire execution environment and the document.

The key to this approach is giving any potentially long-running algorithm a sense of how much user time it has taken up and forcing it to yield execution at set time intervals. Thus, if a function detects that it’s been running for too long, it can pause execution for a short time until other pieces of code have had their turn.

In practice, this requires some fine-tuning. The two crucial variables are how long to let a function execute before yielding, and how much time to yield for. The latter is relatively simple: a short wait time is better, since we’re really concerned with allowing the UI to update. Most browsers safely go as low at 10ms for a timeout interval, and it’s recommended to stay somewhere near that range. Note setting a timeout for 10ms does not guarantee that it will resume execution exactly 10ms later; it could be 12, 15, or even 50 milliseconds before execution resumes. This is one potential pitfall of this solution.

The other variable, time before yielding, is much trickier and will probably end up getting fine-tuned according to your application. There are two separate issues pulling the value in either direction: on one hand, you want your function to execute for as long as possible between periods of inactivity so that it remains as efficient as possible. On the other hand, you also want to preserve the maximum interactivity from the user’s point of view, which encourages short periods of processing followed by long idle periods.

In his book Designing with the Mind in Mind, Jeff Johnson compiles several studies into a convenient chart that lists crucial time intervals for human perception. While 5ms is listed as the shortest perceptible display time for visual stimuli and would be ideal from a UX point of view, that’s obviously a little too aggressive for our scripts — it would triple the execution time if a script ran for 5ms and then delayed for 10ms! The next important increment is 100ms, which is the maximum amount of time for continuity — that is, if a user presses a button and the computer waits more than 100ms to render that button as being pressed, the impression that the user caused that button to be pressed is broken. This seems like a good upper bound for execution time, but keep in mind that you may have to adjust this timing downward depending on how often you check the function run time. If each item takes several milliseconds to process, you may have to yield every 95ms instead of 100ms. (We ultimately decided to be conservative and go with an 85ms execution interval.)

The end result

Either of these solutions will yield a responsive interface that still allows for large amounts of data to be processed. We opted for the second solution due to browser support issues, keeping in mind key perceptual phenomena and best UI practices. While there are several promising technologies on the horizon, such as web workers in the near term and WebCL in the long term, JavaScript in browser environments in 2013 still presents some weaknesses. These are not insurmountable, however, and the new DNAnexus genome browser demonstrates that it is possible to have a usable, computation-heavy application running in the web browser today.

Apps vs. Applets in DNAnexus

One of our main design goals for the new DNAnexus platform was to make it possible for expert bioinformaticians to run all of their genome analyses in the cloud — eliminating the need to copy enormous data sets between multiple compute and storage silos. Many features of the platform were built with this in mind: the extensive API and SDK, the scriptable command-line interface, the built-in genome browser, and the integration with R and other high-level environments, to name a few.

In this blog post, we’ll look at what may be the most important DNAnexus feature enabling bioinformaticians to move their work into the cloud: not only can you select from a library of popular analysis tools (and publish new ones), but you can also upload your own private programs to run on the platform. We call these two types of executables, apps and applets, respectively, and conceptually they reflect the mix of off-the-shelf and custom code involved in any end-to-end bioinformatics analysis. You can experiment with apps and applets in DNAnexus firsthand by signing up for a trial account, complete with $125 in compute and storage credits.

Apps represent general-purpose tools expected to be of interest to a wide audience on the platform, striving for compatibility, ease of use, and robustness. They’re published in a dedicated section of the website, and typically include extensive metadata and documentation. Let’s take a look at the app representing the Velvet de novo assembler:

DNAnexus apps

The app page includes details about the tool, links to source repositories, citation links generated from publication DOIs, and detailed documentation of the inputs and outputs. You’ll also notice the page has a separate “versions” tab. When a new version of an app is published, the old version is automatically archived and remains available.

Overall, an app like this should be easily usable by any bioinformatician on the platform, even if they’re not already familiar with Velvet. The documentation and input/output specification is presented in a standardized format (try dx run velvet -h) and there’s no chain of compilation dependencies to deal with. Of course, from the development side, polishing the platform app to this point takes significant time and effort.

Applets are lighter-weight executables that can be used as scripts for project-specific analyses or ad hoc data manipulations, proprietary analysis pipelines, or development/testing versions of apps. Unlike apps, they reside inside your projects alongside data objects — completely private unless you choose to share a project with others. At the same time, they have full access to the platform’s API and parallelization infrastructure; an applet could very well use 1,000 cores!

Suppose, for example, we wrote a script to perform some custom quality filtration on sequencing reads before providing them to the de novo assembler. Using the dx-app-wizard utility in the SDK, we could quickly package this as an applet and upload it to the platform. There’s no need to prepare documentation and metadata, nor to wait for any kind of approval or review by DNAnexus. The result is an applet in our project along with our data:

DNAnexus applets

It’s easy to run apps and applets, and to mix them together in workflows. Here’s what we see when we click “Run” in the project:

run apps

Applets in the project are shown at the top, followed by a list of platform-wide apps. Similarly, the command-line interface ‘dx run’ can launch both apps and applets.

Apps and applets both benefit from the platform’s built-in reproducibility infrastructure. Every time an executable — app or applet — is uploaded into the system, it receives a permanent, unique ID. This ID is recorded in every job launched with the executable and by extension every file and data object produced by any job. Thus, any data produced on the platform can be traced to the exact executable, inputs, and parameter settings used to create it. While apps and applets both have this permanent ID, apps also have a semantic version number in the form xx.yy.zz, to ease association with version numbers of upstream open-source projects.

One final point of interest: the ability of any user to upload arbitrary executables into our platform obviously raises intriguing security considerations. You can be sure, however, that the platform is built with the security and access control infrastructure necessary to make this feasible. We’ll delve into that topic in a future blog post.

 

Developer Spotlight: A De Novo Assembler Named Ray

sebastien boisvertWe recently launched the DNAnexus developer program, and to our delight one user was able to contribute a valuable new app in less than a day. Sébastien Boisvert, a doctoral student at the Université Laval in Québec, Canada, converted a software application he had previously written for short-read de novo assembly to an app for the DNAnexus community.

Boisvert is the mind behind Ray, a scalable genome assembler built specifically for next-gen, short-read sequence data and related applications, such as metagenomics. Ray was first reported in 2010 in the Journal of Computational Biology. Written in C++, it is an MPI-based parallel tool using a single executable to eliminate the need for writing perl scripts. Ray is sequencing platform-agnostic, so it can be used with data from any short-read sequencer.

Today, Ray is primarily used by bioinformaticians who have ongoing access to a supercomputer. The software’s peer-to-peer design makes it ideal to run on systems with hundreds or thousands of nodes — which also makes it just right for a cloud computing environment. When Boisvert heard that DNAnexus was opening its doors to developer-contributed apps, he immediately looked into how to submit Ray so even more users could have access to the tool. From his perspective, cloud computing offers a more instantaneous experience with massively parallel computing to people who don’t readily have supercomputer access, and also provides the type of infrastructure management that allows users to focus on what they want to compute, rather than how to manage queries and coding.

Boisvert remarked that the DNAnexus documentation for contributing an app was straightforward and that the interface in particular was easy to use. Writing the wrapper to convert the software code into an app took less than a day. He worked with the Developer Program support team at DNAnexus to make sure everything was working properly, and now Ray is available for any DNAnexus user to add to an analysis pipeline — and it’s free. (Check out Boisvert’s own blog about cloud computing options, where he notes that it’s fun to start an app in DNAnexus!)

As our developer program continues to grow, we look forward to working with more contributors to get their great apps into our platform so they can be broadly available to our growing community of users. If you’re interested in learning more about our Developer Program, please visit https://dnanexus.com/developers.