Interactive jobs: connecting to DNAnexus with SSH

In the quest to make app development on the DNAnexus platform easier and more interactive, we are excited to announce today a new feature – SSH connections to compute jobs. Bioinformaticians and Linux developers are familiar with the SSH command, used to connect to remote computers over the network. The new feature makes it easier to monitor DNAnexus jobs, debug them if something goes wrong, or use DNAnexus workers as powerful interactive workstations in the cloud. Jobs running on the DNAnexus platform can now be optionally configured to allow SSH connections to their execution environment.

By default, DNAnexus jobs have always been firewalled from the Internet, and only have network access to the DNAnexus API. This default will remain, but now three new command-line options are available when launching your job:

  • Running  dx run <executable> --allow-ssh will configure your job to open the SSH port for network connections from IP ranges that you specify.
  • Running  dx run <executable> --ssh will do the same as above, but also immediately connect to the job as soon as it starts running.
  • Running  dx run <executable> --debug-on <error-type> will configure your job’s execution environment to set a breakpoint, so that if the job encounters an error, you can connect to it over SSH and examine what went wrong.

As before, outbound access by jobs can be configured using network access permissions.  Inbound access is restricted to SSH connectivity only, and must be enabled explicitly by the user at run time using the options above.

One-time setup of your user account is required to allow use of SSH connections. Use dx ssh_config to perform this setup. This will generate a new SSH key pair, which you can protect with a password, and configure your account with the public key. Only you and the job you’re connecting to can see the public key; the private key remains on the computer that you ran dx ssh_config on.

When you log in, the system will automatically start the byobu window manager running the tmux terminal multiplexer, so that you can use multiple terminals to monitor the job and do other tasks, and can resume where you left off if you get disconnected. Further information on the state of the job and on how to use the terminal is presented in a banner when you log in.

We have been using this feature for the past few weeks, and it has already proven to be a great tool for debugging and understanding the performance of our genomics tools in the cloud. Detailed documentation is available on the DNAnexus wiki.  We have also provided a tutorial on deploying an interactive “Cloud Workstation” to explore and manipulate data stored on DNAnexus data as you would on a local Linux machine.

If you have any questions or comments, don’t hesitate to ask them on DNAnexus Answers!

 

JavaScript Libraries for Integrating with DNAnexus

Today we are pleased to announce the release of our first publicly available JavaScript toolkit, which simplifies the integration process of your web application with DNAnexus.

Using our API library, it’s easy to make calls to the DNAnexus API from your JavaScript application. Many of the details of constructing the AJAX request are taken care of for you, with this new toolkit, including the configuration of the authentication headers, construction of the URL, serialization of the input, and backoff/retry logic for failed requests. All you need is an authentication token from DNAnexus and you’ll be on your way to making API calls.

Here is example code:

var api = new DX.Api("AUTH_TOKEN_GOES_HERE");
api.call("user-bob", "describe").done(function(resp) {
  alert("user-bob's full name is " + [resp.first, resp.last].join(" "));
});

For additional DX.Api documentation please visit: https://github.com/dnanexus/dx-JavaScript-toolkit/blob/master/docs/api.md

In addition to API call management, we are also releasing our upload libraries that utilize multiple connections and check the integrity of files. Traditional file uploads use a single connection, do not provide integrity checking, and are not resumable. To address these shortcomings, DNAnexus upload libraries upload file chunks independently and in parallel and merge them back together, all with only a few lines of code.  Our uploading library is very efficient, utilizing web workers for efficient parallel computation and involves slicing the files into chunks, computing checksums on the chunks, uploading the chunks in parallel, and then closing the file.

Here is example code:

var upload = new DX.Upload(authToken, fileList, options);
upload.start();

For additional DX.Upload documentation please visit: https://github.com/dnanexus/dx-JavaScript-toolkit/blob/master/docs/upload.md

The DX.Upload library makes it easy to create rich upload experiences, such as the DNAnexus Web Uploader (pictured below).

Add data to project

We’ve only scratched the surface in functionality of DNAnexus JavaScript libraries, and we’ll be releasing additional features in the coming months. Also on the horizon, complete UI components for common operations inside the DNAnexus platform. Check back with our devblog for new updates and if you have any questions or feature requests, please email evan@dnanexus.com.

 

DNAnexus Introduces Faster Cloud Options

Spring has arrived at DNAnexus, ushering in important updates! Starting May 1, 2014, we are excited to announce your analyses on DNAnexus will be faster, thanks to new instance types .

What does that really mean? Here’s an example before we dive into all the details…  A specific exome pipeline (e.g., BWA-MEM, GATK-Lite) now runs in less than 4 hours! Previously, the run would have taken nearly 6 hours.

New instance types

We believe, and hope you do too, that DNAnexus is the best choice for expanding your genomic analysis infrastructure. Because, unlike local equipment, which from day one starts collecting dust in your server room while technological advances pile up, the cloud is always on the forefront of computing technology as newer, faster hardware is made available.

These new hardware options are in the form of new instance types (virtual computer configurations) on which your cloud analyses can run. And thanks to the flexibility and reproducibility aspects of the DNAnexus platform, you can start using these new instance types right away—simply launch your existing analyses on one of those new instance types (e.g., using the “–instance-type <…>” option of our “dx run” command-line tool) and enjoy a completely effortless hardware upgrade!

The new instance types are built on high-frequency Intel® processors of the Sandy Bridge and Ivy Bridge microarchitectures, support the Intel® Advanced Vector Extensions (Intel® AVX), and have solid-state drive (SSD) local storage technology for fast I/O performance.

The following table summarizes these new instance types. For a given column (which represents a certain number of cores and local storage capacity), there are up to three different instance types to choose from (with different amounts of memory). Overall these new instance types span a large spectrum, starting at 2 cores, 32 GB SSD, and 3.8 GB RAM, all the way to 32 cores, 640 GB SSD, and 244 GB RAM:

summary new instance types
In an effort to be more informative and transparent, we have also come up with a new, easy to remember, and consistent naming scheme:

  • The prefix (mem1, mem2, or mem3) denotes the memory capacity per core;
  • the infix (ssd1) denotes that these instances have solid-state drive technology;
  • the suffix (x2 through x32) denotes the number of cores.


New names for existing instance types

We liked the convenient new naming scheme so much that we have applied it to existing instance types as well, as shown in the following table.

Compared to the new instance types mentioned earlier, the existing instance types are distinguished by a different storage infix (hdd2), given their regular hard disk drive technology. More information is available on our wiki page, which explains the new naming conventions and includes a detailed list of all instance types.

new instance names
To ease the transition, existing instances can currently be called by either their original name or the new name; the DNAnexus system understands both. However, we encourage you to adopt the new names in a timely manner to avoid any future interruption.

We are very excited to announce these important updates, and we cannot wait to hear your success stories out of them. Drop us a note at support@dnanexus.com if you’d like to get in touch with us.