Cache Money: Keeping Your Product Fresh With Minimal Customer Pain


The DNAnexus platform’s front-end is driven off a technology stack called “Membrane”. Membrane is a front-end views framework which allows internal developers to build isolated components that can be used throughout the website to build complex interactions. Below are two screenshots from two different components, the Data Tree and the Data List, respectively.

Data Tree – Used to show the folder hierarchy in a project, allowing users to expand/collapse folders, select a folder, etc.


Data List – Used to show a list of objects/folders, with support for sorting, etc. Typically used to show the items in folder, or search results.


At the time of this writing, Membrane has 111 components, each having a javascript, html, and css file. That’s a total of 333 files for the components, in addition to bootstrapping resources and third party libraries.

The problem / more background

The Membrane team wants to push out new features quickly and often, with minimal customer pain. In most releases only a few components have been updated, and we only want users to download the components that have been updated, and keep the existing copy of the components that haven’t changed. This is where browser cache management comes in. Web browsers have a web cache where they store web content such as html, javascript, css, images, etc. When you visit the website again in the future this content may be served from cache, which removes the need to fetch that content over the network. The conditions under which content is served from the cache is subject to much configuration, such as client side cache settings, and server side response headers such as “expires”, “max-age”, and “cache-control”, among others.

Utilizing the web cache can greatly speed up the loading of your product, so we know we want to use the cache as much as possible. Simply updating your server to tell the clients that these assets never expire is a simple solution to ensure the cache is used, but at the expense of clients seeing a stale product as long as these entries exist in their cache. What we want is a solution which will use a browser cache entry as long as possible, while quickly invaliding a cache entry when there is a newer version of the asset.

The solution

First let’s start off by coming up with a mechanism for versioning our assets. Instead of adding explicit versions, we will compute an MD5 checksum of the content. During each release, if the MD5 checksum of an asset is the same as it was during the prior release, we know that it did not change. For example the MD5 checksum for the Data List javascript is currently c2a92513c4604b92255f620950ecb93c.

We will now define a new release asset called the manifest. The manifest contains a mapping from an asset path, such as /data/list/view.js to the MD5 checksum for that asset (c2a92513c4604b92255f620950ecb93c). The manifest is loaded during bootstrapping and is always consulted when fetching an asset to provide the latest MD5 checksums for that asset. To prevent caching of the manifest itself, we use the HTTP header “Cache-Control: no-cache”. Every time a user loads the page we will fetch the manifest from the server. We also periodically ping the server during the user session to detect manifest changes and notify users that there are some updates.

The next part of the solution involves how the manifest information is used to manage the browser cache. Earlier I mentioned that the browser may use the cache to fulfil requests for previously accessed content. The browser determines whether or not it has already seen an asset by comparing the url of the asset with entries in the cache. Therefore if the page is requesting the browser will first check if it has a cache entry with that url. Knowing how the browser keys cache entries gives us the ability to force a client update by simply changing the URL of a particular asset.

We’ve already come up with a way to create a unique version of each asset, which is the MD5 checksum. We will now use this in the URL to address not only a particular asset but also the version of that asset. Our path for the data list javascript will now be /asset/c2a92513c4604b92255f620950ecb93c/data/list/view.js. Now that our asset paths have a version in them, we can update our nginx server to include headers which tell the browser to cache these files for a very long period of time (e.g. 2 years). If the data list javascript is updated in a subsequent release the MD5 checksum would change, and the resulting path to that asset would change as well which removes the chance of the client having a stale cache entry.

The final piece of our solution is to make deployments simple. We don’t want to keep these MD5 checksums on our server’s file system, so we use an nginx rewrite rule to strip the MD5 checksum from the path when resolving the asset on disk. So in reality we don’t have assets on disk with md5 checksums in their path, but rather the MD5 checksum is used only in the URL to fetch the asset as a mechanism for managing the user’s browser cache.

Closing thoughts

Cache management is a tricky problem and if you search the web you will find a variety of solutions, each having its own set of pros and cons. Cache validation is also a complex topic which involves more mechanisms that I’ve gone into here, such as ETags.

After evaluating existing solutions both internally and externally, we’ve come up with a fairly simple solution that allows us to update quickly and often, while leveraging the user’s cache as much as possible to provide fast load times and remove the potential for users seeing a stale version of a component.

For additional information or if you have any questions, please feel free to email