Appcache, not so much a douchebag as a complete pain in the #$%^

A little while back, Jake Archibald wrote infamously (and anthropomorphically) that the HTML5 ApplicationCache is a “douchebag”[1]. Mindful that this is a word freighted with troubling significance, it is the term he used, so I’ll go with it.

The Urban Dictionary says the word douchebag

generally refers to a male with a certain combination of obnoxious characteristics related to attitude, social ineptitude, public behavior, or outward presentation.

Though the common douchebag thinks he is accepted by the people around him, most of his peers dislike him. He has an inflated sense of self-​worth, compounded by a lack of social grace and self-​awareness. He behaves inappropriately in public, yet is completely ignorant to how pathetic he appears to others.

I think this is a bit harsh for poor AppCache. To stay with the metapohor, Appcache is doing his best, it’s just that he does exactly what you ask him to. Even when no one could possibly mean what you said! In this way, AppCache reminds me of one of my favourite ever comics, Mr Logic, from Viz Magazine (probably NSFW due to its extreme puerility).

Mr. Logic’s defining characteristic is that he takes everything literally. And as a consequence, he is “a complete pain in the #$%^”. He doesn’t mean to annoy you, he just doesn’t understand nuance, and has no “common sense” (which as my nanna always loved to say is sadly far from common).

Mr Logic

This is AppCache to a tee. Ask him to cache the manifest file for your site, so that your site is now preserved in digital amber, never to be updated again, no problem. Why would anyone ever want to do this? Who knows, but he’ll do it for you.
Removed the manifest attribute from a cached HTML document? Well, AppCache doesn’t check changed documents until you change the manifest file, so ’til then, the old cached version of the HTML file, with its link to the manifest still in place will be used. All very logical. But in many ways counter-​intuitive.

I guess what I’m saying, is the fault, dear reader, lies not in AppCache, but in ourselves. Actually, the fault really lies in the rules that have been taught to AppCache. Some of these are just downright infuriating. And perhaps the most infuriating of these is the following.

It starts with the following entirely logical, but deeply unintuitive way in which caching works.

  • A user visits say webdirections​.org, and the browser builds an applicationCache using the cache manifest, caching the index.html file, images, CSS files, and JavaScript files
  • Subsequently, we change some of the HTML and CSS at webdirections​.org, and update a changed manifest file
  • The user returns to webdirections​.org, and their browser immediately uses the cached resources from the previous visit to display the page.
  • The browser only then checks the manifest file to see if it has changed, and as it has, the browser then downloads the changed resources.

It makes perfect sense! We get the cached version immediately, leaving out any network traffic. Super fast page load FTW. Stale page load not so good.

But, you say, why don’t we not cache the HTML file, but cache all the rest.
Well. AppCache has a concept of “master entries”. A master entry is an HTML file that includes a manifest attribute in the html element that points to a manifest file (which is the only way to create an HTML5 appcache BTW). Any such HTML file is automatically added to the cache. This makes sense a lot of the time, but not always. In particular, when an HTML document changes frequently, we won’t want it cached (as a stale version of the page will most likely be served to the user as we just saw).

Is there no way to over-​ride this? Well, AppCache has the idea of a NETWORK whitelist, which instructs the appcache to always use the online version of a file. What if we add HTML files we don’t want cached to this? Sorry, no dice. HTML files in a master entry stay cached, even when included in the NETWORK whitelist. See what I mean. Poor AppCache didn’t make these rules. He’s just following them literally. He’s not a douchebag, he’s a pain in the %^&*, a total “jobs-​worth”.

So we are be stuck. We seem to be able to

  • either add a manifest attribute to the html element of the document, and have the page cached too
  • or have no appcaching at all for that page.

Where we have the front page of a site that changes frequently, we either have the situation that the page will likely be out of date for users who return (because the most recently cached version will always be used), but we get the ability to cache images, CSS, JavaScript and other resources which don’t change frequently. Or we can’t cache those resources at all.

Updated

The following needs updating, because it is in fact sadly wrong. While no-store foes have an influence on caching, in the case of master entries, rather than simply not caching a master entry, but caching all the other resources in the manifest, we get an error and no resources are cached at all. So, I’m going to turn my technique into a proposal for how AppCache could be made a little less painful. What I suggest below I think is what should happen when the appCache encounters a master entry served with Cache-control: no-store

Original

But, there is a (little known) solution to this. It’s in the HTML5 specification, but currently, it’s only supported in Internet Explorer (10+, the first version to support AppCache) and Firefox. Hopefully other browsers and devices will start supporting it, because it’s a game changer when it comes to AppCache I think.

You probably know that (but may not know the details of how) browsers have long used HTTP response headers to decide on how to cache content. The server can also send instruction about whether a resource is cacheable or not.

In a nutshell, when a browser requests a resource, the server sends both the content of the resource (for example, a HTML document), and a response header. One of the fields of a response header is Cache-control, which can contain a number of directives, including no-cache, and no-store.

  • no-cache doesn’t in fact instruct the browser not to cache the resource, it instructs the browser to always check with the server before using a cached version of the resource (see, it’s not just AppCache who can be a pain)
  • no-store means don’t cache the resource, and always use the online version.

If you’re familiar with AppCache, no-store is the equivalent to the NETWORK section of a cache manifest. Now, how do HTTP headers and AppCache work together? What the HTML5 AppCache spec says about HTTP headers is that they should be ignored for the purposes of AppCache, except no-store. Which means (in theory), we can send an HTML file with the directive Cache-control: no-store, and it won’t be cached in the AppCache! Could it be we have a solution to what has been one of AppCaches most infuriating “features”.

With bated breath I created a test case. On Safari and Chrome, no luck. The HTML file served with no-store is still added to the AppCache. But, with Firefox, and IE10, like Daft Punk, we got lucky. These browsers honour no-store. If I change the HTML document, the next time the page loads, all the cached resources are used from the cache, but the new HTML page displays.

So close, and yet so far, I hear you thinking. Because it’s not supported across all browsers yet, what good does it do me? Here we’ll have to dive a little bit more into AppCache. In browsers that support AppCache, there’s a new property, applicationCache, of the window object. This receives various events, including updateReady when the cache has been changed and is now ready to be used. So, we can update a cached master entry as follows.

  1. add an event listener for updateReady
  2. this calls applicationCache.swapCache, which swaps the now stale cache for the fresh one
  3. our event handler now calls window.reload(true) to force a refresh of the page

What’s great about our no-store trick is, the cache doesn’t need updating, so in browsers which support it, updateReady doesn’t fire! So we have a bullet-​proof way of making sure frequently changing HTML pages aren’t added as master entries to the appcache in browsers which honour no-​store, as well as auto-​refreshing these pages to ensure the browser uses the most up-​to-​date version in browsers which don’t (yet) honour the no-store directive.

Which hopefully makes the AppCache just a little bit less of a pain in the #$%^ to deal with!

Again note, sadly this is not what actually happens

Here’s what does happen when a master entry is served as no-store

  • Chrome and Safari ignore no-store, and build the cache including the master entry
  • Firefox from what I can tell silently fails, and doesn’t build a cache at all
  • Internet Explorer (10) fires an error, and doesn’t build a cache.

Moral of the story: don’t serve HTML with no-store if you want appcache to work!

My upcoming book on HTML5 Offline

I discovered all this, and much more, while researching my upcoming book on HTML5 offline capabilities, not just appcache, but localStorage, the File API, offline events and even HTTP Caching. It’s coming soon, so why not sign up to our newsletter to be the first to hear about it, or follow me on twitter (or better still both!).

Want to learn more about AppCache in the meantime? Here’s an article I wrote a couple of years ago, and this presentation I did at Web Directions Code last year.

Technologies mentioned in this post

People mentioned in this post

References

4 responses to “Appcache, not so much a douchebag as a complete pain in the #$%^”:

    • By: Facundo
    • July 22nd, 2013

    I didn’t start with appcache yet (only read for now), but after read the post I don’t understand:

    Why should I want to use the online version of my index.html?

    If I want to reinforce the offline usage of my app, making the index.html online required will prevent complete the offline usage.

    Am I losing something?

    And the last one (maybe you could change your test code quicker than me :)) if I hook for example the page unload event, could be possible to eventually prevent the caching?

    (My experiment in the past: http://​facundocabrera​.calepin​.co/​p​a​g​e​-​c​a​c​h​e​-​a​n​d​-​t​h​e​-​u​n​l​o​a​d​-​e​v​e​n​t​.​h​tml)

    BTW, excellent post!

  1. Hah, really appreciate the Viz reference! (btw, their spoof of Take a Break magazines was one of my favourite things they ever did http://​www​.takeaweirdbreak​.com/​w​p​-​c​o​n​t​e​n​t​/​u​p​l​o​a​d​s​/​t​a​k​e​a​v​i​z​.​jpg warning: it’s a bit sweary)

    However, I’m not sure I agree with the Mr Logic comparison. My beef with appcache is it feels like it does more than I tell it to do, and small changes have unexpected side effects. The worst of these is the one you mention, the master entry thing, I didn’t tell it to cache the page, but it did. Also, it decides that anything not in the manifest gets blocked until I add NETWORK: * etc etc.

    But yeah, I agree that a lot of the initially unexpected behaviour (like using the cache even when online) actually makes perfect sense. It just isn’t very intuative.

    A few of us are currently working on a new proposal — https://​github​.com/​s​l​i​g​h​t​l​y​o​f​f​/​N​a​v​i​g​a​t​i​o​n​C​o​n​t​r​o​l​l​er/ which is much more worthy of the Mr Logic comparrison.

    The idea is you register a JavaScript file for a domain or subset of domain. Once installed it runs in a worker and fires events whenever a page on that domain (or subset) is requested, or anything is requested from a page on that domain. You add listeners for these events and get to preventDefault() and satisfy the request another way, say from a cache, or by getting some data from idb and putting it into a js templating library to make an html string, or try fetching from the network & do something else on failure.

    It’s more typing than appcache, but it can do a lot more, and you’re in full control. It’s deliberately low level because we don’t really know how the web will use offline yet. The idea is higher-​level apis will be added as usage patterns become clear. This is where appcache went wrong, a very high-​level api was created without really knowing what developers wanted.

    • By: Viktor Zu
    • July 30th, 2013

    Well, the workaround of caching HTMLs is to show the loader on event ‘downloading’. The loader ‘hides’ the site untill ‘updateready’ event is fired and then we can reload the page through window.location.reload();

    As browser check for updates of cache manifest on every page load and tries to do it as quick as possible the situation when user will see the old html page will be very rare (or it will happen if user/​site is offline and offline fallback is not specified)

    • By: Viktor Zu
    • July 30th, 2013

    But on the other hand — it is not a good idea to reload all the files every time we change the article content. And to have cache manifest separately for each article is also not very good idea.