Get off(line)
Taking your web sites and apps offline with the HTML5 appcache
There’s a general (and understandable) belief by even many developers that web sites and web applications can only be used when the browser has a web connection. Indeed, this is routinely cited as one of the real advantages of “native” apps over web apps. But as unintuitive as it sounds, in almost every modern browser and device (except even for now IE10 developer previews, but here’s hoping that changes), that’s not the case, provided the developer does a little extra work to make their app or site persist when a browser is offline. (Of course the user must have visited your site while their browser did have a connection)
In this article, I hope to clear this whole areas up once and for all, show you how to do it, and point to some great resources out there for learning more about creating offline versions of your web sites and apps. Perhaps most importantly, introduce a simple new tool I’ve built to do the heavy lifting for you, ManifestR.
Even if you develop web sites, rather than applications, you can benefit from the techniques outlined here, because caching resources can seriously decrease the load time for your site, particularly on a visitors subsequent site visits.
Making a cache
As I’m sure you know, browsers cache HTML, CSS, JavaScript files, images and other resources of the sites you visit, to speed up the subsequent loading of pages. However, you never know when the browser might discard cached files, and so this is not a reliable way for sites to work offline. But what if we could tell the browser what to cache? Well, with HTML5 application caches (also known as applications caches or “appcaches”) we can do just that. Let’s look at how.
Making it manifest
The heart of the technique is to create an appcache manifest, a simple text file, which tells the browser what to cache (and also what not to). The resources are then cached in an “application cache”, or “appcache”, which is distinct from the cache a browser uses for its own purposes. The anatomy of an appcache manifest is straightforward, but there are a few subtleties.
An appcache manifest
- begins with the string “CACHE MANIFEST” (this is required)
- has a section, introduced by the string “CACHE:” which specifies the URLs of resources (either absolute, or relative to the location where the manifest file will be located on the server) to be cached.
- We can also optionally specify which resources should not be cached, in a section of the manifest file introduced by the string “NETWORK:”. These resources aren’t just not cached, but further, won’t be used when the user is offline, even if the browser has cached them in its own caches.
- We can also optionally specify fallback resources to be used when the user is not connected, in a section of the file called “FALLBACK:”
- You can add comments to the file with, simply by beginning a line with “#”
It’s recommended that the extension for a manifest file is .appcache
(previously, .manifest
was the recommended extension).
Here is a very straightforward example
CACHE MANIFEST CACHE: #images /images/image1.png /images/image2.png #pages /pages/page1.html /pages/page2.html #CSS /style/style.css #scripts /js/script.js FALLBACK: / /offline.html NETWORK: signup.html
The CACHE section
In the CACHE section we list the resources we want cached. We can use either a URL relative to the .appcache file, or an absolute URL. We can cache resources both in the same domain as the cache, as well as (in most cases) other domains (we’ll cover this in more detail in a moment)
Often, the only section of an appcache manifest is this section, in which case, the CACHE: header may be omitted.
Be careful with what you cache. Once a resource, for example an HTML document is cached, the browser will continue to use this cached version, effectively forever, even if you change the file on the server. To ensure the browser updates the cache, you need to change the .appcache file. This can play havoc while you are developing a site, and we’ll cover some techniques for managing this in a moment.
One suggestion is to add a version, and or a date stamp to the manifest as a comment. This way, you can quickly change the date or version number, and then browsers will refresh the appcache. Browsers do this intelligently, checking to see which resources might have changed since they were last cached, and only re-caching those which have.
The Network Section
Probably the most subtle aspect of app caching is the NETWORK section. Because a great many web sites, and particularly applications, have dynamically generated content, pulled in from APIs, CGIs and so on, we may want to ensure certain resources aren’t cached, and are always directly loaded when the browser is online.
That’s where the NETWORK section of the cache comes in. Here, we list the resources we never want to be cached, which is referred to as an “online whitelist“. So, in our example above, we are specifying that the page signup.html (located at the root of our site – remember the entries in our manifest are either absolute or relative URLs) is never cached. When online, any request for this page will always cause the page to be loaded from the server (even if the browser might have previously cached it itself). When the user is offline, any request for this resource resuls in an error (again, even if the browser might have cached the resource in its own caches).
We can also specify a group of resources located within a site in the NETWORK section using a partial URL (technically a “prefix match pattern”) (note that we can’t use this technique in the CACHE section, where all resources must be explicitly listed to be cached in the appcache, with one exception we’ll get to shortly). Any resources which have URLs beginning with this pattern are included in the online whitelist, and never cached. As developers we then need to handle the cases where these resources aren’t available because the user is offline.
There’s also a special wildcard, *. The asterisk specifies that any resources that aren’t explicitly cached in the appcache manifest should not be cached.
Fallbacks
App caching also allow us to specify fallback resources. The form of an entry in the FALLBACK section is two resource identification patterns. The first (in the case above simply “/”, which matches any resource in the site), specifies resources to be replaced with a fallback when the user is offline. The second specifies the resource to replace any resources matching the patter. So, in this case, when any resource in the site has not been cached, and the user is offline, the page offline.html will be used instead. We can also specify resources to be replaced more specifically. For example, we could specify an offline image for any images that haven’t been loaded like so
/images/ /images/missing.png
Here we’re specifying that any resources located in the directory images
at the top level of our site that have not been cached, be replaced with the image called missing.png
found in that same directory, when we are offline.
Using the appcache manifest
So now we’ve created our appcache manifest, we need to associate it with our HTML documents. We do this by adding the manifest
attribute to the html
element of a document, where the value of this attribute is the URL of the appcache file.
The current recommendation is the appcache file have the extension .appcache
. So, if our manifest is located at the root of our site, we’d link to it like so
<html manifest='manifest.appcache'>
There are also suggestions that using the HTML5 doctype may be required for some browsers to use app cache, so, make sure you use the doctype
<!DOCTYPE html>
And we’re all set. Well, almost. In order for the browser to recognize the appcache file, it needs to be served with the mimetype text/cache-manifest
. How you set this up depends on your site’s server. At the time of writing, it’s likely that this is a step you’ll need to take, so if caching isn’t working, that’s very likely why. One of the most common servers is Apache. There are two ways in which you can set up Apache to serve .appchace files as type text/cache-manifest. At the root directory of your site, add a file with the name .htaccess
, with the entry AddType text/cache-manifest .appcache
(if there’s already a .htaccess file, just add this line to it).
Gotchas
App caching can be very powerful, allowing apps to work while the user is offline, and can increase site performance, but there are some definite gotchas it pays to be aware of. Here’s a few well worth knowing about.
Resources in style sheets
You’d be forgiven for thinking that any images in a style sheet that has been cached will be included in the appcache, but that’s not so. Images your style sheet refers to must be explicitly referenced in the CACHE section of the manifest as well.
Similarly, style sheets that are imported using @import, and resources included via JavaScript must also be explicitly cached.
To help build an appcache manifest, I’ve developed manifestR, which we’ll look at in detail in a moment. It will generate an appcache manifest for you, which includes all the scripts, style sheets, including those @import
ed, images, linked pages at the same site, and any images linked to in stylesheets.
There’s one exception to the rule that only explicitly listed resources are cached, and it is important to understand. Any HTML document that has a manifest attribute will be cached, even if it is not listed in the manifest. This can cause all kinds of headaches while developing, which we cover shortly.
Caching Cross Domain Resources
While there is some confusion on the issue, you can in general cache content from across different domains in an app cache. In fact, without this ability, the real world value of app caching would be limited, as content distributed via a CDN (content distribution networks like Akamai) could not be cached (even content served from a differently named server within the same domain couldn’t be cached). The exception to this is that when content is served over secure http (https), then the specification says all resources must come from the same origin. In an exception to this exception, Chrome in fact does not adhere to this part of the specification, and it has been argued that the single origin policy for https is too restrictive in the real world for app caching to be of genuine value.
Refreshing the cache
In effect, unlike most caching of web resources, appcaches do not expire. So, once the browser has cached a particular resource, it will continue to use that cached version, even if you change the resource on the server (for example by editing the contents of an HTML document). The exception to this is when a manifest file is edited. When the manifest is changed, the browser will recache all the resources listed in the manifest.
Caching can cause real headaches while developing a site or application, so it is recommended that during development, you avoid appcaching. One way of achieving this is to serve .appcache
files with the wrong mimetype. This way, you can include the manifest attribute in your HTML elements, and serve the appcache file, just as you wold in production, but not have the effects of appcaching. Moving from development to production is as simple then as associating .appcache files with the right mimetype.
Resource hogging and lazy loading
To improve the performance of a site, you might be tempted to preload the entire site, by adding all the pages, images etc in it to the appcache manifest. And, in certain circumstances this might be desirable. It will however place considerable demands on your server, and use more bandwidth, as the first time a person visits your site, they will download more resources than they otherwise might have. Luckily, appcaches have an additional feature that can help here.
You might recall earlier that even when an HTML document with a link to an appcache manifest isn’t included in the manifest file, it will still be cached. The benefit of this is that rather than explicitly listing all the pages at your site in a manifest for them to be cached, each time someone visits a page that links to a manifest, it will then be cached.
If the primary motivation for using an appcache is to ensure your site or more likely app works offline, you’ll likely want to explicitly list the pages of the site, so that they’ll be available offline even if the user hasn’t visited them. If your primary motivation is increased performance, then let pages lazily cache when the user visits them, but cache scripts, CSS, and perhaps commonly used images.
Cache failure
An important, but subtle gotcha with appcaching is that if even one of the resources you include in your cache manifest is not available, then no resources will be cached. So, it is really important to ensure that any resource listed in your appcache manifest is available online. There’s a tool we discuss in a moment, the Cache Manifest Validator, to help ensure all those resources are online.
Size limits
While there specification places no limits on the size an appcache can be, different browsers, and different devices have different limits. Grinning Gecko reports that:
- Safari desktop browser (Mac and Windows) have no limit
- Mobile Safari has a 10MB limit
- Chrome has a 5MB limit
- Android browser has no limit to appcache size
- Firefox desktop has unlimited appcache size
- Opera’s appcache limit can be managed by the user, but has a default size of 50MB
User Permission
In Firefox, when the user first visits an appcached site, the browser asks the user’s permission (as it and other browsers do for location with the geo-location API). However, unlike with geo, other browsers don’t ask the user’s permission. Just something to be aware of, as there’ll be no guarantee with Firefox that appcaching is being used, even when supported.
Flakiness and browser support
It must also be noted that the general consensus is that appcaching is currently far from perfect across all browsers which support it. The specification is still in draft, but it should also be noted that most browsers have supported at least some appcaching for quite some time.
According to an amalgam of When can I use, Dive into HTML5 and other online resources:
- Safari has supported offline web apps since version 4
- Chrome has supported the feature since version 5
- Mobile Safari has supported offline apps since iOS 2.1
- Firefox has supported it since version 3.5
- Opera has supported appcache since version 11
- Internet Explorer as yet does not support offline web apps, including in IE10 developer previews
- Android has support appcache since version 2.1
Introducing ManifestR
We’ve already mentioned that a particular challenge in creating an appcache is identifying all the resources you need to add to the manifest. To help you with this, I’ve developed ManifestR, an online tool to help you create an appcache manifest for any page. I don’t recommend you use it without at least a little additional fine tuning, as what it attempts to do is locate any resources referenced from a given page. As discussed above, depending on the purpose of your appcache, this is likely to be overkill.
Drag me to your bookmarks bar.
When you use ManifestR on a page, here’s what it looks for
- images both in the same and other domains referenced in the
src
attribute of anyimg
element in the page. - links to pages in the same domain. This can improve the performance of your site for visitors viewing other pages, and is vital if you want the entire site/app to work offline, but means potentially considerable additional load on your server the first time someone visits the site. Whether you choose to keep this list, or remove some or all of the links is an important decision to make.
- style sheets, linked, or included via @import statements, located both in your domain, or other domains
- images linked to in any style sheet, both those in the same domain, or other domains
- JavaScript files, both those in the same domain, and served from other domains. Here too, you’ll need to consider carefully which to include and which you want to add to the online whitelist via the NETWORK section of the manifest.
it then puts them all together in a manifest, ready for you to cut and paste, tweak, save and upload.
I hope you find it useful in building appcache manifests (and make sure you let me know via twitter what you think, and how we can improve it).
More reading
There’s quite a bit available online about app caching, though keep in mind the specification, and implementations are still somewhat in a state of flux. Here’s some articles and other online resources I have found very helpful –
Overviews and specifications
- Not the the faint-hearted, here’s the latest draft specification on offline browsing, and specifically appcache manifests. These specifications are written for browser developers, and you should hopefully not need to delve into them.
- Apcache facts features facts and details of the workings of the cache, with details on gotchas, and best practices
- The Offline section of the fantastic HTML5 Rocks site has details on appcaching.
- Safari, and the webkit browser engine have supported offline appcaching for some time, and Apple has details here.
- The detailed online book on all thing HTML5, Dive into HTML5, has an indepth chapter on offline apps, and appcaching.
- Mozilla has supported offline apps since version 3 of Firefox, with fuller support since 3.5. The Mozilla Developer Network has details here.
- A detailed look at caching, including appcaching from Platformability
Tutorials and how-tos
- The ever excellent HTML5 Doctors’ tutorial
- A from the basics tutorial on appcache from HTML5 Rocks
- Another excellent introductory tutorial from SitePoint
- Standardists Estelle Weyl’s take on offline web apps
- More fromthe folks at Mozilla on building offline web apps
- Opera Developers Network on building offline web apps.
Critiques and gotchas
as we know, all is not yet perfect in the world of the offline web just yet. Here are a couple of critiques of the current, and collections of gotchas discovered by offline pioneers.
- http guru Mark Nottingham on what’s right, and wrong with “one confused puppy“
- Tips and gotchas from app developer Bunny Hero
- A few things to know, love and hate about applicationCache
- How offline web apps should work from Mike Kelly
- Mark Christian, one of the authors of AppCache Facts outlines some things he sees could be improved with HTML5 AppCaching
- The limits do various browsers and devices have on appcache size, from grinning gecko.
Tools
We’ve already mentioned ManifestR, but you should find the The Cache Manifest Validator another really useful tool. Remember, for appcaching to work, every resource you list in your manifest must be available, or nothing will be cached. The Cache Manifest Validator can make sure all your resources are available.
Compatibility
Probably the best place to keep up to date with the ever changing field of HTML, CSS3 and other new web technology support in all modern browsers is When Can I Use?. You can find a snapshot of current browser support above.
This is one of the few good articles on HTML5 that I have read. Thanks.
Thanks Lucien,
there are some good ones out there.
Check out
http://html5doctor.com/
http://www.html5rocks.com
and many others
Definitely one of the best articles I’ve read on the subject!
[…] Taking your websites and apps offline with the HTML5 appcacheJohn Allsopp points to some great resources out there for learning more about creating offline versions of your websites and apps. Perhaps most importantly, he introduces a simple new tool that does the heavy lifting for you, ManifestR. […]
[…] In the CACHE section we list the resources we want cached. We can use either a URL relative to the .appcache file, or an absolute URL. We can cache resources both in the same domain as the cache, as well as (in most cases) other domains (we’ll cover this in more detail in a moment) […]
Thank you very much. This is the first Real understandable appcache/manifest i’ve ever seen. All other i’ve found were too technical and not friendly enough to understand easiy like yours.
Thanks again
Thanks Raphael!
john
Thanks GRGUR!
Thank you man.. I am just going to try it with my website.
Thanks a lot.
I am using
Header set Cache-Control "max-age=290304000, public"
in my .htaccess
Can It effect each other?
Great article, John. I learned a lot! Some questions/comments:
I didn’t see a mention of one of the biggest gotchas when working with appcache: a user must reload the web app TWICE to see any updates. For example, if you update a script and tweak the manifest file and reload the web app, you will NOT see the change. After you reload the web app the browser will check the manifest, see the tweak causing it to recheck the resources, and download the new script. Now if you reload the web app A SECOND TIME you’ll see the script change.
Will resources listed in the CACHE section be cached regardless of their response headers? Ie, what if one of those resources had “Cache-control: no-cache”?
Seems like the NETWORK section is not needed since a resource should only be cached if it’s in the CACHE section. Is the only purpose for NETWORK to list HTML docs that should not be cached?
Are FALLBACK resources downloaded even when a user is online, so that they’ll be available when the user goes offline? Are these loaded last, so they don’t compete for TCP connections with visible resources?
What I’d really like to see is more app “templates” for using app cache. For example, dealing with login is a hard problem with appcache – how to avoid caching an HTML doc with a login form vs the logged in version, and what to do with user validation if the user is offline. This is just one example of many scenarios that are hard to think through in this new offline paradigm.
Motyar,
the current draft specification says
Thanks Steve,
Wow, I’ve not seen reference to this. Why wouldn’t the browser see the updated manifest the first time, and reload all resources as it is supposed to? Is this a bug in the spec, or implementations?
My reading of the spec (see the reply to Motyar right above who asks the same thing) is that appcaching completely trumps server side caching. So, my understanding is it would be cached.
My understanding is that the NETWORK section also impacts the browsers caching other than the appcache – it says “never cache this, in the appcache or eslewhere”?
That’s correct
My q&d test in Safari 5 suggests yes. I even made the FALLBACK section the first section in the manifest, and it still downloaded the fallback content last. I can’t find anything explicit in the spec on this subject, though it may well be there.
appcaches patterns – a great idea.
This is due to the browser standard/internal cache (and it varies depending on the browser, OS and web server configuration). It’s not a bug, it’s a feature of modern browsers to prevent resource re-fetching for a snappier web.
Apart from that, I’d like to have a way to cache resources without caching the originating html file.
PS: good post, John. Thanks!
I did my High Performance HTML5 talk. The best slide is #25 that shows how this reload-twice-to-see-new-resources works.
http://www.slideshare.net/souders/high-performance-html5-sf-html5-ug
http://stevesouders.com/docs/html5-user-group-20110728.pptx
Thanks Matteo,
is it possible to add the original file to the NETWORK section?
gotta try, but looking at the specifications it shouldn’t work.
[…] Get off(line) […]
[…] Get off(line)Good introduction into AppCache. […]
[…] the AppCache if available; subdomains as localhost.site.com or subdomain.site.com don’t.LinksGet off(line) Good introduction into AppCache.Go offline with application cache Just as good an introduction as […]
[…] Allsopp does a great job of outlining the gotchas, and I’ve added some (slides […]
I ran tests concerning appCache behavior and limitations on mobile. http://www.winktoolkit.org/blog/235/
[…] Get off(line) – John Allsopp […]
[…] Allsopp does a great job of outlining the gotchas, and I’ve added some (slides […]
Very nice article on using application cache. The first thing I started thinking about as reading was whether there was a nice abstraction comparable to what YUI Storage Utility did (does?) for local and session storage with it’s various fallbacks to Gears or Flash (if browser doesn’t support storage api). Maybe I didn’t “google hard enough”, but I didn’t find this. Does it exist?
[…] Get off(line), Web Directions, John Allsopp […]
[…] Get off(line), Web Directions, John Allsopp […]
[…] http://appcachefacts.info/ http://www.whatwg.org/specs/web-apps/current-work/multipage/offline.html http://www.webdirections.org/blog/get-offline/ […]
[…] Get off(line), Web Directions, John Allsopp […]
I have found that Typekit can be extremely tricky when using it with an App Cache -strictly for performance purposes. Typekit CSS which is Base64 encrypted must be referenced properly in the cache manifest or else entire site wide fonts will fail -especially when online.
[…] Get off(line) | Web Directions Be careful with what you cache. Once a resource, for example an HTML document is cached, the browser will continue to use this cached version, effectively forever, even if you change the file on the server. […]
[…] Get off(line), Web Directions, John Allsopp […]
I’ve noticed that if your application uses mod_rewrite for the webpage, the app cache will not work, firefox won’t tell you why, safari does, it says:
Application Cache update failed, because http://mywebsite.com/guestlist/mobile/v1/5701 was redirected.
“guestlist/mobile/v1/5701” is an imaginary url handled by mod_rewrite and turned into a physical file with get parameters, this might be a really hard limitation to avoid if you’re in the same position as me, so this system only really works if you can isolate that part of your website in a completely different system where you can remove all the mod_rewriting.
[…] from this presentationwebStorage: Persistent client side data storage — tutorial by John AllsoppGet off(line) — tutorial by John AllsoppAbout John AllsoppJohn Allsopp has spent more than 15 years developing for the web, creating […]
This seems to be the solution for project I am working at but still I will have to confirm after I get the answer to this question. Users have to take devices to the fields where there is no internet and not just devices but the data should be already preloaded in the devices, that data includes individual information like their names, DOB, marital status, location status etc. Therefore device user will only be going to the fields to update the individual information and come back to the server to upload new status, is this possible? in other words Is it possible to run the whole application in the tablet device without internet?
[…] http://www.webdirections.org/blog/get-offline/ […]
[…] we’ll make things easier by giving everyone a head start. Here’s a good article about going offline if you want to read up on it before […]
[…] Descendre (en ligne), Web Directions, John Allsopp […]
Hiya! I just would like to give a huge thumbs up for
the nice info you’ve gotten right here on this post. I can be coming back to your weblog for more soon.
I dο not knoω whether it’s just me or if everybody else encountering issues with your website. It looks like some of the text within your posts are running off the screen. Can somebody else please provide feedback and let me know if this is happening to them as well? This might be a problem with my web browser because I’ve had this happen previouslу.
Κudos
Valuable information. Lucky me I found your site unintentionally, and I
am surprised why this coincidence didn’t took place earlier! I bookmarked it.
That is really fascinating, You are a very professional
blogger. I’ve joined your feed and sit up for in quest of extra of your wonderful post. Also, I have shared your website in my social networks
The information and the aspect were just wonderful. I think that your viewpoint is deep, it’s just well thought out and truly incredible to see someone who knows how to put these thoughts
Hello! This post could not be written any better! Reading through this post reminds me of my previous room
mate! He always kept chatting about this. I will forward this article to him.
Pretty sure he will have a good read. Thank
you for sharing!
[…] to learn more about AppCache in the meantime? Here’s an article I wrote a couple of years ago, and this presentation I did at Web Directions Code last […]