HTML5 selectors API - It's like a Swiss Army Knife for the DOM

HTML5 selectors API – It’s like a Swiss Army Knife for the DOM

John Allsopp 2nd September, 2011 @johnallsopp

In the infancy of JavaScript, there was little if any concept of an HTML document object model (DOM). Even though JavaScript was invented to enable web developers to manipulate parts of a web page, and in the original implementation, in Netscape 2.0, developers could only access the form elements, links, and images in a page. Useful for form validation, and first widely used for image rollover techniques (think :hover, before CSS), but far from the general purpose tool to create modern web applications we now know (and love/hate).

Newer iterations of the DOM provided developers with access to far more than just that original limited range of elements, as well as the ability to insert, modify and delete elements in an HTML document. But, cross-browser implementations very often differed, and full support for the W3C’s DOM standards have arguably been treated as far more optional than CSS or HTML support.

One of the many reasons for the success of JavaScript libraries like jQuery and Prototype, on top of their easing the pain of cross-browser development was how they made working with the DOM far less painful than it had previously been, and indeed how it was with the standard DOM. Being able to use arbitrary CSS selector notation to get matching elements from a document made the standard DOM methods seem antiquated, or at the every least, far too much like hard work.

Luckily, the standards and browser developers took notice. The W3C developed the Selectors API, a way of easily accessing elements in the DOM using standard CSS selector concepts, and browser developers have baked these into all modern browsers, way back to IE8.

In this short (by my standards) article, we’ll look at the Selectors API, how you use it, browser support, and some little things you might like to keep in mind while using it. Rest assured, it’s now widely supported, so in many cases, you can safely use it, potentially with a fallback for older browsers (IE7 and older specifically) via libraries like jQuery (or more lightweight selector engines like Sizzle, which provides this functionality for jQuery, and other libraries).

The Selectors API

The Selectors API, which many would consider to be part of HTML5, is in fact a separate, small specification from the W3C. It provides only two new methods, querySelector, and querySelectorAll, for the Document, Element, and DocumentFragment objects (typically, you’ll use these methods on the document or element objects.) But do these methods make life easier for developers?

Before the Selectors API, to access an object in the DOM we could use these methods:

getElementById (from DOM Level 2 Core) – available for the document element
getElementsByClassName, standardized in HTML5, after long non standard browser support, which is supported on documents and elements
getElementsByTagName, from DOM Level 2 Core, available on the document and element objects

And there are some legacy ways of accessing elements on a page, which date from the earliest days of JavaScript:

links is a property of the document object which contains all anchor (a) and area elements with an href attribute
anchors is a property of the document object which contains all a elements
forms is a property of the document object which contains all form elements

We can also “traverse” the DOM, using:

childNodes, a property of the document and node objects
nextSibling, a property of a node, which contains the element directly following it in the same parent element
parentElement, a property of a node, which contains its parent element.

and related DOM traversal properties and methods.

But, what developers really often want to be able to do (as the success of jQuery and other libraries has shown) is simply say “give me all the elements which match this selector”, or “give me the first element which matches this selector”. And that’s precisely what the simple, powerful Selectors API does. It doesn’t completely do away with the need for DOM traversal, and legacy methods and properties, but it goes a long, long way.

querySelector

querySelector is a method of the document or any element, which returns the first descendent element which would be selected by its one argument, a CSS selector string. We can use this in place of the document.getElementById('content') like so: document.querySelector('#content') (like me, you’ll probably find yourself forgetting to add the # from time to time in querySelector, something which doesn’t throw an error, so can be frustrating to track down).

And we can do things like find the first header element in an HTML5 document, with querySelector('header'). So far so good. But where querySelector really shines is we can use any selector (attribute, structural, dynamic, UI, and even selector groups) with it. In most cases, this makes traversing the DOM, and locating a specific element far simpler, and most likely far quicker, as we won’t be looping in JavaScript and accessing all kinds of DOM properties, rather, the query is taking place inside the browser’s far faster native DOM engine.

querySelectorAll

Often, when working with the DOM, we want to manipulate several elements at once, For example, we might want to unobtrusively attach an event listener to all the links with a given class value. Here, querySelectorAll is your friend. Just like querySelector, it takes a single string as an argument, which is a CSS selector. Instead of returning a single element, it returns a NodeList (a kind of JavaScript array) of matching elements. We can then iterate through this array, and manipulate these objects.

For example, we could use it to replace document.links like so:

document.querySelectorAll('area[href], a[href]')

This finds all area elements with the href attribute set, as well as all a elements with this attribute set as well (notice how we’ve used a selector group, which is quite acceptable with the Selectors API).

Matching elements are returned in the order they appear in the DOM parse tree.

Document or Element?

I mentioned that both the document, and element objects implement these two methods – what’s the difference? Well, as you might have guessed, these methods find elements that are descendants of the object you query on. So, if you use the method on a paragraph element, it will only find the descendant elements of that paragraph which match the selector. Other elements in the document which might match it won’t be returned. But, if you use the methods on the document, then any matching element in the document can be found.

Gotchas

If you’ve really got your hands dirty with the DOM, you’ll know that when DOM methods return a NodeList, it is live—that is, the members of the list change, depending on the state of the document.

Let’s say we get all the elements with a class of “nav” using document.getElementsByClassName('nav'), and it returns 5 elements, which we keep in a variable.

Now, if we add a new element with class nav, or remove one of the existing elements with a class of nav, the NodeList in our variable will be updated to reflect these changes (that’s why it is called a live NodeList).

But querySelector and querySelectorAll are different. While they return a NodeList, it is static. So, if we similarly get all elements with a class of nav using document.querySelectorAll('.nav'), then regardless of what we subsequently do to the DOM, the length and contents of the NodeList won’t change. Which means, it’s always best to query the DOM just before you need the elements, rather than holding on to elements if your DOM is going to change.

There’s also a performance consideration. Tests of various browsers indicate that querySelectorAll is slower than getElementByTagName (though not it would appear in Opera). But, it’s also possible that once available, manipulating the static NodeList may be higher performance than manipulating a dynamic NodeList. And this issue will likely only have an impact in extreme cases. I’d certainly not recommend prematurely optimising by using getElementsByTagName, getElementsByClass, getElementById and so on in place of querySelectorAll, but it is worth noting you might be able to squeeze a little more performance out by doing so if you really need to.

And it is worth noting too that querySelector and querySelectorAll don’t work with every kind of selector. While pseudo-class selectors (like :visited) work with these methods, pseudo-element selectors, like :first-letter, :first-line, :before and :after although permissible as arguments, will return null in the case of querySelector, and an array of length zero for querySelectorAll.

A little gotcha this aging developer has found I’m so used to getElementById and getElementsByClassName that I find myself forgetting the # or . required in the selector string in querySeletor and querySelectorAll. As I mentioned a moment ago, it can be frustrating, as this won’t throw an error, but simply return null or an empty NodeList.

Support

All modern browsers, including IE8 and up support both querySelector and querySelectorAll. It is however worth noting that the results returned are dependent on what selectors the browser supports. IE8 supports CSS2.1 selectors, though not CSS3 selectors. IE9 supports many CSS3 selectors, but not a number of the UI related pseudo-classes, such as :required and :invalid. IE CSS support for versions 5 through 9 is Selectors API specification from the W3C

Selectors API at Opera Developers Center, by Lachlan Hunt, one of the authors of the spec.

Selectors API at Mozilla Hacks

Thoughts on performance from JS performance guru Nicholas Zakas (the comments are well worth reading)

Web Directions Year round learning for product, design and engineering professionals