Year round learning for product, design and engineering professionals

More HTML5 syntax and parser quirks you may not have known

Last week we looked at one of HTML5’s syntax quirks, the fact that you don’t need to quote attribute values (unless the values contain a space or as is less well known one of a number of other characters). This time, some more about some of the subtle side effects of HTML5’s laxer syntax rules.

Let’s start with a quiz. Are both of these valid HTML5 documents (that’s right, complete documents, not just fragments)? Neither? Only one? Which one? Why?

<!doctype html>
<title></title>
<!doctype html>
<html>
<head></head>
<body></body>
</html>

And the answer is that the first is valid, the second not. Which is very counter-intuitive. Unless you know the rules of HTML5 inside out, it’s something you’re unlikely to guess. In HTML5 there is precisely one required element. It’s not html or head or body. It’s title.

Now, does anyone ever actually do this? Try this page out for size.

<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>…</style>
  <a href=//www.google.com/><img src=//www.google.com/images/errors/logo_sm.gif alt=Google></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL <code>/saef</code> was not found on this server.  <ins>That’s all we know.</ins>

That’s Google’s 404 page. No head or body elements. (also no pesky closing tag on those ps). I guess if you’re serving this millions of times a day, a few bytes count (though I wonder why then use up 22 precious bytes for those two ins) elements?

But in and of itself this isn’t all that interesting. What is interesting is this question. If in the style element we had

body p{ 
  color: red
}

Would the paragraphs have red text? The (again counter-intuitive) answer is yes! How so? How can the paragraphs be descended from a body element if we didn’t include a body in our markup? Well, that’s where HTML5’s storied parsing algorithm comes in. Where html or body or head are missing, the parser inserts them into the DOM. Let’s take a look at this in action in the Firefox Developer tools

The parser inserts missing elements into the DOM
The parser inserts missing elements into the DOM

Yes, HTML5’s parser can make sense of almost any way we as web developers can mangle our markup. But unless you really know what you are doing, and really really need to save a few bytes, why not stick to the slightly more verbose, but far better understood approach.

delivering year round learning for front end and full stack professionals

Learn more about us

Web Directions South is the must-attend event of the year for anyone serious about web development

Phil Whitehouse General Manager, DT Sydney