A Brief (and Delightfully Snarky) History of HTML5

Official HTML5 Logo
The Official HTML5 Logo

You've probably been told - or perhaps you have arrived at the conclusion yourself - that HTML5 is this great new (new? OK I'm an old man everything is new to me these days) revision of the immortal HyperText Markup Language that has come in like a wrecking ball to elevate mobile design, gloriously unite browser rendering engines, modernize the ECMAScript (that's JavaScript to you, kiddo - but don't you ever confuse it with JScript!) API and above all else: clear out the mind-boggling clutter of mix-and-match legacy DTDs including some arcane, experimental footnote that uber-dweebs pushed to feel smarter than everyone else called 'XHTML' that hated puppies and had self-closing tags. Well, they called them that but if you didn't close them yourself they'd snap at your fingers or something. Anyway HTML5 crammed all that rambunctious, frothing mess into the basement and slammed the door shut on it forever. Click! Phew. Now we have one big, forgiving standard with all the goodies and a thoughtfully selected menu of pre-defined tags to pick and choose from that make sense for describing information and designing visual layouts and tactile layouts and auditory layouts... all sorts of layouts! And best of all you can even make your own tags named more-or-less whatever you want that start off as simple replicas of <div> so you can easily extend the language itself in ways that are simpler and sensible-er and smart-er-er than even the W3C could have ever envisioned. The end! Right?

Not so fast. HTML5 is a language. But it's a language in sort of the same way design languages are languages (...what a terribly apt analogy, no?) It's not exactly a metaphor... it's better to think of it as somewhat of an abstraction: it's really a conceptualization of objects and abilities or things you can have and things you can do with them. It's decoupled from the implementation, that is to say how it is written. The W3C draws this distinction by saying HTML5 is a vocabulary that can be implemented, in one of two syntaxes: the HTML Syntax or the XML Syntax. The most important distinction for you to understand as a web developer is that the XML Syntax is valid XML and valid XML is well-formed. Or it's not XML. (It's crap. (And it's going in the bin.))

You may be an outlayer but I am willing to bet Euros to Yen that my experience here is virtually universal: I have, to this day, never encountered HTML5 implemented in XML Syntax in the wild. It's a simple matter of using the right tools for a given job: XML is generally used for exchanging information between machines (think AJAX, .docx documents, XMPP (Jabber, Skype), SVG graphics...). And it makes sense: there shouldn't be any blemishes in these formats - if there's an error something is very wrong and it is correct that an entire document or message or packet should be discarded. It is not meant to be pretty, nor made so - it is meant to accurately and precisely describe information itself. Those of us who tried to switch from HTML to strict XHTML in the late 90s and early 00s out of a idealistic and good-intentioned but naive belief that well-formed, machine-readable and human-friendly regime would dominate the future understand only too well today the very good reasons why a forgiving, error-tolerant human-oriented markup language makes sense for creating and delivering documents intended for consumption by humans.

You can't blame us for trying; the problems with HTML 2.0 through 4.01 (don't ask where 1.0 went, we don't like to talk about it) were more glaring than obvious. Less the fault of iterations of the language itself than browser support, dozens of browsers and a multitude of rendering engines upon which those browsers had been created were developed in this time and as one might imagine not only different concepts, tags and attributes were supported, ignored or different from browser to browser but the way they were displayed to the user and functioned was increasingly divergent. More daunting still was the art of cross-platform JavaScript engineering the nascent paradigm of webapps demanded serious programmers pull off, birthing a slew of bulky and inefficient compatibility libraries that could themselves just as easily break as raw code while each distinct dialect evolved and shifted beneath them. Many developers made the logical but limiting decision to design for a certain browser to the exclusion of others, meanwhile an elite culture of deadly warrior-poets emerged (yours truly among them) that dared to wrangle the quirks of as many market-share claiming beasts as possible in works of ugly to awe-inspiring hackery that can strike terror into the hearts of polite programmers even today. If this is what it was like for people to converse in the language of the web I'm sure you can imagine what that meant for machines; spiders, indexers, gateways, proxies, aggregators, caches, load balancers, analysis, analytics, application-level firewalls, content delivery, content management, intrusion prevention, intrusion detection, systems, SYSTEMS, SYSTEMS, SYSTEMS! All these systems could do so much more if only half the stuff passing through them didn't taste like junk.

All of that is to say nothing of the main ideological argument that pervaded until only rather recently. Virtually from the beginning, HTML was criticized for having its feet in two ponds: it looks like a markup language and markup languages are meant for describing data. But here we're talking about data that is intended, especially in the early days and still very certainly nominally, to be rendered in a web browser and presented to humans. If it was meant for machines to process and then do god only knows what, possibly including presenting it to humans also, it would probably be better accomplished in a different format. Like XML - which is exactly what XML is for. But we started off with tags clearly only relevant to displaying documents to humans, like >center>. Which, come to think of it, is pretty limited. Would the next iteration include tags for right-aligning content? What about justified text? With word-wrapping? Without?

Wouldn't it make so much more sense if a so-called Printer Friendly version was already a part of the document, after all we're just changing a couple of visual features - often the same ones thousands of times per website. And then there's the problem of just how visual those few formatting controls available were. People with disabilities use the web too - in fact if we put their needs and use-cases at the centre of our focus for a moment it rapidly becomes intoxicating to contemplate how the web could be a revolutionary tool to reach them - for them to reach you - and connect us all together in ways never even possible to conceive of before. Finally, the stresses gave way and continents shifted, into the ever-transitional, never-quite-done ecosystem Cascading StyleSheets' first wails into the dark night were heard and we could finally cleanly separate form from function, appearance from structure and get to the Good Work™ of making a markup language that describes how data should be organized, inter- and intra-related for presentation to a human being, complimented by a styling language that defines how that data should look, feel, sound, print... (..taste... ..smell... ..?)

...OR the W3C could dick around for a decade stubbornly buggering with XHTML 2.0 instead, which it would later abandon anyway. In so doing they issued a summary slap in the c*ck to every man woman and child whose life or livelihood is impacted by the unhindered evolution of a vibrant and economically crucial world wide web (YUP THAT'S YOU! :D)...). As they slept peacefully atop their wheel (Netscape Navigator?) the real world was passing them by, ever-more desperate for the self-declared arbiters of the most fundamental technology underpinning the web: namely the standardized language its participants must speak to be heard (and speak well to not waste everybody's goddamned time and money). Verily, by virtue of all that makes the Internet good and holy Web Hypertext Application Technology Working Group (WHATWG) self-assembled to meet the needs of the time:

From What is the WHATWG? (https://whatwg.org/faq):

The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving the web through standards and tests.

The WHATWG was founded by individuals of Apple, the Mozilla Foundation, and Opera Software in 2004, after a W3C workshop. Apple, Mozilla and Opera were becoming increasingly concerned about the W3C’s direction with XHTML, lack of interest in HTML, and apparent disregard for the needs of real-world web developers. So, in response, these organisations set out with a mission to address these concerns and the Web Hypertext Application Technology Working Group was born.

In 2017, Apple, Google, Microsoft, and Mozilla helped develop an IPR policy and governance structure for the WHATWG, together forming a Steering Group to oversee relevant policies.

Clicking navigates to a visual History of HTML 5 Timeline by Darryll Alcorn

Hell yeah! That's how it's done Internet Style! Imagine: the people behind the development of the major competing browser engines - the engineers that actually have to design the components that communicate, transact and render at the most fundamental level came together in opposition of indifference, stagnation and wanton disregard for the needs of the public writ large! Even today they still coordinate the implementation and future of the language of the Web itself. This was utterly unimaginable in the era of the Microsoft Antitrust Case. The benefits extend far beyond the introduction of HTML5 and every hobbyist and professional web developer active in both eras feels them whenever their work loads consistently in a second browser. That's not to say W3C's work on XHTML 2.0 and its later pivot to "modularization" was entirely for naught: fruits included SVG, MathML and Xforms which would come to influence the inception of HTML5.

By 2007 The W3C saw the writing on the wall: get with the program or resign to irrelevance and eventual disbandment. A new working group was created to onboard the WHATWG specification and a tentative cooperation was established. It didn't take long for the W3C to get ahead of its britches. In 2011, deciding that HTML5 needed a finalization in the same stuffy tradition as the broken, inflexible and inorganic versions prior. In lock-step with this action a new working group was chartered to begin work on a so-called HTML 6 specification, leaving some wondering if the whole affair weren't a cynical, surprise ploy by the W3C to gain the upper hand and secure the future against WHATWG by simply painting a number on it that nobody asked for or needed. The WHATWG disagreed, being headed by pragmatic industry players with skin in the game and pieces on the board a Living Standard that could respond to new challenges and ideas was preferred. An understandable decision given the active cooperation and participation of the major browser vendors fostered a venue that was uniquely capable of ensuring interoperability, consistency and the ability to both deploy and actually realize changes to the specification in time frames that would otherwise be impossible. Adhering to the old convention of monolithic versioning was not only outdated, unnecessary and - looking back it could be argued - demonstrably broken, it would effectively handicap an industry that fancies itself, not without merit, to be a bedfellow of innovation.

Finally, in May 2019, the two camps were sleeping together once more and all was well in the world. WHATWG, obviously, being the boy of the relationship and the W3C being the girl. Today both groups perform important functions in maintaining the HTML5 specification and its siblings and the W3C has come to accept the sensibility of the Living Standard. Indeed, one could say the Internet has been graced by a new era aligned under a Live and let live Standard.


There are no comments for this item.