Page 1 of 1

Use of HTML5

Posted: Thu Nov 17, 2011 3:31 am
by MikeGale
I've been wondering how many people are using HTML5 (or claiming to!).

Today I found an analysis for Quarter 3 of 2011 which suggests that about a third (34%) of 100 popular sites claim to be using HTML5. They have variable numbers of errors so aren't really doing what they claim, but they are trying.

That surprised me.

See http://bit.ly/sIF4t4 for a PDF report with details.

(NOTE: If you check these yourself I recommend switching off link checking and probably switching off JavaScript checking before you start.)

Re: Use of HTML5

Posted: Tue Nov 22, 2011 4:42 pm
by Albert Wiersch
Thanks Mike. That is interesting. My initial surprise was how many sites use XHTML... and you'd think the XHTML sites would be more error free but it doesn't seem like it from the table.

But I can't say that I'm surprised that only about 4 sites out of 100 didn't have any errors... but congrats to those 4 sites. :D

It's nice to see msn.com use XHTML 1.0 Strict and not have any errors though. It really irritated me when you had to use IE to access a Microsoft site (or any site for that matter). I'm glad that seems to be a thing of the past.

Re: Use of HTML5

Posted: Tue Nov 22, 2011 9:16 pm
by MikeGale
I have the impression that significant pieces of the web are becoming unusable to some browsers. I think there's a wave of web developers who only develop for their preferred browser AND have convinced big companies to use their talents. As you say there used to be IE only sites, now there are Firefox (mostly) sites that fail in IE, so any serious web user needs more than one browser (unless he's prepared not to get support for his new camera or whatever!).

The statistics say that in some cases IE is still the majority, so companies that throw away more than half their audience is surprising. For others Firefox dominates. My conclusion is that mangement doesn't know what a mess they're making!

Then we have mobile browsers which are becoming common. A whole different ball game!

Re: Use of HTML5

Posted: Wed Nov 23, 2011 10:40 am
by Lowiepete
I found that list really quite shocking! To call yourself a professional web developer
and not be able to write valid code seems completely perverse! It's not as if it's
a hard thing to do.

I think Mike is correct about company managements. They are being bamboozled
by B.S. wrapped-up in palatable doses. My preferred browser is Opera, mostly
because I like the way the keyboard short-cuts are built in and not like Firefox,
dependent upon a 3rd party add-on (which in present incarnation don't work!).
Problem is, the Opera browser is rarely among the those preferred by big
companies, especially banks and other large financial institutions.

Indeed, most of them totally ignore almost any form of web building with
disabled users in mind. I'm presently having a battle royal with my building
society, who think it's very clever to have a pop-up window appear when you
click on a link instructing a particular message not to appear again. Of
course, it does this without any prior warning and no keyboard short-cut will
close it. The meaningless, and frankly demeaning, platitudes you get when
you complain need to be seen to be believed.

Seems the semantic web has some way to go - and HTML5 seems to be a
tad unlikely to be the vehicle that'll provide it. We can always live in hope.

Re: Use of HTML5

Posted: Thu Feb 02, 2012 10:02 am
by ormaaj
MikeGale wrote:I have the impression that significant pieces of the web are becoming unusabale to some browsers. I think there's a wave of web developers who only develop for their preferred browser AND have convinced big companies to use their talents.
HTML5 does more than just extend older versions with new features. To put it simply, the design of HTML5 attempts to identify previously unspecified behaviors shared amongst major web browsers and specify them formally so that current and future web browsers will always agree, without breaking (and in many cases improving upon) interoperability with older browsers. The idea is that developers should only have to write one way for all browsers. The reality of the current situation is that modern browsers are significantly more compatible with both one another and historic browsers than they were just a few years ago thanks in large part to HTML5.

As a result of that design, from the point of view of modern browsers, ALL HTML pages are HTML5. Browsers don't distinguish between the various versions of HTML. The various legacy rendering modes for pages with certain old doctypes or no doctype still exist, but there is only one parser - the HTML5 parser, for which there is a reference implementation (html5lib plus a few others). Remember, this is backwards compatible with both old HTML and old "pseudo-XHTML-ish XHTML served as HTML" wherever possible. In a nutshell, If it worked before, it will continue working.

HTML5 is in no way intended to "loosen" the language or justify sloppy coding. All it does is document how browsers work. It also doesn't get in the way of the current or future state of XHTML. It simplifies matters by making both XHTML and HTML effectively two serializations of the same datastructure (except where extra XML capabilities are a superset). If you're a validation and XML purist, you should be satisfied that HTML5 validators such as at validator.nu use real XML schemas (RelaxNG) behind the scenes to specify the language, and you can even supply your own. These are used for both the XML and HTML encoding. If you wanted to go crazy and write a heavily XHTML-dependent website with complex multi-namespace pages which also depend on HTML5-specific features, you can, and you can validate it too.

Lastly, it will always be the case that new features aren't supported by older browsers. That's where the concept of graceful degradation comes in, and is no different now than it was before HTML5. Except it's now easier with the browser landscape being less vendor locked, and with all the great Javascript libraries like YepNope and Modernizr. Discouraging the development of well-reasoned new HTML5 features can't be a good thing.

Re: Use of HTML5

Posted: Thu Feb 02, 2012 3:48 pm
by MikeGale
I applaud what HTML is trying to do. (Though I would not have gone about things the same way myself.)

EDIT: The post to which I'm replying seems to have changed a bit, since I originally posted this. It originally said that I had HTML5 backward.

I'm not talking about some theoretical analysis of sites not working. I'm talking about observation in real life.

I use a variety of browsers. (In reality I'd really like my own browser that actually does what I want, but my current view is that the development effort it too heavy.) I'm basing my observations on using those browsers. (IE 8 and 9, FF (often deliberately behind the curve as I distrust the rapid versioning), Chrome (but it can't do some things), Safari and Opera. Aurora is used a little but not systematically. Others not used in any regular way. I sometimes use a third party service to check visual appearance (only) on a wider range of browsers for both web markup and email markup.)

I increasingly find sites that simply don't work in one of the browsers. This is not some trivial eye candy effect, it's so bad that I need to fire up another browser and paste in the URI to get anything usable.

Years ago I used to dig into the details and enter into a dialog with developers about such issues. That is unfortunately no longer worth doing. (The developers are often paint by numbers guys, but even to get to them I usually have to get through a help desk layer of know nothings.)

These fairly complete failures are what I was referring to.

Re: Use of HTML5

Posted: Thu Feb 02, 2012 4:04 pm
by MikeGale
On a related side issue I recently ran some analyses of sites claiming to be HTML5.

It was interesting that most of them were just a badge engineering exercise. They were not real HTML5.

My time is limited so I didn't go any further. I think that was about 100 popular sites.

I've recently become aware of a publicly available crawl of the web. This is stored in Amazon and can be accessed via their Hadoop implementation. (To run a swarm of machines over a large data set.) I believe that substantial analysis can be run for a few dollars. The samples I've heard about are Java.

See https://www.commoncrawl.org/data/accessing-the-data/ to find out more. They had certificate errors last time I checked, and may prohibit commercial use, which certainly reduces the usefulness of what they have.

Has anybody here thought of running CSE over a body of data like that. If you've done it I'd appreciate some feedback. If you're thinking of doing it (probably using batch reports) give me a shout if you'd like somebody to talk it over with.

Re: Use of HTML5

Posted: Thu Feb 02, 2012 6:46 pm
by ormaaj
EDIT: The post to which I'm replying seems to have changed a bit, since I originally posted this. It originally said that I had HTML5 backward.
Yeah sorry I was attempting to make the rant as short as possible.
I increasingly find sites that simply don't work in one of the browsers. This is not some trivial eye candy effect, it's so bad that I need to fire up another browser and paste in the URI to get anything usable.
I test in all those browsers and regularly build Chromium from git. Some examples? Do you have any idea what specifically about these websites tends to be the blame? Like I said, as of late, all browsers are using literally the very same HTML parsing, even mostly share the same implementation, so there's no chance at least on the markup side of things that the interpretation of even old broken code could result in building a different DOM on most current browsers. If markup is to blame then it should always be consistently broken in the same way.

Of course, Javascript is another story entirely and still varies wildly especially when you get into the bleeding edge of advanced things, but you're talking about mostly either legacy or uninformed developers who don't maintain their sites properly I presume. I can't think of a reason things should be getting worse on that front. If the code is that bad it's a wonder it "works" in any browser. It can only really be nasty javascript hacks or obscure proprietary markup, which should be decreasing in abundance.
It was interesting that most of them were just a badge engineering exercise.
Not surprising. It's quite the trendy marketing gimmick indeed. But as I said, how do you determine what's "real" HTML5? There is no automated method. About all you can do is look for the presence of HTML5 specific features which are hopefully not buried in obfuscated javascript only to become apparent after building a DOM. And even if there aren't any, that doesn't mean it isn't HTML5. An "HTML5 doctype" is by no means a dead giveaway either, nor is the presence of an older XHTML doctype.
Has anybody here thought of running CSE over a body of data like that. If you've done it I'd appreciate some feedback. If you're thinking of doing it (probably using batch reports) give me a shout if you'd like somebody to talk it over with.
I don't think any automated tool could give meaningful numbers as to the overall "correctness" or "brokenness" of a website beyond a very simple validity check. Even that isn't going to work if the page uses a lot of inline javascript.

CSE probably wouldn't be the right tool for the job IMO. Hell, CSE gives me extra errors the more "correct" i'm being as there's no way it could understand why things are being done a certain way against it's overgeneralized generic advice that mostly isn't applicable. That isn't necessarily a bad thing. I can see the value in helping guide beginners except perhaps giving an overly simplified view of the nuances of various behaviors. Hopefully that's not misinterpreted, I just don't think it's output would say much of anything about code quality in a large scale analysis.

I would Imagine CSE throws a hissy fit when it sees code like this: http://www.google.com/404
Google 404 page is the way it is for a reason and not incorrect (though whether or not good practice may be a bit controversial).

Re: Use of HTML5

Posted: Thu Feb 02, 2012 7:38 pm
by Albert Wiersch
ormaaj wrote:Hell, CSE gives me extra errors the more "correct" i'm being as there's no way it could understand why things are being done a certain way against it's overgeneralized generic advice that mostly isn't applicable. That isn't necessarily a bad thing. I can see the value in helping guide beginners except perhaps giving an overly simplified view of the nuances of various behaviors.
On that note, if there is anything specific you want improved in CSE HTML Validator, then please post it to another topic in the appropriate CSE HTML Validator forum with as much detail as you can provide and I would be happy to see if the output or checking could be improved. :D
ormaaj wrote:I would Imagine CSE throws a hissy fit when it sees code like this: http://www.google.com/404
Google 404 page is the way it is for a reason and not incorrect (though whether or not good practice may be a bit controversial).
Yes, CSE HTML Validator tries to enforce better structure, though I'm thinking I may have CSE HTML Validator try to assume the missing tags (head, body, etc) with an optional warning message in the next major release. Not sure on that yet, just a thought.

Re: Use of HTML5

Posted: Thu Feb 02, 2012 9:52 pm
by MikeGale
I test in all those browsers and regularly build Chromium from git. Some examples?
As I said I'm not examining it these days. (It actually goes a bit further than getting hold of somebody. When I have made contact with the right people I tend to find: 1) It sometimes takes a long time to explain the issue 2) They say nothing is wrong then quietly go and fix it (a win of a sort) 3) Or, all too often, acknowledge that there is a mess but the budget has run out and it's not worth approaching the decision maker.) All in all there's better ways to spend my time.

I know that some of the cases are JavaScript issues, now that you mention it. (Since about HTML 2 I've been astounded how many programmers haven't bothered to get to grips with the basics of markup. Problem continues.) Don't know what proportion.
It's quite the trendy marketing gimmick indeed.
It's nothing new. I've seen a lot of XHTML Strict that's the same. Now if browsers had an indicator that the markup is broken we might have a way to shame web authors. It might even lead to a web where so many CPU cycles aren't wasted fixing tag soup.

It would be really cool if the people who are fooled into believing they have HTML 5 checked that!
I don't think any automated tool could give meaningful numbers
Disagree. I've run some analyses, errors only, on pages purporting to be HTML5. It isn't perfect but the results, suitably worked up, are good enough for some analyses (even though not originally intended for that).

They also open the door to further work. They're a great starting point for a person to render their own analysis. The algorithmic data makes the human analysis more productive.

The benefit I'd like to see is web makers testing their product and fixing it before it goes live. That would help us move on from where many are now, to worthwhile active content.

You should try using CSE with just errors switched on.

Re: Use of HTML5

Posted: Fri Feb 03, 2012 4:28 am
by ormaaj
Albert Wiersch wrote:On that note, if there is anything specific you want improved in CSE HTML Validator, then please post it to another topic in the appropriate CSE HTML Validator forum with as much detail as you can provide and I would be happy to see if the output or checking could be improved. :D
Thanks! :mrgreen: will do. Things can usually be configured nicely especially in the case of analyzing one's own pages. I'm not the best resource, my version isn't current (for a class I took a while back in which it was required).
Yes, CSE HTML Validator tries to enforce better structure, though I'm thinking I may have CSE HTML Validator try to assume the missing tags (head, body, etc) with an optional warning message in the next major release. Not sure on that yet, just a thought.
My tastes do favor being explicit over the sparse implied tag style, but it might be valuable to some for the CSE validator to have the capacity to build pages following the precise HTML5 rules and then base it's analysis on the resulting structure only. I know for those only interested in validation the tools for that have been around, but combined with the added "linting" abilities it could be cool.

Re: Use of HTML5

Posted: Mon Feb 06, 2012 3:46 pm
by MikeGale
For one look into the dark side of geo-location see

http://www.geekosystem.com/iphone-tracking/

(Note every web page you access gives away some potential geo-location information. This has been the case since the web started.)

Re: Use of HTML5

Posted: Wed Mar 07, 2012 3:40 am
by jamesace
As a result of that design, from the point of view of modern browsers, ALL HTML pages are HTML5. Browsers don't distinguish between the various versions of HTML. The various legacy rendering modes for pages with certain old doctypes or no doctype still exist, but there is only one parser - the HTML5 parser, for which there is a reference implementation (html5lib plus a few others). Remember, this is backwards compatible with both old HTML and old "pseudo-XHTML-ish XHTML served as HTML" wherever possible. In a nutshell, If it worked before, it will continue working.