In 22th of February 2002 Marko Karppinen performed a validation-survey on the websites of W3C members. As he said:
it would be understandable to assume that the w3c member organizations are on the forefront of web standardization efforts.
But sadly it turned out, that only 18 sites out of 506 were using valid HTML.
Obviously I grabbed the opportunity to repeat he’s experiment, and performed a validation of all the websites of W3C members. (Which was an easy task, as I had my mass-validation scripts left over from previous investigations.)
Today W3C has 401 members (interestingly that’s much fewer than about
500 members they had at 2002). Out of those, 397 had a website (at least
a one mentioned in W3C
members list). The script was unable to validate 45 of those sites
for various reasons (the main one being an <meta http-equiv="redirect"
used in HTML - a damn stupid script, I know). So, 352 sites remained.
And here are the results of validating those compared to the results of Marko Karppinen at 2002:
Result | 2002 survey | 2006 survey | ||
---|---|---|---|---|
nr of pages | percentage | nr of pages | percentage | |
Not valid | 488 | 96% | 286 | 81% |
Valid | 18 | 4% | 61 | 17% |
Tentatively valid | 0 | 0% | 5 | 2% |
As seen from above, the W3C members are clearly getting better at producing valid websites. There are 13% more valid websites then there were at 2002. And it’s not only percentages: at 2002 there were 43 valid pages less than it is today.
On the sad side: over 80% of the W3C members still have an invalid website. And these are the members of organisation who’s mission is:
To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web.
Here’s a list of all those members who’s website was found to be using valid HTML. Note, that only three of those had a valid site back in 2002.
I also gathered some information about doctypes and encodings used on the websites of W3C members.
When we look at the set of invalid pages (because all valid pages must a doctype to be valid at the first place), then only 48% included a document type declaration. This is definitely more than my 2005 February survey found on Estonian pages (35%), but definitely less, than you would expect from W3C members.
DOCTYPE | nr of pages |
---|---|
HTML 4.01 Transitional | 100 |
XHTML 1.0 Transitional | 80 |
HTML 4.0 Transitional | 39 |
XHTML 1.0 Strict | 19 |
XHTML 1.1 | 4 |
HTML 4.01 Strict | 3 |
HTML 4.01 Frameset | 2 |
HTML 4.0 Strict | 1 |
-//W3C//Dtd HTML 4.01 Transitional//EN | 1 |
-//W3C//DTD HTML 3.2//EN | 1 |
HTML 3.2 | 1 |
-//W3C//DTD HTML 4.1 Transitional//EN | 1 |
As seen from the table above, the most popular are the transitional doctypes. Interestingly, compared to my Estonian sites survey, XHTML 1.0 Transitional is more popular among the W3C members, than in Estonian sites, where it landed on the third place. Also, on the fourth place is XHTML 1.0 Strict instead of Estonian HTML 3.2.
DOCTYPE | nr of pages |
---|---|
XHTML 1.0 Transitional | 32 |
XHTML 1.0 Strict | 12 |
HTML 4.01 Transitional | 9 |
XHTML 1.1 | 4 |
HTML 4.0 Transitional | 2 |
HTML 4.01 Strict | 1 |
HTML 4.01 Frameset | 1 |
Doctypes usage on valid pages makes even more clear, that W3C members strive towards XHTML. And a strict doctype on the second page is definitely a good sign (although a first page would be even better, the valid websites of Estonia have a strict doctype on the fourth place).
But what about XHTML 1.1? W3C says it should be served as
application/xhtml+xml
and definitely not as
text/html
. Are the members following this advice?
Sadly, they’re not. Only one of the four members using XHTML 1.1, Progeny Systems, serves it with a correct mime-type. Actually, it looks like it’s the only W3C member who’s serving he’s website as XML. But even they aren’t perfect - they’re markup is missing the XML declaration, and they’re using tables for layout.
111 pages had no encoding specified in HTML. Of course not a requirement, but belongs to a good style. the validator had problems with parsing 31 pages because either a wrong character encoding was specified, or the page did not conform to the standards of UTF-8 (the default character set).
This aside, the most popular encodings were ISO 8859-1 and UTF-8.
Encoding | nr of pages |
---|---|
iso-8859-1 | 151 |
utf-8 | 51 |
shift_jis | 9 |
windows-1252 | 5 |
gb2312 | 5 |
euc-kr | 4 |
windows-1251 | 2 |
us-ascii | 2 |
big5 | 2 |
iso-8859-7 | 2 |
iso-8859-2 | 1 |
utf-16 | 1 |
x-user-defined | 1 |
iso-8859-15 | 1 |
iso-2022-jp | 1 |
windows-1253 | 1 |
ecu-kr | 1 |
windows-1255 | 1 |
So, this how good at web standards are the W3C members on year 2006. Four years since the original survey. The sites have improved, but they could be a lot better.
Maybe at 2008 we have 50% of valid W3C member sites. Maybe...
But what about ordinary sites? Will we have 17% of the World Wide Web validate for 2008? 2009? 2010?
Anyway, we don’t know even how many valid websites do we have today. My own research has shown, that it could be as high as 2%, but there were some serious flaws, which made the investigated set of sites not very representative.
Further studies will hopefully bring some clarity to the subject, until then, we may just happily conclude, that in regards to the standards, at least the W3C members are doing better.
Kirjutatud 4. märtsil 2005.
RSS, RSS kommentaarid, XHTML, CSS, AA