Web Standards in Estonia vol 3

This survey continues the work done in previous surveys on February 2005 and August 2005. Also, the recent work of Validating sites of W3C members (which uses the same methodology) might be of interest to you.

All the figures on this web page are also represented as tables. If you want to see the tables, turn off the showing of images in your browser or view the page with a text‐only browser. If you only see the tables, or don’t see neither tables or figures, then it probably means your browser does not correctly support the object element.

Table of Contents

The selection of pages

I acquired the set of Estonian web pages the same way as in previous surveys: they’re all taken from Neti.ee Estonian servers list.

Compared to the previous survey (done in August 2005), there were 1290 more pages this time.

Survey Nr of pages Growth
February 2005 21,905
August 2005 22,735 830
March 2006 27,211 4476

Table 1. The number of pages that took part from each of the Estonian-webpages-survey.

When we look at table 1, we’ll see there has been a significant growth in the number of pages used in survey. From February to August (6 months) 830 more pages were added to the list, but from August to March (7 months) 4476 pages were added. That’s quite a difference! Probably it’s because the winter is the active time of doing business (and also – it seems – buying domain names).

From the full set of 27,211 pages 3186 pages were excluded for various reasons (redirect to another page, no HTML content, error page etc), which left us with 24,024 pages to perform validation and other kinds of statistics on.

Results

The results of our previous survey were a bit disappointing (to say the least). The number of valid pages had only risen by four and the overall percentage of valid pages had fallen from 2.22% to 2.17%.

That fore I was quite surprised when I first saw the results of this very survey. It looked like a miracle.

I checked everything twice, but it seems to be correct: the amount of valid pages in Estonia has risen from 2.17% to 3.02%! That’s almost 1% of growth! That’s almost 40% more valid pages! The following table 2 provides you with hard numbers.

Result of validation Nr of pages Percentage
Not valid 23,176 96.47%
Valid 725 3.02%
Tentatively valid 124 0.52%

Table 2. The results of validating all the pages.

Doctypes

If previously 34.98% of invalid pages had doctype specified, then now it has risen to 35.54%. Minor change, but indeed.

The distribution of document types used in invalid and valid pages can be seen from the following figures 1 and 2.

Invalid pages
Document typeNumber of pagesChange
HTML 4.01 Transitional4056+865
HTML 4.0 Transitional1691+206
XHTML 1.0 Transitional1364+662
HTML 3.2405+193
HTML 4.01 Frameset245+23
XHTML 1.0 Strict146+63
XHTML 1.185+30

Figure 1. Changes in the use of doctypes on invalid pages. The increase is marked with green and decrease with red (brighter then green). All the other figures in this page use the same colors in exactly the same meaning.

Valid pages
Document typeNumber of pagesChange
HTML 4.01 Transitional336+104
XHTML 1.0 Transitional232+122
XHTML 1.0 Strict57+26
XHTML 1.136+19
HTML 4.0 Transitional32+8
HTML 4.01 Frameset11+3

Figure 2. Changes in the use of doctypes on valid pages.

The most significant change to note from both figures above: the usage of XHTML 1.0 Transitional has almost doubled. Also the XHTML 1.0 Strict and XHTML 1.1 have also nearly doubled they’re usage. This certainly is the sign that XHTML is moving in the direction of becoming more popular than HTML (especially when creating new webpages).

On the bad side of XHTML 1.1, which (according to W3C) must not be served as text/html. But none of the valid pages, that used XHTML 1.1, served it as an XHTML or XML application.

However the great rise of HTML 3.2 is probably some kind of mistake in my data-gathering process, because last time the HTML 3.2 actually made a big fall, but now somehow it has risen again, but there is still less HTML 3.2 now, then there was in the first survey.

Buggy doctypes

The survey discovered a lot of mistakes made in the spelling of doctypes’ Formal Public Identifiers (FPI). Clearly, this indicates, that most of the developers don’t really know what a doctype is – but here are the most common mistakes made.

First of all, FPI is case-sensitive. A lot of developers don’t seem to know that. A lot of pages had doctypes written entirely or partly in lowercase (but also in uppercase). This is wrong. The bad examples follow:

-//w3c//dtd html 4.0 transitional//en 22
-//W3C//Dtd XHTML 1.1//EN
-//W3C//DTD HTML 3.2 FINAL//EN
-//W3C//DTD html 4.01 Transitional//EN
-//w3c//dtd html 4.0//en
-//W3C//DTD XHTML 1.0 strict//EN

Secondly, the language identifier at the end of FPI refers to the language the Document Type Definition itself is written in. All the W3C documents are written in english, that’s why there’s "EN" at the end. But a lot of developers thought this might be a place to specify the language of the HTML document. Wrong again. Bad examples follow:

-//W3C//DTD HTML 4.0 Transitional//ET
-//W3C//DTD HTML 4.01 Transitional//ET
-//W3C//DTD XHTML 1.0 Transitional//ET
-//W3C//DTD HTML 4.0 Transitional//EE
-//W3C//DTD XHTML 1.1//EE
-//W3C//DTD HTML 4.0 Transitional//RU
-//W3C//DTD XHTML 1.0 Transitional//RU
-//W3C//DTD HTML 4.0 Transitional//LT
-//W3C//DTD HTML 4.0 Transitional//LV
-//W3C//DTD XHTML 1.0 Strict//NO

And lastly there were the misspelled FPI’s. Hard to say, why some people write something like "XHTML 1.1 Transitional" without checking to see, if a doctype like that even exists. More bad examples follow:

-//W3C//DTD HTML 5.0 Transitional//EN
-//W3C//DTD HTML 4.1 Transitional//EN
-//W3C//DTD HTML 4.01 Strict//EN
-//W3C//DTD HTML 3.02 Transitional//EN
-//W3C//DTD XHTML 1.1 Transitional//EN
-//W3C//DTD XHTML 1.0//EN
-//W3C//DTD HTM 4.0 Transitional//EN
-//W3C//DTD W3 HTML//EN
-//W3C//DTD 3.2//EN
-//W3C//DTD Transitional 1.0 Strict//EN
-//WC3//DTD HTML 4.0 Transitional//EN

Encodings

The number of pages, that specify encoding has risen from 72% to 73%.

As seen from figure 3, UTF-8 has made the most significant rise and conquered the second place. ISO 8859-13 has risen from 10th place to 8th.

Encodings
EncodingNumber of pagesChange
iso-8859-111019+1441
utf-82124+739
windows-12571867+268
windows-12511030+159
windows-1252594+80
iso-8859-4264+46
iso-8859-15212+47
iso-8859-1388+72
windows-125073+5
iso-8859-263+3
koi8-r26+12
us-ascii18+3
iso8859-113+3
utf811?

Figure 3. Changes in the popularity of encodings. As almost half of the pages use iso-8859-1 the number-of-pages-axis has been cut.

The misspelled "iso8859-15" still has 13th place, but with the rise of UTF-8, the 14th place has been achieved by another misspelling: "utf8".

Other popular misspellings (used more than once):

iso8859-15
unicode
win-1251
_autodetect_all
et-iso-8859-1
iso-8851-15
iso-8859
latin-1
windows 1251
windows1251

Summary

There seems to be only good news this time:

Let’s hope the progress continues...

Kirjutatud 11. märtsil 2006.

Trinoloogialeht

Eesti Trinoloogide Maja. Eesti trinoloogiahuviliste avalik kogunemiskoht. info@triin.net

Peamenüü

Samal teemal

RSS, RSS kommentaarid, XHTML, CSS, AA