This survey continues the work done in previous surveys on February 2005 and August 2005. Also, the recent work of Validating sites of W3C members (which uses the same methodology) might be of interest to you.
All the figures on this web page are also represented as tables. If you want
to see the tables, turn off the showing of images in your browser or view the
page with a text‐only browser. If you only see the tables, or don’t see neither
tables or figures, then it probably means your browser does not correctly
support the object
element.
I acquired the set of Estonian web pages the same way as in previous surveys: they’re all taken from Neti.ee Estonian servers list.
Compared to the previous survey (done in August 2005), there were 1290 more pages this time.
Survey | Nr of pages | Growth |
---|---|---|
February 2005 | 21,905 | |
August 2005 | 22,735 | 830 |
March 2006 | 27,211 | 4476 |
Table 1. The number of pages that took part from each of the Estonian-webpages-survey.
When we look at table 1, we’ll see there has been a significant growth in the number of pages used in survey. From February to August (6 months) 830 more pages were added to the list, but from August to March (7 months) 4476 pages were added. That’s quite a difference! Probably it’s because the winter is the active time of doing business (and also – it seems – buying domain names).
From the full set of 27,211 pages 3186 pages were excluded for various reasons (redirect to another page, no HTML content, error page etc), which left us with 24,024 pages to perform validation and other kinds of statistics on.
The results of our previous survey were a bit disappointing (to say the least). The number of valid pages had only risen by four and the overall percentage of valid pages had fallen from 2.22% to 2.17%.
That fore I was quite surprised when I first saw the results of this very survey. It looked like a miracle.
I checked everything twice, but it seems to be correct: the amount of valid pages in Estonia has risen from 2.17% to 3.02%! That’s almost 1% of growth! That’s almost 40% more valid pages! The following table 2 provides you with hard numbers.
Result of validation | Nr of pages | Percentage |
---|---|---|
Not valid | 23,176 | 96.47% |
Valid | 725 | 3.02% |
Tentatively valid | 124 | 0.52% |
Table 2. The results of validating all the pages.
If previously 34.98% of invalid pages had doctype specified, then now it has risen to 35.54%. Minor change, but indeed.
The distribution of document types used in invalid and valid pages can be seen from the following figures 1 and 2.
Figure 1. Changes in the use of doctypes on invalid pages. The increase is marked with green and decrease with red (brighter then green). All the other figures in this page use the same colors in exactly the same meaning.
Figure 2. Changes in the use of doctypes on valid pages.
The most significant change to note from both figures above: the usage of XHTML 1.0 Transitional has almost doubled. Also the XHTML 1.0 Strict and XHTML 1.1 have also nearly doubled they’re usage. This certainly is the sign that XHTML is moving in the direction of becoming more popular than HTML (especially when creating new webpages).
On the bad side of XHTML 1.1, which (according to W3C) must not be served as
text/html
. But none of the valid pages, that used XHTML 1.1, served it as
an XHTML or XML application.
However the great rise of HTML 3.2 is probably some kind of mistake in my data-gathering process, because last time the HTML 3.2 actually made a big fall, but now somehow it has risen again, but there is still less HTML 3.2 now, then there was in the first survey.
The survey discovered a lot of mistakes made in the spelling of doctypes’ Formal Public Identifiers (FPI). Clearly, this indicates, that most of the developers don’t really know what a doctype is – but here are the most common mistakes made.
First of all, FPI is case-sensitive. A lot of developers don’t seem to know that. A lot of pages had doctypes written entirely or partly in lowercase (but also in uppercase). This is wrong. The bad examples follow:
-//w3c//dtd html 4.0 transitional//en 22 -//W3C//Dtd XHTML 1.1//EN -//W3C//DTD HTML 3.2 FINAL//EN -//W3C//DTD html 4.01 Transitional//EN -//w3c//dtd html 4.0//en -//W3C//DTD XHTML 1.0 strict//EN
Secondly, the language identifier at the end of FPI refers to the language the Document Type Definition itself is written in. All the W3C documents are written in english, that’s why there’s "EN" at the end. But a lot of developers thought this might be a place to specify the language of the HTML document. Wrong again. Bad examples follow:
-//W3C//DTD HTML 4.0 Transitional//ET -//W3C//DTD HTML 4.01 Transitional//ET -//W3C//DTD XHTML 1.0 Transitional//ET -//W3C//DTD HTML 4.0 Transitional//EE -//W3C//DTD XHTML 1.1//EE -//W3C//DTD HTML 4.0 Transitional//RU -//W3C//DTD XHTML 1.0 Transitional//RU -//W3C//DTD HTML 4.0 Transitional//LT -//W3C//DTD HTML 4.0 Transitional//LV -//W3C//DTD XHTML 1.0 Strict//NO
And lastly there were the misspelled FPI’s. Hard to say, why some people write something like "XHTML 1.1 Transitional" without checking to see, if a doctype like that even exists. More bad examples follow:
-//W3C//DTD HTML 5.0 Transitional//EN -//W3C//DTD HTML 4.1 Transitional//EN -//W3C//DTD HTML 4.01 Strict//EN -//W3C//DTD HTML 3.02 Transitional//EN -//W3C//DTD XHTML 1.1 Transitional//EN -//W3C//DTD XHTML 1.0//EN -//W3C//DTD HTM 4.0 Transitional//EN -//W3C//DTD W3 HTML//EN -//W3C//DTD 3.2//EN -//W3C//DTD Transitional 1.0 Strict//EN -//WC3//DTD HTML 4.0 Transitional//EN
The number of pages, that specify encoding has risen from 72% to 73%.
As seen from figure 3, UTF-8 has made the most significant rise and conquered the second place. ISO 8859-13 has risen from 10th place to 8th.
Figure 3. Changes in the popularity of encodings.
As almost half of the pages use iso-8859-1
the number-of-pages-axis
has been cut.
The misspelled "iso8859-15" still has 13th place, but with the rise of UTF-8, the 14th place has been achieved by another misspelling: "utf8".
Other popular misspellings (used more than once):
iso8859-15 unicode win-1251 _autodetect_all et-iso-8859-1 iso-8851-15 iso-8859 latin-1 windows 1251 windows1251
There seems to be only good news this time:
Let’s hope the progress continues...
Kirjutatud 11. märtsil 2006.
RSS, RSS kommentaarid, XHTML, CSS, AA