W3C Validation, who cares, Google? Yahoo? MSN?
Experiment started 3/14/2007
Experiment completed 3/16/2007
Much sooner than I expected or even could have hoped, this page has not only been indexed by Google but more so, shows up prominently in search results for terms and phrases one can find in various parts of this page.
It should be noted though, this does not prove that any and all possible validation errors are overlooked by Google but instead, some errors which some thought fatal for being indexed in Google may not be so fatal after all.
What am I trying to prove and how.
This page is an attempt to experiment with the various search engines' ability to spider a page with errors and yet still be able to return useful information leading to not only the indexing of the page but more so, the page showing up in search results as it would were it to validate flawlessly.
To that end, this page implements probably the most broken code possible, no doctype, no html, head or body tag and some meta tags closed using HTML style, some XHTML style and some not "closed" at all!
Although one can also "break" validation by not closing various body element tags, <p>, <div>, <table> etc. in most cases those errors would be noticeable in one's browser and show up as a visually broken page and since it is unlikely even a beginner webmaster would allow a page to stand like that, this page breaks the code in ways that still allow the page to look and function as normal.
In fact, the CSS works and even more so, the javascript used to expand the images works as well so unless one were to either attempt to validate the page using the W3C's validation tool or the HTMLTidy Firefox plugin, or simply view the page source looking for the missing tag/elements, one would have no idea the page was broken in the first place.
And, as it can be expected to take some time, up to a month or even two for this page to have a chance to be crawled and actually indexed, the final test will be whether or not at some point in the future this page actually becomes indexed or not as well as the extent to which it shows up in relevant keyword searches.
Who cares about search engines and validation anyway?
Many people, maybe too many, seem convinced that validation of a page or maybe more accurately, that the general well formedness of web site pages is important for search engines to be able to crawl and catalog page content properly.
On the other hand, the fact that Matt Cutts has stated in his Should You Optimize for Search Engines or Users & Code Validation video that
“Validation is not necessary. Compelling content is more important than validation. But having valid code may be another reason some other webmasters may be willing to link at your site.”would seem to suggest that people looking to validation to cure a given page's woes are more likely grasping at straws than operating under a verifiable theory let alone a fact.
One can somewhat understand why people might think validation is important with the thinking that crawlers can not know what is what without pages being constructed correctly but then again, besides that fact that the big three search engine operators don't seem able, or care, that their own web sites don't validate, getting a browser to screw up a page due to validation errors alone is difficult.
And, if search engines are concerned that search users see the same as the search engines themselves so that cloaking is less likely to take place, how likely is it that search crawlers will be any less fault tolerant than the browsers within which their results will likely be displayed in?
Who cares what some people think?
So what, some people have an opinion that an expert on the subject would seem to disagree with, everyone is entitled to their opinion so no big deal, right?
Were it the case that people kept their opinions to themselves, there wouldn't be a problem but when you add in the "blog factor" where anyone who can type their name can all of a sudden become an instant expert on anything, there needs to be some balance somewhere and if people won't listen to wisdom from the horses mouth, so to speak, i.e. Matt Cutts, hopefully this experiment might do what an expert opinion/knowledge as well as logic has so far failed to do.
That is if this experiment goes as I expect it to and this page does end up indexed and returned in search engine results for which it should be expected to.
Anyone want to lay odds on this page getting indexed?
Parsing stops on error. Really?
One of the more interesting claims I have heard regarding why validation is important is supposedly search engine spiders stop parsing a page as soon as they run into an "error".
Exactly what type of error is supposed to cause this major catastrophic meltdown is often explained with much hand waving and hmming and hwwing but the very idea of a web crawler stopping as soon as it hits an error is pretty rediculous no matter how one looks at it.
If it were true that parsing stops on errors, how is it that the W3C validation tool seems more than happy to catalog numerous errors on a given page long after it has run into what could be considered serious errors?
The W3C validation tool fails to list all the errors on this page though as it only notices a missing doctype, but for most pages where it has been claimed that validation errors would cause search engines to halt parsing on specific errors, the W3C validation tool seemed to have no problem contiuing the parsing and even identifying numerous errors even after the "parsing killer" errors had been run into.
Even beyond one parsing tool of rather famous acclaim being able to continue parsing after running into errors, the very fact that you are not only seeing this page at all but effecitively viewing it as if there were no errors, browsers seem to have no problem parsing beyond what could or should be considered quite serious errors.
But, browsers and validation tools are not search engine spiders!
True, but since browsers obviously don't bail out on errors, as this page is partial proof of, wouldn't it be great if search engine crawlers actually did?
One could then forget about having to use cloaking techniques to hide content from the spiders because all one would have to do is put a big fat error on the page and then put all the things one would normally cloak after the error and search engines wouldn't be the wiser.
So, how much sense would it make for search engine spiders to not be able to catalog everything the average browser can see?
An incorrect Doctype is bad!
If so, how about NO doctype? Besides this page, go through any search results from any search engine and see how many pages even have a doctype at all let alone a valid one
The Firefox plugin HTMLTidy can tell one what the declared doctype of a given page is and what the page appears to be coded to so if a little plugin can do that, Google, Yahoo, MSN and all the others are unable to do what a simple little browser plugin can do?
After one is done finding all the non-doctype declared pages showing up in search engine results, go through and start cataloging all the pages with incorrect and/or incomplete doctypes.
Remember while you are doing this though that the pages you are looking at are actually indexed and more importantly, show up in search results.
Break me baby!!
Although I can't think of a more broken page than one without the most important of tags, if anyone can think of another way to break the page, while still allowing it to render correctly, feel free to use the contact system to the left of this page to suggest it.
Some might suggest that having the missing tags but they're being broken somehow might be a good test, it would take some convincing that it would be useful because the most often quoted reason why broken tags are a problem is because without them opening and closing properly, a given parser can not know where one element starts and another element ends but if the tags are missing altogether, the same effect is achieved since on can't know where a given element starts and a given element ends because those elements don't exist at all.
How to know the results.
By using the search string in any search engine, minus the quotes, of "site:cass-hacks.com" one will see a list of all the pages indexed from this
site. Both normal and supplemental pages will be shown. If the page shows up in the normal index, the experiment should be considered at least a
partial success. ED note: Since Google
decided to remove the "Supplemental Result" from index listings, the previous is no longer a valid test but, a search on the phrase "validation important
Google" will list
this page as one of the first few results returned.
If the title assigned to the page in the SERPs is "CASS-Hacks - Does Google care about validation?", then it is even more of a success as the search engine was able to find the title at least.
If the description assigned to the page is "This page in effect is a test of whether or not a page with broken HTML/XHTML structure can get indexed.", even better as the description could be found.
But most important of all, if the contents of the page itself then show up in search results for various terms or phrases found in the body, it should be considered essentially a total success!
If it is actually a total success, as I predict, will that stop people from making their unsubstantiated claims? Stupid question!
Some Javascript driven DHTML executed by a Javascript file included in the "head"
Below is an exact copy of a demonstration of a DHTML technique documented and described in a Javascript "thumbnail to full size display" article elsewhere on this site that can be accessed from the content navigation menu to the left.
It is included to show that even files normally loaded in the head section of a given page, of which the CSS stylesheets are also an example, functions properly even though there is actually no head section nor even a body section.
One might also notice that the site's contact form to the left, which is CSS driven, also seems to have no problems even though its behaviour is defined in a style sheet referenced in the non-existent head section.
What may be more amusing, virtually all of the scripts used on this page don't execute until the body's onload event occurs yet there is no body!
Please note that while this page works in Firefox, Epiphany, Konqueror, Lynx, Galeon and even Safari, as expected Internet Explorer doesn't do so well parsing and rendering this page but is it any wonder?








