Validate html

This check validates the html in every page in your site to one of the following standards:


Html validation is important for several reasons:

The html validation engine DeepTrawl uses is provided by the w3c (it's also used to provide validation at and for html 5 validation in the w3c's main validation site), therefore it provides a very high level of accuracy, having been implemented by experts in the field. Unlike other products DeepTrawl doesn't send your html over the internet for validation (creating a dependency on 3rd party servers and raising security questions), all validation is done locally.


Because the DeepTrawl html validator is based on industry standards a lot the the errors it produces can be searched for online to find a resolution. For example:

For the above error you can perform a search online for 'html script type attribute' to immediately find out why this page failed validation and how to fix it.

Configuring this check

Select the Settings link next to the check in the Checks tab or select: Settings > Check settings > Validate html

Html type

This allows you to tell DeepTrawl what type of html your using. By default DeepTrawl will read the doctype at the top of each page and use it to figure out what type of html to validate the page as. For example, the doctype...

<!DOCTYPE html>

... will cause the page to be validated as html 5.

Ignore unknown doctype

If selected doctypes DeepTrawl doesn't understand will be ignored. By default this option is de-selected and DeepTrawl will show an error for any page which doesn't have a recognized doctype.

Local html 5 files are html 5 / xhtml 5

In html 5 the doctype doesn't give an indication of whether a page is xhtml 5 or html 5. When checking pages from the web the mime type will be used to make this distinction. For local pages no mime type is present so you need to tell DeepTrawl whether you'll be using vanilla html 5 or the xhtml 5 variant.

Ignore mime type schema differences

Many websites give incorrect information in the mime types for their pages. For example an xhtml 1.0 document should be sent with a an html header like this...

content-type: application/xhtml+xml

... but many are sent with text/html instead. This actually breaks the compatibility of the page in modern web browsers. They assume that the html should be rendered in backwards compatibility mode.

Historically html validators have ignored this issue, therefore DeepTrawl has the Ignore mime type schema differences option selected by default for backwards compatibility, but for best validation Error on mime type / schema differences should be selected.