Help


Check Hyperlinks


This Check finds broken or invalid hyperlinks. These might be:



Types of hyperlinks checked are:



DeepTrawl can check many other types of links too. See Check Dependencies.



Background

Invalid hyperlinks are a major problem for any website. At best they make the site look unprofessional. At worst, they can shake user confidence so much that they leave the site. A recent study by Deep Cognition showed that 92% of fortune 500 sites have broken links, in fact 13% of all pages in fortune 500 sites have at least one broken link. This makes broken links the worst quality issue facing site owners and viewers. It's so bad, there's even a TED talk devoted to it!


Troubleshooting


404 errors

Most of the time a 404 error is caused because the owner of the destination website has removed or moved the target page. Either remove the link from or point it to an alternative page.


Malformed URL errors

Most of the time, malformed URL errors are caused by a typo in your HTML. Try reviewing the link URL to make sure it is valid.


Non-contactable server errors

A server may only be temporarily non-contactable. If you are worried the server is unavailable too often, it is recommended you remove this link from your site.


Solving missing bookmarks (anchor) errors

These errors are caused because the destination page does not contain an appropriate bookmark/anchor. If you own the page, address this using the following HTML:


<a name="myAnchor"></a>


The following URL points to the specific part of the webpage where the above tag is located:


http://mySite.com/myPage.com#myAnchor


Note: Anchor checking is turned off by default.


Configuring this check


Select the Settings link next to the check in the Checks tab or select: Settings > Check settings > Check Hyperlinks from the menus.


Note: De-selecting any of the following may hide other errors.


Find malformed URLs

Check for badly structured URLs (for example full URLs which have a protocol which is not widely recognized)


Find errors contacting remote servers

Stop DeepTrawl from showing a problem when a server cannot be reached over the Internet. This should usually be left on.


Find timeouts

An error will be shown if the link times out (the timeout setting is in the advanced settings, connection tab).


Error if linked page download takes longer than x

Shows an error if the download of a linked page takes longer than the specified time. It's a good idea to set this too a high value; remember DeepTrawl could be taking a lot of your local bandwidth, making the download seem slower.


Note: Applies to linked html pages only, not other linked files


Follow redirections

Follow HTTP redirections and report any errors which occur with any of the redirections or the resource found at the end of the redirection(s).


All redirections are errors

Ignore any HTTP redirections when downloading links. An error will be shown for any redirection when downloading a link.


Error if too many redirections

Switch this off to suppress errors where DeepTrawl could not reach a linked resource because too many redirections were used. For changing the maximum number of redirections, see advanced settings, connection tab.


Find missing anchors (bookmarks)

Switch this off to avoid checking for the existence of anchors in linked pages. Anchors allow a link to point at a specific section of a webpage, but they require a special tag to be included in the destination page. In some url schemes this can lead to false positives & so is disabled by default.


Find links to pages with text:

If a link points to a page containing the specified text, treat it as invalid. Use this option to find error pages with known text which would otherwise be missed because the HTTP code returned with them indicates that they are fine.

This is useful when...



Enter the text using basic boolean search terms, indicating whether capitalization matters.


Error HTTP codes

Switch on or off to search for specific HTTP error codes. For instance you may wish to never see errors caused by 401 (unauthorized) HTTP codes.