Scope tab

The Scope tab enables you to further customize the scope of your trawls.


Ignore case of URLs when comparing

By default, DeepTrawl ignores the case when following hyperlinks. This avoids trawling a page twice simply because links use varying capitalization.

Switch off this option if the capitalization of the URL may actually determine which page is returned.

Judge scope by original link URL / Judge scope by redirected URL (if any)

By default, DeepTrawl decides whether to include a page in the scope based on the URL of the link pointing to it.

Alternatively, select Judge Scope by redirected URL to judge inclusion according to the HTTP redirected URL of the page (if any). This may result in slower trawls but offers tighter enforcement of the scope rules.

Limit length of URL / Maximum URL length

By default, DeepTrawl limits the maximum length of a page's URL. This prevents recursion in dynamically generated web sites. (Recursion causes DeepTrawl to keep trawling identical pages with ever growing URLs, generated specifically to be linked from the link source page.)

If your site uses very long URLs, you may want to raise this limit or switch off this option.

Check files found on disk appear to be (X)HTML

By default, DeepTrawl confirms that local files are HTML before loading them. Try switching off this option if some of your files are not appearing.