This is where you set up options that affect the link checker.

Link Checking Options
Non-Checkbox Options
| • | HTTP Timeout - When using the HTTP protocol (such as when link checking), this option specifies a time to wait in seconds before failing. For instance, if a link check cannot complete within this time period, then the link check will automatically abort and fail with a "time out" error. Valid values are 1 or more second. The default value is 30 seconds. |
| • | Max simultaneous link checks - Specify the number of simultaneous link checks HTML Validator will perform to try to improve performance. Valid values are 1 to 200. Higher values may have side effects and may put a heavy load on a server if too many links on the same server are checked at the same time. A high value could also cause incorrect results and unnecessary timeouts. If you believe that the link check results are incorrect (incorrect results would most likely be in the form of correct links being flagged as bad or broken), then you may want to try lowering this value. The default value is 15. |
| • | Redirected links - Specify whether redirected links should be OK, warnings, or errors. Redirected links are those that when requested from the server, the server redirects the client to another URL. If set to "Error", then redirected links will be preceded by a red icon in the Links tab of the Results Window. If set to "Warning", then a yellow icon will be used. If set to "OK", then a green icon will be used. The default is to treat redirected links as warnings. |
Checkbox Options
| • | Disable email address syntax checking - Check this box so that anytime an email address is checked for correct Internet email address syntax, the address is accepted as a valid address. This box should only be checked when you are not using Internet formatted email addresses. An example of an Internet formatted email address is username@domain.com. An example of a non Internet formatted email address is John Doe. The default is to check email address syntax so this option is not enabled by default. |
| • | Case sensitive link checking - Check this box to enable case sensitive link checking for local file and local folder links. Due to technical reasons, this does not apply to other types of links such as HTTP links. This option is useful if the web document will be hosted on a server where filenames are case sensitive, such as a Unix or Linux based server. Even if the website will be hosted on a Windows based server (which is not case sensitive), it is still a good idea to make sure that the case matches. If a local file or local folder link is valid except that its case does not match, then the link checker will generate an error for the link and the link status will be "Incorrect Case". This option is enabled by default. |
| • | Auto detect text files containing links (one per line) when validating - Check this box to automatically detect, when validating or link checking, text files that contain one link per line. When this type of file is detected, the links are added to the link checker to be checked. Auto detection checks up to the first 10 lines to see if the lines are http or https links unless the first line is exactly "CSEJOBTYPELINKCHECKTEXTLIST". If the first line is exactly "CSEJOBTYPELINKCHECKTEXTLIST" (without the quotes) then the file is auto-detected to be a text file of links without further checking. This feature allows you to check a list of links when they are listed in a text file, one per line. If you don't think you'll use this feature then you may want to disable the auto detect. This option is enabled by default. (New v9.00) |
| • | Recheck bad links at next link check - Check this box to keep rechecking bad links when the next link check is run in case the bad links are fixed. If this box is not checked, then the link cache will cause bad links to keep reporting as bad links (even if fixed) until the link cache is reset and the link is checked again. If you want to limit the rechecking of bad links to increase efficiency or because it is not helpful then disable this option. This option is enabled by default. (New v9.00) |
| • | Enable HTTPS link checking - Check this box to enable link checking of HTTPS links. Setting up HTTPS connections requires more time and resources because a secure connection has to be established. You may want to disable this if HTTPS link checking is not important or if you encounter a problem due to HTTPS link checking. This option is enabled by default. (New v9.00) |
| • | Enable DNS record checking where available - Check this box to enable additional checking of hostnames where available. FOR EMAIL ADDRESSES: Checks hostnames used in email addresses for available DNS records (first checks for an MX record (mail exchanger), then an A record if no MX record is found). DNS records are used when sending email so if a DNS record doesn't exist for a hostname that is used in an email address, then there is likely to be a problem with the email address. FOR FTP LINKS: Checks hostnames in FTP links. ADDITIONAL INFO: To check for DNS records, a DnsQuery() call to an OS function is used, so the system DNS must be working and the settings must be correct in order for this check to work properly. This option is enabled by default. (New v9.00) |
| • | Reset Link Cache - Press this button to clear everything in the link cache. All links and link checking results in the link cache are cleared from memory so that any new links will have to be added to the link cache again and be rechecked. |
| • | Advanced Options - Press this button for information about setting more advanced link checking options. |
Filter Options (New v9.00)
| • | Enable filters - Check this box to enable filtering based on the filters. |
| • | Don't check these links - Specify a list of strings, separated by semicolons. If a link (after being changed to an absolute link) matches one of these strings then it is not checked by the link checker. This allows you to exclude certain links from being checked. The wildcard character "*" (an asterisk) may be used at the beginning and/or end of each string. For example, to exclude checking links in a certain folder/directory, specify http://www.domain.com/directory/*. This option has no effect if link checking is not enabled. NOTE: Do not surround the semicolons by extra spaces. Extra spaces are considered part of the string. |
Advanced Options - Link Checker Config File
By using a link checker configuration file, additional options can be set. For example, path mapping can be used. Path mapping maps "root-relative" paths in local documents to the correct local or server (HTTP) path so the link checker can check the correct link.
For example, let's say there are several websites with different domains and they are organized on the C drive in different folders like C:\websites\domain1\, C:\websites\domain2\, and C:\websites\domain3\. There are index.html documents in each of the domain directories and there are images linked with src="/jpg/image.jpg". By default, if a link check is done on any index.html document, then the link checker will try to check the file C:\jpg\image.jpg but this is not correct.
To solve the above issue, a new "pathmapcfg" tag can be added to the link checker configuration file. By setting basebeginswith="C:\websites\domain1\" and mapto="C:\websites\domain1\", the link checker will be able to compute the correct link to check. Alternatively, if you want to check the link via HTTP, then use mapto="http://www.domain1.com/" instead of mapto="C:\websites\domain1\". This will cause src="/jpg/image.jpg" in C:\websites\domain1\index.html to be checked as http://www.domain1.com/jpg/image.jpg. To do this for all three example domains, three "pathmapcfg" tags with different "basebeginswith" values are needed, one for each domain. Note that "basebeginswith" values are compared without case sensitivity.
NOTE: After editing the link checker configuration file, restart CSE HTML Validator for the changes to take effect.
Example Configuration File
<?xml version="1.0" encoding="UTF-8"?>
<linkcheckercfg>
<pathmapcfgset>
<pathmapcfg
basebeginswith="k:\websites\wwwhtmlvalidator\"
mapto="http://www.htmlvalidator.com/" />
<pathmapcfg
basebeginswith="k:\websites\wwwnbah\"
mapto="k:\websites\wwwnbah\" />
</pathmapcfgset>
</linkcheckercfg>
By default, the configuration file that stores the additional link checker options is linkcheckercfg.xml and is located in a folder like C:\Documents and Settings\username\Application Data\AI Internet Solutions\CSE HTML Validator\9.0 (Windows XP) or C:\Users\username\AppData\Roaming\AI Internet Solutions\CSE HTML Validator\9.0 (Vista).
To easily edit this file, choose Options->Validator Engine Options->Edit Configuration Files->Link Checker in the editor.
About the Link Cache
CSE HTML Validator keeps a "link cache" in memory for the duration of program execution (the cache is automatically reset when HTML Validator is terminated or reloaded). When a link is checked, the result is stored in the cache so that the link does not have to be checked again. This helps speed up link checking and efficiency. To manually reset the link cache during program execution, use the Reset Link Cache button described above. Resetting the link cache will cause all links to be checked again.