How do I have the HTML Validator ignore URL parameters?

For technical support and bug reports for all editions of CSS HTML Validator, including htmlval for Linux and Mac.
Post Reply
Stefan Muth
Rank 0 - Newcomer
Posts: 2
Joined: Sat Jun 04, 2016 11:45 am

How do I have the HTML Validator ignore URL parameters?

Post by Stefan Muth »

How can I make the Validator ignore URL parameters? For example, the following are all treated as different pages, but they are the same because the parameter does not affect content. Therefore, I want the 'page' parameter to be ignored:

http://www.site.com/
http://www.site.com/?page=AD61
http://www.site.com/?page=S158
... and so on.

All of my pages have the "<link rel="canonical" href="..."/>" directive, so perhaps the HTML Validator could use that for each page instead of the actual URL?
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: How do I have the HTML Validator ignore URL parameters?

Post by Albert Wiersch »

Hi Stefan,

The Batch Wizard will treat those pages as different because the URL parameters can cause different HTML to be returned by the server. However, there are at least a couple of ways to ignore this if desired.

The easiest way in your case would be to go to the 'Exclusions' tab in the Batch Wizard and enter something like this in the 'Don't process these targets' field: http://www.site.com/?page=

Then any target containing that string will not be checked (so it will be skipped).

Another option would be to use an onTargetCanAdd() user function. This is a little more complicated but it gives you more control. Here is the documentation page:
https://www.htmlvalidator.com/current/d ... canadd.htm

You could use something like this:

Code: Select all

function onTargetCanAdd() {
 if beginsWithNoCase($otca_target,'http://www.site.com/?page=') {
  $otca_add=false;
 }
}
You'll need to put that function in a text file and then specify that file in the 'Target List Options' tab.

Hope this help!
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Stefan Muth
Rank 0 - Newcomer
Posts: 2
Joined: Sat Jun 04, 2016 11:45 am

Re: How do I have the HTML Validator ignore URL parameters?

Post by Stefan Muth »

How about validating canonical URLs (maybe it does already), and in the process allow the option to use that instead of the URL for each page when crawling the site? That would potentially solve a lot of problems, including this one. I don't know if it's a good idea -- you're the expert.

For example, I note that the HTML Validator treats www.site-name.com/page.html and www.Site-Name.com/page.html as being distinct pages, while (I think) a browser would not. This leads to unnecessary "Duplicate Title" errors and so on.
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: How do I have the HTML Validator ignore URL parameters?

Post by Albert Wiersch »

Hi Stefan,

I don't think that most sites specify canonical URLs, and using that could create other issues. Someone may want to validate a URL with different parameters.

As for http://www.site-name.com/page.html and http://www.Site-Name.com/page.html being treated as different, that sounds like an issue that I will need to look into for a future release. The domain part of the URL is case-insensitive but the rest of it may or may not be case-insensitive depending on the server, so CSE HTML Validator should always treat the domain part as case insensitive when comparing. I will definitely look into this and make sure it is comparing properly so you don't get any false duplicate title or meta tag notifications.

By the way, if your site is on a Windows server and is case-insensitive when it comes to URL paths, then you can use the 'Targets matching these strings are case-insensitive' option in the 'Target List Options' tab. That will likely fix the domain case problem as well.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: How do I have the HTML Validator ignore URL parameters?

Post by Albert Wiersch »

Sorry, it looks like I forgot to post an update on this.

The issue/bug with treating the same domain as different domains if the case varies (thus resulting in false reports of duplicate page titles and the like) should be resolved in the latest version (16.05).
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Post Reply