Is rel="canonical" supported?

For technical support for all editions of CSS HTML Validator. Includes bug reports.
ktp
Rank II - Novice
Rank II - Novice
Posts: 48
Joined: Sat Oct 29, 2016 10:34 am

Is rel="canonical" supported?

Post by ktp »

Hello,

I have two html URL: URL A (https://www.example.com/folderA/sample.html) and URL B (https://www.example.com/folerB/sample.html) .
URL A has rel="canonical" which points to URL B as href="/folderB/sample.html" (so in relative mode).
But Batch report mentions as errors that URL A and B URL have same title, same meta description etc...
as if rel="canonical" is not supported. Is there a solution?

Thank you.
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3594
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Is rel="canonical" supported?

Post by Albert Wiersch »

Hello,

Thank you. I have reproduced this issue. The Batch Wizard does not consider rel="canonical" when generating the duplicate meta description and duplicate page title reports.

I think the proper thing to do here would be to disregard pages for the aforementioned reports if they specify another URL as the canonical URL. Do you agree?
ktp
Rank II - Novice
Rank II - Novice
Posts: 48
Joined: Sat Oct 29, 2016 10:34 am

Re: Is rel="canonical" supported?

Post by ktp »

In my case I have a lot of such URLs: ten of thousands of them (sitemap has 250,000+ links),
so it is really difficult to read and sort out the real errors among them and among other errors.

I would rather like to have as soon as possible an option or a fix from the batch wizard validator to not consider and mention such errors
(or better to put them on a different list with mention using rel="canonical"). It would have real added value to the validator. For information, one SEO auditing tool I currently used supports rel="canonical" automatically, with statistics etc...

URL A, URL C, URL E use rel="canonical" to URL B.
URL D does not use rel="canonical"

All URL A,B,C,D,E have duplicated title, meta description etc...
Then the batch wizard validator should mention as error URL D and URL B.

Also, in my case, such URL all have the same "filename" e.g. sample.html (maybe my wording is not correct), though of course their "path" part are different:

URL A href="/folderA/sample.html"
URL B href="/folderB/sample.html"
URL C href="/folderC/sample.html"
URL D href="/folderD/sample.html"
URL E href="/folderE/other_sample.html"

So for me an error should also be mentioned when the "filename" part is different.
URL E uses rel="canonical" to URL B, but its filename (other_sample.html) is different from URB L (sample.html).
This is an error for me.

Thank you.
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3594
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Is rel="canonical" supported?

Post by Albert Wiersch »

Thank you for explaining this in detail.

I am doing some final work & updates for the upcoming new major update (CSS HTML Validator 2021/v21) and will try to include a change to address this issue in the release version, or, if not able to by the release version, in an update to 2021/v21.
ktp
Rank II - Novice
Rank II - Novice
Posts: 48
Joined: Sat Oct 29, 2016 10:34 am

Re: Is rel="canonical" supported?

Post by ktp »

Great, I hope that the feature will be included as soon as possible.

In the meantime, I manage to scan smaller branches, so the tool is still efficient and very useful for detecting errors.
I discover that the tool did crawl using the url canonical too: excellent!. Thank to this function, I found several errors in my web project.

By the way, after thinking, the case of URL E should not be considered as an error.
URL can use different folder and file name, as long as it has rel="canonical" to URL B, it is correct.

Keep up the good work!
ktp
Rank II - Novice
Rank II - Novice
Posts: 48
Joined: Sat Oct 29, 2016 10:34 am

Re: Is rel="canonical" supported?

Post by ktp »

In fact, if an URL has rel="canonical" to another URL different than itself, it should be discarded from validator since it will not be indexed by search engine.
Last edited by ktp on Wed Dec 16, 2020 10:56 pm, edited 1 time in total.
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3594
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Is rel="canonical" supported?

Post by Albert Wiersch »

Hello,

Today I was able to add functionality that allows a user (via user functions) to exclude specific targets from the Duplicate Meta Descriptions and Duplicate Page Titles reports. For example, if there is a canonical link then you can exclude the target from the reports.

You can also abort the validation if you want (like if you don't care for the validation results for the target).

I can compile a test version for you. It will require a 2021/v21 key which our system can now send you if you are licensed to use the upcoming major release:
https://www.htmlvalidator.com/sendkey.php

Please let me know if you'd like to test this and I will post more information and send you a link to a download.
ktp
Rank II - Novice
Rank II - Novice
Posts: 48
Joined: Sat Oct 29, 2016 10:34 am

Re: Is rel="canonical" supported?

Post by ktp »

Great! I really need this function, so please post all information for getting it.
I never heard of user functions, hope that this will be easy (no programming need?).

Edit:
> For example, if there is a canonical link then you can exclude the target from the reports.
Just to be sure, only URL with canonical link that points to another URL different than itself are to be discard. Since canonical can be used to point to own URL.

Thank you.
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3594
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Is rel="canonical" supported?

Post by Albert Wiersch »

Great!

A little programming is needed but I will provide the small "user function" that is needed for you to use for this customization. Ultimately it may be integrated into the standard behavior but not just yet.

I still need to work out a reliable way to make sure the canonical URL is not itself and then I will provide the download and function that you can try. I'll work on this tomorrow.
ktp
Rank II - Novice
Rank II - Novice
Posts: 48
Joined: Sat Oct 29, 2016 10:34 am

Re: Is rel="canonical" supported?

Post by ktp »

Thanks! Just also to be sure that for checking the canonical link, it could be either in relaive or absolute path.
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3594
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Is rel="canonical" supported?

Post by Albert Wiersch »

It looks like this is ready to test!

Put this "user function" in a text file like "userfunctions.txt":

Code: Select all

function onStartTag_link() {
 if getAttValueEx('rel',12)=='canonical' {
  if isBatchWizardJob {
   $this_href=convertStringEx(7,'#');
   $can_href=convertStringEx(7,getAttValueEx('href',12));
   if !matchCase($this_href,$can_href) {
    Message(1,MSG_WARNING,'Canonical! this: '+$this_href+', canonical: '+$can_href);
//    abortValidation();
    $_BatchWizard.report_dup_page_title=0;
    $_BatchWizard.report_dup_meta_desc=0;
   } 
  }
 }
}
Then specify the file as a 'user functions' file in the 'Config File' page of the Validator Engine Options (Ctrl+F4). Press the 'Reload Config' button and that's it!

This should exclude non-canonical pages from the 'Duplicate Meta Descriptions' and 'Duplicate Page Titles' reports in the Batch Wizard.

If you want to abort the validation for non-canonical pages then you can uncomment the line with abortValidation(). You'll need to reload the config or restart the validator for changes to the user functions file to take affect.

You'll need v21.0000 or later for this to work (not yet released). I'll send you a private message about this.
ktp
Rank II - Novice
Rank II - Novice
Posts: 48
Joined: Sat Oct 29, 2016 10:34 am

Re: Is rel="canonical" supported?

Post by ktp »

> This should exclude non-canonical pages from the 'Duplicate Meta Descriptions' and 'Duplicate Page Titles' reports in the Batch Wizard.
To be sure, URL A has rel="canonical" to URL B, which has or does not have rel="canonical" to itself. Does the script discard only URL A?

I wish to discard only URLs that have rel="canonical" that does not point to itself.
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3594
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Is rel="canonical" supported?

Post by Albert Wiersch »

ktp wrote: Thu Dec 17, 2020 9:20 pm > This should exclude non-canonical pages from the 'Duplicate Meta Descriptions' and 'Duplicate Page Titles' reports in the Batch Wizard.
To be sure, URL A has rel="canonical" to URL B, which has or does not have rel="canonical" to itself. Does the script discard only URL A?

I wish to discard only URLs that have rel="canonical" that does not point to itself.
Yes, it should check the canonical URL and the URL to itself and if they match then it will not be excluded from the reports.
ktp
Rank II - Novice
Rank II - Novice
Posts: 48
Joined: Sat Oct 29, 2016 10:34 am

Re: Is rel="canonical" supported?

Post by ktp »

Just a first quick report. I just started a small run, and at the end there are warning messages about canonical link (one url canonical to other). But there are no errors for duplicated Title or Meta descriptions. So it looks good, I need to test with larger project to confirm.
User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3594
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Is rel="canonical" supported?

Post by Albert Wiersch »

ktp wrote: Fri Dec 18, 2020 5:55 am Just a first quick report. I just started a small run, and at the end there are warning messages about canonical link (one url canonical to other). But there are no errors for duplicated Title or Meta descriptions. So it looks good, I need to test with larger project to confirm.
Sounds good. The warning message may be the one that the user function is generating. You can easily comment that line out if you don't want it. Prepend "//" to comment that line out like this:

Code: Select all

// Message(1,MSG_WARNING,'Canonical! this: '+$this_href+', canonical: '+$can_href);
Post Reply