Respect "rel=nofollow"

For technical support and bug reports for all editions of CSS HTML Validator, including htmlval for Linux and Mac.
Landon_Luko
Rank 0 - Newcomer
Posts: 8
Joined: Thu Oct 26, 2017 10:01 am

Respect "rel=nofollow"

Post by Landon_Luko »

Does Batch Wizard have an option to respect "rel=nofollow" when crawling a website? If not, this would be a great feature to add.
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Respect "rel=nofollow"

Post by Albert Wiersch »

Hello,

Not directly but you may be able to write a user function to do what you want.

You can exclude targets from being checked if you know which links or can match them with a regular expression. See this documentation page for onTargetCanAdd():
https://www.htmlvalidator.com/2019/docs ... canadd.htm

You can also abort a validation if CSS HTML Validator sees "rel=nofollow".

Can you give me more details as to what exactly you want to do and what you want CSS HTML Validator to do when it sees a document with rel=nofollow?
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Kilo_SSK
Rank 0 - Newcomer
Posts: 4
Joined: Mon Sep 23, 2019 11:27 am

Re: Respect "rel=nofollow"

Post by Kilo_SSK »

Thanks for the advice, I'll see what I can come up with! I'm not the OP, but that'll help.
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Respect "rel=nofollow"

Post by ktp »

It would be really helpful to have read-made snippet for discarding URL with "rel=nofollow".

Currently I don't know how to do this, it seems I need to learn TNPL scripting, and how to add the code, where etc... while
my priority now is to validate my URLs. No many time for self-learning.

Repository (pinned specific topic for example) of useful snippets like this would be really help to get started quickly.
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Respect "rel=nofollow"

Post by ktp »

Hello Admin,

Could you help me for the user function to get discarding URL with rel="nofollow" ?
I am starting to learn how to do this, and how to debug etc... A read-made script would be easier for me now.

Thank you.
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Respect "rel=nofollow"

Post by Albert Wiersch »

ktp wrote: Thu Dec 17, 2020 9:14 pm Hello Admin,

Could you help me for the user function to get discarding URL with rel="nofollow" ?
I am starting to learn how to do this, and how to debug etc... A read-made script would be easier for me now.

Thank you.
Hello,

I'm working on a way to stop the Batch Wizard from following/crawling links with rel="nofollow" right now. I think I am close to a solution. I will post back with the results soon.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Respect "rel=nofollow"

Post by Albert Wiersch »

I've completed initial testing on this new feature which can exclude "a" links with rel="nofollow" from being checked/followed/crawled in the Batch Wizard.

Put this "user function" in a text file like "wizard_userfunctions.txt":

Code: Select all

function onTargetCanAdd() {
 if $otca_flags&1 {
  $otca_add=false;
 }
}
Then specify the file as a 'user functions' file in the 'Target List Options' tab of the Batch Wizard for the target list(s) that you want it to apply to... and that's it!

You'll need v21.0000 or later for this to work (not yet released). I'll send you a private message about this.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Respect "rel=nofollow"

Post by ktp »

First quick report with v21 version and the user scripts for rel="nofollow" and rel="canonical" provided by admin.

1) I have several CSS Html validator messages at start:
External CSS file "C:\Users\user\AppData\Roaming\AI Internet Solutions\CSS HTML Validator\21\batchreporttemplate.css" cannot be accessed. Does it exist? Falling back to using the internal default CSS instead.
Similar error message for c:\Users\user\AppData\Roaming\AI Internet Solutions\CSS HTML Validator\21\htmlvalV210.cfg. I have to get it from v20 after renaming it.

2) The "type" attribute was not used so this script is assumed to be of type "text/javascript" (the default).
This is in contradiction with W3C https://validator.w3.org/
"Warning: The type attribute is unnecessary for JavaScript resources."

3) The link to Facebook is no longer valid:
Consider using Facebook's Open Graph Object Debugger at https://developers.facebook.com/tools/debug/og/object/ to perform further checks on Open Graph tags. This message is displayed only once.

4) Options/Validator engine options/Enable CSS Syntax checking activated. But why CSS check messages displayed with batch wizard while the main validator does not display them?
If I check single URL from the main validator (ctrl-shift-O), there is no error, no warning messages. With the batch wizard, there a lot of CSS warning messages.

5) I have a project with 25K+ URLs. I added rel="nofollow", so only 1085 urls remained in sitemap (this is confirmed by other tool).
But the batch wizard still run as normally (13 min, 25,985 documents found).
So for me the rel="nofollow" is not active somehow. How to debug it? For information, I concatenate the 2 scripts provided by admin
into a single file userfunctions.txt:

Code: Select all

// Then specify the file as a 'user functions' file in the 'Config File' page of the Validator Engine Options (Ctrl+F4). Press the 'Reload Config' button and that's it!

// support for rel="canonical"
function onStartTag_link() {
 if getAttValueEx('rel',12)=='canonical' {
  if isBatchWizardJob {
   $this_href=convertStringEx(7,'#');
   $can_href=convertStringEx(7,getAttValueEx('href',12));
   if !matchCase($this_href,$can_href) {
    Message(1,MSG_WARNING,'Canonical! this: '+$this_href+', canonical: '+$can_href);
    abortValidation();
    $_BatchWizard.report_dup_page_title=0;
    $_BatchWizard.report_dup_meta_desc=0;
   } 
  }
 }
}

// support for rel="nofollow"
userfunctions.txt
function onTargetCanAdd() {
 if $otca_flags&1 {
  $otca_add=false;
 }
}
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Respect "rel=nofollow"

Post by Albert Wiersch »

ktp wrote: Fri Dec 18, 2020 6:36 am First quick report with v21 version and the user scripts for rel="nofollow" and rel="canonical" provided by admin.

1) I have several CSS Html validator messages at start:
External CSS file "C:\Users\user\AppData\Roaming\AI Internet Solutions\CSS HTML Validator\21\batchreporttemplate.css" cannot be accessed. Does it exist? Falling back to using the internal default CSS instead.
Similar error message for c:\Users\user\AppData\Roaming\AI Internet Solutions\CSS HTML Validator\21\htmlvalV210.cfg. I have to get it from v20 after renaming it.
1) Thank you. I'll look into what might have caused this issue.
ktp wrote: Fri Dec 18, 2020 6:36 am 2) The "type" attribute was not used so this script is assumed to be of type "text/javascript" (the default).
This is in contradiction with W3C https://validator.w3.org/
"Warning: The type attribute is unnecessary for JavaScript resources."
2) This is just an informational message. It's not actually telling you to use the attribute.
ktp wrote: Fri Dec 18, 2020 6:36 am 3) The link to Facebook is no longer valid:
Consider using Facebook's Open Graph Object Debugger at https://developers.facebook.com/tools/debug/og/object/ to perform further checks on Open Graph tags. This message is displayed only once.
3) Thank you. I've updated the link and message to:
Consider using Facebook's Sharing Debugger at https://developers.facebook.com/tools/debug/ to perform further checks on Open Graph markup.
ktp wrote: Fri Dec 18, 2020 6:36 am 4) Options/Validator engine options/Enable CSS Syntax checking activated. But why CSS check messages displayed with batch wizard while the main validator does not display them?
If I check single URL from the main validator (ctrl-shift-O), there is no error, no warning messages. With the batch wizard, there a lot of CSS warning messages.
4) Are you sure there are no CSS messages when using the integrated editor? Most CSS messages should display in the 'Style' tab in the Results Window so you may need to select that tab to see the CSS messages.
ktp wrote: Fri Dec 18, 2020 6:36 am 5) I have a project with 25K+ URLs. I added rel="nofollow", so only 1085 urls remained in sitemap (this is confirmed by other tool).
But the batch wizard still run as normally (13 min, 25,985 documents found).
So for me the rel="nofollow" is not active somehow. How to debug it? For information, I concatenate the 2 scripts provided by admin
into a single file userfunctions.txt:
5) You really shouldn't concatenate those user functions as one is for the validator engine and is specified in the Validator Engine Options and the other function is for the Batch Wizard which is specified in the 'Target List Options' tab in the Batch Wizard.

If you put the onTargetCanAdd() function into a separate file (like wiz_userfunctions.txt) and specify it (after loading the target list that you want it to apply to) in the 'Target List Options' tab in the Batch Wizard then I think it will fix this issue (see screenshot).
Attachments
BatchWizUserFunctions.png
BatchWizUserFunctions.png (14.76 KiB) Viewed 5163 times
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Respect "rel=nofollow"

Post by ktp »

> 4) Are you sure there are no CSS messages when using the integrated editor? Most CSS messages should display in the 'Style' tab in the Results Window so you may need to select that tab to see the CSS messages.

I confirm that yes, there are no messages in Style tab for the same URL to validate.
So I checked the CSS checker option. It is confusing.

Editor: Options/Validator Engine Options/Enable CSS syntax checking
Batch Wizard: Options/Validator Engine Options/CSS checker => enable CSS style checking

The wording is confusing, one is "CSS syntax check", the other is "CSS style checking", and both have different presentations.
By toggle on/off some options, I discover in fact that they are same options under "Validator Engine Options"
Edit: Editor/Options, then in Batch Wizard Options (in menu), and also Options (with cranted wheel).

So CSS syntax check or CSS style check is well activated, and CSS warning messages appear in Batch Wizard report, while nothing is shown in The Editor Style tab. So this is a problem for me. With v20, no CSS messages in both Editor and Batch Wizard, with CSS style check activated. But now probably v21 has stronger CSS checker so new messages appear, but the discepancy in output (in Editor/Style tab, and in Batch Wizard report) is a problem.


> If you put the onTargetCanAdd() function into a separate file (like wiz_userfunctions.txt) and specify it (after loading the target list that you want it to apply to) in the 'Target List Options' tab in the Batch Wizard then I think it will fix this issue (see screenshot).

It works! With rel="nofollow" support:
1076 documents in 31.89 seconds
instead of
25,985 documents in 13 min 3secons, as before.

The number of documents (number of URLs) is in sync with the sitemap produced with other tool.

Thank you admin for your quick support. Keep up the good work!
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Respect "rel=nofollow"

Post by ktp »

In javascript with src=, I did not put type="text/javascript" and got from validator:
The "type" attribute was not used so this script is assumed to be of type "text/javascript" (the default).

So I follow the validator and put type="text/javascript" and got now:
The "type" attribute should be omitted instead of specifying a value that is the empty string or a JavaScript MIME type (because JavaScript is the default). This message is displayed up to 3 times. Message repeated 1 time for (line:char): 27:9.

Since W3C recommendation is: "Warning: The type attribute is unnecessary for JavaScript resources.",
I will withdraw type="text/javascript", so at least no warning issued when checked with W3C validator :-).
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Respect "rel=nofollow"

Post by Albert Wiersch »

ktp wrote: Fri Dec 18, 2020 11:03 am In javascript with src=, I did not put type="text/javascript" and got from validator:
The "type" attribute was not used so this script is assumed to be of type "text/javascript" (the default).

So I follow the validator and put type="text/javascript" and got now:
The "type" attribute should be omitted instead of specifying a value that is the empty string or a JavaScript MIME type (because JavaScript is the default). This message is displayed up to 3 times. Message repeated 1 time for (line:char): 27:9.

Since W3C recommendation is: "Warning: The type attribute is unnecessary for JavaScript resources.",
I will withdraw type="text/javascript", so at least no warning issued when checked with W3C validator :-).
This is just an informational message. It's not actually telling you to use the "type" attribute here.

If what is inside the "script" element (or what is specified by the "src" attribute) is JavaScript then you should not use the "type" attribute.

You can right-click on that messages when it's displayed in the Results Window and disable it if you want, so it doesn't show up anymore.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Respect "rel=nofollow"

Post by ktp »

> You can right-click on that messages when it's displayed in the Results Window and disable it if you want, so it doesn't show up anymore.

The problem is that I only have the messages in the Batch Wizard html report, and no message in Results Window (I assume that is in the Editor) for the same URL reported by Batch Wizard. And I really want to disable some of these messages :-(.
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Respect "rel=nofollow"

Post by Albert Wiersch »

ktp wrote: Fri Dec 18, 2020 12:50 pm > You can right-click on that messages when it's displayed in the Results Window and disable it if you want, so it doesn't show up anymore.

The problem is that I only have the messages in the Batch Wizard html report, and no message in Results Window (I assume that is in the Editor) for the same URL reported by Batch Wizard. And I really want to disable some of these messages :-(.
You only need to validate a file in the editor that causes the message to be generated. You can then disable the message and it will be disabled for the Batch Wizard and for all URLs.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Respect "rel=nofollow"

Post by Albert Wiersch »

ktp wrote: Fri Dec 18, 2020 10:38 am > 4) Are you sure there are no CSS messages when using the integrated editor? Most CSS messages should display in the 'Style' tab in the Results Window so you may need to select that tab to see the CSS messages.

I confirm that yes, there are no messages in Style tab for the same URL to validate.
So I checked the CSS checker option. It is confusing.

Editor: Options/Validator Engine Options/Enable CSS syntax checking
Batch Wizard: Options/Validator Engine Options/CSS checker => enable CSS style checking

The wording is confusing, one is "CSS syntax check", the other is "CSS style checking", and both have different presentations.
By toggle on/off some options, I discover in fact that they are same options under "Validator Engine Options"
Edit: Editor/Options, then in Batch Wizard Options (in menu), and also Options (with cranted wheel).

So CSS syntax check or CSS style check is well activated, and CSS warning messages appear in Batch Wizard report, while nothing is shown in The Editor Style tab. So this is a problem for me. With v20, no CSS messages in both Editor and Batch Wizard, with CSS style check activated. But now probably v21 has stronger CSS checker so new messages appear, but the discepancy in output (in Editor/Style tab, and in Batch Wizard report) is a problem.
Thank you. I will check the wording on the options and make sure it is consistent.

As for the discrepancy in the validator/CSS messages, there really shouldn't be any if the validator options are the same. Are you able to provide a sample document or URL with detailed instructions on how to reproduce the discrepancy? Or, if you can send your Batch Wizard report and some screenshots that show the issue then that might help me figure out what's happening as well.
ktp wrote: Fri Dec 18, 2020 10:38 am It works! With rel="nofollow" support:
1076 documents in 31.89 seconds
instead of
25,985 documents in 13 min 3secons, as before.

The number of documents (number of URLs) is in sync with the sitemap produced with other tool.

Thank you admin for your quick support. Keep up the good work!
That's great! I'm glad it is working now.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Post Reply