Bad links reported by Link checker though no error

For technical support and bug reports for all editions of CSS HTML Validator, including htmlval for Linux and Mac.
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

Thank you for the screenshots.

Yes, the problem is the "OK" links that CSS HTML Validator thinks are bad. I am not yet sure if they are really OK or they are really bad (based on a bad server response) but there seems to be an issue here. I am not sure if it's on your server/network side or a bug/issue with CSS HTML Validator.

I still do not understand why changing it to 'Errors only' would affect the link checker. If you want to change it back to what it was before and then run the Batch Wizard job again to see if this problem comes back then that would be helpful, but I'll leave that up to you.

In the meantime I'll keep an eye out for this problem. I did make a change for the next version that will (hopefully) provide more information other than just "OK" if this problem happens again.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Mon Dec 21, 2020 6:23 am Another screen capture from the Link report (it said so). It said: 1001 links extracted: 3 bad, 0 warning, 997 good, 1 not checked, 0 excluded, 3 in this link report. Then it shows 3 lines, each line with a link, with status OK for all 3 links. Note: On other pages of the report, if there are 6 links, then it said: 6 bad etc...

So for plain user, it is contradictory! It is as if the Link report as a person said to me: Hey, I checked for you the links, there are 3 bad links, here the proof: all 3 links have status 200 OK. Do you agree? You can guess my answer! Unless there are recent changes in HTTP protocol that I am unaware of, so now 2xx code is swapped with 4xx or 5xx :-). By the way, the server supports http/2.
Yes, that is not right. There is no status code as it only says "OK". Here is what it should look like (if you've enabled the link report to include good links). Notice it is green instead of red and says "200 - OK" instead of just "OK"... so I am wondering if the server didn't return a status code for some reason.

I tried to reproduce my server failing to return a status code but it would not let me test that easily as it generated errors for my test script regarding the lack of a status code. Hopefully the change I made in the next update to include more information in the 'Status' field (other than just "OK") when this happens will help.
Attachments
BatchWizardLinkReport.PNG
BatchWizardLinkReport.PNG (7.44 KiB) Viewed 3858 times
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Bad links reported by Link checker though no error

Post by ktp »

Albert Wiersch wrote: Mon Dec 21, 2020 11:21 am ...
I still do not understand why changing it to 'Errors only' would affect the link checker. If you want to change it back to what it was before and then run the Batch Wizard job again to see if this problem comes back then that would be helpful, but I'll leave that up to you.
...
There were not only the Validtor Engine Option Full/Erros only but also Link report option.

My current settings (that produces good result for me) concerning Batch Wizard (BW):
BW Validator: Errors only
Link checking: enabled
Link report: disabled

To solve the mystery, I will do two runs corresponding to my 2 configurations that I used on the same test server. Each run takes about 30-40 min.

First run (normally the one with bad result: false bad links)
BW Validator: Full
Link checking: enabled
Link report: enabled

Second run (normally the one with good result: true bad links)
BW Validator: Errors only
Link checking: enabled
Link report: disabled

I will report the two results. First run was started few minutes ago.
Attachments
BatchWizard_Validator_Full_2020-12-21_184209.jpg
BatchWizard_Validator_Full_2020-12-21_184209.jpg (130.75 KiB) Viewed 3853 times
BatchWizard_Always_generate_a_link_report_2020-12-21_184140.jpg
BatchWizard_Always_generate_a_link_report_2020-12-21_184140.jpg (175.47 KiB) Viewed 3853 times
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

OK, thank you. Please let me know if you continue to get false bad links.

If you want, you can wait until the next update for the change I made that will (hopefully) provide more information in case you get more false bad links. I should be able to send you an update in the next day or 2 if you'd like.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Bad links reported by Link checker though no error

Post by ktp »

There are good news and bad news.

The good news is that as expected, Run 1 did have false bad links.
The bad news is that Run 2 gave same result as Run 1, instead of good result as expected.

In summary, this is still good news since I can reproduce the problem.

For the same server, same environment, no changes in any files (html...):

First run settings:
BW Validator: Full
Link checking: enabled
Link report: enabled

First run results:
# of docs with % of docs with Total number of
Errors 146 0.5% 943
Warnings 0 0.0% 0
No errors or warnings 29403 99.5% -
Bad links 5359 18.1% 15320
Total documents processed 29549 documents in 36 minutes, 47.81 seconds
Report created lundi 21 décembre 2020 at 18:43:05


Second run settings:[
BW Validator: Errors only
Link checking: enabled
Link report: disabled

Second run results:

# of docs with % of docs with Total number of
Errors 146 0.5% 943
Warnings 0 0.0% 0
No errors or warnings 29403 99.5% -
Bad links 5359 18.1% 15320
Total documents processed 29549 documents in 23 minutes, 42.08 seconds
Report created lundi 21 décembre 2020 at 19:25:45

Edit (add): by the way, I remember now. The "good" result is got from production server, and the "bad" result is got from test server. This could be a point to take into account.
Attachments
BatchWizard_Run2_2020-12-21_195007.jpg
BatchWizard_Run2_2020-12-21_195007.jpg (199.33 KiB) Viewed 3846 times
BatchWizard_Run1_2020-12-21_192014.jpg
BatchWizard_Run1_2020-12-21_192014.jpg (189.25 KiB) Viewed 3846 times
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Mon Dec 21, 2020 1:00 pm Edit (add): by the way, I remember now. The "good" result is got from production server, and the "bad" result is got from test server. This could be a point to take into account.
Thanks you for posting the results.

I think the above is important information. This may very well have something to do with the server being used.

If you are able to reproduce it on the production/public server, then please let me know. I would like to try to reproduce it (I haven't been able to so far).

Otherwise please wait until the next update to try again. The next update contains that change I talked about that will (hopefully) provide more information into why this is happening.

Also, you may want to check the configuration of your test server and make sure it is returning the correct HTTP status codes.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

By the way, if you did not exit and restart CSS HTML Validator before run 2, then you may want to try run 2 again because the link cache should have been cleared before running the new test job. I wish I had mentioned that earlier but it did not occur to me at the time. Sorry about that.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Bad links reported by Link checker though no error

Post by ktp »

Effectively, I did not restart the Batch Wizard + Editor. So I restarted it and do Run 2 again.
The number of bad links is different now. So probably it is something related to the server
as you suspected.


My test server is a VMware virtual machine 1 GB RAM running CentOS 7 on my laptop i5 10th generation.
Production server is on a Xeon server 16 GB RAM. The two have same softwares (Apache/Nginx and settings).
But on production server, during Link check, the CPU load was never higher than 1.0, while on
the VM machine (test server) it momentarily reach about 10 (high load) as alerted by my monitoring software:
High 5 minute load average alert:

Time: Sat Dec 19 20:07:46 2020 +0100
1 Min Load Avg: 9.81
5 Min Load Avg: 6.49
15 Min Load Avg: 3.64
Running/Total Processes: 10/153

I observed that Link checker issues about 100+ req/s on the production server.

I will wait for your new build to test again.
Attachments
BatchWizard_Run2bis_2020-12-21_214017.jpg
BatchWizard_Run2bis_2020-12-21_214017.jpg (171.69 KiB) Viewed 3836 times
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Mon Dec 21, 2020 2:48 pmI observed that Link checker issues about 100+ req/s on the production server.
If it's too much this can be adjusted with the settings in the 'Link Checking' page in the Validator Engine Options.

Here is more information:
https://www.htmlvalidator.com/current/d ... g_node.htm
ktp wrote: Mon Dec 21, 2020 2:48 pm I will wait for your new build to test again.
Sounds good. I hope to get that update to you soon (perhaps within the next 24 hours).
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Bad links reported by Link checker though no error

Post by ktp »

OK mystery revealed.
I run with new build v21.0001 from admin, and got same bad links reported, but now with more details in Status column.
Socket error 10053 (winsock). This explains the difference between production and test server: production server is much more powerful.

Solutions?
- reducing number of request/second when testing on test server? But this increase the test duration.
- increase some network parameters on test server?
- disable firewall on test server?
- other?

So CSS HTML Validator was half right when saying bad links :-).

By the way, I did not use the Validator user function for rel="canonical" since it is now supported built-in. But I still use the Batch Wizard user function for rel="nofollow". Why this script is not built-in incorporated into CSS HTML Validator?
Attachments
Validator_Winsock_10053_2020-12-23_054126.jpg
Validator_Winsock_10053_2020-12-23_054126.jpg (56.65 KiB) Viewed 3805 times
Validator_4515_bad_links_2020-12-23_054152.jpg
Validator_4515_bad_links_2020-12-23_054152.jpg (174.48 KiB) Viewed 3805 times
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Tue Dec 22, 2020 10:58 pm OK mystery revealed.
I run with new build v21.0001 from admin, and got same bad links reported, but now with more details in Status column.
Socket error 10053 (winsock). This explains the difference between production and test server: production server is much more powerful.
It's great that this shed some light on the issue. It looks like the network system is aborting the connection for some reason.
ktp wrote: Tue Dec 22, 2020 10:58 pm Solutions?
- reducing number of request/second when testing on test server? But this increase the test duration.
- increase some network parameters on test server?
- disable firewall on test server?
- other?
Yes, those sound like good things to try. If you are OK with increasing the validation time then I would start with reducing the 'Max simultaneous link checks' to 10 (the default should be 15). That option is in the 'Link Checking' page of the Validator Engine Options. You might as well also slightly increase the wait time between checks to 300ms or so. I suspect these things will take a significant load off your test server.
ktp wrote: Tue Dec 22, 2020 10:58 pmBy the way, I did not use the Validator user function for rel="canonical" since it is now supported built-in. But I still use the Batch Wizard user function for rel="nofollow". Why this script is not built-in incorporated into CSS HTML Validator?
I don't think it makes sense not to check/validate rel="nofollow" links by default, because rel="nofollow" only tells search engines to ignore the link. People can still go to the link and view the page, and if people can go to the link and view it, then most web developers would probably still want to check those pages in case there are errors, spelling, link, and/or accessibility issues that will affect a user's experience.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Bad links reported by Link checker though no error

Post by ktp »

Albert Wiersch wrote: Tue Dec 22, 2020 11:44 pm ...
Yes, those sound like good things to try. If you are OK with increasing the validation time then I would start with reducing the 'Max simultaneous link checks' to 10 (the default should be 15). That option is in the 'Link Checking' page of the Validator Engine Options. You might as well also slightly increase the wait time between checks to 300ms or so. I suspect these things will take a significant load off your test server.
...
Theory: Using trial after settigngs "Wait time between checks" value, then stopping after few percent done: if no bad links reported, you can decrease the wait time between checks. So you wil reach ater several trials the sweet spot where no bad links are reported (and also minimum test duration).
Note: the link checker does not consume lot of CPU as Validator.

But in practice, with "Wait time" set to 20 ms, it takes 2 hours 25 minutes to fish (the link check takes 2 hours by itself), and still Bad links (error 10053). Two hours is too long for me. So I reverse back to default values, and will only use Link check on production server.

Question: why does one get 200 OK status with such socket error 10053? If socket error, one cannot even establish connection to get status 200 OK, isn't it?
Attachments
2hours_26min_run_2020-12-23_115702.jpg
2hours_26min_run_2020-12-23_115702.jpg (174.94 KiB) Viewed 3791 times
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Bad links reported by Link checker though no error

Post by ktp »

Albert Wiersch wrote: Tue Dec 22, 2020 11:44 pm ...
I don't think it makes sense not to check/validate rel="nofollow" links by default, because rel="nofollow" only tells search engines to ignore the link. People can still go to the link and view the page, and if people can go to the link and view it, then most web developers would probably still want to check those pages in case there are errors, spelling, link, and/or accessibility issues that will affect a user's experience.
OK I would suggest to have easier choice from user with an option to enable/disable rel="nofollow" without requiring to know about user function, script etc... Idem for the rel="canonical". User should have an easy choice for behaviour on rel="canonical", validate or nor, duplicate title/meta or not etc... Idem for behaviour with rel="nofollow". One tool I use for producing sitemap has single crawl filters options panel that is easy to undersand. Just toggle at choice for obeying "link" tag with rel="canonical" or "a" tag with rel="nofollow".
Attachments
A1SG_crawl_filters_2020-12-23_090940.jpg
A1SG_crawl_filters_2020-12-23_090940.jpg (106.58 KiB) Viewed 3791 times
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Wed Dec 23, 2020 6:53 am Theory: Using trial after settigngs "Wait time between checks" value, then stopping after few percent done: if no bad links reported, you can decrease the wait time between checks. So you wil reach ater several trials the sweet spot where no bad links are reported (and also minimum test duration).
Note: the link checker does not consume lot of CPU as Validator.
That's an interesting approach to dynamically adjust the settings, but the "devil is in the details" and trying to implement and debug this would probably be more complex and time consuming than can be justified.
ktp wrote: Wed Dec 23, 2020 6:53 am But in practice, with "Wait time" set to 20 ms, it takes 2 hours 25 minutes to fish (the link check takes 2 hours by itself), and still Bad links (error 10053). Two hours is too long for me. So I reverse back to default values, and will only use Link check on production server.
I understand. That's a long time especially if it doesn't even solve the problem and you're still getting that Error. It could be something else too, like a firewall? But this would take more time to investigate and may not be worth it.
ktp wrote: Wed Dec 23, 2020 6:53 amQuestion: why does one get 200 OK status with such socket error 10053? If socket error, one cannot even establish connection to get status 200 OK, isn't it?
I am not sure. This is what the component I am using is reporting. It's possible the connection is being established but it is being aborted before it can complete. I also wonder if it has something to do with the SSL/TLS connection/negotiation. Is your test server using the same level of security and not anything obsolete for the SSL/TLS/HTTPS connection?

I have asked a question regarding this issue in the forum for the component that is used. They should know a lot more about why this may be happening than me.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Wed Dec 23, 2020 7:04 am OK I would suggest to have easier choice from user with an option to enable/disable rel="nofollow" without requiring to know about user function, script etc... Idem for the rel="canonical". User should have an easy choice for behaviour on rel="canonical", validate or nor, duplicate title/meta or not etc... Idem for behaviour with rel="nofollow". One tool I use for producing sitemap has single crawl filters options panel that is easy to undersand. Just toggle at choice for obeying "link" tag with rel="canonical" or "a" tag with rel="nofollow".
Thank you for the suggestion. It should be possible to add some easy options that automatically add/use the necessary "user functions" behind-the-scenes. I'll make a note of this for a possible future update.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Post Reply