Bad links reported by Link checker though no error

For technical support and bug reports for all editions of CSS HTML Validator, including htmlval for Linux and Mac.
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Bad links reported by Link checker though no error

Post by ktp »

Albert Wiersch wrote: Wed Dec 23, 2020 2:19 pm
ktp wrote: Wed Dec 23, 2020 6:53 amQuestion: why does one get 200 OK status with such socket error 10053? If socket error, one cannot even establish connection to get status 200 OK, isn't it?
I am not sure. This is what the component I am using is reporting. It's possible the connection is being established but it is being aborted before it can complete. I also wonder if it has something to do with the SSL/TLS connection/negotiation. Is your test server using the same level of security and not anything obsolete for the SSL/TLS/HTTPS connection?

I have asked a question regarding this issue in the forum for the component that is used. They should know a lot more about why this may be happening than me.
The test and production servers have same configuration files (Apache + nginx), same settings for SSL/TLS (ciphers...). The difference is that the test server uses self-generated certificate, while the production server has certificate issued by Certificate Authority. Most (if not all browsers) do not like self-generated certificate (or bad or obsolete certificates) and issue warning with such URLs. The problem is that only some links (5000+) got 10053 error, not all (300K+ if I remember well) if I refer to the statistics shown during run, so probably the server load would be part of the problem, not the protocol itself.

I don't know if choices for using other component would make some difference. The sitemap generator tool I used offer many types of them.
Attachments
A1SG_crawler_engine_2020-12-23_213605.jpg
A1SG_crawler_engine_2020-12-23_213605.jpg (97.91 KiB) Viewed 2984 times
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Wed Dec 23, 2020 6:53 amQuestion: why does one get 200 OK status with such socket error 10053? If socket error, one cannot even establish connection to get status 200 OK, isn't it?
I found this link that explains this error in a more understandable way:
https://www.chilkatsoft.com/p/p_299.asp

So I am assuming the connection was established and response received but the connection was terminated at some point after that for some reason.

1. I could make it so that error 10053 is ignored if there's a 200 status code. What do you think of this idea?

2. Oh, if you're using any type of firewall or proxy then can you try turning that off and see if that has any effect on this issue?
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
ktp
Rank III - Intermediate
Posts: 60
Joined: Sat Oct 29, 2016 10:34 am

Re: Bad links reported by Link checker though no error

Post by ktp »

Albert Wiersch wrote: Wed Dec 23, 2020 6:34 pm
ktp wrote: Wed Dec 23, 2020 6:53 amQuestion: why does one get 200 OK status with such socket error 10053? If socket error, one cannot even establish connection to get status 200 OK, isn't it?
I found this link that explains this error in a more understandable way:
https://www.chilkatsoft.com/p/p_299.asp

So I am assuming the connection was established and response received but the connection was terminated at some point after that for some reason.


2. Oh, if you're using any type of firewall or proxy then can you try turning that off and see if that has any effect on this issue?
I make short test without any firewall, error 10053 still occured.

Albert Wiersch wrote: Wed Dec 23, 2020 6:34 pm
1. I could make it so that error 10053 is ignored if there's a 200 status code. What do you think of this idea?
Ignoring completely 10053 probably is not recommended, since we lose information (presence of these cases, number of occurrences).
My itnitial concern was to distinguish real bad links among these numerous 10053-200 OK links.
So maybe best compromise is to separate them in the report with e.g. mention Information only. They still be considered as OK.
So the "Bad links" will report correctly only links with HTTP status not OK, and we do not lose information on 10053 errors (presence and number of occurrences).
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Wed Dec 23, 2020 7:04 amOK I would suggest to have easier choice from user with an option to enable/disable rel="nofollow" without requiring to know about user function, script etc... Idem for the rel="canonical". User should have an easy choice for behaviour on rel="canonical", validate or nor, duplicate title/meta or not etc... Idem for behaviour with rel="nofollow". One tool I use for producing sitemap has single crawl filters options panel that is easy to undersand. Just toggle at choice for obeying "link" tag with rel="canonical" or "a" tag with rel="nofollow".
I've just added a new "Advanced" option in the Batch Wizard that stops the Batch Wizard from crawling and validating links with rel="nofollow". See attached screenshot. This will be in v21.0002 and above.

As for rel="canonical", I think it makes sense to exclude (by default) non-canonical targets from the duplicate page title and duplicate meta description reports so I'm not sure an option for this is warranted (at least at this time).
Attachments
DontCrawlNoFollow.png
DontCrawlNoFollow.png (23.24 KiB) Viewed 2968 times
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

ktp wrote: Wed Dec 23, 2020 11:08 pm I make short test without any firewall, error 10053 still occured.
OK, thanks for checking.
ktp wrote: Wed Dec 23, 2020 11:08 pmIgnoring completely 10053 probably is not recommended, since we lose information (presence of these cases, number of occurrences).
I agree. I was only considering ignoring 10053 errors in specific cases.
ktp wrote: Wed Dec 23, 2020 11:08 pmMy itnitial concern was to distinguish real bad links among these numerous 10053-200 OK links.
So maybe best compromise is to separate them in the report with e.g. mention Information only. They still be considered as OK.
So the "Bad links" will report correctly only links with HTTP status not OK, and we do not lose information on 10053 errors (presence and number of occurrences).
I could make it so that the 10053 error is treated as good only if the status code is 200 while still adding a link comment mentioning the 10053 error... but this comment wouldn't normally be seen because "good" links are usually excluded from the link reports (otherwise they would be too big). But the link comment would be more likely to be seen in the editor in the Results Window in the 'Links' tab.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
User avatar
Albert Wiersch
Site Admin
Posts: 3785
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Re: Bad links reported by Link checker though no error

Post by Albert Wiersch »

Regarding the 10053 error again, when the status code is 200... I'm going to ignore this error in the link checker when the status code is 200. I think the most likely cause is the server responding to the request and then immediately terminating the connection, so if the server responded that the link is OK, then it's best to treat it as such even if there is an issue with the socket connection. I will note the 10053 error in the link comment.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Post Reply