False error 404 reported ?
False error 404 reported ?
Hello,
I am using v25.0001 version, I run several times into the following problem.
Following is an example :
I start CSS HTML Validator, and run the Batch Wizard with the target: https://www.example.com/subdir/
Inside /subdir, there is https://www.example.com/subdir/menu.html
which includes https://www.example.com/subdir/page.html
Few days later, I change /subdir/menu.html to include now https://www.example.com/subdir_new/page.html
(/subdir_new/page.html is moved from /subdir/page.html on my server) then run the same Batch Wizard started few days ago.
It reports that https://www.example.com/subdir_new/page.html has error 404 !
If I click on the mentioned error 404 link in the report, of course it is valid (status 200).
I look at my Web server (Apache2) log, and all the requests (HEAD, GET) coming from "CSS HTML Validator Professional (https://www.htmlvalidator.com/)"
during the Batch Wizard run timeframe are ALL with status 200 (OK). No status 404, and for sure not for https://www.example.com/subdir_new/page.html. Also, of course, there is no trace of request for https://www.example.com/subdir/page.html , which is good.
So for me CSS HTML Validator seems to get confused somehow with same html name, and the real status reported by the Web server is not taken into account.
The only bypass I found was to restart CSS HTML Validator, and run the project again, without changing anything of course.
This time no more false error 404 links reported!
Note: on the contrary, when a true error 404 link is reported by Batch Wizard, as soon as I fix the problem and run the Batch Wizard again (without the need to restart it), no problem found, everything is coherent.
I am using v25.0001 version, I run several times into the following problem.
Following is an example :
I start CSS HTML Validator, and run the Batch Wizard with the target: https://www.example.com/subdir/
Inside /subdir, there is https://www.example.com/subdir/menu.html
which includes https://www.example.com/subdir/page.html
Few days later, I change /subdir/menu.html to include now https://www.example.com/subdir_new/page.html
(/subdir_new/page.html is moved from /subdir/page.html on my server) then run the same Batch Wizard started few days ago.
It reports that https://www.example.com/subdir_new/page.html has error 404 !
If I click on the mentioned error 404 link in the report, of course it is valid (status 200).
I look at my Web server (Apache2) log, and all the requests (HEAD, GET) coming from "CSS HTML Validator Professional (https://www.htmlvalidator.com/)"
during the Batch Wizard run timeframe are ALL with status 200 (OK). No status 404, and for sure not for https://www.example.com/subdir_new/page.html. Also, of course, there is no trace of request for https://www.example.com/subdir/page.html , which is good.
So for me CSS HTML Validator seems to get confused somehow with same html name, and the real status reported by the Web server is not taken into account.
The only bypass I found was to restart CSS HTML Validator, and run the project again, without changing anything of course.
This time no more false error 404 links reported!
Note: on the contrary, when a true error 404 link is reported by Batch Wizard, as soon as I fix the problem and run the Batch Wizard again (without the need to restart it), no problem found, everything is coherent.
- Albert Wiersch
- Site Admin
- Posts: 3834
- Joined: Sat Dec 11, 2004 9:23 am
- Location: Near Dallas, TX
- Contact:
Re: False error 404 reported ?
Do you mean you are getting false good links when you should be getting 404's when you move the location of page?
This would be because of the link cache. It prevents the program from checking the same link over and over again, which is usually (but not always) just a waste of time and resources.
You can reset the link cache to fix this or just restart the program. The fact that you say restarting the program fixes this makes me think it's a link cache issue.
You can use a TNPL user function to always reset the link cache before a Batch Wizard job as well:
Basically you just put the above function in a text file and then specify that text file as a user functions files in the 'Target List Options' tab of the Batch Wizard for the target list (or lists if more than one) that you want to use the user function(s) with.
For more info please see:
https://www.htmlvalidator.com/current/d ... kcache.htm
If this doesn't solve the problem then are you able to create some simple files and a Batch Wizard target list that I can use to try to reproduce the problem? Perhaps something on your own server? Or it can use local files. Whatever is easier and whatever reproduces the problem.
This would be because of the link cache. It prevents the program from checking the same link over and over again, which is usually (but not always) just a waste of time and resources.
You can reset the link cache to fix this or just restart the program. The fact that you say restarting the program fixes this makes me think it's a link cache issue.
You can use a TNPL user function to always reset the link cache before a Batch Wizard job as well:
Code: Select all
function onBeforeMainStart() {
ProgressMessage('Attempting to reset the link cache...',0x1000);
if resetLinkCache(30) {
ProgressMessage('The link cache was reset successfully.',0x1000);
}
else {
ProgressMessage('The link cache could not be reset.',2);
}
}
For more info please see:
https://www.htmlvalidator.com/current/d ... kcache.htm
If this doesn't solve the problem then are you able to create some simple files and a Batch Wizard target list that I can use to try to reproduce the problem? Perhaps something on your own server? Or it can use local files. Whatever is easier and whatever reproduces the problem.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Re: False error 404 reported ?
> Do you mean you are getting false good links when you should be getting 404's when you move the location of page?
No, it's the contrary. I got bad links (error 404) report, while in fact the location of page was moved correctly.
Note: I just found that in Validator Engine Options, Link Checking section: there is a Button "Reset Link Cache". Links in cache: 366. After clicking the button, Links in cache: 0.
From your suggesion on implementing TNPL user function, I just remember having two txt files created in 2020:
- userfunctions.txt
- wiz_userfunctions.txt
Content of wiz_userfunctions.txt:
Content of userfunctions.txt:
Questions:
1) I assume I have just to add your code snippet resetting the link cache to wiz_userfunctions.txt?
2) I no longer remember why my 2 user functions in 2020 are required for "canonical" and "nofollow" links?
What are their purposes? Are they still needed with version v25.0001 ?
No, it's the contrary. I got bad links (error 404) report, while in fact the location of page was moved correctly.
Note: I just found that in Validator Engine Options, Link Checking section: there is a Button "Reset Link Cache". Links in cache: 366. After clicking the button, Links in cache: 0.
From your suggesion on implementing TNPL user function, I just remember having two txt files created in 2020:
- userfunctions.txt
- wiz_userfunctions.txt
Content of wiz_userfunctions.txt:
Code: Select all
// to use in 'Target List Options' tab in the Batch Wizard
// support for rel="nofollow"
userfunctions.txt
function onTargetCanAdd() {
if $otca_flags&1 {
$otca_add=false;
}
}
Code: Select all
// Then specify the file as a 'user functions' file in the 'Config File' page of the Validator Engine Options (Ctrl+F4). Press the 'Reload Config' button and that's it!
// support for rel="canonical"
function onStartTag_link() {
if getAttValueEx('rel',12)=='canonical' {
if isBatchWizardJob {
$this_href=convertStringEx(7,'#');
$can_href=convertStringEx(7,getAttValueEx('href',12));
if !matchCase($this_href,$can_href) {
//Message(1,MSG_WARNING,'Canonical! this: '+$this_href+', canonical: '+$can_href);
abortValidation();
$_BatchWizard.report_dup_page_title=0;
$_BatchWizard.report_dup_meta_desc=0;
}
}
}
}
Questions:
1) I assume I have just to add your code snippet resetting the link cache to wiz_userfunctions.txt?
2) I no longer remember why my 2 user functions in 2020 are required for "canonical" and "nofollow" links?
What are their purposes? Are they still needed with version v25.0001 ?
- Albert Wiersch
- Site Admin
- Posts: 3834
- Joined: Sat Dec 11, 2004 9:23 am
- Location: Near Dallas, TX
- Contact:
Re: False error 404 reported ?
1) Yes, you can add the code snippet to reset the link cache to your current Batch Wizard user functions file.ktp wrote: ↑Thu Jan 09, 2025 2:44 pm Questions:
1) I assume I have just to add your code snippet resetting the link cache to wiz_userfunctions.txt?
2) I no longer remember why my 2 user functions in 2020 are required for "canonical" and "nofollow" links?
What are their purposes? Are they still needed with version v25.0001 ?
2a) For onTargetCanAdd(), your function appears to cause links with rel="nofollow" to not be crawled/followed for validation. You should be able to remove this function and instead check the 'Don't crawl and validate "a" links with rel="nofollow"' option in the 'Advanced' tab of the Batch Wizard, but it's up to you. You can keep using the function if you want as either method should do the same thing.
2b) The onStartTag_link() function looks like it's excluding certain (non-canonical) targets from the 'Duplicate Page Titles' and 'Duplicate Meta Descriptions' report as well as aborting the validation. Are you only wanting to validate canonical URLs? If you do then you still need that function.
As for the false 404 error, I will try to make a Batch Wizard target list that reproduces this problem. If I can reproduce it then I should be able to figure out what's going on and fix any bugs.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Re: False error 404 reported ?
Thank you for your explanations and support.
I just realize now that:
- Bad links in Batch Wizard report: only present if Link check is Enabled.
- Bad link is not equivalent to error 404 (I just check by renaming an HTML file, and there is no bad link reported, but the error 404 is pointed only in "Important Messages/Errors From Job Run" section).
In my reported case, the error 404 is reported where I saw a red 404 rectangle, and when I clicked on the associated link,it works, hence the problem of false report.
By the way, thanks to forum history, I found out our old discussion that is related also to false report and cache:
Bad links reported by Link checker though no error:
viewtopic.php?t=3160
About link cache and restart Batch Wizard:
viewtopic.php?p=12457#p12457
About "nofollow" and "canonical" link type:
viewtopic.php?p=12466#p12466
I just realize now that:
- Bad links in Batch Wizard report: only present if Link check is Enabled.
- Bad link is not equivalent to error 404 (I just check by renaming an HTML file, and there is no bad link reported, but the error 404 is pointed only in "Important Messages/Errors From Job Run" section).
In my reported case, the error 404 is reported where I saw a red 404 rectangle, and when I clicked on the associated link,it works, hence the problem of false report.
By the way, thanks to forum history, I found out our old discussion that is related also to false report and cache:
Bad links reported by Link checker though no error:
viewtopic.php?t=3160
About link cache and restart Batch Wizard:
viewtopic.php?p=12457#p12457
About "nofollow" and "canonical" link type:
viewtopic.php?p=12466#p12466
- Albert Wiersch
- Site Admin
- Posts: 3834
- Joined: Sat Dec 11, 2004 9:23 am
- Location: Near Dallas, TX
- Contact:
Re: False error 404 reported ?
Thank you for the additional information and links.
So that I fully understand the false 404's happening in the current version, can you send screenshots of exactly what you are seeing (with all the provided details for the false 404's)?
Note that the red rectangle/highlight doesn't necessarily mean that the problem/error is a '404' (not found URL). It could be another issue.
So that I fully understand the false 404's happening in the current version, can you send screenshots of exactly what you are seeing (with all the provided details for the false 404's)?
Note that the red rectangle/highlight doesn't necessarily mean that the problem/error is a '404' (not found URL). It could be another issue.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Re: False error 404 reported ?
OK, from your explanation, so thing is clear and coherent now. There is probably a problem but the red rectangle 404 is somehow misleading, since there is no real 404 error: clicking on mentioned 404 error link works (status 200), and check with Web server log shows no error 404 from Batch Wizard requests for this link.
Unfortunately I no longer have the corresponding Batch Wizard data since I run it several times since.
But I will keep an eye on the problem if ever it occurs again.
In the meantime I will keep CSS HTML Validator/Batch Wizard running without restarting it (as usual), and use no user function that resets the link cache.
Unfortunately I no longer have the corresponding Batch Wizard data since I run it several times since.
But I will keep an eye on the problem if ever it occurs again.
In the meantime I will keep CSS HTML Validator/Batch Wizard running without restarting it (as usual), and use no user function that resets the link cache.
- Albert Wiersch
- Site Admin
- Posts: 3834
- Joined: Sat Dec 11, 2004 9:23 am
- Location: Near Dallas, TX
- Contact:
Re: False error 404 reported ?
Sounds good. Please grab a good screenshot if the problem happens again.
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Re: False error 404 reported ?
I run into the problem again today.
I take some screen captures and zip/backup the PortableUserData\ directory just in case.
Following is what I do when the problem occurs.
I have a virtual machine with test root target https://vmdebian.example.com .
When I make modifications on my html files (change, move, deletion...), I always check everything
manually then run a non-regression test on my virtual machine with Batch Wizard to validate my changes.
When everything is OK, I then synchronize the changes to my production server, let's say for example https://www.myrealserver.com .
Note that inside my html files, the "canonical" and links in json-ld data (type breadcrumblist) have to be
with real server https://www.myrealsever.com prefix, they cannot be in relative form.
So this time I add some new links (new html files) and check on my virtual machine with Batch Wizard.
It points out 20 bad links (404 errors). The bad links are due to the "canonical" and links in json-ld
mentioned above: they are real 404 errors since I do not yet synchronize my changes to the real site/server.
So, after last check, I synchronize my changes to real server/real site, and now all the 20 mentioned bad links
before should be gone. But the problem is that (probably due to the link cache of CSS HTML Validator),
there are still there! But this time if I click on them, they all works : status 200! The situation will
not change even if I run the Batch Wizard again and again without resetting the link cache (if as an user I ever know
that it is a link cache problem), or without restarting CSS HTML Validator + Batch Wizard (which I did previously as a bypass).
Hence the problem of "false 404 error reported".
Note: after few hours, without restarting Batch Wizard or clearing the link cache, I run the Batch Wizard again, this time surprisingly everything is OK, no longer false error 404. So maybe the cache has also a timeout limit (although it still displays 525621 links)?
I take some screen captures and zip/backup the PortableUserData\ directory just in case.
Following is what I do when the problem occurs.
I have a virtual machine with test root target https://vmdebian.example.com .
When I make modifications on my html files (change, move, deletion...), I always check everything
manually then run a non-regression test on my virtual machine with Batch Wizard to validate my changes.
When everything is OK, I then synchronize the changes to my production server, let's say for example https://www.myrealserver.com .
Note that inside my html files, the "canonical" and links in json-ld data (type breadcrumblist) have to be
with real server https://www.myrealsever.com prefix, they cannot be in relative form.
So this time I add some new links (new html files) and check on my virtual machine with Batch Wizard.
It points out 20 bad links (404 errors). The bad links are due to the "canonical" and links in json-ld
mentioned above: they are real 404 errors since I do not yet synchronize my changes to the real site/server.
So, after last check, I synchronize my changes to real server/real site, and now all the 20 mentioned bad links
before should be gone. But the problem is that (probably due to the link cache of CSS HTML Validator),
there are still there! But this time if I click on them, they all works : status 200! The situation will
not change even if I run the Batch Wizard again and again without resetting the link cache (if as an user I ever know
that it is a link cache problem), or without restarting CSS HTML Validator + Batch Wizard (which I did previously as a bypass).
Hence the problem of "false 404 error reported".
Note: after few hours, without restarting Batch Wizard or clearing the link cache, I run the Batch Wizard again, this time surprisingly everything is OK, no longer false error 404. So maybe the cache has also a timeout limit (although it still displays 525621 links)?
- Attachments
-
- htmlval_false_404_errors_2_2025-01-13.jpg (78.54 KiB) Viewed 128 times
-
- htmlval_false_404_errors_1_2025-01-13.jpg (113.22 KiB) Viewed 128 times
-
- htmlval_false_404_errors_0_cache_2025-01-13.jpg (267.13 KiB) Viewed 128 times
- Albert Wiersch
- Site Admin
- Posts: 3834
- Joined: Sat Dec 11, 2004 9:23 am
- Location: Near Dallas, TX
- Contact:
Re: False error 404 reported ?
Thank you for the detailed information and screenshots.
Please give me a few days to review the link checking code and get back to you.
In the mean time, are you sure there is no cache system on any of your servers that might also be causing or contributing to this?
Please give me a few days to review the link checking code and get back to you.
In the mean time, are you sure there is no cache system on any of your servers that might also be causing or contributing to this?
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
- Albert Wiersch
- Site Admin
- Posts: 3834
- Joined: Sat Dec 11, 2004 9:23 am
- Location: Near Dallas, TX
- Contact:
Re: False error 404 reported ?
Also, in the Link Checking options page of the Validator Engine Options, do you have the 'Recheck bad links at the next link check' option checked? It should be checked by default.
If this option is checked and it's working right, then it should try rechecking those bad links without the need to reset the link cache. Perhaps it is but for some reason you are experiencing an unexpected delay (you said it works after a few hours).
If this option is checked and it's working right, then it should try rechecking those bad links without the need to reset the link cache. Perhaps it is but for some reason you are experiencing an unexpected delay (you said it works after a few hours).
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Re: False error 404 reported ?
> Also, in the Link Checking options page of the Validator Engine Options, do you have the 'Recheck bad links at the next link check' option checked? It should be checked by default.
If you look at my screenshot showing the number of links cache (525621) above, you can see that this option is checked (either by default).
> If this option is checked and it's working right, then it should try rechecking those bad links with out the need to reset the link cache. Perhaps it is but for some reason you are experiencing an unexpected delay (you said it works after a few hours).
When the batch wizard is ended, I immediately looked at its report, and tried the errors 404 mentioned. First it pointed correctly (real 404 errors), so I synchronize to real server that fixes the problem, then rerun the Batch Wizard again (so after few minutes). The errors 404 are still there, so I try several times again running Batch Wizard, same false errors 404 problem (so within few minutes time lapse). Afterwards I took a nap , then when awaken I tried again and the problem dissapears.
If you look at my screenshot showing the number of links cache (525621) above, you can see that this option is checked (either by default).
> If this option is checked and it's working right, then it should try rechecking those bad links with out the need to reset the link cache. Perhaps it is but for some reason you are experiencing an unexpected delay (you said it works after a few hours).
When the batch wizard is ended, I immediately looked at its report, and tried the errors 404 mentioned. First it pointed correctly (real 404 errors), so I synchronize to real server that fixes the problem, then rerun the Batch Wizard again (so after few minutes). The errors 404 are still there, so I try several times again running Batch Wizard, same false errors 404 problem (so within few minutes time lapse). Afterwards I took a nap , then when awaken I tried again and the problem dissapears.
Last edited by ktp on Mon Jan 13, 2025 11:34 am, edited 1 time in total.
Re: False error 404 reported ?
I am thinking about the possibility of an user error mistakenly taken the same old html report from Batch Wizard, in my case:
file:///D:/private/programs/Portable/PortableCSSHtmlValidator/PortableUserData/batchreport.html
But it is not possible, since even if I let opened old, previous reports (this is not the case), the latest
report from Batch Wizard run will always popups and be presented in foreground on a new tab of my Firefox browser,
while I am watching Batch Wizard processing window. And with my many retries running Batch Wizard, I always see errors 404.
file:///D:/private/programs/Portable/PortableCSSHtmlValidator/PortableUserData/batchreport.html
But it is not possible, since even if I let opened old, previous reports (this is not the case), the latest
report from Batch Wizard run will always popups and be presented in foreground on a new tab of my Firefox browser,
while I am watching Batch Wizard processing window. And with my many retries running Batch Wizard, I always see errors 404.
- Albert Wiersch
- Site Admin
- Posts: 3834
- Joined: Sat Dec 11, 2004 9:23 am
- Location: Near Dallas, TX
- Contact:
Re: False error 404 reported ?
Yes, I see that. Sorry I missed it.
Yes, that a good point... could you be seeing an old report? Perhaps cached?
OK.ktp wrote: ↑Mon Jan 13, 2025 11:01 am But it is not possible, since even if I let opened old, previous reports (this is not the case), the latest
report from Batch Wizard run will always popups and be presented in foreground on a new tab of my Firefox browser,
while I am watching Batch Wizard processing window. And with my many retries running Batch Wizard, I always see errors 404.
I did look at the code a little so far, and I made a couple of tweaks but I am not sure if the tweaks will help with this issue.
1) Would you like to try a test build to see if there is any different?
2) If you reset the link cache or restart the program, the false 404 issue is always resolved?
Albert Wiersch, CSS HTML Validator Developer • Download CSS HTML Validator FREE Trial
Re: False error 404 reported ?
Yes, I always restart the program as bypass (when I did not know that it is related to link cache). This is why I was interested in the userfunction that clear the link cache at start of Batch Wizard. Note: I found out that closing the Batch Wizard window did not clear the cache. I need to exit completely CSS HTML Validator program, restart it, then invoke again Batch Wizard from menu Tools > Batch Wizard.Albert Wiersch wrote: ↑Mon Jan 13, 2025 11:36 am 2) If you reset the link cache or restart the program, the false 404 issue is always resolved?
I would prefer to find out first a test case that I can 100% reproduce the problem. This way the problem cause would be throughly understood, and then the solution will be completely satisfactory. So I suggest that I still keep an eye on this rare problem, that is related somehow with link cache mechanism.Albert Wiersch wrote: ↑Mon Jan 13, 2025 11:36 am 1) Would you like to try a test build to see if there is any different?