Batch Wizard - false validation errors

For topics about current BETA or future releases, including feature requests.
Sares
Rank II - Novice
Rank II - Novice
Posts: 21
Joined: Tue Jun 19, 2007 6:51 am

Batch Wizard - false validation errors

Post by Sares » Thu May 15, 2008 6:43 am

Hi,

I am using latest Beta 9 and the Batch Wizard to automatically validate a webproject with hundreds of files using the URL & follow link method. This works really good, but I always get false positives in the validation report.

If I validate the same file within HTML Validator the page is valid.

It says:

-----------------------------------

2:2 Error The "html" tag must have an end tag (</html>) but the end tag was not found. In XHTML, all elements not declared as empty (like "b", "p", "span", etc.) must have an end tag. Elements declared as empty (like "br", "img", "input", etc.) can have an end tag or can use the empty element shorthand.
<html xmlns="http://www.w3.org/1999/xhtml">

60 Error Reached the end of the document before the ending '>', '-->', '?>', '##>', or '%>'.
</html

Comment CSE HTML Validator Professional 9.00 BETA 1 expires on October 1, 2008, 138.46 days from now. Note that BETA versions always expire, even if registered. Comment "Errors only" mode enabled. Not a thorough or complete validation.

-----------------------------------

I also found some sections where the report shows the following:


-----------------------------------


78:8 Error Found text after the </html> end tag. Text should not appear here.
</html>s


Comment CSE HTML Validator Professional 9.00 BETA 1 expires on October 1, 2008, 138.46 days from now. Note that BETA versions always expire, even if registered. Comment "Errors only" mode enabled. Not a thorough or complete validation.

-----------------------------------

But as you already mentioned no text after the closing </html> tag. Maybe the Batch wizard has some problems following dynamical pages (PHP in this case) where the links to follow has attributes in it (f.ex. index.php?id=3, or something like this: http://localhost/index.php?dir=EV%20MO

But it also happens on pages without any dynamical links to follow.
Maybe an importand addtional information is that all files are XHTML 1.0 transitional and UTF-8 encoded (content & file-encoding), if this is useful.

Any help would be really appreciated as it´s a bit annoying to double check the results and hundreds of false validation messages.

Thank you very much and kind regards
Sares

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3453
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Thu May 15, 2008 9:17 am

Hello,

I'm thinking that PHP is sending back (for some reason) different pages than what you think it is sending back and CSE HTML Validator is validating that.

Is there any way you could provide a sample target list that I could run to reproduce the problem? If I can reproduce the issue, then I can find out what's happening. The simpler the better as long as the issue is reproduced.

Also, did you have this same issue with v8 or another version other than v9 BETA?
Image
Albert Wiersch

Sares
Rank II - Novice
Rank II - Novice
Posts: 21
Joined: Tue Jun 19, 2007 6:51 am

Post by Sares » Thu May 15, 2008 10:00 am

Hello Albert,

I´ve made another validation using the latest 8.0.4 version and the errors disappeared (except errors because of not encoded & - I was not able to find how to disable this check, can you give me a hint please).

No error because of text after </html> and no "Error The "html" tag must have an end tag" - message. I´ve validated the same project using almost the same settings with target list URL and follow link method.

So something is different between these two versions of the Batch-Wizard. I unfortunately can not provide you a target list, because this app is on a local intranet and it´s hard to extract some parts of it.

But I can make additonal test if you would tell me what I should do.

Thank you very much for your help and for your time!
Sares

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3453
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Thu May 15, 2008 10:06 am

Sares wrote:Hello Albert,

I´ve made another validation using the latest 8.0.4 version and the errors disappeared (except errors because of not encoded & - I was not able to find how to disable this check, can you give me a hint please).
Hello,

You may be able to disable that message by checking a document that generates the message you want to disable in the editor and when the messages are displayed in the Results Window, right-click and disable the message you don't want. If that doesn't work, then please copy and paste the exact text of the message so I know exactly which message or check you are wanting to disable.
Sares wrote:No error because of text after </html> and no "Error The "html" tag must have an end tag" - message. I´ve validated the same project using almost the same settings with target list URL and follow link method.

So something is different between these two versions of the Batch-Wizard. I unfortunately can not provide you a target list, because this app is on a local intranet and it´s hard to extract some parts of it.
Then can you send me the report page from the Batch Wizard that has these false errors? Please send to support at htmlvalidator dot com. That may help (but may not be enough). I'll let you know.
Image
Albert Wiersch

Sares
Rank II - Novice
Rank II - Novice
Posts: 21
Joined: Tue Jun 19, 2007 6:51 am

Post by Sares » Thu May 15, 2008 10:21 am

Hi,

the error message of the Batch Wizard 8 is the following:

--

Found an ampersand '&' but could not find a following semicolon for a properly formed character reference. If you want to actually display an ampersand, such as in "Recreation & Sports", then use the character entity reference "&" as in "Recreation & Sports" (see comment).

--

Any chance to get it disabled in the settings somehow? I will also try the method you´ve mentioned, but I would prefer it to know what to change in the config.


Then can you send me the report page from the Batch Wizard that has these false errors? Please send to support at htmlvalidator dot com. That may help (but may not be enough). I'll let you know.
Hmm, do you really need the complete report? It´s a real huge report and there are no other errors than the two I described, but these two repeat over and over again (errors on more than 300 Pages) ...

I asked you this, because of privacy & security reasons I would have to anonymize the report and the URLS in it and this would be crazy :shock:

Sares

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3453
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Thu May 15, 2008 10:30 am

Sares wrote:Hi,

the error message of the Batch Wizard 8 is the following:

--

Found an ampersand '&' but could not find a following semicolon for a properly formed character reference. If you want to actually display an ampersand, such as in "Recreation & Sports", then use the character entity reference "&" as in "Recreation & Sports" (see comment).
It looks like you are using '&' outside of a tag. In that case, the message cannot be disabled. You should use '&' instead.
Sares wrote:Hmm, do you really need the complete report? It´s a real huge report and there are no other errors than the two I described, but these two repeat over and over again (errors on more than 300 Pages) ...
I don't need the complete report, just a piece of it or a report page that shows the validator messages and any document source related to those messages.
Image
Albert Wiersch

Sares
Rank II - Novice
Rank II - Novice
Posts: 21
Joined: Tue Jun 19, 2007 6:51 am

Post by Sares » Thu May 15, 2008 10:42 am

Hi,
It looks like you are using '&' outside of a tag. In that case, the message cannot be disabled. You should use '&' instead.
Yes you are right, & is used a lot in the query-string and sometimes in the normal generated html code. Changing this would be a real challenge, because a lot came out of databases or is generated dynamically.

But ok, if it´s not possible to disable the check of proper &-encoding it´s not a big thing, but it would be fine if ...
I don't need the complete report, just a piece of it or a report page that shows the validator messages and any document source related to those messages.
Ok, I will check some pages and send you the required informations.

Thank you!
Sares

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3453
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Thu May 15, 2008 11:06 am

Sares wrote:Yes you are right, & is used a lot in the query-string and sometimes in the normal generated html code. Changing this would be a real challenge, because a lot came out of databases or is generated dynamically.
Ooops... there are so many options in CSE HTML Validator that I can't remember them all. :D

Try disabling the Validate references option in the Validator 2 tab of the Validator Engine Options.
Image
Albert Wiersch

Sares
Rank II - Novice
Rank II - Novice
Posts: 21
Joined: Tue Jun 19, 2007 6:51 am

Post by Sares » Thu May 15, 2008 12:19 pm

Hi,
Ooops... there are so many options in CSE HTML Validator that I can't remember them all. Try disabling the Validate references option in the Validator 2 tab of the Validator Engine Options.
I can exactly understand what you mean, HTML Validator is quite feature and configuration rich :D

Ok, I have disabled this option and in V 8.04 the Batch Validation ended with 0 errors. Before I have resettet to the standard-config and have only disabled a few things like the & check.

Afterwards I have done the same with Beta V9. It ended up in more than 350 docs with errors having round about 820 errors in it :shock:

Here the complete error Message of one page:

Code: Select all

2:2 Error The "html" tag must have an end tag (</html>) but the end tag was not found. In XHTML, all elements not declared as empty (like "b", "p", "span", etc.) must have an end tag. Elements declared as empty (like "br", "img", "input", etc.) can have an end tag or can use the empty element shorthand.
<html xmlns="http://www.w3.org/1999/xhtml">
 
 
68 Error Reached the end of the document before the ending '>', '-->', '?>', '##>', or '%>'.
</html
 
 
  Comment CSE HTML Validator Professional 9.00 BETA 1 expires on October 1, 2008, 138.22 days from now. Note that BETA versions always expire, even if registered. 
  Comment "Errors only" mode enabled. Not a thorough or complete validation. 
and here the source-code for this file:

Code: Select all

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Client Login</title>
    <link rel="stylesheet" media="screen" href="../../css/styles.css" />
    <script type="text/javascript" language="JavaScript" src="../../js/functions.js"></script>    
    <script language="JavaScript" type="text/javascript">
        function setfocus() { document.login.username.focus(); }
    </script>
</head>

<body onload="setfocus()">
<div id="DMBRI" style="position:absolute;">
<img src="../../dhtml/images/dmb_i.gif" name="DMBImgFiles" width="1" height="1" border="0" alt="" />
<img src="../../dhtml/dmb_m.gif" name="DMBJSCode" width="1" height="1" border="0" alt="" />
</div>
<script language="JavaScript" type="text/javascript" src="../../js/dhtml.js"></script>
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tr>
    <td colspan="3" bgcolor="#CEEFFF">        
        <img src="../../images/spacer.gif" width="16" height="1" alt="" />
        <a href="javascript:;" onmouseover="cFrame.ShowMenu('IR',19, 19, false,'pic1|3',0);" onmouseout="cFrame.tHideAll();"><img src="../../images/home.png" width="64" height="18" alt="" border="0" name="pic1" /></a><a onmouseover="cFrame.ShowMenu('Business', 83, 19, false,'pic2|3',0);" onmouseout="cFrame.tHideAll();" href="javascript:;"><img src="../../images/business.png" width="66" height="18" alt="" border="0" name="pic2" /></a><a onmouseover="cFrame.ShowMenu('Employee', 149, 19, false,'pic3|3',0);" onmouseout="cFrame.tHideAll();" href="javascript:;"><img src="../../images/employee.png" width="111" height="18" alt="" border="0" name="pic3" /></a><a onmouseover="cFrame.ShowMenu('Service', 260, 19, false,'pic4|3',0);" onmouseout="cFrame.tHideAll();" href="javascript:;"><img src="../../images/service.png" width="57" height="18" alt="" border="0" name="pic4" /></a><a onmouseover="cFrame.ShowMenu('HumanRessources', 317, 19, false,'pic5|3',0);" onmouseout="cFrame.tHideAll();" href="javascript:;"><img src="../../images/human-resources.png" width="110" height="18" alt="" border="0" name="pic5" /></a><a onmouseover="cFrame.ShowMenu('Departments', 427, 19, false,'pic6|3',0);" onmouseout="cFrame.tHideAll();" href="javascript:;"><img src="../../images/departments.png" width="80" height="18" alt="" border="0" name="pic6" /></a>
    </td>
</tr>
<tr>
    <td colspan="3" bgcolor="#CEEFFF"><img src="../../images/spacer.gif" width="1" height="35" border="0" alt="" /></td>
</tr>
<tr bgcolor="#CEEFFF">
    <td><img src="../../images/spacer.gif" width="20" height="10" border="0" alt="" /></td>
    <td>&nbsp;</td>
    <td valign="bottom" align="right"><a href="../../home.php"><img src="../../images/small.png" width="311" height="55" alt="" border="0" /></a><img src="../../images/spacer.gif" width="10" height="1" border="0" alt="" /></td>
</tr>
<tr>
    <td colspan="3"><img src="../../images/spacer.gif" width="100" height="1" border="0" alt="" /></td>
</tr>
<tr>
    <td colspan="3" bgcolor="#00457B" align="right"><a href="javascript: feedbackWin()"><img src="../../images/feedback.png" width="115" height="15" alt="Let us know, what you think" border="0" /><img src="../../images/spacer.gif" width="55" height="1" border="0" alt="" /></a></td>
</tr>
</table>
<div class="contentwrapper">

<h3>ADMINISTRATION</h3>
<div class="content">
    <p>To use the areas of administration, you must have a proper username and password.</p> 
    <p>You will get permissons exclusively from the admin team.</p>

    <form name="login" target="content" action="./login.php?goto=1" method="post">
    <table width="400" border="0" cellpadding="0" cellspacing="0">
    <tr> 
        <td>Username:</td>
        <td><input type="text" name="username" size="25" /></td>
    </tr>
    <tr> 
        <td>Password:</td>
        <td><input type="password" name="password" size="25" /></td>
    </tr>
    <tr> 
        <td>&nbsp;</td>
        <td><input type="submit" name="senden" value="LOGIN" size="15" /></td>
    </tr>
    </table>
    </form>
</div>

</div>
</body>
</html>
Maybe this is useful.

Sares

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3453
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Thu May 15, 2008 5:02 pm

Thanks. I think that may be helpful. I will investigate tomorrow.

I noticed in the quoted text you provided there is "</html" on one of the lines. Is CSE HTML Validator not showing the final '>' character? Perhaps it is being chopped off for some reason and that is causing the problem.
Image
Albert Wiersch

Sares
Rank II - Novice
Rank II - Novice
Posts: 21
Joined: Tue Jun 19, 2007 6:51 am

Post by Sares » Thu May 15, 2008 5:13 pm

Hi,

yes, that´s correct. CSE HTML Validator is not showing the final '>' of <html/>. I have tried to use all kind of settings, have disabled a lot of things, added some options again ... unfortunately without any success.

So maybe you will find a solution for this problem. As told before in V8.04 there are no errors at all if I disable the & checking.

Thank you so much for your help and kind regards
Sares

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3453
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Thu May 15, 2008 8:53 pm

Sares wrote:Hi,

yes, that´s correct. CSE HTML Validator is not showing the final '>' of <html/>. I have tried to use all kind of settings, have disabled a lot of things, added some options again ... unfortunately without any success.
OK, thanks for the confirmation. I suspect this is something to do with UTF-8 and a bug or issue that is causing the last character to be dropped. I will concentrate on looking for anything that might cause that and hopefully a fix will be out with the next BETA that I hope to release very soon.
Image
Albert Wiersch

Sares
Rank II - Novice
Rank II - Novice
Posts: 21
Joined: Tue Jun 19, 2007 6:51 am

Post by Sares » Fri May 16, 2008 1:18 am

Thank you very much!

Sares

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3453
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Fri May 16, 2008 5:13 pm

Sares wrote:Thank you very much!

Sares
You're welcome! Unfortunately some things came up today and I was not able to research this. It is definitely on my "to do" list for next week. I will report back sometime next week after I've had a chance to investigate it.
Image
Albert Wiersch

Sares
Rank II - Novice
Rank II - Novice
Posts: 21
Joined: Tue Jun 19, 2007 6:51 am

Post by Sares » Mon May 19, 2008 5:52 am

Hi Albert,

I am very happy if you find some time for investigation this week. In the meantime V8 works fine for me and without any problems, so it´s not time-critical that you find a solution for this problem let´s say in the next 2 hours :)

Anyway, it would be fine, if the next beta will fix this problem, because it´s a bit annoying to have hundreds of errors in the validation report :)

Thanks and kind regards
Sares

Post Reply