all clear

For topics about current or future BETA releases, including feature requests.

all clear

Postby roedygr » Tue Aug 19, 2008 4:08 pm

Beta 5 is behaving perfoctly. The only thing it screws up is ignoring <cseignore>surrounding deliberate high bit chars, which you explained was quite hard to fix since high bits are not checked in the parser.
User avatar
roedygr
Rank V - Professional
Rank V - Professional
 
Posts: 242
Joined: Fri Feb 17, 2006 6:22 am
Location: Victoria BC Canada

Re: all clear

Postby roedygr » Tue Aug 19, 2008 7:40 pm

roedygr wrote:The only thing it screws up is ignoring <cseignore>surrounding deliberate high bit chars, which you explained was quite hard to fix since high bits are not checked in the parser.


You could fix that fairly simply without changing the structure of HTMLValidator. If you find a high ASCII char, scan back looking for the string <cseignore>. If you find one, scan forward for a </cseignore>. if you find one and the offset is after the offset of the high char, ignore the error, otherwise display the warning message as usual. It is not completely correct, since it would not handle nested <cseignore>, but it would get it right 99% of the time.

here is roughly the Java code you would need to implement my suggestion:

Code: Select all
final int highAt = whereBadCharFound();
if ( highAt >= "<cseignore>".length() )
   {
   final int start = big.lastIndexOf( "<cseignore">, highAt - "<cseignore>".length() );
   if ( start >= 0 )
      {
      final int end = big.indexOf( "</cseignore">, start + "<cseignore">.length() );
      if ( end > highAt )
         {
         return true; // ignore the high char, it was in a <cseignore> sandwich
         }
      }
   }
return false; // report the high char if configued to do so, it was not in a <cseignore> sandwich
User avatar
roedygr
Rank V - Professional
Rank V - Professional
 
Posts: 242
Joined: Fri Feb 17, 2006 6:22 am
Location: Victoria BC Canada

Postby Albert Wiersch » Wed Aug 20, 2008 8:43 am

I'm glad to hear BETA 5 is working well! It is definitely ready for a release on the 25th. I just need to work on some final things and the documentation. :D

As for the "high ASCII" and "cseignore", I'll have to think about this for a future version. I'm not entirely convinced that this check shouldn't apply in a "cseignore" section since usually character encoding/checking applies to the entire document.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX


Return to CSE BETA Talk

Who is online

Users browsing this forum: No registered users and 1 guest