high ascii bug

For topics about current or future BETA releases, including feature requests.

high ascii bug

Postby roedygr » Sun Oct 09, 2011 5:49 pm

When I run the batch validator it says:
0:7
Warning
At least one high ASCII character (#239) was found in this line. You may want to replace high ASCII characters with character entities, but this is not technically necessary.
policy�what an older and less elegant generation called the horse-and-sparrow theory: If you feed the horse enough

However, when I bring that file up in the editor, it says I have no warnings or errors.
User avatar
roedygr
Rank V - Professional
Rank V - Professional
 
Posts: 242
Joined: Fri Feb 17, 2006 6:22 am
Location: Victoria BC Canada

Re: high ascii bug

Postby Albert Wiersch » Sun Oct 09, 2011 8:40 pm

I suspect it has something to do with the encoding and the HTTP headers. Is the Batch Wizard downloading the page via HTTP? The encoding the web server is sending back may not be correct or may be causing the difference in results.

Can you share the URL of the page?

If it's not getting the page via HTTP/HTTPS, then can you send me a copy of the page to support at htmlvalidator dot com?

Thank you.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX

Re: high ascii bug

Postby roedygr » Sun Oct 23, 2011 3:33 am

These are all local pages. My header looks like this:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Getting Started With Esperanto: Kiel Komenci Esperanton</title>
<meta http-equiv="lang" content="en">
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Author" content="Roedy Green, (250) 361-9093 of Canadian Mind Products. For email see contact page.">
<meta name="Copyright" content="Canadian Mind Products 1996-2011">
<meta name="Description" content="Esperanto, an overview for computer users.">
<meta name="Generator" content="HTMLMacros with the Compactor">
<meta name="Keywords" content="Acquiring Esperanto Fonts, browser, correlatives, Creating Esperanto HTML Documents, Esperanto, Esperanto accented letters, Esperanto correlatives, Esperanto Fonts, Esperanto in Browsers, IE, Internet Explorer, ISO 8859-3, Netscape, Opera, Roedy Green, vortaro, what&rsquo;s wrong with Esperanto">
<link href="../mindprod.css" type="text/css" rel="stylesheet" media="screen">
<link href="../jdisplay.css" type="text/css" rel="stylesheet" media="screen">
<link rel="meta" href="../labels.rdf" type="application/rdf+xml" title="ICRA labels">
<link rel="home" href="../index.html">
<link rel="icon" href="../image/icon16/esper.png">
<link rel="alternate" type="application/rss+xml" title="Canadian Mind Products | Various Ethical Concerns" href="http://mindprod.com/rss/ethics.xml">
</head>
<body>
User avatar
roedygr
Rank V - Professional
Rank V - Professional
 
Posts: 242
Joined: Fri Feb 17, 2006 6:22 am
Location: Victoria BC Canada

Re: high ascii bug

Postby Albert Wiersch » Mon Oct 24, 2011 10:52 am

Thank you. I tried to reproduce this but I couldn't get the Batch Wizard to generate the incorrect "high ASCII" message (which is shouldn't because it should load this document as UTF-8). Could you give me exact steps to reproduce this problem? A full test document that reproduces the issue would also be very useful. Thank you.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX

Re: high ascii bug

Postby roedygr » Mon Oct 31, 2011 6:56 pm

Albert Wiersch wrote: A full test document that reproduces the issue would also be very useful. Thank you.


You can download the document at http://mindprod.com/esperanto/esperanto.html

Batch validate it.

Settings: I will email you various config files as attachments.
User avatar
roedygr
Rank V - Professional
Rank V - Professional
 
Posts: 242
Joined: Fri Feb 17, 2006 6:22 am
Location: Victoria BC Canada

Re: high ascii bug

Postby Albert Wiersch » Wed Nov 02, 2011 11:19 am

roedygr wrote:You can download the document at http://mindprod.com/esperanto/esperanto.html

Batch validate it.

Settings: I will email you various config files as attachments.


Thanks. Using your document I was able to find the problem. The web server wasn't specifying a charset and the comment at the top of your document was causing an issue with the algorithm/logic used to extract the charset from the "meta" tag. I've improved the logic to address this. Now that the charset is properly extracted, and it being utf-8, it should disable the "high ASCII" check because it's a Unicode document.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX

Re: high ascii bug

Postby roedygr » Wed Nov 02, 2011 9:42 pm

Albert Wiersch wrote:Thanks. Using your document I was able to find the problem.


You have to test HTMLValididator both when the file is locally on hard disk and remotely served from the web.

I presume you get no additional info than the file contents when it comes locally, and HTTP headers can conflict with internal META header fields when it comes from the web.

Are servers supposed to peek inside documents to figure out what encoding to put in the HTTP header?

I find out what mime types a remote server is sending with http://mindprod.com/applet/mimecheck.html
To see the other fields in the HTTP header, I use wireshark. http://mindprod.com/jgloss/wireshark.html
User avatar
roedygr
Rank V - Professional
Rank V - Professional
 
Posts: 242
Joined: Fri Feb 17, 2006 6:22 am
Location: Victoria BC Canada

Re: high ascii bug

Postby Albert Wiersch » Thu Nov 03, 2011 10:20 am

roedygr wrote:You have to test HTMLValididator both when the file is locally on hard disk and remotely served from the web.


Thanks. I'm glad you mentioned that as I had to go back and make some further changes for local files.

roedygr wrote:Are servers supposed to peek inside documents to figure out what encoding to put in the HTTP header?


Nope... it is usually a server configuration or if a script then it could be set dynamically.

roedygr wrote:I find out what mime types a remote server is sending with http://mindprod.com/applet/mimecheck.html
To see the other fields in the HTTP header, I use wireshark. http://mindprod.com/jgloss/wireshark.html


Thanks for the links & recommendations. You probably already know this, but if you just want to see the HTTP headers that the server is returning then you can use File->Open from the Web in CSE HTML Validator and look at the progress. It shows what is sent to the server for the request and the HTTP headers that are sent back. You could also use the online validator at http://www.OnlineWebCheck.com/. It displays the HTTP headers at the bottom of the results.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX


Return to CSE BETA Talk

Who is online

Users browsing this forum: No registered users and 1 guest

cron