Page 1 of 1

Making CSE catch high ascii characters from MS Word

PostPosted: Fri Oct 12, 2012 8:44 am
by cdwz
We get a lot of MS Word docs to convert to web pages, and in the newest version we've moved to (Office 2010), Word uses a lot of high ascii characters that don't appear until you view the pages in a browser. We'll get hyphens becoming Euro symbols, some blanks becoming the "A" with the accent above it, etc.

Is there any way to make CSE check for these?

Re: Making CSE catch high ascii characters from MS Word

PostPosted: Thu Nov 15, 2012 10:10 am
by cdwz
I still sort of need a solution for this.

Re: Making CSE catch high ascii characters from MS Word

PostPosted: Thu Nov 15, 2012 10:33 am
by Albert Wiersch
Hello,

I'm sorry for the delay. I must have missed your original message.

It sounds like an encoding issue. Are you saving Word documents as HTML in Word 2010? I would think Office 2010 would save them using UTF-8 so this wouldn't be an issue, but I am not that familiar with how Word works in this regard.

Can you send a sample document that I can use to reproduce the problem to support at htmlvalidator dot com? Also, if there is a public URL that I can access that also shows the problem, then that would be helpful too.

Re: Making CSE catch high ascii characters from MS Word

PostPosted: Thu Nov 15, 2012 12:25 pm
by cdwz
Next time we have an affected file, I'll set it aside.

We're basically taking documents that were created in MS Word 2010 and pasting them into the design view of Dreamweaver, then switching to code view to clean up the code. The funny part is that one of my colleagues has an older version of CSE (version 8 ) and his picks up these characters just fine. My version 10 does not.

I've tried changing some of my setting, but if I make it any more sensitive, I get slammed with warnings about table tags not having a summary attribute. Since we use the caption tag, I think the summary is redundant. I wish I could turn that setting off!

Re: Making CSE catch high ascii characters from MS Word

PostPosted: Thu Nov 15, 2012 1:10 pm
by Albert Wiersch
cdwz wrote:Next time we have an affected file, I'll set it aside.


OK.

cdwz wrote:The funny part is that one of my colleagues has an older version of CSE (version 8 ) and his picks up these characters just fine. My version 10 does not.


It's possible that the legacy 'high ASCII check' is not being turned off in version 8 but is in 10 because it is detecting a Unicode document, but I'd have to see the actual document to determine exactly what's happening.

cdwz wrote:I've tried changing some of my setting, but if I make it any more sensitive, I get slammed with warnings about table tags not having a summary attribute. Since we use the caption tag, I think the summary is redundant. I wish I could turn that setting off!


Have you tried disabling the message? Please see:
http://www.htmlvalidator.com/htmlval/v1 ... ssages.htm

You should be able to right click on it when it's displayed in CSE's editor and disable the message.