Making CSE catch high ascii characters from MS Word

For technical support for all editions of CSE HTML Validator. Includes bug reports.

Making CSE catch high ascii characters from MS Word

Postby cdwz » Fri Oct 12, 2012 8:44 am

We get a lot of MS Word docs to convert to web pages, and in the newest version we've moved to (Office 2010), Word uses a lot of high ascii characters that don't appear until you view the pages in a browser. We'll get hyphens becoming Euro symbols, some blanks becoming the "A" with the accent above it, etc.

Is there any way to make CSE check for these?
cdwz
Rank II - Novice
Rank II - Novice
 
Posts: 27
Joined: Tue Sep 02, 2008 10:42 am
Location: Washington DC

Re: Making CSE catch high ascii characters from MS Word

Postby cdwz » Thu Nov 15, 2012 10:10 am

I still sort of need a solution for this.
cdwz
Rank II - Novice
Rank II - Novice
 
Posts: 27
Joined: Tue Sep 02, 2008 10:42 am
Location: Washington DC

Re: Making CSE catch high ascii characters from MS Word

Postby Albert Wiersch » Thu Nov 15, 2012 10:33 am

Hello,

I'm sorry for the delay. I must have missed your original message.

It sounds like an encoding issue. Are you saving Word documents as HTML in Word 2010? I would think Office 2010 would save them using UTF-8 so this wouldn't be an issue, but I am not that familiar with how Word works in this regard.

Can you send a sample document that I can use to reproduce the problem to support at htmlvalidator dot com? Also, if there is a public URL that I can access that also shows the problem, then that would be helpful too.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX

Re: Making CSE catch high ascii characters from MS Word

Postby cdwz » Thu Nov 15, 2012 12:25 pm

Next time we have an affected file, I'll set it aside.

We're basically taking documents that were created in MS Word 2010 and pasting them into the design view of Dreamweaver, then switching to code view to clean up the code. The funny part is that one of my colleagues has an older version of CSE (version 8 ) and his picks up these characters just fine. My version 10 does not.

I've tried changing some of my setting, but if I make it any more sensitive, I get slammed with warnings about table tags not having a summary attribute. Since we use the caption tag, I think the summary is redundant. I wish I could turn that setting off!
cdwz
Rank II - Novice
Rank II - Novice
 
Posts: 27
Joined: Tue Sep 02, 2008 10:42 am
Location: Washington DC

Re: Making CSE catch high ascii characters from MS Word

Postby Albert Wiersch » Thu Nov 15, 2012 1:10 pm

cdwz wrote:Next time we have an affected file, I'll set it aside.


OK.

cdwz wrote:The funny part is that one of my colleagues has an older version of CSE (version 8 ) and his picks up these characters just fine. My version 10 does not.


It's possible that the legacy 'high ASCII check' is not being turned off in version 8 but is in 10 because it is detecting a Unicode document, but I'd have to see the actual document to determine exactly what's happening.

cdwz wrote:I've tried changing some of my setting, but if I make it any more sensitive, I get slammed with warnings about table tags not having a summary attribute. Since we use the caption tag, I think the summary is redundant. I wish I could turn that setting off!


Have you tried disabling the message? Please see:
http://www.htmlvalidator.com/htmlval/v1 ... ssages.htm

You should be able to right click on it when it's displayed in CSE's editor and disable the message.
Image
Albert Wiersch
User avatar
Albert Wiersch
Site Admin
Site Admin
 
Posts: 2361
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX


Return to CSE Tech Support

Who is online

Users browsing this forum: No registered users and 1 guest