Data corruption when using Tidy and utf-8

TopStyle has been discontinued and support is no longer available. The forum is read-only. NOTE: TopStyle was not developed by the makers of CSS HTML Validator.
Locked
tranglos
Rank 0 - Newcomer
Posts: 6
Joined: Sat Oct 03, 2009 9:44 am

Data corruption when using Tidy and utf-8

Post by tranglos »

I was trying to use Tidy in TopStyle, to convert html 4.0 to xhtml. After some trial and error I found out that Tidy does not handle non-ascii characters in charsets other than Latin1 (bascially), so the only option for Central European charsets (Windows-1250 or iso-8859-2) is to ensure both input and output use utf-8. Of course, utf-8 files open correctly in TopStyle.

The problem is that while Tidy performs the task correctly, the data it returns to TopStyle are not correctly interpreted. In the result pane, raw utf-8 bytes are displayed, and they are retained after clicking "Copy to active editor".

Here is my Tidy configuration. (This is the only set of options that allows Tidy to do the job at all. Any other settings cause tidy to either replace non-ascii characters with entities, or to reduce them to the 0-127 ascii range, which corrupts data).
Tidy Convert to XHTML configuration
Tidy Convert to XHTML configuration
01-tidy-config.png (12.69 KiB) Viewed 5712 times
And this is the result - original text in the top pane, text from Tidy in the bottom pane:
Conversion result: note raw utf-8
Conversion result: note raw utf-8
02-tidy-result.png (12.53 KiB) Viewed 5712 times
I think I have eliminated the possibility that Tidy is somehow at fault here, since I got the correct result using Tidy with the same set of options in a different commercial HTML editor (a TopStyle competitor, so I won't name it here).

Is this something that can be fixed in TopStyle?
User avatar
TopStyle Support
Rank VI - Professional
Posts: 506
Joined: Sat Jun 13, 2009 12:38 pm
Contact:

Re: Data corruption when using Tidy and utf-8

Post by TopStyle Support »

Fixed in 4.0.0.67

Thanks, Stefan.
tranglos
Rank 0 - Newcomer
Posts: 6
Joined: Sat Oct 03, 2009 9:44 am

Re: Data corruption when using Tidy and utf-8

Post by tranglos »

Fantastic, thanks a lot, Stefan!
Locked