Data corruption when using Tidy and utf-8

Post anything about TopStyle here. NOTE: TopStyle has been discontinued.

Moderator: TopStyle Support

Post Reply
tranglos
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Sat Oct 03, 2009 9:44 am

Data corruption when using Tidy and utf-8

Post by tranglos » Sat Oct 03, 2009 2:10 pm

I was trying to use Tidy in TopStyle, to convert html 4.0 to xhtml. After some trial and error I found out that Tidy does not handle non-ascii characters in charsets other than Latin1 (bascially), so the only option for Central European charsets (Windows-1250 or iso-8859-2) is to ensure both input and output use utf-8. Of course, utf-8 files open correctly in TopStyle.

The problem is that while Tidy performs the task correctly, the data it returns to TopStyle are not correctly interpreted. In the result pane, raw utf-8 bytes are displayed, and they are retained after clicking "Copy to active editor".

Here is my Tidy configuration. (This is the only set of options that allows Tidy to do the job at all. Any other settings cause tidy to either replace non-ascii characters with entities, or to reduce them to the 0-127 ascii range, which corrupts data).
Tidy Convert to XHTML configuration
Tidy Convert to XHTML configuration
01-tidy-config.png (12.69 KiB) Viewed 4420 times
And this is the result - original text in the top pane, text from Tidy in the bottom pane:
Conversion result: note raw utf-8
Conversion result: note raw utf-8
02-tidy-result.png (12.53 KiB) Viewed 4420 times
I think I have eliminated the possibility that Tidy is somehow at fault here, since I got the correct result using Tidy with the same set of options in a different commercial HTML editor (a TopStyle competitor, so I won't name it here).

Is this something that can be fixed in TopStyle?

User avatar
TopStyle Support
Rank VI - Professional
Rank VI - Professional
Posts: 507
Joined: Sat Jun 13, 2009 12:38 pm
Contact:

Re: Data corruption when using Tidy and utf-8

Post by TopStyle Support » Mon Oct 05, 2009 3:33 pm

Fixed in 4.0.0.67

Thanks, Stefan.

tranglos
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Sat Oct 03, 2009 9:44 am

Re: Data corruption when using Tidy and utf-8

Post by tranglos » Tue Oct 06, 2009 7:06 am

Fantastic, thanks a lot, Stefan!

Post Reply