topstyle/ tidy roundtrip editing breaks utf-8 characters

Post anything about TopStyle here. NOTE: TopStyle has been discontinued.

Moderator: TopStyle Support

Post Reply
jeroenvg
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Wed Oct 11, 2006 11:59 am
Location: netherlands

topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by jeroenvg » Sun Sep 20, 2009 7:04 pm

hello,

while editing html in topstyle, i often validate/ correct/ pretty print my code by running tidy from topstyle, and copying the output back to the editor.

at the moment, utf-8 characters that display correctly in the editor, are broken in the output from html-tidy. e.g., an ellipsis - created by typing <alt+0133> - displays as three periods in the editor, but is converted to '…' (letter a with circumflex, euro sign, broken vertical bar) by tidy without warning.

i expected tidy to recognize the ellipsis as an utf-8 character and return it as an ellipsis in the output panel, or else complain about the character being wrong in some way.

* i created my own tidy configuration in topstyle for this purpose, which i believe to be correct (attached for reference).

* i'm reporting this here and not at tidy.sourceforge.net, because i think topstyle may cause the problem: the output is correct if i run tidy on the same file - with a similar configuration - from the command line [1].

steps to repeat:

1. set topstyle to utf-8 encoding.
2. create a new xthml strict document.
3. for viewing the file directly in a browser (instead of requesting it from a server) add

Code: Select all

<meta content="text/html; charset=utf-8" http-equiv="content-type" />
to the head.
4. add an utf-8 character to the body, e.g.

Code: Select all

<p><alt+0133></p>
for an ellipsis.
5. save the document.
6. set the html tidy configuration to

Code: Select all

input-encoding utf8
and

Code: Select all

output-encoding utf8
.
7. run html tidy with this configuration on the document.
8. paste the output from the output pane back to the editor.

context:
* win xp pro, service pack 3
* topstyle 4.0.0.65 registered
* HTML Tidy for Windows released on 25 March 2009 (packaged with topstyle 4.0.0.65)
* firefox 3.5.3 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)

any idea on what is going wrong, or how to set up topstyle/ tidy to make this work?

[1] C:\Program Files\TopStyle 4>tidy --input-encoding utf8 --output-encoding utf8 -output test.utf-8.tidy.html test.utf-8.html
Attachments
default.mine.zip
'default.mine.tdy'; my topstyle/ tidy configuration
(709 Bytes) Downloaded 489 times
test.utf-8.zip
* 'test.utf-8.html'; the html file created in topstyle, untidied
* 'test.utf-8.tidy.html'; the html file created in topstyle, tidied from the command line
* 'test.utf-8.topstyle.tidy.html'; the html file created in topstyle, tidied from topstyle
(1.28 KiB) Downloaded 555 times

User avatar
TopStyle Support
Rank VI - Professional
Rank VI - Professional
Posts: 507
Joined: Sat Jun 13, 2009 12:38 pm
Contact:

Re: topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by TopStyle Support » Mon Oct 05, 2009 3:24 pm

Fixed in 4.0.0.67

Thanks, Stefan.

jeroenvg
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Wed Oct 11, 2006 11:59 am
Location: netherlands

Re: topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by jeroenvg » Thu Oct 08, 2009 1:18 pm

ehm, i'm a bit confused ;) there is no 4.0.0.67 to download, but the 4.0.0.66 i can download shows the same behavior i described in my first post.

User avatar
TopStyle Support
Rank VI - Professional
Rank VI - Professional
Posts: 507
Joined: Sat Jun 13, 2009 12:38 pm
Contact:

Re: topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by TopStyle Support » Thu Oct 08, 2009 2:50 pm

Correct. 4.0.0.67 hasn't been released (yet). However, if you would like to test-drive it, you can download it here: http://svanas.dynip.com/topstyle/downlo ... 0.0.67.exe

It would be great if you could confirm the fix.

Thanks, Stefan.

jeroenvg
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 6
Joined: Wed Oct 11, 2006 11:59 am
Location: netherlands

Re: topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by jeroenvg » Thu Oct 08, 2009 9:53 pm

yip, confirmed fixed. thanks!

Post Reply