topstyle/ tidy roundtrip editing breaks utf-8 characters

TopStyle has been discontinued and support is no longer available. The forum is read-only. NOTE: TopStyle was not developed by the makers of CSS HTML Validator.
Locked
jeroenvg
Rank 0 - Newcomer
Posts: 6
Joined: Wed Oct 11, 2006 11:59 am
Location: netherlands

topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by jeroenvg »

hello,

while editing html in topstyle, i often validate/ correct/ pretty print my code by running tidy from topstyle, and copying the output back to the editor.

at the moment, utf-8 characters that display correctly in the editor, are broken in the output from html-tidy. e.g., an ellipsis - created by typing <alt+0133> - displays as three periods in the editor, but is converted to '…' (letter a with circumflex, euro sign, broken vertical bar) by tidy without warning.

i expected tidy to recognize the ellipsis as an utf-8 character and return it as an ellipsis in the output panel, or else complain about the character being wrong in some way.

* i created my own tidy configuration in topstyle for this purpose, which i believe to be correct (attached for reference).

* i'm reporting this here and not at tidy.sourceforge.net, because i think topstyle may cause the problem: the output is correct if i run tidy on the same file - with a similar configuration - from the command line [1].

steps to repeat:

1. set topstyle to utf-8 encoding.
2. create a new xthml strict document.
3. for viewing the file directly in a browser (instead of requesting it from a server) add

Code: Select all

<meta content="text/html; charset=utf-8" http-equiv="content-type" />
to the head.
4. add an utf-8 character to the body, e.g.

Code: Select all

<p><alt+0133></p>
for an ellipsis.
5. save the document.
6. set the html tidy configuration to

Code: Select all

input-encoding utf8
and

Code: Select all

output-encoding utf8
.
7. run html tidy with this configuration on the document.
8. paste the output from the output pane back to the editor.

context:
* win xp pro, service pack 3
* topstyle 4.0.0.65 registered
* HTML Tidy for Windows released on 25 March 2009 (packaged with topstyle 4.0.0.65)
* firefox 3.5.3 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)

any idea on what is going wrong, or how to set up topstyle/ tidy to make this work?

[1] C:\Program Files\TopStyle 4>tidy --input-encoding utf8 --output-encoding utf8 -output test.utf-8.tidy.html test.utf-8.html
Attachments
default.mine.zip
'default.mine.tdy'; my topstyle/ tidy configuration
(709 Bytes) Downloaded 634 times
test.utf-8.zip
* 'test.utf-8.html'; the html file created in topstyle, untidied
* 'test.utf-8.tidy.html'; the html file created in topstyle, tidied from the command line
* 'test.utf-8.topstyle.tidy.html'; the html file created in topstyle, tidied from topstyle
(1.28 KiB) Downloaded 707 times
User avatar
TopStyle Support
Rank VI - Professional
Posts: 506
Joined: Sat Jun 13, 2009 12:38 pm
Contact:

Re: topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by TopStyle Support »

Fixed in 4.0.0.67

Thanks, Stefan.
jeroenvg
Rank 0 - Newcomer
Posts: 6
Joined: Wed Oct 11, 2006 11:59 am
Location: netherlands

Re: topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by jeroenvg »

ehm, i'm a bit confused ;) there is no 4.0.0.67 to download, but the 4.0.0.66 i can download shows the same behavior i described in my first post.
User avatar
TopStyle Support
Rank VI - Professional
Posts: 506
Joined: Sat Jun 13, 2009 12:38 pm
Contact:

Re: topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by TopStyle Support »

Correct. 4.0.0.67 hasn't been released (yet).

It would be great if you could confirm the fix.

Thanks, Stefan.
jeroenvg
Rank 0 - Newcomer
Posts: 6
Joined: Wed Oct 11, 2006 11:59 am
Location: netherlands

Re: topstyle/ tidy roundtrip editing breaks utf-8 characters

Post by jeroenvg »

yip, confirmed fixed. thanks!
Locked