while editing html in topstyle, i often validate/ correct/ pretty print my code by running tidy from topstyle, and copying the output back to the editor.
at the moment, utf-8 characters that display correctly in the editor, are broken in the output from html-tidy. e.g., an ellipsis - created by typing <alt+0133> - displays as three periods in the editor, but is converted to '…' (letter a with circumflex, euro sign, broken vertical bar) by tidy without warning.
i expected tidy to recognize the ellipsis as an utf-8 character and return it as an ellipsis in the output panel, or else complain about the character being wrong in some way.
* i created my own tidy configuration in topstyle for this purpose, which i believe to be correct (attached for reference).
* i'm reporting this here and not at tidy.sourceforge.net, because i think topstyle may cause the problem: the output is correct if i run tidy on the same file - with a similar configuration - from the command line [1].
steps to repeat:
1. set topstyle to utf-8 encoding.
2. create a new xthml strict document.
3. for viewing the file directly in a browser (instead of requesting it from a server) add
Code: Select all
<meta content="text/html; charset=utf-8" http-equiv="content-type" />
4. add an utf-8 character to the body, e.g.
Code: Select all
<p><alt+0133></p>
5. save the document.
6. set the html tidy configuration to
Code: Select all
input-encoding utf8
Code: Select all
output-encoding utf8
7. run html tidy with this configuration on the document.
8. paste the output from the output pane back to the editor.
context:
* win xp pro, service pack 3
* topstyle 4.0.0.65 registered
* HTML Tidy for Windows released on 25 March 2009 (packaged with topstyle 4.0.0.65)
* firefox 3.5.3 (Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
any idea on what is going wrong, or how to set up topstyle/ tidy to make this work?
[1] C:\Program Files\TopStyle 4>tidy --input-encoding utf8 --output-encoding utf8 -output test.utf-8.tidy.html test.utf-8.html