Although I have used PHP for a number of years and CSE Validator for even more, the several issues that arise from special characters in an html document normally have more than one solution, but I got stumped yesterday. A new project required extensive work with non-latin characters, and a lot of them. Normally we can use "entities" like © to produce © the copyright symbol and others, which are found in the CSE editor drop down lists in several places. Many numeric entities also exist in parallel with the shorthand one. © is the numeric unicode entity which is the same as © . This is not really news, but many may be unaware that the CSE editor handles copy and pasted characters that are non-Latin also. For example,
高级妓女 or Ἀσπασία
These are saved properly in an html file, even when your operating system language is English. So, there was some surprise when this method produced question marke "??????" in the rendered output of this current php project. Needless to say UTF-8 (unicode) is implicated. But, I've used this cut and paste method in a similar manner for years with the CSE editor without a problem.
So, to simplify only a little, why does
produce (source code)
Before everyone raises their hands, note that there indeed is
Code: Select all
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
in the head section.