spell checking entified words

For topics about current BETA or future releases, including feature requests.
Post Reply
User avatar
roedygr
Rank V - Professional
Posts: 367
Joined: Fri Feb 17, 2006 5:22 am
Location: Victoria BC Canada
Contact:

spell checking entified words

Post by roedygr »

I understand you did not write the spell checking engine. You have to live with its limitations.

I think this strategy may work even if it cannot handle entities.

You have to modify your spell checking parser that finds words to look up for spell checking to treat entities as if they were single Unicode characters. Currently, it considers a word with an entity in it as two separate words.

You then feed the Unicode string to the spell checking engine.

if there were entities in the document, you then convert the result string back into entified form. The spell checker never sees any entities.
If there were no entities, you insert the raw Unicode back into the document.

For tables and Java code to strip and insert entities see http://mindprod.com/products1.html#ENTITIES

If the spell checker does it own parsing, perhaps you could deentify the entire document, and when the spell checking and manual edits are done, re-entify it just before saving. The user, doing manual edits could use either Unicode or Entities, but he would see Unicode, not his original entities. You might even advertise that as a feature to help proofreading.
Post Reply