First Look at Beta 5

For topics about current BETA or future releases, including feature requests.
Post Reply
User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 710
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

First Look at Beta 5

Post by MikeGale » Tue Jun 07, 2005 5:16 pm

After a welcome drop in workload two days ago I've now had a chance to look at the Beta. My observations follow.

My Situation

I simplify my markup and retain backward compatibility for sites that are already built. New sites probably get less backward compatibility.

To get this to work well for me I can either ignore some warnings (when I read the output) or customise the messages. My focus with these checks has been to revisit customising the validation. (In the past I abandoned that because I didn't like cusomising it all over again for each new version. This time I hope that programexport.xml, source control and diff tools can, together, make it practical.)

Things I noticed

1) The export programs feature is a lifesaver. I don't know all variables available at a point in the code, that would be valuable when modifying the programs. (Sometimes message control is limited to a flag that controls multiple messages, in that case programexport is your way to get finer control.)

2) Is there an alternative to string coding like '+"'"+' in the programs. Named entity encoding like ' might be a convenient alternative.

3) I need to enhance some tests for messages. Two examples: I use both id and name, and I use both type and language (in script). I want the message test to be "name is present, id is not present" before generating the message. The message currently suggests I might use only id or id + name, which is redundant.

4) I run the manual spell check and get nothing found. When I run validate a misspelling is found. (In this case I was using a UK English spelling, maybe all dictionaries are not in use during the validate version of spell check?) In other words I get misspelling messages and I can't kill them the obvious way.

5) Where are the accessibility messages programmed? I didn't find them in programexport.xml.

6) I'm getting some accessibility messages repeated. (Identical messages.)

7) With utf-8 encoding I get strange characters at the start of the editing panel. Shouldn't these be hidden?

8) I ran the program over an RSS feed. It gave a lot of red ink. It would be cool to check out RSS feeds.

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3426
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Wed Jun 08, 2005 4:26 pm

Hi Mike,

Thanks for the feedback and taking a look at the soon-to-be-released BETA!
1) The export programs feature is a lifesaver. I don't know all variables available at a point in the code, that would be valuable when modifying the programs. (Sometimes message control is limited to a flag that controls multiple messages, in that case programexport is your way to get finer control.)
1) Can you be more specific as to what you'd like to see? I have tried to document everything at this page:
http://www.htmlvalidator.com/htmlval/v7 ... ditor.html

I know that it can be confusing when getting that involved with the program configuration.
2) Is there an alternative to string coding like '+"'"+' in the programs. Named entity encoding like ' might be a convenient alternative.
2) No, there's not a better way right now. I also have thought about this. I would like to address this one of these days. :-) There's not very many people who even look at the tag name programs so it's pretty low on my list.
3) I need to enhance some tests for messages. Two examples: I use both id and name, and I use both type and language (in script). I want the message test to be "name is present, id is not present" before generating the message. The message currently suggests I might use only id or id + name, which is redundant.
Can you send me a sample HTML document (to support@htmlvalidator.com) and detail exactly what you'd like CSE to do given the sample HTML document?
4) I run the manual spell check and get nothing found. When I run validate a misspelling is found. (In this case I was using a UK English spelling, maybe all dictionaries are not in use during the validate version of spell check?) In other words I get misspelling messages and I can't kill them the obvious way.
The validator spell check works a little differently, checking spelling in comments and certain attribute values that just doing a spell check in the editor might not find. You may want to uncheck the Ignore markup languages option in the Options->Spelling Options and then do a spell check in the editor. That may make it easier to add the misspelled words because you will be prompted to ignore or add them. Otherwise you can add new dictionary words by using Options->Spelling Options, clicking on the Dictionaries button, and then editing the appropriate dictionary by adding ignore words.
5) Where are the accessibility messages programmed? I didn't find them in programexport.xml.
Some are in programexport.xml but some are in the DLL and are called by runProgram() in the tag name programs. See the documentation for runProgram() to see if a certain accessibility message might be generated via a runProgram() call.
6) I'm getting some accessibility messages repeated. (Identical messages.)
Can you send a sample document that reproduces the problem (to support@htmlvalidator.com)? There may be some similar messages, but they should not be identical -- having the same text, same category, and referring to the same problem location in the document.
7) With utf-8 encoding I get strange characters at the start of the editing panel. Shouldn't these be hidden?
Unfortunately, the editor component currently used does not support UTF-8. I plan to switch to a new editor component in a future version. I suspect that's the cause of the strange characters you see.
Image
Albert Wiersch

User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 710
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Post by MikeGale » Wed Jun 08, 2005 6:17 pm

Thanks very much.

1) Programming programexport. I have had another look. With that page and the code I can do what I want. (I was hoping for a lazy man's way to see what variables are in scope at some point in code.)

3) See email.

5) I was looking for some accessibility messages which (I guess) are inside a runProgram.

6) See email.

7) Characters in editor with utf-8 file. Yep that will be the cause. utf-8 slips in 2 bytes (I think) at the start of a file. (The rest of that file might be identical to conventional encoding.)

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 3426
Joined: Sat Dec 11, 2004 9:23 am
Location: Near Dallas, TX
Contact:

Post by Albert Wiersch » Thu Jun 09, 2005 7:52 am

MikeGale wrote:Thanks very much.

1) Programming programexport. I have had another look. With that page and the code I can do what I want. (I was hoping for a lazy man's way to see what variables are in scope at some point in code.)

3) See email.

5) I was looking for some accessibility messages which (I guess) are inside a runProgram.

6) See email.

7) Characters in editor with utf-8 file. Yep that will be the cause. utf-8 slips in 2 bytes (I think) at the start of a file. (The rest of that file might be identical to conventional encoding.)
Hi Mike,

You're welcome!

1) Sorry, there's no way to see what variables are in scope... but I can say that all variables are global variables so "everything" should be in scope.

3 & 6) I received your emails and will respond via email later in the day.
Image
Albert Wiersch

Justine Linkingstone
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 1
Joined: Thu Nov 10, 2005 7:14 am
Contact:

Post by Justine Linkingstone » Tue Nov 15, 2005 7:50 pm

MikeGale wrote:7) Characters in editor with utf-8 file. Yep that will be the cause. utf-8 slips in 2 bytes (I think) at the start of a file. (The rest of that file might be identical to conventional encoding.)
Sorry to bump this, just wanted to elaborate on this topic.

The two bytes slipped into the document are called BOM (byte-order-mark) and used to determine the endian-ness of the UTF encoding of the file. They are optional to the UTF8 encoding but absence of them may cause problems when you cross platforms :P

Rest of the file will be identical, only if you use lower ascii characters (e.g. from the english codepage). If the document contains higher characters such as umlauts or cryllic, these characters will appear garbled as well, with the current editor without UTF support.

User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 710
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Post by MikeGale » Tue Nov 15, 2005 10:09 pm

Yep Justine it's those BOM.

As you say you need to use the common part of the ASCII set. This tends to mean numeric entities are safer in many cases. Passing the material through different editors and CMS packages gives interesting results!!

I put up a little article about some experiences at:
http://www.decisionz.com/document/CMS/C ... nXHTML.htm

Post Reply