Anybody using HTML5 parser to repair HTML?

For general web development questions that are not specifically related to CSS HTML Validator. This includes (but is not limited to) general HTML, CSS, Accessibility, JavaScript, and SEO questions.
User avatar
MikeGale
Rank VI - Professional
Posts: 726
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Anybody using HTML5 parser to repair HTML?

Post by MikeGale »

It occurred to me that there might be a fairly simple way to repair HTML. It might overcome some limits I've noticed with HTMLTidy.

The repair would match what newer browsers are doing and would include SVG and MathML markup.

That would use the HTML5 parser. This follows the rules laid down with HTML5, and implemented in various browsers.

There are versions available in various languages (Java etc.). The reference version is written in Python.

Has anybody here experimented with that idea?

One route would be to take HTML content, from file or HTTP and output it serialised either as file or a stream in memory.

Something like that could work in a similar way to HTMLTidy.

If anybody has tried parts of that I'd appreciate your observations.

Might be a good complement to pre-process content before working on it with CSE.