Anybody using HTML5 parser to repair HTML?

Post here if your message doesn't fit into another forum but is still about web development. Includes site critiques, web hosting and server questions, helpful software and resources, and more.
Post Reply
User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 709
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Anybody using HTML5 parser to repair HTML?

Post by MikeGale » Tue Apr 24, 2012 12:26 am

It occurred to me that there might be a fairly simple way to repair HTML. It might overcome some limits I've noticed with HTMLTidy.

The repair would match what newer browsers are doing and would include SVG and MathML markup.

That would use the HTML5 parser. This follows the rules laid down with HTML5, and implemented in various browsers.

There are versions available in various languages (Java etc.). The reference version is written in Python.

Has anybody here experimented with that idea?

One route would be to take HTML content, from file or HTTP and output it serialised either as file or a stream in memory.

Something like that could work in a similar way to HTMLTidy.

If anybody has tried parts of that I'd appreciate your observations.

Might be a good complement to pre-process content before working on it with CSE.

Post Reply