Like you, I have on occasion needed to process other people's markup.
On big jobs I have found Tidy able to do a lot of the work, but it has limits. It used to conflate different settings into one. Making it impossible to make it do exactly what you want. (Last time I checked it still worked like that.)
After a rough clean up with tidy I found that a little work with regular expressions on multiple files can often just about finish the job. It pays to test your regular expressions first and save them in a library for re-use.
For a small job, page by page editing, maybe by hand, can be most productive.
When doing a big job, setting up a CSE batch increases productivity.