Autocomplete HTML

Discussion about HTML and XHTML related issues.
cyberfriend

Autocomplete HTML

Postby cyberfriend » Mon Jul 14, 2008 2:37 am

I would like to know if there's a way to "autocomplete" the purified HTML. Meaning, if there's no doctype tag and such (html, head, body, etc) in the HTML provided, can HTML Purifier "autocomplete" the provided HTML ?

Basically, I'm parsing emails, and since not all of them have fully standarized HTML parts and I need them to be so, I thought that maybe HTML Purifier could do that automagically for me.

Thanks in advance and keep up the good work ;)

PS: I thought of using this, but didn't do the trick...

$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional'); $config->set('HTML', 'TidyLevel', 'heavy');

User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 661
Joined: Mon Dec 13, 2004 2:50 pm
Location: Tannhauser Gate

Postby MikeGale » Mon Jul 14, 2008 4:09 am

By Purifier I take it you mean the
Tools -> Format and Fix HTML
action.

I gave it a quick try with my configuration and found that it does what you are asking. No further setup, no nothing it just works.

See below what was added:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>

<body>

P.S. I didn't understand what
$config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional'); $config->set('HTML', 'TidyLevel', 'heavy');
means. That looks like setup for something other than CSE HTML Validator!

User avatar
Albert Wiersch
Site Admin
Site Admin
Posts: 2860
Joined: Sat Dec 11, 2004 10:23 am
Location: Near Dallas, TX
Contact:

Postby Albert Wiersch » Mon Jul 14, 2008 8:19 am

I am not sure what you mean by "HTML Purifier" but CSE HTML Validator Std/Pro includes a "Format and Fix HTML" tool that uses HTML Tidy to do similar things to what you are asking. Note that this tool is not in the lite edition, but you can download the HTML Tidy program by itself for free:
http://tidy.sourceforge.net/

As Mike said, the $config->set options must be related to something other than CSE HTML Validator.
Image
Albert Wiersch

ezyang
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 2
Joined: Tue Jul 15, 2008 9:18 am

Postby ezyang » Tue Jul 15, 2008 10:26 am

Hello,

This is the lead developer of HTML Purifier, the application he is referring to in his post. I suspect this is a spam post, as it is an almost exact repost of a legitimate issue someone else previously had on our forums. The signature links are spammy, as well.

P.S. I would normally assume good faith, but a similar "spammy" re-post was made on our forums under the same username.

User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 661
Joined: Mon Dec 13, 2004 2:50 pm
Location: Tannhauser Gate

Postby MikeGale » Tue Jul 15, 2008 3:43 pm

If you want to punish those who pay for spam, put these values in your hosts file

127.0.0.1 www DOT worthingtonmillwork DOT com
127.0.0.1 www DOT manhattanflameproofing DOT com

(With the obvious adjustments this will prevent access to these sites. )

Only use this technique if you know what you're doing!

If you document (recommended) it might look like:

# Punish spammers

#2008-07-16 spammed CSE web forum under name cyberfriend wasted my time.
127.0.0.1 ...
127.0.0.1 ...

ezyang
Rank 0 - Newcomer
Rank 0 - Newcomer
Posts: 2
Joined: Tue Jul 15, 2008 9:18 am

Postby ezyang » Tue Jul 15, 2008 3:45 pm

Hmm... I don't really understand what that would do to the spammers. They're usually after increased pagerank in Google/visibility, and blackholing their DNS entry in your HOSTs file doesn't really change much...

User avatar
MikeGale
Rank VI - Professional
Rank VI - Professional
Posts: 661
Joined: Mon Dec 13, 2004 2:50 pm
Location: Tannhauser Gate

Postby MikeGale » Tue Jul 15, 2008 3:53 pm

It stops me ever going to the site of those who paid for the spammer.

If it gets into one of the hosts block lists, like that MVP list, it does more widespread damage to the reachability of the domain in question.

When the spam sponsors find out they are likely to stop paying for spam and drop this little cyberfiend. Plus tell others about it.

A lot of people have given up taking retribution on these cursed humans. Some haven't.

Thanks very much for pointing out what's going on here.


Return to “HTML and XHTML”

Who is online

Users browsing this forum: No registered users and 1 guest