Page 1 of 1

Real data inside web pages

PostPosted: Thu Mar 29, 2012 6:30 pm
by MikeGale
I'm interested in the use of web pages for more than eyecandy, text and pictures.

Maybe others here are too.

Seems that some Germans have been looking into it.

http://j.mp/Hp8pWq

They've looked at a lot of pages (from common Crawl) and extracted the information where it exists.

Of the dozen or so formats they looked at:
* hCalendar Microformat (details of events) seems to have been most common
* Followed by XFN (XHTML Friends Network), RDFa and increasingly html-microdata.

The big story is that RDFa is growing robustly, html-microdata is also growing and everything else is either growing slightly or in a decline. (The full detail is more nuanced than that!)

In the batch (/batch set) they processed 2,565,741,671 URL's of which 251,855,917 had such data. That's 9.8%. One heck of a lot more than I expected.

Apparently the work cost all of something like USD 600. That is astonishing!

If anybody here is using these things I suggest checking out what RDFa and html-microdata, have going for them.

If anybody has insight into these, would appreciate a heads up.