Real data inside web pages

Post here if your message doesn't fit into another forum but is still about web development. Includes site critiques, web hosting and server questions, helpful software and resources, and more.
Post Reply
User avatar
Rank VI - Professional
Rank VI - Professional
Posts: 711
Joined: Mon Dec 13, 2004 1:50 pm
Location: Tannhauser Gate

Real data inside web pages

Post by MikeGale » Thu Mar 29, 2012 6:30 pm

I'm interested in the use of web pages for more than eyecandy, text and pictures.

Maybe others here are too.

Seems that some Germans have been looking into it.

They've looked at a lot of pages (from common Crawl) and extracted the information where it exists.

Of the dozen or so formats they looked at:
* hCalendar Microformat (details of events) seems to have been most common
* Followed by XFN (XHTML Friends Network), RDFa and increasingly html-microdata.

The big story is that RDFa is growing robustly, html-microdata is also growing and everything else is either growing slightly or in a decline. (The full detail is more nuanced than that!)

In the batch (/batch set) they processed 2,565,741,671 URL's of which 251,855,917 had such data. That's 9.8%. One heck of a lot more than I expected.

Apparently the work cost all of something like USD 600. That is astonishing!

If anybody here is using these things I suggest checking out what RDFa and html-microdata, have going for them.

If anybody has insight into these, would appreciate a heads up.

Post Reply