../

Captcha confirmation

Please fill the box with the word in the image.

Refresh Captcha
Confirm

7 Things You Must Know About Microformats Before You Die

Notice the catchy title? It's called copywriting, and I am not very good at it, but I'm trying. In the last posts I was pretty thorough with my writing, but this time I will try to keep it short and sweet, so you still have time to watch Star Wars Kid and Chocolate Rain on YouTube like the rest of us. So here's a pre-washed, pre-cooked, pre-heated, pre-screened, pre-approved, pre-packaged, post-dated, freeze-dried, double-wrapped, vacuum-packed list of the ten things you must know about Microformats:

1 - You Can Talk to Machines With Microformats

Microformats are used to annotate semantics, so that humans, and especially machines, can parse a page's content and know what it's all about. With regular markup, it's possible to interpret content with natural language processing and other artificial intelligence techniques, or simple hardcoded data scraping, but these techniques obviously fall short, as there is no better way to extract semantics than to have them already specified along with the content in the first place, and this is where Microformats come in.

2 - Microformats are Damn Easy

Microformats are not based on some weird esoteric language, they use pre-existing syntax to specify semantic information, with most formats being represented in HTML by using "class", "id", "title", "rel" and "rev" attributes. So in a way, if you know HTML, you already know everything you need to start using Microformats.

3 - HTML5 is a Microformat Killer (slight exageration)

HTML5 takes semantics into account, as it has added a lot of tags that are meant to be used instead of their generic "div" predecessors (example: "header", "section", "article", "footer"). For example, in the past while creating a blog post entry, one would probably use the "div" tag to wrap the post's contents, whilst with HTML5, the kosher thing to do would be to use the "article" tag instead, giving parsers a greater insight as to the enclosed contents.

As the HTML5 specification matures, it may integrate further markup that may turn some Microformats obsolete. However, Microformats should be here to stay, as no organization can be as fast as a single individual at writing a specification, and no one can stop you from making your own Microformat addressing a new semantic description need you have identified.

4 - Microformats are in the Wild

There are already loads of Microformats available out there. I'm going to show you examples of two stable and widely used Microformats, but first, switch to Firefox, and install the Operator extension. This extension will parse pages for a multitude of Microformats, extract their enclosed contents and provide you with operations you can perform on them. It will give you a superficial example on the benefits of annotating data with semantics.

Now that you're running Firefox with Operator, here are some Microformat examples:

hCard

This one is used to represent contact information. Below you can see my business card, annotated with the hCard Microformat:

Tiago Silva Hive Solutions Personal Web Site Hive Solutions Web Site

If you check out the Operator bar, you will notice that it has detected the above contact information. This was a pretty straightforward feat, as it may look like regular text, but if you look under the hood:

<div class="vcard">
<span class="fn">Tiago Silva</span>
<span class="org">Hive Solutions</span>
<a class="url fn n" href="http://www.tiagosilva.me">Personal Web Site</a>
<a class="url" href="http://www.hive.pt">Hive Solutions Web Site</a>
</div>

This markup allows the parser to know the enclosed content is a contact information, since the "div" has the "vcard" class which indicates that hCard is being used in its contents, guiding the parser on how to interpret the data.

You can use this creator to encapsulate your contact information with the hCard Microformat.

hCalendar

This Microformat is used to represent information about an event. Below you can see the event of me writing this post, annotated using the hCalendar Microformat:

June 9, 2010 6 - 6pm at Hive - Hive Solutions Post
Make a post for the Hive Solutions Blog.

Once again, if you check out the Operator bar, you will notice that it has detected the above event, and if you look at the code, you can see more than meets the eye:

<div class="vevent">
<a href="http://blog.hive.pt" class="url">
<abbr title="2010-06-09T18:0000" class="dtstart">June 9, 2010 6</abbr> -
<abbr title="2010-06-09T18:00" class="dtend">6</abbr>pm at
<span class="location">Hive</span> -
<span class="summary">Hive Solutions Post</span>
</a>
<div class="description">Make a post for the Hive Solutions Blog.</div>
</div>

This markup allows the parser to know the enclosed content is an event, because the "div" has the "vevent" class which indicates that hCalendar is being used in its contents, providing the parser with insight on how to interpret the data.

You can use this creator to encapsulate your events with the hCalendar Microformat.

5 - You're Already Using Microformats

Google is starting to pay attention to Microformats in its crawling endeavours. Nowadays, if you search for thai green mango salad recipe you will get the following result:

Google didn't used any cutting-edge algorithm to figure out that the underlying page was talking about a recipe that could be cooked in 20 minutes, and was reviewed 5 times. If you follow the link and analyze the page's source code, you will notice that it uses the hRecipe and hReview Microformats to annotate these details explicitly.

6 - Lorem ipsum ipsum lorem

'Sup dawg, I heard you like 7 bullet points, so I put a 6th bullet point in your bullet points so you can have 7 bullet points.

7 - The Semantic Web is Here

Straight out of Wikipedia, here's the definition of the Semantic Web for you:

The Semantic Web is an evolving development of the World Wide Web in which the meaning (semantics) of information on the web is defined, making it possible for machines to process it. It derives from World Wide Web Consortium director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange.

When you talk to an academic about the Semantic Web he/she will probably geek out on elaborated specifications like Resource Description Framework (RDF) being what the Semantic Web is all about, and it probably will be, but for now, keeping it real, HTML5 and Microformats are the tools we have to realistically make the web semantic today.

If you wonder where all of this is moving to, and what's the ultimate vision for the Semantic Web, have a look at WolframAlpha, a computational knowledge search engine where you can today make queries as magical as big mac + coke and know everything there is to know about this combination, thanks to the semantic data sources it uses to cross data and extract meaning out of of. Now imagine if WolframAlpha could use the whole web as its data source... now imagine if something much more intelligent and sophisticated could have access to the whole web as its data source...

Post a comment