Captcha confirmation

Please fill the box with the word in the image.

Refresh Captcha
Confirm

Them Internets is Booming!!!

Pack your bags folks! The world is about to end, the Mayans were off by a year or so, and I don't think even them could foresee our end as "death by IP address shortage" at the time. Why the media hasn't picked up on this one as the soon to be apocalypse yet, is beyond me, but by the end of next year we will be running out of IP addresses to assign to stuff, which is kind of an inconvenience, since following the current trend, even my boxer shorts will be pingable any time soon.

If we do live to tell the tale, and IPv6 has indeed saved us all, we will have a gazillion of IP addresses to use, apparently more than enough for all the sensing and computing devices we are decorating our planet with. If you think we are in the era of information overload, you ain't seen nothin' yet. We are about to be sh*tstormed with a tidal wave of information the likes of the world as never seen. Twitter has done a marvelous job of flooding our beloved Internets with arguably valuable data, by dumping what's on everyone's mind, however irrelevant, onto the Web. It won't be long before everything is "tweeting" every single piece of relevant and/or irrelevant information to the Internet, from GPS devices, to cell phones, to toasters.

Web 3.0: Now with 532452% more Data (and peanuts, hopefully...)

We will have a lot of information on our hands and very little to do with it, or what we will able to do with it, will be with great effort. A paradigm shift is in order, and such is the promise of the Semantic Web, that of switching the Internet from a worldwide file server to a global database.

The problem with the Web nowadays, is that it's fine and dandy for retrieving information, but kind of sucks for retrieving knowledge. For example, I like to travel, but the planning and booking I have to do beforehand is always a huge pain (am I not an ungrateful bastard?). What I want is simple: I want to go to one or more locations, in a certain time range, with the cheapest transportation available with the shortest trip time, be accomodated in easily accessible spots, preferably near the center, with best rating for the lowest price. I know what I want, but here's what happens when I type that in Google:

What the hell am I supposed to do with these results? Read them, extract the information, cross reference it, apply my constraints mentally and hope to find a match? No friggin' way, I'm too lazy... especially when I know that this is a feasible query, that it's possible to compute my query and get a decent answer, if only data was described in a standard way. I was a Semantic Web atheist, as I always saw the concept as something invented by pot-smoking hippies from academia to keep their paper flow steady... but in May 2009, Wolfram|Alpha opened my eyes.

Couldn't See the Forest for the Trees...

Wolfram|Alpha is scaringly awesome, you can give it the most insane questions, and it will give you a decent answer most of the time. Like, hey, I wonder where the International Space Station is right now:

And the list of cool questions you can ask it just goes on and on:

Wolfram|Alpha accomplishes this by working on structured data sources, instead of flat pages of unstructured data. This way, it knows what the data is, what it means and what it's associated with, and can therefore cross different data sources to extract new knowledge. Imagine if a similar knowledge engine had access to every single piece of data on the Internet in a structured way. A lot of incredible things would be possible, and one of them, would be to get an answer to my god damn query. What if travel booking was as easy as asking a question like that and clicking the pay button? I stress, that this is more than possible, so there is no reason not to dream that high (or that low...).

Heigh-ho, Heigh-ho, Semantify the Web we Go, Tralalalala...

The question is, how to go about making the Web semantic. That's where the going gets tough... In order to annotate data with semantics, W3C proposes its Resource Description Framework (RDF) family of specifications. Basically you're supposed to use these to annotate your data so that it can be understood by computers. For example, if you had a travel agency website, annotating it in such a way, would bring my holy grail backpacking online service one step closer to reality, since its travel plans could be used as a data source. The huge problem is that producing this data is humongously painful, and there's little to gain from it from the point of view of who's annotating the data. Therefore, from my point of view, this bottom-up approach to the Semantic Web is just a fairy tale acid trip.

The solution is cleary top-down, this data has to be at least initially produced by machines, and later fine-tuned by humans. This is the approach used by Freebase, an open database of structured information, which was recently acquired by Google. Freebase initially harvested its data from unstructured data sources such as Wikipedia, and now relies on crowdsourcing for collaborative fine-tuning.

So what's my point? I honestly don't know... just want to shout out to my homies that the Semantic Web is "for reals", it's not a pipe dream, and it will come about faster than you expect, just still not sure in what shape or form, but definitely the one that offers the path of least resistance. Which I may speculate to be in the form of a killer Semantic Web Application that leaves huge amounts of open structured data in its trail, like a semantic "googlish" search engine that is incredible enough to break the muscle memory imprinted habit of using Google after every question mark that pops into one's head.

Saint John on Rails

Today is Saint John's day here in Porto, where Hive Solutions is headquartered.

Which means that tonight, thousands of people will hit the streets, in what is probably Europe's liveliest street festival, to bang each other's heads with plastic hammers, eat sardines, and watch fireworks. And I will too, just after I finish this post :). So, on a completely unrelated note, I am going to quickly rant on Ruby on Rails.

To Rail or not to Rail

Rails rocks, or so they say. Until now, I knew very little about Ruby on Rails. I knew what it was capable of, what the framework was meant for, who designed it, its as well as their history, their success, and all the associated hype that roams the web (and also knew a little bit of Ruby). Basically, my paradigm was: "For the scenarios it was built for, Rails rocks.".

But now that I have actually implemented some new features for another company's product, which was built on Rails, and had to learn it for real, some of the hype has dissipated, and I was left with reality.

1 - Ruby is just too awesome

Ruby is a very liberal language, in the sense that it has some nice features, like everything being an object, which allow people to create delicious syntax sugar at first glance, and cryptic mumbo-jumbo for whoever has to understand the code afterwards. For example, here's a creative way of creating a loop in Ruby:

5.times { print "This is one of the many ways you can loop five times." }

Cool as it may seem (and it is), if possible, I personally don't like having more than one way to do a thing, since I feel that it makes it just too easy for a team to create a messy codebase, in case they haven't agreed on strict code standards beforehand.

2 - Rails is just too damn helpful

Rails shortcuts a lot. In its quest for DRYness (Don't Repeat Yourself 'ness), it provides a lot of default behaviour (example: default template routing) which in the beginning provides a developer like myself with a pleasantly paranoid frustration of having things work, but not understanding why, which may be interesting when starting from scratch, but completely sucks when you're building stuff on top of a codebase that isn't yours and you're still not done grasping it completely, as you have Rails' smoke and mirrors making that job harder (this is obviously a very personal rant :P).

3 - Rails is stateless

This is one of the reasons I wouldn't trust Rails if I was going to build something big. The need for state, would pop in at one time or another. Just on the top of my head, if I wanted to have a scheduling task, I am not seeing my way out of having to run a cronjob, and getting out of the development stack truly sucks for many reasons, one of them having to do with portability.

4 - Rails is not modular

This is the ultimate reason why I won't do anything big in Rails, I am addicted to modularity, so I would use Colony instead. Taking all the advantages that runtime modularity brings out of the equation, just the paradigm that Colony would impose on my development, by forcing me to separate my application's concerns gracefully, would pay off so damn much in the long run, that I wouldn't consider any other option.

So what?

My opinion may change, but currently, if I was going to build a small project with very specific requirements, very fast, I would probably use Rails, and if I was doing something big, I would use Colony. Anyway, I don't see a reason why Colony won't be able to beat Rails in this regard as well in the nearby future :).

7 Things You Must Know About Microformats Before You Die

Notice the catchy title? It's called copywriting, and I am not very good at it, but I'm trying. In the last posts I was pretty thorough with my writing, but this time I will try to keep it short and sweet, so you still have time to watch Star Wars Kid and Chocolate Rain on YouTube like the rest of us. So here's a pre-washed, pre-cooked, pre-heated, pre-screened, pre-approved, pre-packaged, post-dated, freeze-dried, double-wrapped, vacuum-packed list of the ten things you must know about Microformats:

1 - You Can Talk to Machines With Microformats

Microformats are used to annotate semantics, so that humans, and especially machines, can parse a page's content and know what it's all about. With regular markup, it's possible to interpret content with natural language processing and other artificial intelligence techniques, or simple hardcoded data scraping, but these techniques obviously fall short, as there is no better way to extract semantics than to have them already specified along with the content in the first place, and this is where Microformats come in.

2 - Microformats are Damn Easy

Microformats are not based on some weird esoteric language, they use pre-existing syntax to specify semantic information, with most formats being represented in HTML by using "class", "id", "title", "rel" and "rev" attributes. So in a way, if you know HTML, you already know everything you need to start using Microformats.

3 - HTML5 is a Microformat Killer (slight exageration)

HTML5 takes semantics into account, as it has added a lot of tags that are meant to be used instead of their generic "div" predecessors (example: "header", "section", "article", "footer"). For example, in the past while creating a blog post entry, one would probably use the "div" tag to wrap the post's contents, whilst with HTML5, the kosher thing to do would be to use the "article" tag instead, giving parsers a greater insight as to the enclosed contents.

As the HTML5 specification matures, it may integrate further markup that may turn some Microformats obsolete. However, Microformats should be here to stay, as no organization can be as fast as a single individual at writing a specification, and no one can stop you from making your own Microformat addressing a new semantic description need you have identified.

4 - Microformats are in the Wild

There are already loads of Microformats available out there. I'm going to show you examples of two stable and widely used Microformats, but first, switch to Firefox, and install the Operator extension. This extension will parse pages for a multitude of Microformats, extract their enclosed contents and provide you with operations you can perform on them. It will give you a superficial example on the benefits of annotating data with semantics.

Now that you're running Firefox with Operator, here are some Microformat examples:

hCard

This one is used to represent contact information. Below you can see my business card, annotated with the hCard Microformat:

Tiago Silva Hive Solutions Personal Web Site Hive Solutions Web Site

If you check out the Operator bar, you will notice that it has detected the above contact information. This was a pretty straightforward feat, as it may look like regular text, but if you look under the hood:

<div class="vcard">
<span class="fn">Tiago Silva</span>
<span class="org">Hive Solutions</span>
<a class="url fn n" href="http://www.tiagosilva.me">Personal Web Site</a>
<a class="url" href="http://www.hive.pt">Hive Solutions Web Site</a>
</div>

This markup allows the parser to know the enclosed content is a contact information, since the "div" has the "vcard" class which indicates that hCard is being used in its contents, guiding the parser on how to interpret the data.

You can use this creator to encapsulate your contact information with the hCard Microformat.

hCalendar

This Microformat is used to represent information about an event. Below you can see the event of me writing this post, annotated using the hCalendar Microformat:

June 9, 2010 6 - 6pm at Hive - Hive Solutions Post
Make a post for the Hive Solutions Blog.

Once again, if you check out the Operator bar, you will notice that it has detected the above event, and if you look at the code, you can see more than meets the eye:

<div class="vevent">
<a href="http://blog.hive.pt" class="url">
<abbr title="2010-06-09T18:0000" class="dtstart">June 9, 2010 6</abbr> -
<abbr title="2010-06-09T18:00" class="dtend">6</abbr>pm at
<span class="location">Hive</span> -
<span class="summary">Hive Solutions Post</span>
</a>
<div class="description">Make a post for the Hive Solutions Blog.</div>
</div>

This markup allows the parser to know the enclosed content is an event, because the "div" has the "vevent" class which indicates that hCalendar is being used in its contents, providing the parser with insight on how to interpret the data.

You can use this creator to encapsulate your events with the hCalendar Microformat.

5 - You're Already Using Microformats

Google is starting to pay attention to Microformats in its crawling endeavours. Nowadays, if you search for thai green mango salad recipe you will get the following result:

Google didn't used any cutting-edge algorithm to figure out that the underlying page was talking about a recipe that could be cooked in 20 minutes, and was reviewed 5 times. If you follow the link and analyze the page's source code, you will notice that it uses the hRecipe and hReview Microformats to annotate these details explicitly.

6 - Lorem ipsum ipsum lorem

'Sup dawg, I heard you like 7 bullet points, so I put a 6th bullet point in your bullet points so you can have 7 bullet points.

7 - The Semantic Web is Here

Straight out of Wikipedia, here's the definition of the Semantic Web for you:

The Semantic Web is an evolving development of the World Wide Web in which the meaning (semantics) of information on the web is defined, making it possible for machines to process it. It derives from World Wide Web Consortium director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange.

When you talk to an academic about the Semantic Web he/she will probably geek out on elaborated specifications like Resource Description Framework (RDF) being what the Semantic Web is all about, and it probably will be, but for now, keeping it real, HTML5 and Microformats are the tools we have to realistically make the web semantic today.

If you wonder where all of this is moving to, and what's the ultimate vision for the Semantic Web, have a look at WolframAlpha, a computational knowledge search engine where you can today make queries as magical as big mac + coke and know everything there is to know about this combination, thanks to the semantic data sources it uses to cross data and extract meaning out of of. Now imagine if WolframAlpha could use the whole web as its data source... now imagine if something much more intelligent and sophisticated could have access to the whole web as its data source...

Older Posts