Hive Solutions - The diary

Pack your bags folks! The world is about to end, the Mayans were off by a year or so, and I don't think even them could foresee our end as "death by IP address shortage" at the time. Why the media hasn't picked up on this one as the soon to be apocalypse yet, is beyond me, but by the end of next year we will be running out of IP addresses to assign to stuff, which is kind of an inconvenience, since following the current trend, even my boxer shorts will be pingable any time soon.

If we do live to tell the tale, and IPv6 has indeed saved us all, we will have a gazillion of IP addresses to use, apparently more than enough for all the sensing and computing devices we are decorating our planet with. If you think we are in the era of information overload, you ain't seen nothin' yet. We are about to be sh*tstormed with a tidal wave of information the likes of the world as never seen. Twitter has done a marvelous job of flooding our beloved Internets with arguably valuable data, by dumping what's on everyone's mind, however irrelevant, onto the Web. It won't be long before everything is "tweeting" every single piece of relevant and/or irrelevant information to the Internet, from GPS devices, to cell phones, to toasters.

Web 3.0: Now with 532452% more Data (and peanuts, hopefully...)

We will have a lot of information on our hands and very little to do with it, or what we will able to do with it, will be with great effort. A paradigm shift is in order, and such is the promise of the Semantic Web, that of switching the Internet from a worldwide file server to a global database.

The problem with the Web nowadays, is that it's fine and dandy for retrieving information, but kind of sucks for retrieving knowledge. For example, I like to travel, but the planning and booking I have to do beforehand is always a huge pain (am I not an ungrateful bastard?). What I want is simple: I want to go to one or more locations, in a certain time range, with the cheapest transportation available with the shortest trip time, be accomodated in easily accessible spots, preferably near the center, with best rating for the lowest price. I know what I want, but here's what happens when I type that in Google:

What the hell am I supposed to do with these results? Read them, extract the information, cross reference it, apply my constraints mentally and hope to find a match? No friggin' way, I'm too lazy... especially when I know that this is a feasible query, that it's possible to compute my query and get a decent answer, if only data was described in a standard way. I was a Semantic Web atheist, as I always saw the concept as something invented by pot-smoking hippies from academia to keep their paper flow steady... but in May 2009, Wolfram|Alpha opened my eyes.

Couldn't See the Forest for the Trees...

Wolfram|Alpha is scaringly awesome, you can give it the most insane questions, and it will give you a decent answer most of the time. Like, hey, I wonder where the International Space Station is right now:

And the list of cool questions you can ask it just goes on and on:

Wolfram|Alpha accomplishes this by working on structured data sources, instead of flat pages of unstructured data. This way, it knows what the data is, what it means and what it's associated with, and can therefore cross different data sources to extract new knowledge. Imagine if a similar knowledge engine had access to every single piece of data on the Internet in a structured way. A lot of incredible things would be possible, and one of them, would be to get an answer to my god damn query. What if travel booking was as easy as asking a question like that and clicking the pay button? I stress, that this is more than possible, so there is no reason not to dream that high (or that low...).

Heigh-ho, Heigh-ho, Semantify the Web we Go, Tralalalala...

The question is, how to go about making the Web semantic. That's where the going gets tough... In order to annotate data with semantics, W3C proposes its Resource Description Framework (RDF) family of specifications. Basically you're supposed to use these to annotate your data so that it can be understood by computers. For example, if you had a travel agency website, annotating it in such a way, would bring my holy grail backpacking online service one step closer to reality, since its travel plans could be used as a data source. The huge problem is that producing this data is humongously painful, and there's little to gain from it from the point of view of who's annotating the data. Therefore, from my point of view, this bottom-up approach to the Semantic Web is just a fairy tale acid trip.

The solution is cleary top-down, this data has to be at least initially produced by machines, and later fine-tuned by humans. This is the approach used by Freebase, an open database of structured information, which was recently acquired by Google. Freebase initially harvested its data from unstructured data sources such as Wikipedia, and now relies on crowdsourcing for collaborative fine-tuning.

So what's my point? I honestly don't know... just want to shout out to my homies that the Semantic Web is "for reals", it's not a pipe dream, and it will come about faster than you expect, just still not sure in what shape or form, but definitely the one that offers the path of least resistance. Which I may speculate to be in the form of a killer Semantic Web Application that leaves huge amounts of open structured data in its trail, like a semantic "googlish" search engine that is incredible enough to break the muscle memory imprinted habit of using Google after every question mark that pops into one's head.