Big Data, Big Problems: The Trouble With Storage Overload

We collect an astonishing amount of digital information. But as the Economist recently pointed out, we've long since surpassed our ability to store it all. Big data is here, and it's causing big problems.

Walmart's transaction databases are a whopping 2.5 petrabytes. There are more than 40 billion photos hosted by Facebook alone. When there's this much data floating around, it becomes nearly impossible to sort and analyse. And it's only expanding faster: the amount of digital information increases tenfold every five years.

We've also running out of space. The Economist reports that the amount of information created will more than double the available storage by 2011.

And the data we can store becomes more and more difficult to sort for future generations of researchers and businesses.

This may not seem like such a huge deal, but take a more recent, practical example. To produce the definitive word on the Lehman Brothers bankruptcy, court-appointed examiner Anton R. Valukas had to sift through 350 billion pages of electronic documents. That's three quadrillion bites of data. So how'd he look through all that information?

Simple. He didn't. Instead, loose search parameters were used to cut the number of emails and documents roughly in half, then teams of lawyers pared down what was left to a "manageable" 34 million pages. Valukas's final report was an expansive 2200 pages long, but there's no way he was able to process all of the relevant documents, or that he was able to tell the whole story.

If there's hope to be found, it's in metadata. Much like library cards kept you from having to read every book, Google arranges your search queries and Flickr your photos. Even the tags on Gizmodo make it more manageable to find relevant content. But while metadata gives things searchable labels, the fact that it's often crowd-sourced means that those labels are at best inconsistent and at worst incomprehensible.

We have a more thorough record of our lives and the world around us now than we ever have before. We can map the human genome in a week, for goodness sake. All of that's something to be thankful for. We should be leaving behind as much of a record of our existence as possible. But we should also figure out how to manage it, and present it, before big data balloons totally out of our control. [Economist]