software = science + art + people
2013-04-09
“Big Data” is another one of those buzz words that seems to be everywhere these days. We hear stories regularly about how fast the world’s data grows and how big it’s going to be by 20xx. Vendors then reason that we should buy their wares to cope. This infographic is typical:
I have several deep professional connections to big data[1], going back decades, so when I say I think a lot of it is manufactured silliness, I’m hoping you’ll pause before laughing me off.
The fact is, most of the “data” that’s exploding is not hard-won intellectual treasure for the ages; it’s marginal stuff like the viewing history on Fred Flintstone’s deleted Netflix account. More than big data, we’re experiencing a “big crud” wave, because we’re pack rats. This comic has it right:
I’m not claiming that all big data is worthless; some amazing things become possible at the scale of billions of records. For Netflix, maybe Fred Flintstone’s viewing history is valuable. Maybe. However, big data is only an asset if we can derive some value from it. And an awful lot of big data doesn’t pass that smell test, either because our tools are inadequate, or because the data becomes stale, or because it wasn’t particularly interesting data to start with.
The value we want to derive is insight.
If you’re willing to be serious about the big data wave, then find the best of breed tools that push what’s possible. I recommend capturing value from big data while it’s in flight, and not storing it at all.
If you don’t want to surf the wave, then I have a relatively easy[2] solution. It’s called the delete button. Go watch an episode of “Hoarders” and tell me I’m wrong. :-)
Comments-
Jesse Harris, 2013-04-09:
Data that isn't valuable today could be critical tomorrow, and getting rid of it is irreversible. The very nature of data forces us to become digital packrats, accumulating and maintaining bits (pun intended) of cruft for what seems like an incomprehensible period of time. With storage getting cheaper and cheaper, there's not much disincentive to do so. I was really disappointed that Microsoft backed off of its ambitious WinFS project. It would have helped home users tame some of this ever-increasing data.