Venkatesh Rao on Big Data, Machine Learning, and Blockchains

Venkatesh Rao had a good take on the big data/machine learning/blockchain mania in Breaking Smart a few weeks ago:

Many people, database experts among them, dismiss Big Data as a fad that’s already come and gone, and argue that it was a meaningless term, and that relational databases can do everything NoSQL databases can. That’s not the point!¬†The point of Big Data, pointed out by George Dyson, is that computing undergoes a fundamental phase shift when it crosses the Big Data threshold: when it is cheaper to store data than to decide what to do with it. The point of Big Data technologies is not to perversely use less powerful database paradigms, but to defer decision-making about data — how to model, structure, process, and analyze it — to when (and if) you need to, using the simplest storage technology that will do the job.A organization that chooses to store all its raw data, developing an eidetic corporate historical memory so to speak, creates informational potential and invests in its own future wisdom.

Next, there is machine learning. Here the connection is obvious. The more you have access to massive amounts of stored data, the more you can apply deep learning techniques to it (they really only work at sufficiently massive data scales) to extract more of the possible value represented by the information. I’m not quite sure what a literal Maxwell’s Historian might do with its history of stored molecule velocities, but I can think of plenty of ways to use more practical historical data.

And finally, there are blockchains. Again, database curmudgeons (what is it about these guys??) complain that distributed databases can do everything blockchains can, more cheaply, and that blockchains are just really awful, low-capacity, expensive distributed databases (pro-tip, anytime a curmudgeon makes an “X is just Y” statement, you should assume by default that the(X-Y) differences they are ignoring are the whole point of X). As with Big Data, they are missing the point. The essential feature of blockchains is not that they can poorly and expensively mimic the capabilities of distributed databases, but do so in a near-trustless decentralized¬†way, with strong irreversibility and immutability properties.