Andrew C. Oliver
Contributing Writer

Big data is dead โ€” long live big data

analysis
Apr 9, 20153 mins
Data ManagementHadoopRobotics

Soon, we'll see 'prepacked' applications that incorporate the distributed processing, machine learning, and analytics of today's overhyped, custom-made solutions

skeleton dead computer PC user
Credit: Damien du Toit

For the last few years weโ€™ve talked endlessly about big data, led by Hadoop and now Spark. The next round of hype is all about applying machine learning to big data, but thatโ€™s merely a way to sell AI and analytics to people without using those dirty words.

In truth, the big data era is rapidly coming to a close. Youโ€™ve probably seen media reports of big data pullbacks โ€” which, I suppose, puts us in the trough of disillusionment in Gartnerโ€™s famous hype cycle.

Now is the point where big data โ€œendsโ€ and actual application of the technology begins.

For the industry, this means there will be fewer โ€œletโ€™s roll out the platform and see what happensโ€ projects. The decision makers are going to take a more rational approach, as they should, and start with a business problem first. This means even the platform companies are talking about โ€œsolutions.โ€

Standard solutions for actual problems

The next big step is analyzing problems, finding patterns, and creating packaged solutions to those problems.

We already see this in the finance industry with the latest generation of distributed fraud detection packages wrapped up and ready to go. Fraud detection software isnโ€™t new, but distributing it at Hadoop and/or cloud scale is pretty fresh. Not only is finance happening faster, but so is fraud. For years, there has been a missile gap โ€” and the industry was losing. Now theyโ€™re fighting back, and Hadoop, Spark, and other modern tools are the firepower behind a new arsenal.

Custom-built solutions using the next wave of technology arenโ€™t enough. Fraud detection for credit cards isnโ€™t that different than for invoicing, insurance, or other common business applications. The next big wave isnโ€™t to write superspecialized apps for very specific industries, but to identify the โ€œdistributed big data patternsโ€ for solving common problems that exist across lines of business.

Sure, building custom solutions where everyone solves similar problems in different ways will persist for a while. But the future is finding commonality, developing patterns, and spreading that across lines of business โ€” that is, to use this new technology of massive distribution and cost-effective scale and apply it without blinders on. In the end, we customize it and use the right terms and add the twists, but designing pluggable algorithms in software that donโ€™t have to be written over and over again is what weโ€™re supposed to be good at, right?

Weโ€™ve seen this before. Decades ago, accounting software was a hot topic. While you can still occasionally find specialized accounting software for specific businesses, most big companies use a prepackaged solution thatโ€™s customized to some degree or has a plug-in specific to the industry in question. It seldom occurs to a skilled CIO or CTO to write an accounting package for a line of business, let alone one specific to the company. They buy off the shelf, even though there are no more shelves of software.

The next big leap is going โ€œdata drivenโ€ and using โ€œmachine learningโ€ through a series of software package acquisitions and trivial integration. It might be driven by big data in the back end, but โ€œbig dataโ€ will be like Ethernet cards: a given, but not a hot topic of conversation.