Andrew C. Oliver
Contributing Writer

Hadoop, in trouble? Only in Gartner-land

analysis
Sep 17, 20157 mins
Business IntelligenceHadoopTechnology Industry

A new poll of customers provides a brighter, more detailed picture of Hadoop adoption than Gartner's famously downbeat survey

Elephant dog rain tint
Credit: Thinkstock

There are lies, damn lies โ€ฆ and technology industry studies.

Back in May,ย Gartner released a surveyย that crapped all over the Hadoop industry:ย Ofย the 284 CIOs polled by Gartner, only 26 percent claimed to be โ€œeither deploying, piloting, or experimenting with Hadoop.โ€ Even with the small sample size and large margin of error, these were depressing numbers โ€” ones that contradicted the level of adoption people like me are seeing in the real world.

Well,ย Gartner has been wrong before. Released today, aย new survey commissioned by AtScaleย that surveyed more than 2,100 people offers results a whole lot closer to what those of us in the field encounter. Most dramatically, 76 percent of respondents said they plan to use Hadoop โ€” or are already using it and plan to use it more.

Naturally, the AtScale numbers need to be taken with a grain of salt: AtScale is a Hadoop solutions provider and the survey was conducted on its behalf. But anecdotally, at least, I can tell you that the results paint a picture a closer to reality than Gartnerโ€™s doom and gloom.

Hadoopโ€™s killer app

Some of you might find this shocking, but the AtScale survey indicates that Hadoopโ€™s killer app is business intelligence โ€” for 69 percent of those planning to use Hadoop and 65 percent of those already using it. If thatโ€™s a surprise, then you didnโ€™t read even the top item in my article โ€œThe 7 most common Hadoop and Spark projects.โ€

Most companies donโ€™t have โ€œbig dataโ€ โ€” only many new unstructured or semistructured data sources โ€” and theyโ€™d like to gain insight by aggregating them and hooking up a visualization tool. In fact, according to the study, most want to gain insight using Tableau or Excel. If theyโ€™re already using Hadoop, they are probably also working with Tableau (51 percent). If they arenโ€™t, theyโ€™d like to use Excel (60 percent).

This matches exactly what I see in the field. My companyโ€™s bread and butter is building data lakesย (or enterprise data hubs, if you prefer). As the survey confirms, these new Hadoop-based systems generally donโ€™t replace Teradata or Netezza. Instead, customers either want to augment their existing MPP to handle new types of data or they donโ€™t have MPP in place at all. In my experience, companies find their MPP systems canโ€™t scale the way they hoped โ€” and discover they can shove Hadoop on regular hardware (or on Amazon) and add more nodes as they grow.

According to the study, the lower cost of Hadoop solutions isnโ€™t the main attraction for most companies. But cost and scale are always connected. If youโ€™re looking to do BI and analytics today, you probably donโ€™t think โ€œlet me buy this weird proprietary hardware columnar database thingโ€ as a first step. In fact, if you draw the architecture of a Netezza next to the architecture of Hive or maybe HBase plus Phoenix, youโ€™ll see a very similar structure.

What does Netezzaโ€™s owner IBM think? Well, it thinks about Spark.

Self-service is the goal

Most companies want to achieve a level of self-service from Hadoop. According to the study, the companies that have achieved significant business value have already reached some level of self-service.

Self-service has multiple meanings. On the one hand, you need fewer people involved in managing Hadoop. On the other hand, you need a sufficient amount of data in the lake so that a new feed isnโ€™t needed for each new report or dashboard. You also need views and general structure around it to make sure a mere mortal can query it with SQL. Yes, the main way people practice self-service is with SQL tools.

According to the study, most people havenโ€™t achieved self-service and thus havenโ€™t achieved the tangible value they were looking for.

Anemic 10-node clusters are losers

โ€œHello worldโ€ in Hadoop 2 takes 12 nodes. Anything smaller and you get a really slow version of what you already had in SQL Server. According to the study, the people who have larger clusters have achieved more value.

This isnโ€™t shocking. Iโ€™ve mentioned more than a few times that Hive is slow but scales well, and that can be said of other Hadoop technologies. If you have a 10-node cluster, itโ€™s probably barely functional. Of course you didnโ€™t achieve value.

Secondly, if revenue generation (14 percent) or scale-out (37 percent) are your main business drivers rather than cost, but you donโ€™t actually scale out โ€” then of course you donโ€™t achieve value. Iโ€™d say this connects to another finding of the survey, which Iโ€™ve covered many times before: If you have an executive mandate, not sponsorship alone, you have a 20 percent better chance of achieving value. In my experience, having an executive mandate usually results in a larger cluster.

Exploring the verticals

I made my bones as an open source developer, but today, I do a lot of what could be called sales or sales engineering. Doing this requires fun exercises, such as: โ€œWhich industries do we want to focus on?โ€ I came up with financial services, health care, retail, and manufacturing. This was mainly a function of what weโ€™d done before, where we are, and who calls us the most. According to the study, retail didnโ€™t make the short list of industries using Hadoop, despite producing many early success stories.

Manufacturing, consulting, telecommunications, financial services and health care all made the list. In my opinion, the opportunities are growing quickest in financial services and health care.

Youโ€™d expect financial services companies to be relatively mature users of Hadoop, but that seems to hold true only for the top-tier banks. The next level of clients are barely kicking the tires, but want what the big boys have. Meanwhile, the Affordable Care Actย has driven the need for โ€œmeaningful useโ€ of electronic medical records. This involves data from disparate systems โ€” for infection control, population health management, and so on.

Actually deriving meaningful information means consolidating data from multiple data sources. That sounds a lot like data consolidation, data lakes, and BI, doesnโ€™t it?

The talent gap

If Gartner had been right about Hadoop stalling, it wouldnโ€™t be such a problem to find people with Hadoop skills. I can tell you recruiting experienced Hadoop people is not easy, and once you train them, you have to pay them well and keep them engaged as 1,000 recruiters descend upon them.

According to AtScaleโ€™s survey, 61 percent of respondents view the talent pool as the biggest challenge to adoption. This doesnโ€™t change once customers adopt Hadoop, although they find that management, security, performance, governance, and accessibility are bigger issues than they realized.

A brighter picture

In the AtScale study, 49 percent of respondents have already achieved value and 45 percent are optimistic about achieving it. Only 6 percent are pessimistic, and 3 percent plan to use Hadoop less in the future. Are these numbers too rosy to be believed? Probably.ย 

Nonetheless, Iโ€™ll bask in the glow of a survey that reflects what Iโ€™ve seen in the field: Brisk and constant adoption โ€” mainly for consolidation โ€” and a rapidly maturing technology that people get value from.

AtScale plans to do a follow-up survey. I hope to see a deeper view of companies that never bought Teradata and Netezza, how companies are using Spark, and whether they are buying hardware for Hadoop. Meanwhile, Iโ€™d love to hear from others in the Hadoop community about adoption levels, challenges, and opportunities going forward.