A new poll of customers provides a brighter, more detailed picture of Hadoop adoption than Gartner's famously downbeat survey
There are lies, damn lies โฆ and technology industry studies.
Back in May,ย Gartner released a surveyย that crapped all over the Hadoop industry:ย Ofย the 284 CIOs polled by Gartner, only 26 percent claimed to be โeither deploying, piloting, or experimenting with Hadoop.โ Even with the small sample size and large margin of error, these were depressing numbers โ ones that contradicted the level of adoption people like me are seeing in the real world.
Well,ย Gartner has been wrong before. Released today, aย new survey commissioned by AtScaleย that surveyed more than 2,100 people offers results a whole lot closer to what those of us in the field encounter. Most dramatically, 76 percent of respondents said they plan to use Hadoop โ or are already using it and plan to use it more.
Naturally, the AtScale numbers need to be taken with a grain of salt: AtScale is a Hadoop solutions provider and the survey was conducted on its behalf. But anecdotally, at least, I can tell you that the results paint a picture a closer to reality than Gartnerโs doom and gloom.
Hadoopโs killer app
Some of you might find this shocking, but the AtScale survey indicates that Hadoopโs killer app is business intelligence โ for 69 percent of those planning to use Hadoop and 65 percent of those already using it. If thatโs a surprise, then you didnโt read even the top item in my article โThe 7 most common Hadoop and Spark projects.โ
Most companies donโt have โbig dataโ โ only many new unstructured or semistructured data sources โ and theyโd like to gain insight by aggregating them and hooking up a visualization tool. In fact, according to the study, most want to gain insight using Tableau or Excel. If theyโre already using Hadoop, they are probably also working with Tableau (51 percent). If they arenโt, theyโd like to use Excel (60 percent).
This matches exactly what I see in the field. My companyโs bread and butter is building data lakesย (or enterprise data hubs, if you prefer). As the survey confirms, these new Hadoop-based systems generally donโt replace Teradata or Netezza. Instead, customers either want to augment their existing MPP to handle new types of data or they donโt have MPP in place at all. In my experience, companies find their MPP systems canโt scale the way they hoped โ and discover they can shove Hadoop on regular hardware (or on Amazon) and add more nodes as they grow.
According to the study, the lower cost of Hadoop solutions isnโt the main attraction for most companies. But cost and scale are always connected. If youโre looking to do BI and analytics today, you probably donโt think โlet me buy this weird proprietary hardware columnar database thingโ as a first step. In fact, if you draw the architecture of a Netezza next to the architecture of Hive or maybe HBase plus Phoenix, youโll see a very similar structure.
What does Netezzaโs owner IBM think? Well, it thinks about Spark.
Self-service is the goal
Most companies want to achieve a level of self-service from Hadoop. According to the study, the companies that have achieved significant business value have already reached some level of self-service.
Self-service has multiple meanings. On the one hand, you need fewer people involved in managing Hadoop. On the other hand, you need a sufficient amount of data in the lake so that a new feed isnโt needed for each new report or dashboard. You also need views and general structure around it to make sure a mere mortal can query it with SQL. Yes, the main way people practice self-service is with SQL tools.
According to the study, most people havenโt achieved self-service and thus havenโt achieved the tangible value they were looking for.
Anemic 10-node clusters are losers
โHello worldโ in Hadoop 2 takes 12 nodes. Anything smaller and you get a really slow version of what you already had in SQL Server. According to the study, the people who have larger clusters have achieved more value.
This isnโt shocking. Iโve mentioned more than a few times that Hive is slow but scales well, and that can be said of other Hadoop technologies. If you have a 10-node cluster, itโs probably barely functional. Of course you didnโt achieve value.
Secondly, if revenue generation (14 percent) or scale-out (37 percent) are your main business drivers rather than cost, but you donโt actually scale out โ then of course you donโt achieve value. Iโd say this connects to another finding of the survey, which Iโve covered many times before: If you have an executive mandate, not sponsorship alone, you have a 20 percent better chance of achieving value. In my experience, having an executive mandate usually results in a larger cluster.
Exploring the verticals
I made my bones as an open source developer, but today, I do a lot of what could be called sales or sales engineering. Doing this requires fun exercises, such as: โWhich industries do we want to focus on?โ I came up with financial services, health care, retail, and manufacturing. This was mainly a function of what weโd done before, where we are, and who calls us the most. According to the study, retail didnโt make the short list of industries using Hadoop, despite producing many early success stories.
Manufacturing, consulting, telecommunications, financial services and health care all made the list. In my opinion, the opportunities are growing quickest in financial services and health care.
Youโd expect financial services companies to be relatively mature users of Hadoop, but that seems to hold true only for the top-tier banks. The next level of clients are barely kicking the tires, but want what the big boys have. Meanwhile, the Affordable Care Actย has driven the need for โmeaningful useโ of electronic medical records. This involves data from disparate systems โ for infection control, population health management, and so on.
Actually deriving meaningful information means consolidating data from multiple data sources. That sounds a lot like data consolidation, data lakes, and BI, doesnโt it?
The talent gap
If Gartner had been right about Hadoop stalling, it wouldnโt be such a problem to find people with Hadoop skills. I can tell you recruiting experienced Hadoop people is not easy, and once you train them, you have to pay them well and keep them engaged as 1,000 recruiters descend upon them.
According to AtScaleโs survey, 61 percent of respondents view the talent pool as the biggest challenge to adoption. This doesnโt change once customers adopt Hadoop, although they find that management, security, performance, governance, and accessibility are bigger issues than they realized.
A brighter picture
In the AtScale study, 49 percent of respondents have already achieved value and 45 percent are optimistic about achieving it. Only 6 percent are pessimistic, and 3 percent plan to use Hadoop less in the future. Are these numbers too rosy to be believed? Probably.ย
Nonetheless, Iโll bask in the glow of a survey that reflects what Iโve seen in the field: Brisk and constant adoption โ mainly for consolidation โ and a rapidly maturing technology that people get value from.
AtScale plans to do a follow-up survey. I hope to see a deeper view of companies that never bought Teradata and Netezza, how companies are using Spark, and whether they are buying hardware for Hadoop. Meanwhile, Iโd love to hear from others in the Hadoop community about adoption levels, challenges, and opportunities going forward.


