Big data isn't a one-off project: It's a culture of collecting, analyzing, and using data
Hey, it must be hard to be the only person on the planet who doesnβt understand big data.
Actually, thatβs far from true: Youβre in good company. While Gartner finds that 64 percent of enterprises are investing in big data, a similar chunk (60 percent) donβt have a clue as to what to do with their data.
The real problem isnβt one of technology, but of process. The key to succeeding with big data, as in all serious IT investments, is iteration. Itβs not about Hadoop, NoSQL, Splunk, or any particular vendor or technology. Itβs about iteration.
Big data, big confusion Though the number of companies embracing big data projects has grown since 2012 β from 58 percent of enterprises surveyed to 64 percent β the level of understanding of exactly what to do with that data hasnβt kept pace, as the Gartner data suggests.
This isnβt all that surprising, given how hard it is to pull money from data. Itβs easy to say βactionable insights,β but far harder to glean them. Thatβs why data scientists currently out-earn most other professions, with an average salary of $123,000, which continues to go up.
Those who do data science well blend statistical, mathematical, and programming skills with domain knowledge, a tough combination to find in any single person. Of these, Iβd argue that domain knowledge matters most as it leads to the process of getting value from data, as Gartner analyst Svetlana Sicular hints:
Organizations already have people who know their own data better than mystical data scientists β¦. Learning Hadoop is easier than learning the companyβs business. What is left? To form a strong team of technology and business experts and supportive management who create a safe environment for innovation.
That βsafe environment for innovationβ is one that affords data practitioners room to iterate.
Innovation is iteration There are at least two major problems with big data projects. The first is that many companies consider them, well, projects. Big data isnβt a one-off project: Itβs a culture of collecting, analyzing, and using data. As Phil Simon, author of βToo Big to Ignore: The Business Case for Big Data,β told me: βDo you think that Amazon, Apple, Facebook, Google, Netflix, and Twitter do? Nope. Itβs part of their DNA.β
The way it becomes DNA, however, is the second detail that trips up companies getting into big data: They think itβs a technology issue. While most great big data technology is open source, building out a big data application isnβt as simple as downloading Hadoop or the NoSQL database of your choice. As IDC analyst Carl Olofson highlights:
Organizations should not jump too quickly into committing to any big data technology, whether Hadoop or otherwise, as their solution to a given problem, but should consider all the alternatives carefully and develop a strategy for big data technology deployment.
Such careful consideration happens by iterating. Rather than paying a mega-vendor a mega-check to get started (do this, and you are absolutely doing big data wrong), the right approach is to start small. As Thomas Edison noted, the trick is to fail fast or, as he put it, βI have not failed. Iβve just found 10,000 ways that wonβt work.β
Big data is all about asking the right questions, hence the importance of domain knowledge. But in reality, youβll probably fail to collect the right data and to ask pertinent questions β over and over again. The key, then, is to use flexible, open data infrastructure that allows you to continually tweak your approach until it bears real fruit.
Itβs not only about big data As mentioned above, this iterative approach isnβt solely for big data. Ideally, most of IT should follow this approach. As one executive at a Fortune 50 bank told me, βProduct stability comes from releasing code more frequently, not less. You want each release to be a non-event, not a major launch.β This, of course, is the main idea behind agile development.
Agile development is aided by the influx of data technologies that easily embrace dynamic schema such as that supported by Hadoop, as my colleague Dwight Merriman, founder of DoubleClick and MongoDB, suggests:
[Modern development is] agile development. We are talking about lots of iterations, lots of really small releases. We have a release each day; then, we change it. The product manager says, βNo, that is not exactly what I wanted,β and we change it yet again. This notion of iteration has interesting implications for the database and data layer. If you had a new schema migration every day, that would be painful. But if we have something fluid in terms of what is being stored, that fits really well with this notion of iteration.
Agile iteration, in other words, is the heart of innovation today. While technology facilitates this shift, itβs more a cultural shift than a technology shift. To innovate, you and your company need to start thinking of data as an essential ingredient to your day-to-day business, not a point project you code, then move on.
So long as you recognize that this culture will take time to build and accommodate plenty of failure along the way, you, too, can make big data into big business like Facebook and Google do.


