Forgoing features for speed has its trade-offs, as these NoSQL data store shortcomings show
The NoSQL buzzword has been metastasizing for several years. The excitement about these fast data stores has been intoxicating, and weโre as guilty as anyone of seeing the groundbreaking appeal of NoSQL. Yet the honeymoon is coming to an end, and itโs time to start balancing our enthusiasm with some gimlet-eyed hard truths.
Donโt get us wrong. Weโre still running to try the latest experiment in building a simple mechanism for storing data. We still find deep value in MongoDB, CouchDB, Cassandra, Riak, and other NoSQL standouts. Weโre still planning on tossing some of our most trusted data into these stacks of code because theyโre growing better and more battle-tested each day.
[ Also on InfoWorld: NoSQL standouts: New databases for new applications | First look: Oracle NoSQL Database | Get a digest of the key stories each day in the InfoWorld Daily newsletter. ]
But weโre starting to feel the chafing, as the NoSQL systems are far from a perfect fit and often rub the wrong way. The smartest developers knew this from the beginning. They didnโt burn the SQL manuals and send nastygrams to the sales force of their once devoted SQL vendor. No, the smart NoSQL developers simply noted that NoSQL stood for โNot Only SQL.โ If the masses misinterpreted the acronym, that was their problem.
This list of gripes, big and small, is thus an attempt to document this fact and to clear the air. Itโs meant to set things straight now so that we can do a better job understanding the trade-offs and the compromises.
NoSQL hard truth No. 1: JOINs mean consistency
One of the first gripes people have about SQL systems is the computational cost of executing a JOIN between two tables. The idea is to store the data in one and only one place. If youโre keeping a list of customers, you put their street addresses in one table and use their customer IDs in every other table. When you pull the data, the JOIN connects the IDs with the addresses and everything remains consistent.
The trouble is that JOINs can be expensive, and some DBAs have concocted complex JOIN commands that boggle the mind, turning even the fastest hardware to sludge. It was no surprise that the NoSQL developers turned their lack of JOINs into a feature: Letโs just keep the customerโs address in the same table as everything else! The NoSQL way is to store key-value pairs for each person. When the time comes, you retrieve them all.
Alas, people who want their tables to be consistent still need JOINs. Once you start storing customersโ addresses with everything else about them, you often end up with multiple copies of those addresses in each table. And when you have multiple copies, you need to update them all at the same time. Sometimes that works, but when it doesnโt, NoSQL isnโt ready to help with transactions.
Wait, you say, why not have a separate table with the customerโs information? That way there will only be one record to change. Itโs a great idea, but now you get to write the JOIN yourself in your own logic.
NoSQL hard truth No. 2: Tricky transactions
Letโs say youโre OK to live without JOINing tables because you want the speed. Itโs an acceptable trade-off, and sometimes SQL DBAs denormalize tables for just this reason.
The trouble is that NoSQL makes it hard to keep the various entries consistent. There are often no transactions to make sure that changes to multiple tables are made together. For that, youโre on your own, and a crash could ensure that tables turn inconsistent.
The earliest NoSQL implementations thumbed their nose at these transactions. They would offer data listings that were consistent, except when they werenโt. In other words, they went after the lowest-value data where errors wouldnโt make any material difference.
Now some NoSQL implementations offer something approaching a transaction. Oracleโs NoSQL product, for instance, offers transactional control over data written to one node and lets you choose a flexible amount of consistency across multiple nodes. If you want perfect consistency, you have to wait for each write to reach all nodes. Several other NoSQL data stores are experimenting with adding more structure and protection like this.
NoSQL hard truth No. 3: Databases can be smart
Many NoSQL programmers like to brag about how their lightweight code and simple mechanism work extremely quickly. Theyโre usually right when the tasks are as simple as the insides of NoSQL, but that changes when the problems get harder.
Consider the old challenge of a JOIN. Once NoSQL programmers start generating their own JOIN commands in their own logic, they start to try to do this efficiently. SQL developers have spent decades developing sophisticated engines to handle JOIN commands as efficiently as possible. One SQL developer told me he was trying to synchronize his code with the spinning hard disk so that he would request data only when the head was just above the right spot. This may seem extreme, but SQL developers have been working on similar hacks for decades.
Thereโs no doubt that programmers spend days pulling out their hair trying to structure their SQL queries to take advantage of all of this latent intelligence. It may not be simple to tap, but when the programmer figures it out, the databases can really sing.
A sophisticated query language like SQL always has the potential to outshine an unsophisticated query language like those found in NoSQL. It may not matter with simple results, but when the action becomes complex, the SQL is being executed on the machine right next to the data. It has little overhead fetching the data and doing the work. A NoSQL server usually has to ship the data to where itโs going.
NoSQL hard truth No. 4: Too many access models
In theory, SQL is supposed to be a standard language. If you use SQL for one database, you should be able to run the same query in another compliant version. This claim may work with a few simple queries, but every DBA knows that it can take years to learn the idiosyncrasies of SQL for different versions of the same database. Keywords are redefined, and queries that worked on one version wonโt work with another.
NoSQL is even more arcane. Itโs like the Tower of Babel. Since the beginning, NoSQL developers have each tried to imagine the best language possible, but they have very different imaginations. This hotbed of experimentation is good โ until you try to jump between tools. A query for CouchDB is expressed as a pair of JavaScript functions for mapping and reducing. Early versions of Cassandra used a raw, low-level API called Thrift; newer versions offer CQL, an SQL-like query language that must be parsed and understood by the server. Each one is different in its own way.
Each tool doesnโt just have its own idiosyncrasies, it sports an entirely different philosophy and way of expressing it. There are no easy ways to switch between data stores and youโre often left writing tons of glue code just to give yourself the option of switching in the future. This may not be too difficult when youโre stuffing pairs of keys and values into the system, but it can grow increasingly aggravating the more complexity you introduce.
NoSQL hard truth No. 5: Schema flexibility is trouble waiting to happen
One of the great ideas from the NoSQL model is not requiring a schema. In other words, programmers donโt need to decide in advance which columns will be available for each and every row in a table. One entry may have 20 strings attached to it, another may have 12 integers, and another might be completely blank. The programmers can make the decision whenever they need to store something. They donโt need to ask permission of the DBA, and they donโt need to fill out all the paperwork to add a new column.
All that freedom sounds intoxicating, and in the right hands it can speed development. But is it really a good idea for a database that might live through three teams of developers? Is it even workable for a database that might last beyond six months?
In other words, the developers might want the freedom to toss any old pair into a database, but do you want to be the fifth developer to come along after four have chosen their own keys? Itโs easy to imagine a variety of representations of โbirthday,โ with each developer choosing his or her own representation as a key when adding a userโs birthday to an entry. A team of developers might imagine almost anything: โbday,โ โb-day,โ โbirthdayโ.
The NoSQL structure offers no support to limit this problem because that would mean reimagining the schema. It doesnโt want to harsh on the mellow of the totally cool developers. A schema would get in the way.
The fact is that adding a column to a table isnโt a big deal, and the discipline might actually be good for the developer. Just as it helps to force developers to designate variable types, it also helps to force developers to designate the type of data attached to a column. Yes, the DBA may force the developer to fill out a form in triplicate before attaching that column, but itโs not as bad as dealing with a half-dozen different keys created on the fly by a programmer.
NoSQL hard truth No. 6: No extras
Letโs say you donโt want all of the data in all of the rows, and you want the sum of a single column. SQL users can execute a query with the SUM operation and send one โ just one โ number back to you.
NoSQL users get all of the data shipped back to them and can then do the addition themselves. The addition isnโt the problem because it takes about the same amount of time to add up the numbers on any machine. However, shipping the data around is slow, and the bandwidth required to ship all that data can be expensive.
There are few extras in NoSQL databases. If you want to do anything but store and retrieve data, youโre probably going to do it yourself. In many cases, youโre going to do it on a different machine with a complete copy of the data. The real problem is that it can often be useful to do all of the computation on the machine holding the data because shipping the data takes time. But tough for you.
NoSQL solutions are emerging. The Map and Reduce query structure from MongoDB gives you arbitrary JavaScript structure for boiling down the data. Hadoop is a powerful mechanism for distributing computation throughout the stack of machines that also holds the data. It is a rapidly evolving structure that offers rapidly improving tools for building sophisticated analysis. Itโs very cool, but still new. And technically Hadoop is an entirely different buzzword than NoSQL, though the distinction between them is fading.
NoSQL hard truth No. 7: Fewer tools
Sure, you can get your NoSQL stack up and running on your server. Sure, you can write your own custom code to push and pull your data from the stack. But what if you want to do more? What if you want to buy one of those fancy reporting packages? Or a graphing package? Or to download some open source tools for creating charts?
Sorry, most of the tools are written for SQL databases. If you want to generate reports, create graphs, or do something with all of the data in your NoSQL stack, youโll need to start coding. The standard tools come ready to snarf data from Oracle, Microsoft SQL, MySQL, and Postgres. Your data is in NoSQL? Theyโre working on it.
And theyโll be laboring on it for a bit. Even if they jump through all of the hoops to get up and running with one of the NoSQL databases, theyโll have to start all over again from the beginning to handle the next system. There are more than 20 different NoSQL choices, all of which sport their own philosophy and their own way of working with the data. It was hard enough for the tool makers to support the idiosyncrasies and inconsistencies in SQL, but itโs even more complicated to make the tools work with every NoSQL approach.
This is a problem that will slowly go away. The developers can sense the excitement in NoSQL, and theyโll be modifying their tools to work with these systems, but it will take time. Maybe then theyโll start on MongoDB, which wonโt help you because youโre running Cassandra. Standards help in situations like this, and NoSQL isnโt big on standards.
NoSQL shortcomings in a nutshell
All of these NoSQL shortcomings can be reduced to one simple statement: NoSQL tosses away functionality for speed. If you donโt need the functionality, youโll be fine, but if you need it in the future, youโll be sorry.
Revolutions are endemic to tech culture. A new group comes along and wonders why the last generation built something so complex, and they set out to tear down the old institutions. After a bit, they begin to realize why all of the old institutions were so complex, and they start implementing the features once again.
Weโre seeing this in the NoSQL world, as some of the projects start adding back things that look like transactions, schemas, and standards. This is the nature of progress. We tear things down only to build them back again. NoSQL is finished with the first phase of the revolution and now itโs time for the second one. The king is dead. Long live the king.
Related articles
- NoSQL standouts: New databases for new applications
- First look: Oracle NoSQL Database
- Flexing NoSQL: MongoDB in review
- 10 essential performance tips for MySQL
- 10 essential MySQL tools for admins
- Master MySQL in the Amazon cloud
This story, โ7 hard truths about the NoSQL revolution,โ was originally published at InfoWorld.com. Follow the latest developments in data management at InfoWorld.com. For the latest developments in business technology news, follow InfoWorld.com on Twitter.


