Which document-oriented database is right for your app? Follow this guide to the most developer-friendly NoSQL databases
โThe right tool for the right job.โ If such wisdom holds true anywhere, it certainly holds true with the choice of database a developer picks for a given application. Document databases, one of the family of data products collectively referred to as โNoSQL,โ are for developers who want to focus on their application rather than the database technology.
With a document database, data is not stored in tables with distinct column types. Instead, it is stored in freeform โdocumentsโ with any number of fields and any number of nested structures. Such documents are typically represented as JSON, and updated either by way of APIs or by sending JSON to a REST endpoint. Most every modern programming language supports JSON and REST, so working with a document database feels more like working natively with those data structures than working with a traditional database.
This schemaless design, as it is called, has its limitations. A developer must do more work to ensure that inserted data is consistent, because such consistency isnโt always guaranteed by the database itself. SQL, the standard-issue and widely understood language for database work, isnโt supported by most document databases, so those with existing database expertise must start from scratch. But the convenience, speed, scalability, and versatility of a document database is hard to beat when youโre writing an application that needs a protean, free-form data structure.
Here weโve profiled seven of the best known and most widely used document databases.ย Four of the sevenโCouchDB, Couchbase Server, MongoDB, and RethinkDBโare open source projects with few or no practical barriers to getting started; Couchbase and MongoDB are also available in supported enterprise editions under commercial licenses. The other threeโAmazon DynamoDB, Google Firebase, and IBM Cloudantโare hosted services from major cloud vendors, where close integration with other services in those clouds is a big draw.
See the table below to compare features; scroll right in the table to see all columns, using the scrollbar at bottom. Read on for brief discussions of each database.
Amazon DynamoDB
Amazonโs DynamoDB document store began life in 2012 as an extension of Amazonโs SimpleDB. Under the hood it is powered by a key-value store, Dynamo. A co-developer of DynamoDB would later draw on many of the same ideas to create Apache Cassandra.ย
DynamoDB features
Like most of Amazonโs other cloud offerings, DynamoDB is a pay-as-you-go-for-what-you-need managed service. Developers set how much storage capacity to provide for keeping either unstructured documents or key-value pairs, and choose a flat hourly rate limit for read and write requests to the database. No need to provision servers or configure replicationโAmazon handles all of that under the covers, and recently added autoscaling to the mix.
Naturally, DynamoDB offers developers useful integrations with other services in the Amazon cloud. Triggers, for instance, can be set up by way of AWS Lambda functions. Amazonโs BI and analysis tools are also nearby. ย The proximity to these services is convenient, but it also means Amazon can upsell functionality any number of ways. Caching and acceleration a la Redis, for instance, are available by way of the DynamoDB Accelerator, a cost-plus add-on.
DynamoDB Local
You wonโt find DynamoDB in an open source incarnation. Itโs available exclusively as a hosted offering on the Amazon cloud.
That said, unlike many other cloud-native databases, DynamoDB is also available in a version that can be downloaded and run locally. Butย DynamoDB Local is not intended for production use, but rather as a way to stage an application in a test environment without requiring connectivity or running up an Amazon bill.
Microsoft Azure Cosmos DB
Cosmos DB is an ambitious project, a database system that encompasses multiple models for storing and retrieving data. Cosmos DB can serve as a document database, a columnar database, a graph database, or a key-value store, allowing the user to pick the paradigm that suits them and draw on various APIs for working with those paradigms.ย
Cosmos DB features
Rather than invent an entirely new API for a document database system, Cosmos DB provides an API compatible with the popular MongoDBย (discussed below). Among the benefits is that existing code that uses MongoDB interface libraries or MongoDBโs binary wire protocol can work as-is. It amounts to Cosmos DB being able to provide MongoDB as a service. Likewise, Cosmos DB supports the API of Cassandra, the popular column-family database.ย
Microsoft touts several advantages to Cosmos DB that arenโt necessarily exclusive to its document database functionality, but are intended to appeal to those building document database applications. One such offering is tunable consistency levels. If you have some classes of document transactions that require stronger consistency across Azure regions than others, you can manually specify them on a per-transaction basis.
Other features are more specific to document databases.ย For instance, MongoDB users have to set up indexes on document collections to optimize searches.ย Cosmos DB users working with the MongoDB APIsย donโt have to set up indexing for documents, as every property in an inserted documented is automatically indexed.
Using Cosmos DB on Microsoft Azure
Thereโs no locally hosted version of Cosmos DB. Itโs only available as a service in the Microsoft Azure cloud. That said, development APIs for Cosmos DB are available for most every popular enterprise languageโJava, Node.js, .NET, and Python.
Couchbase Server
Couchbase is not so much sibling to CouchDB as successor. Couchbase was built on work done in CouchDB and Membase, but is not related to either of those projects. Itโs a document database and distributed key-value store rolled into one, with advanced features like automated failover and cross-datacenter replication, intended for enterprise use cases.
Couchbase features
One feature that sets Couchbase apart, not just from other NoSQL competition but from its predecessor CouchDB, is itsย SQL-like query language called N1QL (pronounced โnickelโ). N1QL doesnโt offer the full range of commands you would expect from an ANSI SQL implementation, but it provides enough useful functions, such as JOIN operations, for someone with SQL experience to get workable results.
The Couchbase query system is not just for developers, but for the DBAs and business analysts who normally deal with conventional databases. Features like the EXPLAIN keyword seem to have been put in specifically to appeal to that crowd.
As a combination document database and key-value store, Couchbase stores documents by using their unique identifiers as the key. Documents can also be assigned time-to-live values, to function like a key-value cache. That said, a true key-value caching system like Redis will be far faster for basic key-value storage, but Couchbase is more flexible, and Redis and Couchbase can be combined effectively to speed things up. On that note, Couchbase has native support for the Memcached protocol, so existing applications that use Memcached can plug into Couchbase as a substitute.
Couchbase Community vs. Enterprise
Couchbase Server comes in a full-blown for-pay enterprise edition, a free-to-use community edition, and an open source edition, which is the foundation for the others. Binary downloads for the enterprise and community edition are available from Couchbaseโs site, and the source code is available from Couchbaseโs developer site. (There is no one GitHub repository for the Couchbase open source project as it is an aggregation of several projects.)
The community edition can be deployed in production, but lacks the more advanced features the enterprise edition as well as support, so non-buyer beware. Some features in Couchbase, such as its horizontal scaling functionality, have found their way into the CouchDB project, but that is more the exception than the rule.
Couchbase Lite
Another edition of Couchbase worthy of note for app developers is Couchbase Lite, an embeddable version of Couchbase that can synchronize with instances of the full-blown edition. Couchbase Lite is the key component in Couchbase Mobile, an application stack for mobile apps that need a data store that synchronizes automatically with a back end. Couchbase Mobile is available for iOS, Android, Java. .Net, MacOS, and tvOS.
CouchDB
The CouchDB project was begun in 2005 by a former IBM developer and moved to the Apache Software Foundation in 2008. It is sometimes assumed that CouchDB is the basis for Couchbase, but CouchDB and Couchbase are parallel projects with different aims.
CouchDB vs. Couchbase
Whereas Couchbase is both a document database and a key-value store, CouchDB is strictly a document database. And while Couchbase has long focused on enterprise features such as fault tolerance and a SQL-like query language, such niceties are only beginning to arrive in CouchDB.
CouchDB features
CouchDB emphasizes simplicity of deployment and ease of use. Retrieving data from the database is as simple as sending JSON-formatted queries to a REST HTTPS endpoint, with the results returned in JSON. Most every modern programming language can do these things, and also perform the mapping and reducing needed to create the views behind CouchDB queries and reports. There is no need for an ODBC driver or a data connector.
One of CouchDBโs special sauces is its data reconciliation technology. Changes made to one CouchDB peer are automatically reconciled with others, in a manner akin to a version control system. Any conflicts between document versions are retained as if they were previous revisions to that document.
This eventually consistent model is useful for databases that arenโt always or consistently connected (such as for intermittently connected mobile applications), or in cases where you donโt need the latest-and-greatest version of data in a particular node. But eventual consistency is also one of CouchDBโs biggest caveats. If you do need immediate consistency, CouchDB is not the place to find it.
Scalability has long been a weak spot for CouchDB, but it has recently been addressed. Version 2.0 stirred in a new clustering technology, courtesy of bits open sourced by Cloudant/IBM and merged into the project. Finally, for those who are familiar with MongoDB and want to use a similar declarative query syntax, the Mango project, also from Cloudant/IBM, provides that as an external add-on.
CouchDB download
CouchDB binaries for all major platforms, and source code, can be downloaded from the official CouchDB site. Source for the project is available on GitHub as well.
Google Firebase Realtime Database
You might think of Google Firebase as Googleโs answer to DynamoDBโa way to provide fast-syncing data storage between a cloud back-end and local apps on multiple platforms.
The Firebase Realtime Database is just one component in the Firebase stack,ย intended for building apps heavy on audience engagement and insight. The whole stack includes functions like authentication, performance monitoring, user analytics, and many others, but here we focus on Firebase itself.
Google Firebase features
Google acquired Firebase in 2014. In the years since, it has wired up Firebase to take advantage of many Google Cloud features. Google Cloud Functions for Firebase, for instance, allows you to trigger JavaScript functions in the cloud in response to Firebase events. Google Analytics for Firebase lets you pull mobile app data into BigQuery for deeper analysis.
As gaming is one of Firebaseโs target applications, the SDKs provided for Firebase include the Unity cross-platform game development framework. Developers working on more conventional enterprise-focused or consumer-facing projects have plenty of other choices: native iOS and Android, C++, generic web/JavaScript, and any other language that supports REST (Java, Python, you name it).
Firebase is designed to work in scenarios where connectivity isnโt guaranteed. Like CouchDB, it caches changes locally when offline, and automatically synchronizes with the back end when connectivity returns. Note that Firebase isnโt designed to be used as a standalone, entirely offline solution; on Android, for instance, local databases are limited to 10 MB in storage.
Firebase on Google Cloud and GitHub
Firebase isnโt available as a standalone product, but is only available as part of Googleโs cloud products offerings. The Firebase GitHub repository has source code for the SDKs and for various platform-specific tools.
IBM Cloudant
Cloudant is essentially IBMโs hosted edition of CouchDB. Originally, Cloudant was an independent company, offering an edition of CouchDB called โBigCouchโ that was hosted on IBMโs SoftLayer cloud. In 2014, IBM acquired Cloudant outright as part of IBMโs overall push towards analytics and big data.ย
Cloudant vs. CouchDB
Cloudant is meant to be more than a hosted version of CouchDB. Cloudant provides features not readily available in CouchDB itself, such as natively integrated full-text search. Full-text search in CouchDB typically requires integration with external projects. Data can be replicated in both directions between Cloudant and an instance of CouchDB, so itโs relatively easy to move between either one as needed.
Some of Cloudantโs improvements to CouchDB have found their way back into the underlying CouchDB project, including CouchDB 2.0โs horizontal scaling functionality and the Mango query language interface. But donโt take that as proof that Cloudant features will automatically trickle down to CouchDB.
Cloudant on IBM Cloud
Cloudant is primarily a cloud offering onย IBM Cloud, where it can be used in conjunction with other IBM Cloud data products such as dashDB, DataWorks, and Watson Analytics.
Cloudant Local
A behind-the-firewall edition of Cloudant, called Cloudant Local, offers all of the same functionality as the cloud-hosted offering. Cloudant Local is available on the Ubuntu and Red Hat flavors of x86 Linux, as well as IBMโs own System z running Red Hat or Suse. Developers can download a free, test-and-dev-only version in a Docker image.
MarkLogic
MarkLogic might be described as the Oracle, DB2, or Microsoft SQL Server of NoSQL. MarkLogic is a proprietaryย NoSQL document database system, with an emphasis on performance and professional-grade features,ย and with a few concessions to allow developers to get quickly on board without going broke.
MarkLogic indexing
Like other document databases, MarkLogic can ingest and perform queries on semi-structured data, mainly in JSON and XML format. All document types can be indexed with aย wide variety of indexing and search strategies. A โContent Processing Frameworkโ provides automated tools for transforming documents outside the systemโMicrosoft Office, Adobe PDF, HTML, and so onโinto MarkLogic data.
By default MarkLogic builds โuniversal indexes,โ or document indexes that returnย the most common types of search or aggregationโindividual words and phrases, or values of XML elements or JSON properties.
MarkLogic also lets developers specify indexes for more advanced searches. One example isย range indexes. Range indexes allow searches for values, like prices or dates, that exist in a range between other values or that can be found in proximity to other values.ย These in turn allow functionality like performing joins between documents, or running SQL queries on views constructed from document data.
MarkLogic storage
Documents are stored directly on disk in MarkLogic, but an admin can use the tiered storage feature to organize where documents live based on their access patterns. Constantly changing documents can be kept on an SSD, while archival data can be kept in a NAS or an Amazon S3 bucket.
Storage tiers are decoupled from MarkLogicโs other organizational schemes, so you can create a single database with components that span multiple storage tiers.
MarkLogic XML schemas and XQuery
For XML documents in MarkLogic you can specify a formal schema if you need one, although not for JSON documents. Schemas can be used to perform queries on XML data by way of the W3C standard XQuery language. MarkLogic provides a console environment, the Query Console, to allow interactive execution of XQuery commands on a MarkLogic database.
MarkLogic download
MarkLogic is commercial software that requires a license for production use. Aย free developer licenseย allows users to work with up to 1TB of data, in both cloud and local instances of the software, for development and testing only.
MongoDB
MongoDB is easily the most widely deployed document database, and the best-known among the developer community. It embodies most of the key concepts found in document databases and NoSQL systems generally: schemaless storage, a scale-out architecture, and a shared-nothing design.
MongoDB Enterprise vs. Community
The open source edition of MongoDB already includes the vast majority of the features needed to gin up a basic production deployment. Commercial licenses add key enterprise features including backup, automation extensions, monitoring, data exploration tools, a BI connector with SQL support, and an in-memory storage engine.
The enterprise features in MongoDB have been tilting towards drawing enterprise developers away from the likes of Oracle, as seen by the recent addition of in-memory processing, a SQL-like interface through third-party data exploration and BI tools like Tableau, and the ability to perform recursive graph queries on document data. Graph queries are useful for exploring open-ended chains of relationships, such as family trees and social networks.
MongoDB data loss
MongoDB has also been the target of much criticism. Some of it has stemmed from developers not properly understanding the productโs aims and methodologies. But some of it has been due to real problems such as dirty reads and stale reads, lost updates, and the inability to sort by collations, a serious limitation when dealing with Unicode documents. Note that all of these issues were addressed in MongoDB 3.4.
MongoDB security
Another major issue has been security, with misconfigured and publicly exposed MongoDB instances being attacked and held for ransom, but that can be remedied by paying due diligence to MongoDBโs security settings before putting it into production.
MongoDB downloads
MongoDBโs Community Server version, the free and open source edition of MongoDB, is available for download directly from MongoDBโs official site. Source code for MongoDB is available on GitHub. The Enterprise Server edition is also available for download directly from MongoDB, and can be used as-is for evaluation and testing free of charge.
RethinkDB
The story behind RethinkDB is as interesting as the project itself. RethinkDB was originally conceived as a commercial product with an open source (AGPL) licensed version, but the company behind the database failed. The Cloud Native Computing Foundation came to the rescue, purchasing the intellectual property for the project and donating it to the Linux Foundation. Now RethinkDB has a second life under a more liberal open source license and the sponsorship of a major player in the open source space.
RethinkDB features
The big innovation behind RethinkDB is a built-in change notification system that streams live updates to applications. In the words of its introductory documentation, โinstead of polling for changes, the developer can tell the database to continuously push updated query results to applications in realtime.โ Thus RethinkDB simplifies the development of real-time applications like multiplayer games, albeit at the cost of overall ACID compliance. Individual documents in the database are handled in a transactional way, but the state of the database as a whole is only eventually consistent.
RethinkDB lacks native support for SQL, but includes a querying system called ReQL, which is implemented by way of the native syntax in Python, JavaScript, Ruby, and Java. ReQL uses chained dot commandsย to allow developers to construct complex, lazily evaluated expressions in the language of their choice. For example:ย
r.table(โusersโ).pluck(โlast_nameโ).distinct().run(conn)
When changes to documents land in RethinkDB, theyโre made available through a โchangefeed,โ a log that can be parsed by applications to derive details about the changesโfor instance, when the data in question is new or an altered version of existing data. ReQL expressions are used to create what amount to callback functions for handling changefeed events (essentially, triggers). They can also define relationships between data entities by way of mechanisms that emulate table joins.
RethinkDB download
Binaries for RethinkDB are available from the official RethinkDB project page.ย Official Docker images can be pulled from Docker Hub. The projectโs source is available on GitHub.
Further reading
- Review: MongoDB learns cool new tricks
- The essential guide to MongoDB security
- CouchDB 2.0 adds clustering and an easier query language
- Review: MongoDB Atlas shoulders the ops burden
- Couchbase 4.0 review: The Swiss Army knife of NoSQL
- Review: RethinkDB rethinks real-time Web apps
- Build real-time Web apps with RethinkDB
- N1QL brings SQL to NoSQL databases
- First look: Couchbaseโs new SQL for NoSQL


