by Simon Bisson

Contributing Writer

How Cosmos DB ensures data consistency in the global cloud

analysis

May 30, 20176 mins

Microsoft’s new database provides three cutting-edge approaches designed for applications that span geographic regions

Cloud computing isn’t like working on-premises. Instead of limiting code to one or maybe two datacenters, we’re designing systems that span not just continents but the entire world.

And that’s where we start to get issues. Even using fiber connections, the latency of crossing the Atlantic Ocean is around 60ms, though in practice delays are around 75ms. The Pacific is wider, so latency through trans-Pacific fiber is around 100ms.

Delays add up, and they make it hard to ensure that distributed databases are in sync. That makes it harder still to be sure that a query in the U.K. will return the same result as one in the U.S. Yes, most replication strategies mean that eventually the two will have the same content, but there’s a big question over just when that will happen. If the connections are busy, or there a lot of database writes, data can easily get delayed.

Microsoft’s recently launched Cosmos DB aims to simplify distributed application development, with more ways to reduce latency and five consistency models. You’ll find much of it familiar if you used the Document DB service, because Cosmos DB takes its capabilities and adds more data models and more query options. Cosmos DB also adds API compatibility with common on-premises and in-cloud NoSQL services, with more on the way.

APIs define how Cosmos DB handles data and how it exposes its contents. Although you choose a default API during set up, you query your data using any supported API. So, you can start with the familiar MongoDB APIs, then switch to Gremlin for graph queries on your data.

How cloud services typically ensure data consistency

Most distributed cloud services take one of two approaches to ensuring data consistency:

One approach, strong consistency, doesn’t allow reads until all associated writes are complete. Any read always returns the latest version of an item. If you use strong consistency for your data, your application is only as fast as the latency among all the regions where you store data. That’s why Cosmos DB limits you to using only a single Azure region. It’s a belt-and-braces way of working with data, and it works well where writes matter—and where your app doesn’t need instant access to data. But that approach makes strong consistency impractical if there’s any requirement for near-real-time data.

The other approach, eventual consistency, is a lazier approach to working with data. It’s more focused on applications that need to read data as soon as it’s written. Data is read at any time, but there’s a risk that it could be changed by another write. That leaves you with a level of indeterminacy in your data: You know at some point it’s going to be accurate, just not when that will be. Data may arrive in any order, so don’t use eventual consistency for series data. There’s even the chance that if you read the same item twice, the second time could return older data than the first.

How Cosmos DB handles data consistency in the real world

But only a small percentage of Cosmos DB users will use either of these approaches. Instead, most will take advantage of three alternative consistency models, based on the work of Turing Award winner Leslie Lamport. That foundation let Microsoft create a database to handle more realistic situations, and deliver distributed applications without the penalties of traditional consistency models.

The first alternative consistency model, bounded staleness, gives you a point at which reads and writes are in sync. Before it, there’s no guarantee, after it, you’re always accessing the latest version (at least until the next time that item updates). You define the boundary as either a number of versions or a time interval. Outside the boundary, everything is consistent, inside it, there’s no guarantee of a read returning the latest data. The result is a store that has an element of strong consistency, while still giving you low latency and the option of global distribution and high reliability. You use this model if you want to be sure that all reads are consistent, wherever they are, and that all writes are fast. You also get data that’s correct if you read it in the region where it’s written.

The second alternative consistency model, session consistency, works well when you drive reads and writes from a client app. The client gets to read its own writes, while data replicates across the rest of the network. This way, you have low latency access to the data you need, along with knowing that you’ll fail over in the event of any downtime—and that your application will run in any Azure region.

Microsoft has added a third alternative consistency model, consistent prefix, in Cosmos DB. Consistent prefix adds predictability to the speed of eventual consistency. You might not see the latest write when you read the data, but your reads will never be out of order. That’s a useful feature, because it’s both fast and predictable. Write A, then B, and then C, and your client will see A, or A and B, but never just A and C. Eventually all the Cosmos DB regions will converge on A, B, and C, giving you speed and reliability.

Cosmos DB is a very different beast from much of its competition. Many NoSQL services offer some limited form of distributed access, but they’re aimed only at offering redundancy and disaster recovery. Others, like Google’s Spanner, offer some similar features, but only across datacenters in a single region. That might be fine if you’re working with a U.S.- or E.U.-only audience, but more and more cloud services have a global reach. Spanner’s low latency with strong consistency is a nice option to have, but it’s less valuable when cross-regional data replication becomes a major bottleneck.

The type of consistency you choose for Cosmos DB depends on your application. Are you focusing on writing data or reading it? How is the data used? Each consistency model has its pros and its cons, and you need to consider them carefully before making your choice. Session consistency is a good place to start for most app-centric data, but it’s worth experimenting with other options, especially if you don’t need instant access to data from all over the world.

by Simon Bisson

Contributing Writer

Author of InfoWorld's Enterprise Microsoft blog, Simon Bisson prefers to think of “career” as a verb rather than a noun, having worked in academic and telecoms research, as well as having been the CTO of a startup, running the technical side of UK Online (the first national ISP with content as well as connections), before moving into consultancy and technology strategy. He’s built plenty of large-scale web applications, designed architectures for multi-terabyte online image stores, implemented B2B information hubs, and come up with next generation mobile network architectures and knowledge management solutions. In between doing all that, he’s been a freelance journalist since the early days of the web and writes about everything from enterprise architecture down to gadgets. He is the author of Azure AI Services at Scale for Cloud, Mobile, and Edge: Building Intelligent Apps with Azure Cognitive Services and Machine Learning.

Show me more

Topics

About

Policies

Our Network

More

How Cosmos DB ensures data consistency in the global cloud

Microsoft’s new database provides three cutting-edge approaches designed for applications that span geographic regions

How cloud services typically ensure data consistency

How Cosmos DB handles data consistency in the real world

More from this author

Vibe coding with GitHub Spark

Up and running with Azure Linux 3.0

Wassette: A bridge between Wasm and MCP

Getting started with A2A in .NET

Managing Azure VMs with Project Flash

Microsoft’s action-focused small language model Mu

Taking .NET Aspire for a spin

NLWeb: Tapping MCP for natural language web search

Show me more

Databricks adds Data Science Agent to automate analytics tasks

Rust Innovation Lab launched, sponsors first project

PostgreSQL 18 to boost OLTP performance, but misses AI readiness

Getting encryption wrong (and getting it right, too)

How to build a native desktop app vs. a web UI app

PyApp: Build click-to-run Python apps with Rust