The research behind the horizontally scalable, SQL-compatible database has spawned imitators, but Google's private network is the real secret sauce
Earlier this year, Google offered a peek at Cloud Spanner, an automanaged database service that melds features from both conventional relational systems and NoSQL technologies.
Today, Google announced Cloud Spanner will be available to the general public later this month. It will compete not only with rival cloud databases, but also up-and-coming open source projects that address scale and reliability issues by using Googleβs own ideas.
The best of both worlds
Google presents Cloud Spanner as a happy medium between two common database needs that often prove incompatible. A database can be highly scalable and distributed (the NoSQL approach), or it can be transactionally consistent (the conventional database approach). Cloud Spanner aims to be both.
As laid out in a 2012 research paper, one key to accomplish this is a time synchronization mechanism for actions that need to be kept consistent between nodesβsuch as globally consistent read operations, which people expect from a transactional database.
This sync mechanism takes into account the potential differences between timestamps provided by different machines in the cluster and can βwait outβ the differences if they are too large. But the system also tries to keep uncertainty to a minimum by drawing on multiple time sources to increase clock accuracy. As a result, itβs easier to get operations spread across multiple nodes (for example, MapReduce) to agree on when something was achieved and to deliver consistent results.
In a white paper published earlier this year, Google talked about another key element: How Cloud Spanner leverages Googleβs own network. Of the three characteristics that are most desired from a distributed systemβconsistency, availability, and tolerance for splits between nodesβCloud Spanner tries to deliver all three by making slight but often undetectable sacrifices to availability, aided by the fact that the service runs on Googleβs own highly redundant network.
A little more scale, a little less SQL
The actual database Google has created from this technology strongly resembles other cloud-hosted transactional databases, but with some potentially irksome differences.
First, Cloud Spanner is advertised as having support for ANSI 2011 SQL queries. The documentation shows this is true for SELECT queries; they support all the familiar SQL syntax, including JOIN and GROUP BY. But INSERT and UPDATE commands are not available; according to a blog post at Quizlet, which used Cloud Spanner in beta, you need to use βRPCs for mutating rows given their primary keyβ instead. Some of this is made easier through Cloud Spannerβs language and interface support, as it provides libraries for Go, Java/JDBC, Node.js., and Python, as well as support for REST calls.
Cloud Spannerβs other touted advantage is scale and availability.Β The database autoscales based on demand, with pricing based on the number of nodes in use, storage needed on those nodes, and outbound bandwidth consumed. Right now the size of a database influences the number of nodes required to deploy it; every 2TB of database storage requires at least oneΒ nodeΒ to support it.
Imitation and flattery
Cloud Spannerβs promises are echoes of features in other database products, although Google is clearly hoping to compete broadly by offering a better amalgamation of features in one place.
Take autoscaling, for instance. Ex-Microsoftie Bob Muglia served up SnowflakeΒ as a cloud data-warehouse system that didnβt need to be tweaked or tuned. There, Google can almost certainly compete on pricing, as it has its own infrastructure, where Snowflake is implemented on Amazon.
Speaking of Amazon, it has a few products that could be competition. Aurora, for instance, is Amazonβs hosted version of MySQL, and it beats Googleβs MySQL offering for high-end work. It also has the advantage of being familiar and widely supported; thereβs barely a database developer who hasnβt touched MySQL at some point. But again, Googleβs hope is that Cloud Spanner will compete by offering better scale across the board, including for write operations and not only reads.
Then thereβs CockroachDB, which is approaching its first full 1.0 version. This open source database project is an implementation of the ideas in Googleβs Spanner paper, in much the same way Googleβs paper on MapReduceΒ inspired Hadoop.
Where Google wants to stand out, though, is in the execution. That explains the white paper professing how it isnβt only the time-synchronization functions that makes Cloud Spanner special, but also Googleβs tight control over the networking between nodes. It might be possible for another cloud to implement that through a CockroachDB-based service, but Googleβs counting on first-mover advantageβand all the major back-end resources it can work withβto make an impression.


