Andrew C. Oliver
Contributing Writer

Lies your database is telling you

analysis
Mar 10, 20165 mins

A wise person once said time is a device invented to keep everything from happening at once. Jonas Boner explains how the database world has abused time from the beginning

Crossing fingers
Credit: Thinkstock

Recently, Typesafe-turned-Lightbend CTO Jonas Boner has been giving a presentation attacking the general model used for most database systems. He takes aim at CRUD or the ACID model used to achieve CRUD. Such critiques are scarcely unusual, yet Bonerโ€™s argument was uncommonly brilliant.

He began withย double-entry accounting, whichย was born as early as the seventh century and still pervades every major business in the world. Simpler alternatives are rarely considered because the disadvantages are too great. The basic premise of double-entry accounting is you canโ€™t change the past, only correct the present.

Database developers, however, thought they knew better, so they created the software equivalent of a time machine: The update statement and its awful cousin the delete statement. To be fair, when those statements were invented, a 5MB hard drive had to be loaded with a forklift, so other structures werenโ€™t necessarily feasible.

The database developers who had those great notions must have missed some of the best/worst episodes of โ€œStar Trek,โ€ where you find out that time travel is generally a bad idea. With updates, you get concurrency control, mutexes, transactions, and other constructs that try to mitigate the negative effects of attempting to modify the same state while dealing with more than one thing happening at a time.

Now, there is an alternative: โ€œinsert onlyโ€ structures. The trouble with those โ€” besides generating more instances, rows, attributes, or documents (like double-entry accounting) โ€” is that you never have a โ€œconsistent viewโ€ of the data. Boner asserts that this is OK because the consistent view is nothing more than a convenient fiction you have inconveniently created at the expense of adding more latency to your operational system.

According to Boner, not only is time an illusion, so is the present. It seems absurd, right? Now is the present. However, by the time you got to the end of that sentence to cognate what you read, it was no longer true. If you try to mentally hold on to the present in more than a general sense, you find that you canโ€™t because the present is no more than a pointer that is always moving.

When we get to the level of larger data sets, however, determining totals โ€œright nowโ€ is at the very least laborious in an insert-only structure. The โ€œlocal presentโ€ is a set of โ€œfacts derived from multiple concurrent pasts.โ€ That is, if you look at all of the states that โ€œwereโ€ generated in the system up until โ€œnow,โ€ you can arrive at a conclusion as to the state or value of now.

Meanwhile, when you try to discover this โ€œright nowโ€ state, you may find you donโ€™t have all of the information. In fact, you find that Donald Rumsfeld might have insight for you. Not only do you have known unknowns, but you have unknown unknowns. Why? Information has latency. There are facts you donโ€™t have yet. Even when we try and force a consistent view of the world, we make things more latent somewhere else, and our operational system is less concurrent and lower scale.

How do we deal with information inconsistency and even information loss in the โ€œreal world?โ€ We infer from context (fill in the blanks), and we attempt to confirm, wait, and repeat operations as new facts come in. As with double-entry accounting, we try to take a โ€œcompensating actionโ€ to account for the times we are wrong.

According to Boner, the path forward is to treat time as a first-world construct instead of an unmodeled implied item. To do that, you canโ€™t go around โ€œchanging things,โ€ insert only or doing away with CRUD in favor of only CR. In other words, we make records or โ€œfactsโ€ immutable. This obviously goes all the way from the front end of the system to the storage.

A popular explanation of transactions uses the bank account. Assuming I have a bank account and you have a bank account and I want to transfer money to you, we open up a nice transaction (which locks both accounts) and subtract money from my account and add money to yours. If I donโ€™t have enough money the transaction rolls back. If your account canโ€™t receive the money it rolls back. This allows us to know exactly how much money we have in our bank accounts at any given time.

The only problem with this analogy is that no banking system has ever worked or will ever work this way. What do banks use? They use credits, debits, and compensating transactions. There are financial exchanges โ€œin flight,โ€ which are in various states of completion. If something wrong happens, the bank takes compensating action. Even with a bank, the answer to the question โ€œHow much money do I have in my account right now?โ€ is a type of fiction told to make customers feel better.

Bonerโ€™s idea is to define โ€œconsistency boundariesโ€ that describe the time, place, and circumstances in which the answer we give is correct. Outside of those boundaries is chaos. This starts to look a lot more like physics than computing, but thatโ€™s a more honest approach.

That said, we have a long way to go before business and developers come to terms with how much lying they do to achieve a false sense of simplicity. The idea of โ€œstrong consistencyโ€ is ingrained in the minds of many. I mean, I recently had a client design an audit log that required updates.

If you havenโ€™t caught Bonerโ€™s โ€œLife Beyond the Illusion of Present,โ€ I highly recommend it. Iโ€™d take some of the prescription (obviously this is a long pitch for Lightbend/Akka) with a grain of salt, but the problem is well stated. From a person who has developed both strongly consistent and highly concurrent systems, I can say Bonerโ€™s talk makes me even less enamored of crusty old Oracle DBAs and their pre-seventh-century ways.