Getting your teeth cleaned. A trip to the DMV. Doing your taxes. Data Caching.
Painful, yet unavoidable, these are a few of life’s most frustrating phenomena. They are types of things that you dread for weeks and cause your blood pressure to skyrocket just thinking about them. While each of these topics really do deserve a blog post in their own right, today I am going to focus on one that is near and dear to my heart: data caching.
The Pain Of Caching: Same As It Ever Was
Much like paying taxes, the pain around data caching is hardly new. In fact, it is one of the oldest problems in computer science. For virtually as long as people have been writing software, the need for data structures designed to facilitate performant data access has been a foundational part of software architecture. Unfortunately, it has also been one of the most painful, inspiring the famous quote from Phil Karlton “There are only two hard things in Computer Science: cache invalidation and naming things.” (It turns out naming blog posts is not much easier). For all of the progress we have made with regards to the DX and approachability of software development over the past few decades, you would think that something as foundational as data caching would be more or less a solved problem. But, you’d be wrong. The pain of caching data today is as true as it ever was.
In fact, it’s gotten worse, driven largely by an explosion of data and ever increasing demand for performance from users. In an intense battle for user attention, the difference between a “snappy” user experience and sluggish one can have a real impact on a company's bottom line. This is particularly true in areas like e-commerce, where slow page loads have been shown to have a major effect on the likelihood of purchase. Worse yet, this problem is not constrained to just the region the company is incorporated in. An increasingly global user base means developers need to solve this problem not in just one geography, but globally.
Caught Between a Cache and a Hard Place
Getting this right is a pretty gnarly technical challenge, and the common solutions are not pretty. When assigned with the task of improving database read performance, the modern application developer finds themselves caught between two equally painful options:
- Use a caching solution such as Redis or Memcached. These technologies are quite good at solving the performance and scale challenges mentioned above, but where they fall short is in their user experience and cost of maintenance. They force the developer to do a lot of things manually, which ends up requiring a lot of complicated glue code that maintains the relationship between the cache and the underlying data store. This makes adopting a caching layer a pretty heavy lift for an application team, and it might take months to get a project up and running. Even worse is what happens once you do get it up and running. The maintenance of cache invalidation and translation logic is known to be painful and error prone, and often results in heisenbugs that are very difficult to track down after the fact and create an enormous tax on developer time and overall well being. It’s just not particularly fun work – no software developer goes to work dreaming that they get to maintain the translation between a database table and the data formats of Redis, and the problem of making sure that the data cache is consistent with the data in the underlying database is a brutal one.
- Use database read replicas. Read replicas have the advantage of being “easier” to implement in the short term, which allows you to get started with them a bit quicker than option 1. But, they can be enormously painful at scale. There are a few reasons for this, and the combination of these factors mean read replicas could be easier up front, but potentially more painful in the long run:
- They can be very expensive. In general, a read replica needs to have the same amount of compute and storage resources as the underlying “master database” in order to work effectively. This can get very expensive very fast. In the worst case, it increases your database costs by 5x! This is probably the biggest drawback of read replicas.
- They come with a lot of overhead. Unlike a cache, which circumvents the database more or less entirely (at least in cases where you are caching data that does not change much), a read replica is still a “database hit” and comes with all of the associated overhead and competition for resources. If your database is under a lot of load when you try to read from a replica, things might still be very slow.
- They are prone to replication lag. This is a situation where the data in the read replica substantially “lags” behind the source of truth in the core database, giving the user a view of the world that is not quite accurate. Small lags are generally okay for many read-heavy use cases, but longer lags can be problematic.
Clearly neither of these options are particularly ideal. But what if it didn’t have to be this way? What if there was a solution that combined the best of both worlds, merging the ease of use and architectural simplicity of creating a read replica with the blazing performance improvements of a caching layer. What a world that would be.
Around The World In 60 (Milli)Seconds
Luckily for us, there were some PhDs on the case. While at MIT, Jon Gjengset, Alana Marzoev, and their CSAIL lab mates spent their time and immense brain power on this exact problem. The result was Noria. Named after a type of wheel used to lift water from a river into an aqueduct so that it could be distributed across the land, the goal of the project was to do much the same for data. A simple, streamlined caching layer that hooks directly into your existing RDMS with node code changes, Noria enables developers to achieve tens of millions of reads per second without the need for complex translation code. It’s all just SQL. This not only removes the painful upfront cost of translation code, it also makes debugging 10x easier. You can debug issues in Noria the same as you would debug any SQL query.
Though Noria was more of a research concept than production software, the response to the approach was overwhelmingly positive. Despite 0 marketing and very little maintenance over the past few years, the project has garnered over 4000 stars and 200 forks. It’s also written in Rust, which I mention here just because that's pretty neat.
Seeing the organic response to Noria, Alana knew this approach needed to be productionized and brought to the masses. And so, in 2020, ReadySet was born to do just that. Built on the foundation that Noria laid down, ReadySet looks to take the core technical concepts and ease of use approach that the project is famous for and bring it to the world of battle hardened infrastructure software. Globally deployable in just minutes, the process of deploying ReadSet to your application is as follows:
- Deploy a ReadySet instance with the click of a button.
- Change out your application’s database connection string to point to your ReadySet instance instead of your underlying database. No code changes required.
- Choose the queries you want to cache.
That’s it. You have now gone from 0 caching to caching in just minutes. And because ReadySet is both MySQL and Postgres wire protocol compatible, it should work for more or less any relational database on the market. Cache invalidation is handled for you through Noria’s dataflow – you just write your application as if it wasn’t even there.
While humanity has yet to find a clean solution to dental hygiene, automobile registration, or government paperwork (at least one that does not involve jail time), thanks to Alana and Jon, we believe we have finally found one for data caching. There are no investments I find more enjoyable than the ones where the company is building something I truly wish I had had during my days as a programmer. ReadySet is one of those companies. We are delighted to play a small part in their journey by leading ReadySets Series A.
Published — April 5, 2022