port port http://brett.is/writing/about/the-battle-of-the-caches/

12 years ago · 2743a06f4e
--- a/contents/writing/about/the-battle-of-the-caches/index.md
+++ b/contents/writing/about/the-battle-of-the-caches/index.md
@ -0,0 +1,95 @@
 ---
 title: The Battle of the Caches
 author: Brett Langdon
 date: 2013-08-01
 template: article.jade
 ---

 A co-worker and I set out to each build our own http proxy cache.
 One of them was written in Go and the other as a C++ plugin for
 Kyoto Tycoon.

 ---

 So, I know what most people are thinking: “Not another cache benchmark post,
 with skewed or biased results.” But luckily that is not what this post is about;
 there are no opinionated graphs showing that my favorite caching system happens
 to be better than all the other ones. Instead, this post is about why at work we
 decided to write our own API caching system rather than use <a href="http://www.varnish-cache.org/" target="_blank">Varnish</a>
 (a tested, tried and true HTTP caching system).

 Let us discuss the problem we have to solve. The system we have is a simple
 request/response HTTP server that needs to have very low latency (a few
 milliseconds, usually 2-3 on average) and we are adding a third-party HTTP API
 call to almost every request that we see. I am sure some people see the issue
 right away, any network call is going to add at least a half to a whole millisecond
 to your processing time and that is if the two servers are in the same datacenter,
 more if they are not. That is just network traffic, now we must rely on the
 performance of the third-party API, hoping that they are able to maintain a
 consistent response time under heavy load. If, in total, this third-party API call
 is adding more than 2 milliseconds response time to each request that our system
 is processing then that greatly reduces the capacity of our system.

 THE SOLUTION! Lets use Varnish. This is the logical solution, lets put a caching
 system in front of the API. The content we are requesting isn’t changing very often
 (every few days, if that) and it can help speed up the added latency from the API
 call. So, we tried this but had very little luck; no matter what we tried we could
 not get Varnish to respond in under 2 milliseconds per request (which is a main
 requirement of solution we were looking for). That means Varnish is out, the next
 solution is to write our own caching system.

 Now, before people start flooding the comments calling me a troll or yelling at me
 for not trying this or that or some other thing, let me try to explain really why
 we decided to write our own cache rather than spend extra days investing time into
 Varnish or some other known HTTP cache. We have a fairly specific requirement from
 our cache, very low and consistent latency. “Consistent” is the key word that really
 matters to us. We decided fairly early on that getting no response on a cache miss
 is better for our application than blocking and waiting for a response from the
 proxy call. This is a very odd requirement and most HTTP caching systems do not
 support it since it almost defeats their purpose (be “slow” 1-2 times so you can be
 fast all the other times). As well, HTTP is not a requirement for us, that is,
 from the cache to the API server HTTP must be used, but it is not a requirement
 that our application calls to the cache using HTTP. Headers add extra bandwidth
 and processing that are not required for our application.

 So we decided that our ideal cache would have 3 main requirements:
 1. Must have a consistent response time, returning nothing early over waiting for a proper response
 2. Support the <a href="https://github.com/memcached/memcached/blob/master/doc/protocol.txt" target="_blank">Memcached Protocol</a>
 3. Support TTLs on the cached data

 This behavior works basically like so: Call to cache, if it is a cache miss,
 return an empty response and queue the request to a background process to make the
 call to the API server, every identical request coming in (until the proxy call
 returns a result) will receive an empty response but not add the request to the
 queue. As soon as the proxy call returns, update the cache and every identical call
 coming in will yield the proper response. After a given TTL consider the data in
 the cache to be old and re-fetch.

 This was then seen as a challenge between a co-worker,
 <a href="http://late.am/" target="_blank">Dan Crosta</a>, and myself to see who
 can write the better/faster caching system with these requirements. His solution,
 entitled “CacheOrBust”, was a
 <a href="http://fallabs.com/kyototycoon/" target="_blank">Kyoto Tycoon</a> plugin
 written in C++ which simply used a subset of the memcached protocol as well as some
 background workers and a request queue to perform the fetching. My solution,
 <a href="https://github.com/brettlangdon/ferrite" target="_blank">Ferrite</a>, is a
 custom server written in <a href="http://golang.org/" target="_blank">Go</a>
 (originally written in C) that has the same functionality (except using
 <a href="http://golang.org/doc/effective_go.html#goroutines" target="_blank">goroutines</a>
 rather than background workers and a queue). Both servers used
 <a href="http://fallabs.com/kyotocabinet/" target="_blank">Kyoto Cabinet</a>
 as the underlying caching data structure.

 So… results already! As with most fairly competitive competitions it is always a
 sad day when there is a tie. Thats right, two similar solutions, written in two
 different programming languages yielded similar results (we probably have
 Kyoto Cabinet to thank). Both of our caching systems were able to yield us the
 results we wanted, **consistent** sub-millisecond response times, averaging about
 .5-.6 millisecond responses (different physical servers, but same datacenter),
 regardless of whether the response was a cache hit or a cache miss. Usually the
 morale of the story is: “do not re-invent the wheel, use something that already
 exists that does what you want,” but realistically sometimes this isn’t an option.
 Sometimes you have to bend the rules a little to get exactly what your application
 needs, especially when dealing with low latency systems, every millisecond counts.
 Just be smart about the decisions you make and make sure you have sound
 justification for them, especially if you decide to build it yourself.