Consistent Hashing
What is hash? According to vocabulary.com, hash is “chopped meat mixed with potatoes and browned”. In computer world, hashing bears similar meaning, objects come in with different format disparate content and it’s the job of hashing to chop them, mix with necessary other ingredients then finally present them in a uniform way, a.k.a, fixed length value.
One prominent usage of hashing is hashtable. A hash table is as simple as: given an arbitrary key can be uniquely mapped to a value. Fitting a small hashtable in memory on a single machine is trivial, but on a larger scale, when the size of the hashtable stretches beyond the limitation of a physical machine, we need to resort to something called consistent hashing.
Consistent hashing is a way of breaking a giant hashtable into manageable smaller hashatables which fit into single physical machine. Imagine it as a round canal with water moving in one direction. Boats (or hash keys) are tossed randomly into the canal and move along with the current and. Ultimately, the boats need to be loaded with goodies (or hash values), before they can exit the canal. This can only be done when boats arrive at the next closest station (or physical machines). In this sense, consistent hashing solves the following problems:
- Toss a boat and you will get a boatload of goodies
- Stations can be added and removed freely, if too many boats are in some parts of the canal, it might be wise to add more stations; whereas stations can be suspended if boat traffic is light. The following code snippets demonstrate how these two scenarios work:
The above-mentioned functions can be abstracted into the following class interface
1 | class ConsistentHashing: |
To lay the ground for this data structure, we need to first construct a Ring
class that add or override the following functions:
1 | class Ring(dict): |
The full code will look like this, I put comments inline:
1 | from collections import OrderedDict |
How does this work?
- We start with a ring containing 4 nodes (an extra inf node to handle boundary condition)
- Key value pairs are added accordingly, for example, key 1 is tosses onto the ring and it follows the downsteam to node 5
- The result of the ring can be seen below
- A new node 3 enters the ring, so the key 1 and its value gets moved to node 3
- Node 3, 10, 20 leave the ring, the key value pairs are shuffled accordingly.
To demonstrate, we start from the interface, a consistent hashing data structure should be able to do the following