EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Slideshow and powerpoint viewer: Looking back and ahead So far: Different ways of replicating stat

Looking back and ahead So far: Different ways of replicating state Tradeoffs between replication and consistency Impact of strong vs. weaker consistency models Rest of the semester: Scaling up RSMs Case studies: » Amazon Dynamo, Azure storage, Google Spanner, Bitcoin, … November November1, 1,2017 2017 EECS 498 – Lecture 14 2

What have we overlooked? Say each server can serve R PUTs/sec Rate at which RSM in Project 2 can serve PUTs? What about in project 3? As you add more servers Every replica has to handle every operation Idle servers unutilized Horizontal scaling better than vertical scaling Adding more commodity servers >> beefing up servers November November1, 1,2017 2017 EECS 498 – Lecture 14 3

Scaling up Assumption so far: All replicas have entire state Example: Every replica has value for every key What we need instead: Partition state Map partitions to servers November November1, 1,2017 2017 EECS 498 – Lecture 14 4

Partitioning state Modulo hashing Example: Apply hash function to key Compute modulo to # of servers (N) Store (key, value) pair at hash(key) mod N Store student’s transcripts across 4 servers Hash function = (Year of birth) mod 4 Hash function = (Date of birth) mod 4 Problem: Skew in load across servers November November1, 1,2017 2017 EECS 498 – Lecture 14 5

Problem for modulo hashing: Changing number of servers h(x) = x + 1 (mod 4) Add one machine: h(x) = x + 1 (mod 5) Server 4 3 Keys remapped to new nodes Need to 2 transfer values 1 0 5 November November1, 1,2017 2017 7 10 11 27 29 36 38 40 Object serial number EECS 498 – Lecture 14 6

Consistent Hashing Represent hash space as a circle Partition keys across servers 0 S2 12 Assign every server a random ID S3 Hash server ID Server responsible for keys between predecessor and itself 4 S1 Shard 8 S4 How to map a key to a server? Hash key and execute read/write at successor November November1, 1,2017 2017 EECS 498 – Lecture 14 7

Adding/Removing Nodes 0 12 4 S1 S3 8 S5 S2 S4 0 S5 S2 12 4 S1 S3 8 S4 0 12 4 S1 S3 8 S4 Minimizes migration of state upon change in set of servers Server addition: New server splits successor’s shard Server removal: Successor takes over shard November November1, 1,2017 2017 EECS 498 – Lecture 14 8

Virtual nodes Each server gets multiple (say v) random IDs Each ID corresponds to a virtual node If N servers with v virtual nodes per server, each virtual node owns 1/(vN)th of hash space Larger v better load balancing Vary v across servers to account for heterogeneity November November1, 1,2017 2017 EECS 498 – Lecture 14 9

Virtual nodes S1.1 0 14 S2.2 12 4 S1.3 S2.1 8 S1.2 What happens upon server failure? v successors take over Each now stores (v+1)/v×1/Nth of hash space November November1, 1,2017 2017 EECS 498 – Lecture 14 10

Using Consistent Hashing How does client map keys to servers? Front-end Client Server Server Front-end Server Front-ends must agree on set of active servers November November1, 1,2017 2017 EECS 498 – Lecture 14 11

Distributed Hash Table Scalable lookup of node responsible for any key Scale to thousands (or even millions) of nodes No one node knows all nodes in the system Example usage: Trackerless BitTorrent Key = File content hash Value = IP addresses of nodes that have file content November November1, 1,2017 2017 EECS 498 – Lecture 14 13

Successor pointers Downside of approach? N120 N10 N105 O(N) Lookup N32 K80 K80 N90 N60 K80 If you don’t have value for key, forward to succ. November November1, 1,2017 2017 EECS 498 – Lecture 14 14

Efficient lookups What’s required to enable O(1) lookups? Every node must know all other nodes Need to convert linear search to binary search Idea: Maintain log(N) pointers to other nodes Called finger table Pointer to node ½-way across hash space Pointer to node ¼-way across hash space … November November1, 1,2017 2017 EECS 498 – Lecture 14 15

Finger tables i’th entry at node n points to successor of hash(n)+2^i # of entries = # of bits in hash value Threaded through others’ finger tables November November1, 1,2017 2017 ½ 1/8 Binary lookup tree rooted at every node ¼ 1/16 1/32 1/64 N80 EECS 498 – Lecture 14 16

Finger tables Node n Succ of hash(n) Succ of hash(n)+2 Succ of hash(n)+22 Succ of hash(n)+(max hash)/2 How to recursively use finger tables to locate node for key k? November November1, 1,2017 2017 EECS 498 – Lecture 14 17

Lookup with finger table Lookup(key k, node n) Modulo look in local finger table for arithmetic highest f s.t. hash(f) < hash(k) if f exists call Lookup(k, f) // next hop else return n’s successor // done November November1, 1,2017 2017 EECS 498 – Lecture 14 18

Is log(N) lookup fast or slow? For a million nodes, it’s 20 hops If each hop takes 50 ms, lookups take a second If each hop has 10% chance of failure, it’s a couple of timeouts So log(N) is better than O(N) but not great November November1, 1,2017 2017 EECS 498 – Lecture 14 20

Handling churn in nodes Need to update finger tables upon addition or removal of nodes Hard to preserve consistency in the face of these changes November November1, 1,2017 2017 EECS 498 – Lecture 14 21