Slide #1.

EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha
More slides like this


Slide #2.

Looking back and ahead  So far:     Different ways of replicating state Tradeoffs between replication and consistency Impact of strong vs. weaker consistency models Rest of the semester:   Scaling up RSMs Case studies: » Amazon Dynamo, Azure storage, Google Spanner, Bitcoin, … November November1, 1,2017 2017 EECS 498 – Lecture 14 2
More slides like this


Slide #3.

What have we overlooked?     Say each server can serve R PUTs/sec Rate at which RSM in Project 2 can serve PUTs? What about in project 3? As you add more servers    Every replica has to handle every operation Idle servers unutilized Horizontal scaling better than vertical scaling  Adding more commodity servers >> beefing up servers November November1, 1,2017 2017 EECS 498 – Lecture 14 3
More slides like this


Slide #4.

Scaling up  Assumption so far: All replicas have entire state   Example: Every replica has value for every key What we need instead:   Partition state Map partitions to servers November November1, 1,2017 2017 EECS 498 – Lecture 14 4
More slides like this


Slide #5.

Partitioning state  Modulo hashing     Example:     Apply hash function to key Compute modulo to # of servers (N) Store (key, value) pair at hash(key) mod N Store student’s transcripts across 4 servers Hash function = (Year of birth) mod 4 Hash function = (Date of birth) mod 4 Problem: Skew in load across servers November November1, 1,2017 2017 EECS 498 – Lecture 14 5
More slides like this


Slide #6.

Problem for modulo hashing: Changing number of servers h(x) = x + 1 (mod 4) Add one machine: h(x) = x + 1 (mod 5) Server 4 3 Keys remapped to new nodes  Need to 2 transfer values 1 0 5 November November1, 1,2017 2017 7 10 11 27 29 36 38 40 Object serial number EECS 498 – Lecture 14 6
More slides like this


Slide #7.

Consistent Hashing  Represent hash space as a circle  Partition keys across servers     0 S2 12 Assign every server a random ID S3 Hash server ID Server responsible for keys between predecessor and itself 4 S1 Shard 8 S4 How to map a key to a server?  Hash key and execute read/write at successor November November1, 1,2017 2017 EECS 498 – Lecture 14 7
More slides like this


Slide #8.

Adding/Removing Nodes 0 12 4 S1 S3 8  S5 S2 S4 0 S5 S2 12 4 S1 S3 8 S4 0 12 4 S1 S3 8 S4 Minimizes migration of state upon change in set of servers   Server addition: New server splits successor’s shard Server removal: Successor takes over shard November November1, 1,2017 2017 EECS 498 – Lecture 14 8
More slides like this


Slide #9.

Virtual nodes  Each server gets multiple (say v) random IDs    Each ID corresponds to a virtual node If N servers with v virtual nodes per server, each virtual node owns 1/(vN)th of hash space Larger v  better load balancing  Vary v across servers to account for heterogeneity November November1, 1,2017 2017 EECS 498 – Lecture 14 9
More slides like this


Slide #10.

Virtual nodes S1.1 0 14 S2.2 12 4 S1.3 S2.1 8  S1.2 What happens upon server failure?   v successors take over Each now stores (v+1)/v×1/Nth of hash space November November1, 1,2017 2017 EECS 498 – Lecture 14 10
More slides like this


Slide #11.

Using Consistent Hashing How does client map keys to servers? Front-end Client Server Server Front-end Server Front-ends must agree on set of active servers November November1, 1,2017 2017 EECS 498 – Lecture 14 11
More slides like this


Slide #12.

Consistent Hashing Impact  Widely used in key-value stores     Memcached Cassandra … Limited scalability if strong consistency desired November November1, 1,2017 2017 EECS 498 – Lecture 14 12
More slides like this


Slide #13.

Distributed Hash Table  Scalable lookup of node responsible for any key    Scale to thousands (or even millions) of nodes No one node knows all nodes in the system Example usage:    Trackerless BitTorrent Key = File content hash Value = IP addresses of nodes that have file content November November1, 1,2017 2017 EECS 498 – Lecture 14 13
More slides like this


Slide #14.

Successor pointers Downside of approach? N120 N10 N105 O(N) Lookup N32 K80 K80 N90 N60  K80 If you don’t have value for key, forward to succ. November November1, 1,2017 2017 EECS 498 – Lecture 14 14
More slides like this


Slide #15.

Efficient lookups  What’s required to enable O(1) lookups?    Every node must know all other nodes Need to convert linear search to binary search Idea: Maintain log(N) pointers to other nodes     Called finger table Pointer to node ½-way across hash space Pointer to node ¼-way across hash space … November November1, 1,2017 2017 EECS 498 – Lecture 14 15
More slides like this


Slide #16.

Finger tables  i’th entry at node n points to successor of hash(n)+2^i   # of entries = # of bits in hash value Threaded through others’ finger tables November November1, 1,2017 2017 ½ 1/8 Binary lookup tree rooted at every node  ¼ 1/16 1/32 1/64 N80 EECS 498 – Lecture 14 16
More slides like this


Slide #17.

Finger tables Node n Succ of hash(n) Succ of hash(n)+2 Succ of hash(n)+22 Succ of hash(n)+(max hash)/2 How to recursively use finger tables to locate node for key k? November November1, 1,2017 2017 EECS 498 – Lecture 14 17
More slides like this


Slide #18.

Lookup with finger table Lookup(key k, node n) Modulo look in local finger table for arithmetic highest f s.t. hash(f) < hash(k) if f exists call Lookup(k, f) // next hop else return n’s successor // done November November1, 1,2017 2017 EECS 498 – Lecture 14 18
More slides like this


Slide #19.

Lookups take O(log N) hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60 November November1, 1,2017 2017 EECS 498 – Lecture 14 19
More slides like this


Slide #20.

Is log(N) lookup fast or slow?     For a million nodes, it’s 20 hops If each hop takes 50 ms, lookups take a second If each hop has 10% chance of failure, it’s a couple of timeouts So log(N) is better than O(N) but not great November November1, 1,2017 2017 EECS 498 – Lecture 14 20
More slides like this


Slide #21.

Handling churn in nodes   Need to update finger tables upon addition or removal of nodes Hard to preserve consistency in the face of these changes November November1, 1,2017 2017 EECS 498 – Lecture 14 21
More slides like this