Slide #1.

Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale The Design and Implementation of a LogStructured File System 1
More slides like this


Slide #2.

Overview  Considerations for File Systems  Problems with Existing File Systems  Log-Structured File System  Sprite LFS Implementation Details  Sprite LFS Evaluation  Solid State and Log-Structured File Systems  Questions / Discussion 2
More slides like this


Slide #3.

Considerations for File Systems  Technology  Processors  Disks  Main Memory  Workloads  Office  Engineering Environments 3
More slides like this


Slide #4.

Technology Considerations  Processors  Getting exponential faster  Unbalanced Systems  Disk  Components of Disk Access ▪ Transfer Bandwidth ▪ Access Time  Main Memory  Increasing at exponential rate  Caches ▪ Reduces read operations to disk ▪ Write buffers 4
More slides like this


Slide #5.

Workload Considerations  Files tend to be small  Small random disk I/Os  Creation and deletion dominated by updates to metadata 5
More slides like this


Slide #6.

Problems with Existing File Systems  Information Spreading  Synchronous Writes 6
More slides like this


Slide #7.

Information Spreading  Information is spread around the disk so small accesses are frequent  Unix FFS: separates files, file attributes, and directory entries  Unix FFS: takes five disk I/Os with seeks to create a new file 7
More slides like this


Slide #8.

Synchronous Writes  Defeats the use of cache as write buffer  Unix FFS: writes file attributes and metadata structures synchronously  NFS: has synchronous operations that improve crash recovery at cost of write performance 8
More slides like this


Slide #9.

Log-Structured File Systems  Goal: Improve write performance  Buffer file system changes in a cache  Write changes sequential in a single disk operation  Two issues in obtaining goal  Retrieving Information from the log  Managing free space 9
More slides like this


Slide #10.

Sprite LFS Data Structures 10
More slides like this


Slide #11.

Sprite LFS structure compared to Unix FFS 11
More slides like this


Slide #12.

Reading File from Sprite LFS Checkpoint Region Inode Data Block Cache Inode Map 12
More slides like this


Slide #13.

Free Space Management  Fragmentation from deleted and overwritten files  Two approaches to reduce fragmentation  Threading ▪ Leave live data in place and thread through the free extents ▪ Reduces ability of sequential writes  Copying ▪ Copy live data an append to the front of the log ▪ Leaves larger free extents for use ▪ Copying is very expensive on long-lived files  Sprite LFS uses a combination of both threading and copying 13
More slides like this


Slide #14.

Threading vs Copying 14
More slides like this


Slide #15.

Sprite LFS Segments  The disk is divided into large fixedsized segments  Either 512 kilobytes or 1 megabyte  Live data on segments is copied if segments need to be rewritten  System collects long-lived data together  These segments can be skipped during the copy procedure  The log is threaded segment-by- 15
More slides like this


Slide #16.

Sprite LFS Segment Cleaning  Refers to copying of live data in segments  Read segments into memory  Identify the live data ▪ Segment Summary Block ▪ Uid in inode maps  Write live data to smaller number of clean segments  No free list 16
More slides like this


Slide #17.

Segment Cleaning Policy Questions  When should the segment cleaner execute?  How many segments should it clean at a time?  Which segments should be cleaned?  How should the live blocks be grouped when they are written out? 17
More slides like this


Slide #18.

Write Cost  Used to compare cleaning policies  Average amount of time the disk is busy per byte of new data written, including all the cleaning overheads  1.0 is perfect while higher means fractions of disk bandwidth are being utilized 18
More slides like this


Slide #19.

Write Cost of Sprite LFS 19
More slides like this


Slide #20.

Cleaning policy simulation  Models file system as a fixed number of 4-kbyte files  Simulator overrides data by using different access patterns  Uniform  Hot-and-Cold 20
More slides like this


Slide #21.

Greedy Cleaning Policy  Cleaner chooses the least utilized segments to clean  In the case of the Hot-and-Cold distribution the cleaner also sorts the live data by age  Cold blocks tended to be in different segments from Hot blocks 21
More slides like this


Slide #22.

Greedy Cleaning Policy Problems  In Hot-and-Cold performance was worse than random distribution  Cold Segments were not dropping to cleaning utilization thresholds quickly enough 22
More slides like this


Slide #23.

Cost-Benefit Policy  Greedy Policy data shows that hot and cold segments should be treated differently  Cold segments should be cleaned at high utilization  Hot segments should be cleaned at low utilization  Cost-Benefit policy rates each segment with the benefit of cleaning the segment and the cost of cleaning the segment 23
More slides like this


Slide #24.

Cost-Benefit Policy Evaluation 24
More slides like this


Slide #25.

Segment Cleaning Policy  Segment Cleaning kicks in when the number of clean segments drops below a threshold  Cleans segments until number of clean segments passes a threshold  Threshold values do not seem to effect performance greatly  Cost-Benefit Policy is used in cleaning the segments  Segment Usage table used to support the CostBenefit Policy  Contains number of live blocks and time of the most recently accessed block  Information used to compute the cost benefit to see if segment should be cleaned 25
More slides like this


Slide #26.

Crash Recovery  Checkpoint Region  Contains the addresses of all the blocks in the inode map and segment usage table, plus the current time and a pointer to the last segment written  Performed at periodic intervals  System uses the checkpoint region to return log to this state  Roll-Forward  Uses data after the check point to recover as many files as possible 26
More slides like this


Slide #27.

Micro Benchmarks  File systems: Sprite LFS and Unix FFS  Machine specs: Sun-4/260, with 32 megabytes of memory, a Sun SCSI3 HBA, and a Wren IV disk (1.3 MBytes/sec maximum transfer bandwidth, 17.5 milliseconds average seek time)  Disk Specs: 300 megabytes of usable storage  SunOS (Unix FFS) using 8 kilobyte blocks  Sprite FFS using 4 kilobyte blocks and 1 megabyte segment size 27
More slides like this


Slide #28.

Micro Benchmark Results Small File Benchmark Large File Benchmark 28
More slides like this


Slide #29.

Cleaning Overhead  Tested on 5 different file systems over 4 month period  Waited several months to allow file system to balance  Write cost smaller than simulated  This was due to block sizes used in simulation 29
More slides like this


Slide #30.

Segment Utilization for /user6 30
More slides like this


Slide #31.

Other Overheads 31
More slides like this


Slide #32.

Crash Recovery Times 32
More slides like this


Slide #33.

Log Structured File Systems and SSDs  Log Structured File Systems write files sequentially on disk  Segment Cleaning 33
More slides like this


Slide #34.

Questions 34
More slides like this