So far, we’ve established the fact that the disk is slow, and memory is fast- and that one of the biggest challenges in database implementation is in minimizing the number of times we need to incur I/Os by transferring data in between disk and memory.
Since memory is limited, we need to figure out a clever way to re-use information once it’s read into memory, while still allowing new information to come in when needed. This way, we can minimize the number of times we need to read the disk in order to find something.
The system that does this is the buffer manager. It employs a page replacement policy to decide which pages to evict when the buffer is full and a new page is read from disk.
Relevant Materials #
Page Replacement Policies #
Least Recently Used policy: evict the page that was least recently accessed. This makes intuitive sense because if a page hasn’t been used in a long time, then it’s likely we don’t need it anymore!
Although LRU seems pretty good, it can get quite inefficient since we need to keep track of the latest access time for every page in the buffer, and quickly find the oldest time (probably using some sort of heap).
Thankfully, we can approximate LRU with the Clock policy! Rather than strictly using the latest access time, we’ll instead add a reference bit to each frame.
See Discussion 4 for a walkthrough, or if you prefer to read the algorithm, go to the next section.
Detailed Clock Algorithm #
On initialization: Set the clock hand to point to the 0th entry, and set all reference bits to 0.
When trying to access a page from memory:
- Iterate through the entire cache looking for the page. If found, set the reference bit of the page to 1 and return the page. DO NOT move the clock hand!
- If not found, go back to the clock hand’s location and do the following:
- Skip all pinned pages. (A page is pinned if it’s currently in use, meaning we cannot evict it from the cache.)
- If the current entry has a reference bit of 0, then set the reference bit to 1, evict that entry, replace it with the desired entry from disk, and return it.
- Otherwise, set the reference bit to 0, advance the clock hand, and repeat the previous 2 steps.
Another major issue with LRU (and Clock, to some extent) is that it struggles with repeated patterns that are longer than the number of buffer frames available.
For example, the access policy ABCDEABCDEABCDEABCDE would result in $0$ hits if we had $4$ or fewer buffer frames, since $E$ would always evict $A$, $A$ would then evict $B$ which would evict $C$, and so on. This problem is known as sequential flooding.
The solution to sequential flooding is to use Most Recently Used policy, replacing the page that was used the earliest. If we fed the ABCDE example into a MRU policy, it would result in far more hits (since only two replacements would be needed per cycle, rather than 5).
Exam Tips #
A very common exam question would look something like this:
Given the access pattern ABCDEDEFG and a buffer manager with 4 frames, what is the hit rate of <LRU/MRU/Clock>? Express your answer as “X/Y”, where X is the number of hits and Y is the total number of requests.
In my opinion, the best way to tackle these problems is to draw a grid that looks something like this:
Then, for each access, list which pages are in the buffer at that point in time, marking the hits. Below is an example for LRU, which would have a hit rate of $2/9$:
Clock is a bit harder to do, but still managable. I prefer still using the grid method, rather than drawing out the clock face and having to keep erasing the clock hand to advance it. I usually do this by keeping track of the hand position and reference bits. In the example below, the
+ represents the clock hand, and the reference bit is the number after each page letter: