ClockProApproximation - linux-mm.org Wiki

This is an almost complete design to approximate CLOCK-Pro replacement in a zoned VM with a big hash table instead of a non-resident list. Step 2 probably needs some work to filter out correlated references...

new pages get added to the inactive (cold) list
if they are referenced they go to the active (hot) list
if an active page was referenced, it goes to the head of the list
if an active page wasn't referenced, it goes to the head of the inactive list
if an inactive page wasn't referenced, we remember we recently evicted this page
if we recycled an empty slot in the non-resident table, we do nothing
if we recycled an occupied slot in the non-resident table, we reduce the number of inactive pages we keep around
if, on page fault time, we fault in a page that was on the non-resident table (ie. recently evicted) and it's inter reference distance is smaller than that of the least active page on the active list, we add the page to the active list immediately and increase the number of inactive pages we keep around - see Inter Reference Distance below.
active pages being referenced should also be used as an indication that we are indeed caching the working set, and the active list should not be shrunk too much - this emulates the HANDhot clears out Mtest pages from the CLOCK-Pro algorithm, but probably should not carry as much weight as recycling an occupied slot in the non-resident table
- [SongJiang: The operation that HANDhot clears out the LRU non-resident cold pages out of the clock (free an occupied hash table slot) is critical in CLOCK-Pro, which makes sure that the reuse distances of cold pages (including non-resident) are always less than those of hot pages. For this reason, we might have to consider merging the two lists into one clock list. - RikvanRiel: could the mechanism described in Inter Reference Distance take care of this?]
if we end up refaulting too much (ie. too many pages have a reference distance of <2 times the size of memory), and the system load is high, we enable the swap token mechanism to prevent thrashing - we can measure this by looking at the goal for the number of inactive pages that this CLOCK-Pro approximation ends up setting
- [SongJiang: The maximum reuse distance test period is 2 times the size of memory. However, the test period can be less than that and is variable. The period is basically determined by HANDtest. However, it is also related to HANDhot, because we ensure HANDtest always stays before HANDhot (i.e HANDhot might clear non-resident cold pages and push HANDtest ahead). Does the "system load" mean working set size? It doesn't make sense to me why the "refaulting" could cause thrashing. ]
conversely, when the inactive target gets low we know we are caching the working set just fine, and the swap token logic can be switched back off
- [SongJiang: Being adaptive on hot/cold memory allocation makes thing less straightforward. I am going to clarify this tomorrow. I suggest fixing the cold page memory allocation at the first stage of the project in order to better understand the correslation between the algorithm and the measured performance.]

Correlated References Filter

In point 2. above, we really only want to promote pages from the inactive list to the active list when they were referenced again after their initial page fault and reference. In short, we need a way to filter out correlated references. One idea would be to have a new pages list that's maybe 5% or 10% the size of the inactive list. New pages (after a page fault of an unknown page) go onto this list, and the reference bit gets cleared when the page leaves the list, to be placed at the head of the inactive list.

Linux vs CLOCK-Pro translations

Some of the words from the CLOCK-Pro paper have been changed to map closer to the current terms in the Linux VM, here is are the translations:

Mhot == active list
Mcold == inactive list
Mtest == recently evicted pages (non-resident pages)

Readahead & Inactive Target Adjustments

Readahead and IO clustering provides a bit of a conundrum. On the one hand, we are faulting these pages into memory, but on the other hand this isn't done because anybody actually needs the page right now. My current thinking is to use the "catch correlated references" logic to activate and count these pages. If the page was referenced when it reaches the end of the new pages list, we treat it as if it were a fresh page fault, clearing the entry in the page's entry in the non-resident table and adjusting the inactive target as needed.

Detecting Readahead Thrashing

When a page is being evicted, we could check to see if its information is still in the recently evicted table. If it is, we know we evicted a page before we got a chance to use it, and could have avoided the IO. If it happens frequently, we should think about reducing the amount of readahead being done, or maybe think about load control...

Inter Reference Distance

The non-resident set of pages consists of hash bucket within which we do FIFO/circular replacement of entries. By restricting the recently_evicted() function to looking at only the last N entries within each circular buffer, we can limit the size of the non-resident list. If we keep Mcold + Mtest <= Mhot we can be reasonably sure that when we add a faulted in Mtest page to the active list, that page has a smaller inter-reference distance than the least recently used page on the active list. This assumes that the hash function used spreads the load around evenly between the hash buckets, which a cryptographically strong hash function should do. (This is theory by RikvanRiel and could be wrong, please comment)

This leads to 3 invariants that could be followed:

Mtest <= Mhot + Mcold                            # limit too low for second level caches ?
Shot * (Mhot + Mcold + Mtest) <= Scold * Mhot    # invariant kept by limiting Mtest
Mtest <= ((Scold - Shot) / Shot) * Mhot) - Mcold # simplification of the line above
Shot + Scold <= Mhot + Mcold                     # Shot and Scold divided by two when limit reached

Back to AdvancedPageReplacement