Clock-Pro
This document descibes an approximation to the CLOCKPro page replace algorithm presented in:
The Algorithm
This algorithm strifes to separate the pages with a small reuse distance from those with a large reuse distance. Pages with a small reuse distance are called hot pages and are not available for reclaim. Cold pages are those that have a large reuse distance. In order to track the reuse distance a test period is started when a reference is detected. When another reference is detected during this test period the page has a small enough reuse distance to be classified as hot.
The test period is terminated when the page would get a larger reuse distance than the current largest hot page. This is directly coupled to the cold page target - the target number of cold pages. More cold pages mean fewer hot pages and hence the test period will be shorter.
The cold page target is adjusted when a test period expires (dec) or when a page is referenced during its test period (inc).
If we faulted in a nonresident page that is still in the test period, the inter-reference distance of that page is by definition smaller than that of the coldest page on the hot list. Meaning the hot list contains pages that are colder than at least one page that got evicted from memory, and the hot list should be smaller - conversely, the cold list should be larger.
Since it is very likely that pages that are about to be evicted are still in their test period, their state has to be kept around until it expires, or the total number of pages tracks is twice the total of resident pages.
The data-structre used is a single CLOCK with three hands: Hcold, Hhot and Htest. The dynamics are: Hcold is rotated to look for unreferenced cold pages - those can be evicted. When Hcold encounters a referenced page it either starts a test period or promotes the page to hot if it already was in its test period. Then if there are less cold pages left than targeted, Hhot is rotated which will demote unreferenced hot pages. Hhot also terminates the test period of all cold pages it encounters. Then if after all this there are more nonresident pages tracked than there are resident pages, Htest will be rotated. Htest terminates all test periods it encounters, thereby removing nonresident pages. (Htest is pushed by Hhot - Hcold moves independently)
res
h/c
tst
ref
Hcold
Hhot
Htest
Flt
1
1
0
1
= 1101
1100
= 1101
1
1
0
0
= 1100
R 1000
= 1100
1
0
1
1
R 1100
1001
1001
1
0
1
0
N 0010
1000
1000
1
0
0
1
1010
= 1001
= 1001
1
0
0
0
X 0000
= 1000
= 1000
0
0
1
1
1100
0
0
1
0
= 0010
X 0000
X 0000
0
0
0
1
1010
The table gives the state transitions for each hand, '=' denotes no change, 'R' denotes a relocation, 'N' denotes becomes nonresident and 'X' denotes removal.
(XXX: mention LIRS hot/cold page swapping which makes for the relocation on promotion/demotion)
The Approximation
h/c |
PageHot() |
tst |
PageTest() |
ref |
page_referenced() |
Because pages can be evicted from one zone and paged back into another, nonresident page tracking needs to be inter-zone whereas resident page tracking is per definition per zone. Hence the resident and nonresident page tracking needs to be separated.
This is accomplished by using two CLOCKs instead of one. One two handed CLOCK for the resident pages, and one single handed CLOCK for the nonresident pages. These CLOCKs are then coupled so that one can be seen as an overlay on the other - thereby approximating the relative order of the pages.
The resident CLOCK has, as mentioned, two hands, one is Hcold (it does not affect nonresident pages) and the other is the resident part of Hhot.
The nonresident CLOCK's single hand will be the nonresident part of Hhot. Htest is replaced by limiting the size of the nonresident CLOCK.
The Hhot parts are coupled so that when all resident Hhot have made a full revolution so will the nonresident Hhot.
Two in One
Instead of a single CLOCK with two hands, two lists are used. When the two lists are laid head to tail two junction points appear, these points are the hand positions.
This approach has the advantage that there is no pointer magic associated with the hands. It is impossible to remove the page a hand is pointing to.
To allow the hands to lap each other the lists are swappable; eg. when the hands point to the same position, one of the lists has to be empty - however it does not matter which list is. Hence we make sure that the hand we are going to work on contains the pages.
Use Once
Use-Once insert; we want to avoid activation on the first reference (which we know will come).
This is accomplished by inserting the page one state lower than usual so the activation that does come ups it to the normal insert state. Also we insert right behind Hhot so 1) Hhot cannot interfere; and 2) we lose the first reference quicker.
Insert (cold,test)/(cold) so the following activation will elevate the state to (hot)/(cold,test). (NOTE: the activation will take care of the cold target increment).
Nonresident NUMAfication
TODO
Benchmark results
As of 30-11-2005:
mdb-2.6.15-rc2-stock |
3:29:27 |
100% |
mdb-2.6.15-rc2-cart |
3:14:15 |
93% |
mdb-2.6.15-rc2-mlock |
3:12:26 |
92% |
mdb-2.6.15-rc2-clockpro |
3:07:55 |
89% |
times listed are wall-time, on a p3-500 with mem=128M and a single 7200RPM ata disk.
Code
The code is available here
TODO
- redo the non-resident code; currently I hacked up the CART non-resident code, which is a bit of overkill for this algorithm, revert back to something closer to Rik's original code - DONE
- NUMAfication of the nonresident code
- benchmark;