This page describes a new page replacement design by Rik van Riel. This design should meet the most important [:PageReplacementRequirements:page replacement requirements] as well as fix the VM behaviour in certain ProblemWorkloads. == Design tenets == * File IO is fundamentally more efficient than swap IO. This has a number of reasons: * Pages are swapped out in an LRU-like fashion. File content usually is already on disk; we can drop the page without IO. * Multiple rounds of malloc and free can mix up application memory. File contents are usually related, so we can do efficient readahead. * Swap administration in Linux is very simple (also low overhead). * We have to deal with systems where swap is insignificantly small, eg. a database server with 128GB RAM, 2GB swap and an 80GB shared memory segment. * We cannot waste our time scanning 100MB of anonymous memory, to get at the 8GB freeable page cache! * We need separate pageout selection lists for anonymous and file backed pages. * Belady's MIN "algorithm" needs to be modified. A page replacement algorithm does not have as its primary goal to minimize the number of page cache and anonymous memory misses. Instead, the goal is to minimize the number of IO operations required. * If we keep some statistics, we can measure exactly how much more efficient file IO and swap IO are, for the workload that the system is currently running. * Using those statistics, in combination with other information, we can efficiently size the "LRU" pools for anonymous and file backed memory. * If there is no swap space we do not try to shrink the anonymous memory pool. * Since the basic split is on "IO cost", memory mapped pages (except shared memory segments) go into the file backed pool. * We need a scan resistant algorithm (see AdvancedPageReplacement) to select which pages to free. == Design details ==