When a VM system is under high load, the system throughput can dramatically decrease; in fact, it can degrade so badly that all processes are spending the majority of their time waiting for the disk instead of running, this situation is known as SystemThrashing. The swap token mechanism can help a system avoid and recover from that situation.
The classical method of reducing thrashing is a BSD style memory scheduler, which limits the number of processes that run simultaneously by temporarily suspending some processes completely. While this is guaranteed to reduce the amount of pressure on the VM subsystem, it provides no guarantee that any of the processes in the system are able to make progress. Furthermore, while a properly tuned memory scheduler provides good results, it is hard to tune it right - and an untuned memory scheduler could easily make the situation worse, instead of improving things.
In contrast to the BSD memory scheduler, the swap token method does nothing to reduce the load on the VM subsystem. Instead, it temporarily gives immunity from pageout to one process, allowing that process to make progress. By giving out the swap token in turns, every process in the system gets a fair chance at making progress. This way the system can recover from a thrashing situation.
For more information, see the Simple token based thrashing protection paper by Song Jiang.
While an untuned swap token implementation (like that in the 2.6.11 kernel) already provides large benefits under heavy system load, it has a detrimental effect on performance when the load on the system is very low. In fact, because of this negative effect the swap token mechanism is disabled by default.
This means there are a lot of things left to do (copied from a kernelnewbies email Rik wrote):
- the swap token mechanism is bad for performance in very light VM loads, and switched off by default in the upstream kernel - it would be good if the swap token mechanism could detect the VM load and switch itself on and off on demand (implemented in the patches below)
- the policy for moving the swap token between processes is pretty braindead - having a more intelligent policy could probably get big performance gains!
- the paper linked from mm/thrash.c has some hints on what a better policy could look like - I just never got around to implementing it ;(
Chances are there is a lot of interesting interesting research left to do around the swap token mechanism. Interesting research because the actual code needed for most of these policy changes should be relatively small, but getting them right will require a lot of careful analysis and measurement.
Rik van Riel posted some patches to the linux-kernel mailing list to automatically activate and deactivate the swap token enforcement depending on VM load:
PATCH 1/2 swap token tuning, the rwsem code additions
PATCH 2/2 swap token tuning, the actual swap token changes
Trying changes to the swap token policy? Did you find interesting research? Please leave your notes here, so we can learn from each other's research.. .
Ashwin Chaugule put up patches for an improved algorithm to transfer the token amongst processes. Follow the discussions on LKML here. These patches are now included upstream since 2.6.19.