Please add your problematic workload descriptions to the list. These workloads will be used as input on deciding how the VM needs to change, also see PageReplacementRequirements and PageReplacementDesign.
Database, 128GB RAM, 2GB swap
In this scenario, the VM wastes a lot of time scanning through essentially unswappable pages. Essentially unswappable because the amount of swap is negligable compared to the amount of total memory. Besides, the users do not want their database processes or SHM segment swapped out anyway, unless it is absolutely necessary.
Database + backup
While a backup is running, part of the working set of the database is evicted from memory. Clearly this should not happen, since the database memory is accessed frequently, while the backup runs only once a day.
This problem also happens to various other workloads when they get combined with a backup. Updatedb can provoke a similar problem, sometimes even worse because the dentry and inode slab caches cannot be reclaimed as easily as page cache.
Copy lots of data, then start the application
After copying a lot of data, an application is started to use that data. Unfortunately, the VM sometimes ends up swapping out the application while keeping the data in memory. It would be better if the data (page cache) would get evicted first.
Webserver with lots of small cache files
A webserver has a cache dir with lots of little files. The cache files are larger than available ram. Crontab to expire old cache file runs something like find . -mtime -delete. The find pollutes the vfs cache and causes contention because apache is still trying to write new files and find is deleting them.
Very large working set
Some versions of Linux end up eating a lot of CPU time in the page reclaim code when an application references pretty much all memory. The issue is that whenever the pageout code scans memory, most pages will have been referenced. Even if this is not an issue in the current Linux VM (need to verify), this kind of bug can be reintroduced very easily.
Garbage Collection and MM
Garbage collected languages (like Java) do not behave optimal if their heap space size is near to the computer RAM size. In other words: paging is difficult for the kernel virtual MM in this case. There are indeed indications for the opinion that collaboration of kernel MM and user space tools could improve performance. Matthew Hertz did some academic research on the topic. His bookmarking collector, a garbage collection algorithm shows this sort of collaboration.
Lots of Virtual Machines
When running lots of virtual machines (eg: kvm) on a linux host, these guests are currently large opaque consumers of memory. Worst case their memory usage versus reclaim uses a lot of CPU and results in the guests getting to satisfy page faults from disk. This could be viewed as a specific variant of "Very Large Working Set" described above.