Just for my own sake, I'm calling every "memory pool"/zone/"resource group" a container here.
Requirements
A container must have some limit on the amount of memory it may use.
The overhead in storage, processing time, and dedicated lines of code to the greater kernel should be minimized
Memory which is private to the container (say, anonymous memory) must be strictly accounted to that container
Must scale to large numbers of CPUs and in a NUMA environment
Icing on the Cake
Overcommitting memory should be allowed. We should not allow memory on a system to go completely unused.
Memory for files may be accounted to either the container or a shared pool
Some care should be taken to ensure that a container may not abuse this shared pool
It is preferable to actually determine when sharing is "actually" occurring, but approximate metrics should be OK. This requirement is very secondary to any overhead which it might exhibit.
One useful idea would be being able to bind a directory hierarchy to a particular memory container. So you could e.g. assign all of /lib to the "common" container.
Should allow runtime flexibility in size and number of containers
A task might want to change containers at runtime. This might be a database or web server which wants to "do work" for a particular set of users, but doesn't want to go through the overhead of starting a whole new instance.
Software Zones
Use the existing Linux zone model to create sets of contiguous memory. Each of these is a subset of a current 'struct zone'. Each container gets one or more of these zones from which to allocate its pages. Pages shared between containers will be placed in centralized, "shared" zones.
This code's use of the existing Linux structures would let it do things like page reclaim with the existing algorithms. This can also be done with the existing fake NUMA and cpusets support, without substantial kernel changes.
However, each page still needs a page to "software zone" lookup mechanism, at least for returning the page to the proper allocator lists on free_page(). The nice part is that we already have a page to 'struct zone' lookup via each node's node_zones[] array. However, substantially increasing the number of zones will substantially increase the number of bits in page->flags needed to do proper lookups. It may also become infeasible to use a simple array in the node for these lookups.
Static Page Ownership (the classic CKRM way among others)
Add a pointer to 'struct page', and point it to an object that represents the container which caused the page's allocation. Don't change this until the page gets freed. Any other users of this page don't get charged for it.
Partial Page Ownership (Beancounters????)
Make sure that any additional users get charged, even if they are not the "first" user. Multiple users in a single container should not be charged multiple times. Overhead of figuring this out exactly could be more costly than other approaches.
Only Count RSS
In this scenario, we only count a container's mapped pages. All of the accounting can be done with existing data structures (the rmap lists). When a process goes over its limits, the existing page reclaim algorithm can be used, with a modification to preferentially look for pages mapped by the container over its limit. The overhead here comes by looking at the rmap lists at map and unmap time to see if this use is the first or last for a container.
The big disadvantage to this approach is that it ignores things that aren't mapped.

