Just for my own sake, I'm calling every "memory pool"/zone/"resource group" a container here.


Icing on the Cake

Software Zones

Use the existing Linux zone model to create sets of contiguous memory. Each of these is a subset of a current 'struct zone'. Each container gets one or more of these zones from which to allocate its pages. Pages shared between containers will be placed in centralized, "shared" zones.

This code's use of the existing Linux structures would let it do things like page reclaim with the existing algorithms. This can also be done with the existing fake NUMA and cpusets support, without substantial kernel changes.

However, each page still needs a page to "software zone" lookup mechanism, at least for returning the page to the proper allocator lists on free_page(). The nice part is that we already have a page to 'struct zone' lookup via each node's node_zones[] array. However, substantially increasing the number of zones will substantially increase the number of bits in page->flags needed to do proper lookups. It may also become infeasible to use a simple array in the node for these lookups.

Static Page Ownership (the classic CKRM way among others)

Add a pointer to 'struct page', and point it to an object that represents the container which caused the page's allocation. Don't change this until the page gets freed. Any other users of this page don't get charged for it.

Partial Page Ownership (Beancounters????)

Make sure that any additional users get charged, even if they are not the "first" user. Multiple users in a single container should not be charged multiple times. Overhead of figuring this out exactly could be more costly than other approaches.

Only Count RSS

In this scenario, we only count a container's mapped pages. All of the accounting can be done with existing data structures (the rmap lists). When a process goes over its limits, the existing page reclaim algorithm can be used, with a modification to preferentially look for pages mapped by the container over its limit. The overhead here comes by looking at the rmap lists at map and unmap time to see if this use is the first or last for a container.

The big disadvantage to this approach is that it ignores things that aren't mapped.

Software Zones

Static Page Ownership

Partial Page Ownership

Only Count RSS

enforces comprehensive memory limits

doesn't account for page cache, can not be extended to cover non-user-mapped memory use

code overhead

storage overhead

Extra 'struct page' field

At least the cost of 'static' page ownership

runtime overhead

walking the rmap chains might get expensive

resize at runtime

Physical contiguity requirement will inhibit growth. But if you have lots of small zones, and allow several to be assigned to a single container, you can resize reasonably easily

creation at runtime

Must find physically contiguous area to use, can not simply take a bit from each existing container

recognize page sharing

requires a "shared" zone

doesn't recognize use by multiple containers, but could have a "shared" container

support overcommit

Overcommit is trickier because of the static assignment of zones to containers. But with a few minor hooks in the kernel (directed reclaim, and OOM notifications) it's possible for userspace to juggle the zone assignments to wherever they're needed, allowing overcommit

vulnerable to DOS attack

No charge for shared data access means any container unusing something can not cause another to go over its limit

Stop using shared data at opportune time to force another container over its limit.

Same as static scheme

containers get no credit for sharing, so no penalty when sharing goes away

LinuxMM: SoftwareZones (last edited 2017-12-30 01:05:10 by localhost)