Corner cases are slightly non-obvious problems implementing huge pages. If you think of one, please archive it here so it doesn't come back and bite us later.
NUMA-aware allocation - don't want all the memory accesses going through one controller! See IBM's page about tuning Stream.
- STREAM and hugetlb is a huge issue unless you have chipset level interleave no matter what
- Fork of an application with huge pages when no huge pages are left - must fail over to small pages.
- mmap of a file that is partially in cache already
- mmap of a file that then gets replaced on disk
- (say glibc text and then glibc gets upgraded to a newer version)
- MAP_GROWSDOWN for stacks including automatic growth
- mprotect to executable (scenario: dlopen of a shared lib that
- requires executable stack, causing all thread stacks to be mprotected executable)
- write() of mmaped area to the same file at an offset either
- inside or outside the same 2Mb page
- same but for an O_DIRECT write
- same but for AIO write
- sendfile() of a non-4Kb region
- truncate of 4Mb mmaped file to 1Mb, then truncate to 1.5Mb
- then 0.5Mb (creates a sparse file)
- mmap of a 2Mb region in 2 parts: first 1Mb gets mmaped writable private,
- second 2Mb gets mmaped shared writable; then write a few bytes to each part
- Running a large .bss segment workload, specifying only a portion of the large page requirements to be made available to the workload... ie: app defines 10GB in arrays, but only 5GB are
- whether an entire segment must be dedicated to huge pages (256MB in the ppc environment) if one huge page is specified Should huge pages be allocated by segment?
- can we define stages of transparent functionality to enable customer production usage in the near future?
- can a non-root userid be specified to have access to huge pages? in a production environment, admins may require ability to designate huge page users..