The Linux page allocator, from mm/page_alloc.c, is the main memory allocation mechanism in the Linux kernel. It has to deal with allocations from many parts of the Linux kernel, under many different circumstances. Consequently the Linux page allocator is fairly complex, and easiest to understand in the context of its environment. Because of this, this wiki article begins with an explanation of exactly what the page allocator needs to do, before going into the details of how things are done.

I am writing this article bit by bit whenever I feel like it. If you feel like writing something, go right ahead - RikvanRiel

memory allocators

Various different parts of the Linux kernel allocate memory, under different circumstances. Most memory allocations happen on behalf of userspace programs; these allocations can use any memory in the system (highmem, zone_normal and dma) and, if free memory is low, can wait for memory to be freed by the pageout code. Page cache and page table allocations can also use any memory in the system and can wait for memory to be freed.

Most kernel level allocations are different and can only use memory that is directly mapped into kernel address space (zone_normal and dma). Most, though not all, kernel level allocations can wait for memory to be freed, if free memory is low.

Allocations from interrupt context are different. They can not wait for memory to be freed, so if free memory on the system is low at the time of the allocation, the allocation will simply fail.

gfp mask

The gfp_mask is used to tell the page allocator which pages can be allocated, whether the allocator can wait for more memory to be freed, etc. All the gfp flags are defined in include/linux/gfp.h:

/* Zone modifiers in GFP_ZONEMASK (see linux/mmzone.h - low two bits) */
#define __GFP_DMA       0x01u
#define __GFP_HIGHMEM   0x02u

/*
 * Action modifiers - doesn't change the zoning
 *
 * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt
 * _might_ fail.  This depends upon the particular VM implementation.
 *
 * __GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
 * cannot handle allocation failures.
 *
 * __GFP_NORETRY: The VM implementation must not retry indefinitely.
 */
#define __GFP_WAIT      0x10u   /* Can wait and reschedule? */
#define __GFP_HIGH      0x20u   /* Should access emergency pools? */
#define __GFP_IO        0x40u   /* Can start physical IO? */
#define __GFP_FS        0x80u   /* Can call down to low-level FS? */
#define __GFP_COLD      0x100u  /* Cache-cold page required */
#define __GFP_NOWARN    0x200u  /* Suppress page allocation failure warning */
#define __GFP_REPEAT    0x400u  /* Retry the allocation.  Might fail */
#define __GFP_NOFAIL    0x800u  /* Retry for ever.  Cannot fail */
#define __GFP_NORETRY   0x1000u /* Do not retry.  Might fail */
#define __GFP_NO_GROW   0x2000u /* Slab internal usage */
#define __GFP_COMP      0x4000u /* Add compound page metadata */
#define __GFP_ZERO      0x8000u /* Return zeroed page on success */
#define __GFP_NOMEMALLOC 0x10000u /* Don't use emergency reserves */
#define __GFP_NORECLAIM  0x20000u /* No realy zone reclaim during allocation */

page allocation order

alloc_pages

buddy allocator

per-cpu page queues

hot/cold pages

NUMA tradeoffs

This page is part of the ["LinuxMMInternals"] section: ["CategoryLinuxMMInternals"]