LinuxMM:

(IA32, non-PAE specific information)

Kernel use of virtual memory begins very early on in the boot process. head.S contains code to create provisional page tables and get the kernel up and running, but that is beyond this overview.

Every physical page of memory up to 896MB is mapped directly into the kernel space. Memory greater than 896MB (High Mem) is not permanently mapped, but is instead temporarily mapped using kmap and kmap_atomic (see HighMemory).

Initialization:

Paging is initialized in arch/i386/mm/init.c. The function 'paging_init()' is called once by setup_arch during kernel initialization. It immediately (in non-PAE) calls pagetable_init(). pagetable_init() starts by defining the base of the page table directory:

 *pgd_base = swapper_pg_dir;

swapper_pg_dir is defined in head.S as well, using.org directives. It points to 0x1000 above the 'root' of kernel memory. Kernel memory is defined to start at PAGE_OFFSET,which in x86 is 0XC0000000, or 3 gigabytes. This is where the 3gig/1gig split is defined. Every virtual address above PAGE_OFFSET is the kernel, any below is user.

After some capability checking, pagetable_init() calls 'kernel_physical_mapping_init'. This function performs the lions share of the kernel page table setup.

This function performs the bulk of the kernel page table setup. By looping for each pmd and pte, the function calls one_md_table_init and one_page_table_init respectively. These functions create new page middle directories and page tables by allocating space using the boot memory allocator. In non-PAE mode, the pmd is not used and no memory is allocated. Here is the important part of one_page_table_init:

 *page_table = (pte_t*)alloc_bootmem_low_pages(PAGE_SIZE);
 set_pmd(pmd, __pmd(__pa(page_table) |_PAGE_TABLE ));

The first line allocates a page of memory to hold the table using the bootmem allocator, the next inserts the table into the pmd.

Once the table is returned, kernel_physical_mapping_init populates it the page table using code similar to this:

 set_pte(pte, pfn_pte(pfn, PAGE_KERNEL))

This code populates the page tables in a linear fashion. What I mean to say is the mapping from physical page number to virtual addressis linear and only differs by PAGE_OFFSET. To translate a physical address to a virtual address, one only needs to add PAGE_OFFSET(0XC0000000). This can be seen in the macro va from page.h:

#define __va(x)                 ((void *)((unsigned long)(x)+ PAGE_OFFSET))

The virtual address of x is returned by adding PAGE_OFFSET.

Once the page tables have been set, pagetable_init() calls permanent_kmaps_init() to set up the page tables for use by kmap. Recall that we discussed the use of kmap to temporarily map high memory (>896MB) into the kernel as required. This function call sets the page tables for use by kmap.

Once all is set, the return is made back to paging_init(). On return, paging_init loads the new page table address to CR3, here:

load_cr3(swapper_pg_dir);

After flushing the TLB's to force a reload for our new page tables, kmap_init() is the last piece of the paging setup. It completes the setup of the kmap initialized above.

Paging is active.

<saxm> is everything from PAGE_OFFSET onwards paged?
<riel> saxm: depends, what do you mean by "paged" and what do you mean by "everything" ? ;))
* riel could find exceptions on either side of PAGE_OFFSET, depending on which meanings you want to use
<saxm> riel:  "paged" as in hardware paged by the cpu, "everything" meaning addressable memory
<riel> after bootup, all memory is accessed through the MMU
<ahu> riel, do you recall when current mainline 2.6.10 will decide not to cache a file?
<riel> so everything before and after PAGE_OFFSET is paged
<riel> not everything can be demand paged, though ...
<ahu> for example, when I do: open() seek() read() close()
<ahu> I seem to recall that sequential reads were special cased?
<saxm> riel:  but there's a difference between paging above and below PAGE_OFFSET?? Process pages below PAGE_OFFSET map to kernel pages above PAGE_OFFSET?
<riel> pages below PAGE_OFFSET belong to userspace
<riel> and can be demand paged
<riel> addresses above PAGE_OFFSET are kernel memory
<saxm> riel:  so there is no linear mapping between pages in virtual memory and consecutive area of physical memory?
<riel> there is a linear mapping for the first 900 MB of kernel memory
<riel> where physical address 0 - 896 MB is mapped into PAGE_OFFSET - PAGE_OFFSET+896MB
<Bertl> (depending on the split)
<saxm> riel:  ok, so there are 896*1024/4 physical frames addressable from PAGE_OFFSET->PAGE_OFFSET+896mb, and page directorys/tables map userspace page accesses to the appropriate page within this range?
<riel> saxm: no, userspace does not have access to the virtual memory beyond PAGE_OFFSET
<riel> saxm: userspace only gets access to virtual addresses below PAGE_OFFSET
<saxm> riel: just trying to understand how virtual pages relate to this mapped area of memory from PAGE_OFFSET to PAGE_OFFSET+896?
<riel> memory above PAGE_OFFSET is kernel virtual memory
<riel> part of it is a direct map of the first part of physical memory
<riel> but that same physical memory could also get virtual mappings from elsewhere, eg. userspace
<riel> or vmalloc
<riel> also, userspace and vmalloc can map physical memory from outside the 896MB of direct mapped memory (as well as inside it)
<saxm> riel:  ok, multiple mappings to physical pages, that clears things up for me!
<saxm> riel:  so how does it works for kernel memory? kernel memory allocations (for page tables etc...) must come out of that 896meg chunk too?
<riel> most kernel memory allocation needs to come from that 896 MB, indeed
<riel> though page tables are the big exception ;)
<saxm> riel:  which means they're resident in memory all the time - if that's where physical memory is mapped to?
<riel> kernel data structures are always resident
<saxm> riel:  so where do page tables reside? Surely not below PAGE_OFFSET? Somewhere above PAGE_OFFSET+896mb then?
<riel> they could reside anywhere
<saxm> anywhere from 0->4gb (on x86 with no pae)?
<maks> once it was recommended for lower latency by audio folks, it turns out that todays ext3 is for them the best bet too.
<maks> echan pardon
<riel> saxm: yeah
<riel> saxm: so it could be either inside the low 896MB, or in highmem (or some page tables in both - more likely)
<saxm> riel: and that 896meg chunk of physical memory addressed at PAGE_OFFSET, is also pagaeble right? So kernel allocations (not including page tables) just set some flag to disable paging on that page?
<riel> ummmmmmmmmm, they map physical memory
<riel> physical memory is, by definition, not pageable
<riel> the contents of those pages might be pageable though
<riel> so you could have a page P at physical address 400MB
<riel> a process (eg. mozilla) is using that page
<riel> at virtual address 120MB
<riel> somewhere in its heap
<riel> the contents of the physical page can be paged out, at which point mozilla's heap page at 120MB is paged out
<riel> but the kernel mapping (at PAGE_OFFSET + 400MB) still maps the same page P
<riel> just with different contents ;)
<saxm> riel: thanks for that very helpful example!

LinuxMM: VirtualMemory (last edited 2006-01-04 18:56:21 by 201-25-140-245)