|Deletions are marked like this.||Additions are marked like this.|
|Line 62:||Line 62:|
|*Has two local variables: a swap_entry_t (which is a bitfield, so all "members" are extracted via macros) and an int for an error code||*Has two local variables: a swap_entry_t (which is a bitfield, so all "members" are extracted via macros) and an int to collect the returned error code from add_to_swap_cache().|
|Line 64:||Line 64:|
|*Allocates swap space for the page by calling get_swap_page(), which lives at swapfile.c line 640 (in a recent version of Linus' tree).||*Allocates swap space for the page by calling get_swap_page(), which lives at swapfile.c line 640 (in a recent version of Linus' tree). Stores the handle to it in the local swap_entry_t.|
|Line 66:||Line 66:|
| *writes the page to the swap area by calling add_to_swap_cache(). [?? or does something, and, by marking it as dirty, tells another kernel process to do the writing?]
*Marks the page as dirty and exits returning 1 if it was successfully added to swap; if, otoh, add_to_swap_cache returned an error code for an -ENOMEM allocation failure, cleans up by calling swapcache_free and exits returning 0.
| *Inserts the swap entry into the kernel's swap management system (the radix tree?) by calling add_to_swap_cache(), which lives at swap_state.c line 121.
*Marks the page as dirty, so that bdflush(?) will do the actual write to disk, and exits returning 1 if it was successfully added to swap. Otherwise, if add_to_swap_cache returned an error code for an -ENOMEM allocation failure, cleans up by calling swapcache_free and exits returning 0.
|Line 70:||Line 70:|
|==== My current questions & things I want to learn: ====
*what a gfp_mask is
*what the radix tree is, and how it works, assuming it's important (I suspect it may be very important)
*how swap space is organized, and what the swap entries look like on disk
*why a swap_entry_t (bitfield) has to fit in an unsigned long
*what functions call add_to_swap()
I am a kernel intern through the Gnome Outreach Program for Women. RikvanRiel is my mentor. My project is to rid the swapoff code of the quadratic complexity in try_to_unuse().
This page will eventually become a proper home page, but until I learn enough about wiki editing to write proper articles, I will be using this as a scratch area for my thoughts, questions, and article stubs and drafts.
My Current Working State
Rik has asked me, "can you find out, and describe to me, how the location of a certain place in swap is stored in the memory management data structures for a process? and what the two parts of the swap information describe?
- Action: study struct page and friends in include/linux/mm_types.h
- Question: what is (are) the top level struct(s) for memory management? (A: struct mm_struct) What functions and structs hold them?
- Question: How is a swap area location described?
Question: The first double word block of a struct page holds a union (implying only one member is used at a time) of a pointer to a struct address_space, and a pointer to void intended for a slab object. I know that most memory comes from the buddy allocator, but the kernel has the slab allocator for small needs of its own. So, is the struct address_space associated with the buddy allocator? Question: are the page allocator and the buddy allocator the same thing? i.e. "page" describes what it does, and "buddy" describes how it works?
- Action: study handle_mm_fault() found at: memory.c:3783:int handle_mm_fault(struct mm_struct *mm, ...
Question: What’s the relationship between a page and a vm_area? Specifically, what’s the correspondence between a page in memory and its representation in the swap area? A: it goes through the page table, d'oh!
- Question: What’s in asm/page.h? -A: is dummy representation for NOMMU situations, but may be useful in providing things to grep for
- Question: What’s a struct rb_root? (mm_struct member) rbtree.h, used for fast lookup of whatever type is assigned to it
- Action: study the places where init_mm is used
- Question: What’s a struct vm_operations_struct? Is it analogous to a struct address_space_operations? mm.h line 210
Question: Are the functions pointed to by the members of struct address_space_operations in fs.h (line 347) what I think they are--operations to transition a page between the states described in fig. 2 on this page http://www.redhat.com/magazine/001nov04/features/vm/ ?
- Action: figure out what the swap entry type is, find its definition, find what operations can be done on it by what functions, and trace its inclusion all the way to the top
Dec. 10: What I’ve learned:
There is often no way to know which task is using a certain page. It’s not important.
One of the main data structures that I need to understand, both in terms of how it works and how it is used, is struct mm_struct. There is a list of them (the struct holds a struct list_head), and they can be swapped. The short file init_mm.c instantiates the list handle, called init_mm. This structure seems to be the principal structure representing an access into the swap area.
A struct mm_struct holds a list of struct vm_area_structs. (note: although the struct vm_area_struct contains a struct list_head, it also contains pointers to prev and next, which are declared before the struct list_head and may be more important). These are chunks of some type of memory.
Then, there is a struct page, defined in the same file as the struct mm_struct. These are also kept in a list (i.e. a struct page holds a struct list_head) (I don’t yet know where the handle is). A struct page holds (either) a struct address_space (or a slab object); not exactly sure what this does yet, but defined in the same file (fs.h) there is a struct address_space_operations. All the members of this struct are function pointers that seem to govern transitions between the states listed in fig. 2 here: http://www.redhat.com/magazine/001nov04/features/vm/
Dec. 11: What I’ve learned:
- Page table types are architecture specific, and are defined in files such as pgtable.h, pgtable_types.h, and page.h under arch/*/include/asm.
- Confirmed that struct mm_struct is the top level memory management structure for a process, and struct vm_area_structs are the chunks of virtual memory that are available to that process. (I still haven't answered Rik's question because VM is not swap)
- Control groups (cgroups) are for resource management. cgroup related data members can be ignored for now.
A pte_t (page table entry type) apparently contains a reference to a struct page (still looking for a definition of pte_t to verify), or at least has one associated with it. vm_normal_page() from memory.c line 742 returns a page table entry's associated struct page. On Dec. 12 My mentor comments: a pte_t can have a struct page associated with it, but doesn't necessarily. If a pte doesn't indicate that it maps memory, which it does by having its present bit set, vm_normal_page() doesn't get called on it. Also, some types of memory, such as device memory, don't have a struct page associated with them, and the pte could be mapping that. I took another look at vm_normal_page() and it does indeed check for those situations.
Dec. 12: What I’ve learned:
- What the LRU (Least Recently Used) cache is, and how it works. Its purpose is to prioritize pages on the basis of readiness to be swapped out. It consists of four lists of struct page: active and inactive anonymous pages, active and inactive file pages. The struct page's are strung onto the list by their struct list_head lru member, and the list handles are kept in an array (there are arithmetic ready #defines to specify them) in a struct lruvec (defined in include/linux/mmzone.h).
- The main entry point into the swapout code is do_try_to_free pages, but it has evolved into a wrapper, the actions of which are not straightforward. So, I started tracing the swapout action at shrink_lruvec.
- shrink_lruvec identifies scan candidates (pages at the tail ends of the two inactive lru lists) and passes them to shrink_inactive_list by way of shrink_list (which makes sure it is indeed an inactive list). shrink_lruvec will also, if it doesn't find enough pages to reclaim right away, shuffle some of the pages from the active to the inactive lists. Pages are moved down the lists, and from the active to the inactive list, according to the number of times they've been scanned versus the number of times they've been accessed. (Question: is this correct? What does "rotated" mean?)
- shrink_inactive_list first calls isolate_lru_pages to move the reclaim candidates to a special list for scanning. Then it passes this special list to shrink_page_list, which does the actual reclamation.
Dec. 13: What I’ve learned (written up on Dec. 16): add_to_swap(), line by line
- lives at mm/swap_state.c line 163
- Receives two parameters: a struct page, which is the page to be swapped out; and a struct list_head, which is used only for the huge page case
- Has two local variables: a swap_entry_t (which is a bitfield, so all "members" are extracted via macros) and an int to collect the returned error code from add_to_swap_cache().
- Begins by asserting two conditions: that the page has been locked by the caller, and is up to date. It asserts with the BUG_ON macro, which invokes a kernel panic if the condition fails.
- Allocates swap space for the page by calling get_swap_page(), which lives at swapfile.c line 640 (in a recent version of Linus' tree). Stores the handle to it in the local swap_entry_t.
- handles the huge page case, which can be ignored for now.
- Inserts the swap entry into the kernel's swap management system (the radix tree?) by calling add_to_swap_cache(), which lives at swap_state.c line 121.
- Marks the page as dirty, so that bdflush(?) will do the actual write to disk, and exits returning 1 if it was successfully added to swap. Otherwise, if add_to_swap_cache returned an error code for an -ENOMEM allocation failure, cleans up by calling swapcache_free and exits returning 0.
- Also, I learned that the likely() and unlikely() macros are optimization instructions for the compiler. They affect the order of the generated assembly language instructions, but do not affect the flow of control. They can provide hints as to how a test will probably be resolved, but otherwise, just ignore them and look at what's inside.
My current questions & things I want to learn:
- what a gfp_mask is
- what the radix tree is, and how it works, assuming it's important (I suspect it may be very important)
- how swap space is organized, and what the swap entries look like on disk
- why a swap_entry_t (bitfield) has to fit in an unsigned long
- what functions call add_to_swap()