== Under Construction == Be advised that this page was written by an intern, and has not been checked for accuracy. This page describes the call chain that the kernel follows when swapping pages out to free up memory pages. Here is the call chain we will discuss in this article. shrink_lruvec() shrink_list() shrink_inactive_list() isolate_lru_pages() shrink_page_list() add_to_swap() get_swap_page() scan_swap_map() add_to_swap_cache() __add_to_swap_cache ---- The main entry point into the swapoff call chain is vmscan.c:2378:static unsigned long do_try_to_free_pages(). However, it has evolved into a wrapper, with actions that are not straightforward. (I’ve outlined one possible call chain below; the complete picture involves zone handling, cgroups, and other such things, which are outside the scope of this article.) So instead, for this discussion we’ll enter the call chain at: vmscan.c:1995:static void shrink_lruvec() Before we dive into shrink_lruvec(), it’s worth discussing what the LRU (Least Recently Used) cache is, and how it works. Its purpose is to prioritize pages on the basis of readiness to be swapped out. It consists of four doubly linked lists of struct page: active and inactive anonymous pages, and active and inactive file pages. The struct page's are strung onto a list through their struct list_head lru member, and the list handles are kept in an array (there are macros to specify which one) in a struct lruvec (defined in include/linux/mmzone.h). This struct lruvec is the important paramater to shrink_lruvec(). The job of shrink_lruvec() is to identify swap candidates, considering pages at the tail ends of the two inactive LRU lists. (As we continue down the call chain, after the pages are swapped out, they will be removed from the LRU lists--this is how shrink_lruvec will “shrink the LRU vector.”) If it doesn't find enough possible candidates right away, it will shuffle some of the pages from the active to the inactive lists. Pages are moved down the lists, and from the active to the inactive list, according to the number of times they've been scanned versus the number of times they've been rotated (i.e. found in the course of a scan to have been accessed, and thus moved to the head of their respective active LRU list). Once the lists have been sufficiently shuffled, shrink_lruvec() will pass each one of the lists to be shortened down the chain in turn (the actual parameters will be the entire struct lruvec, the macro to index the intended list within it, and the number of pages at its tail to consider). It passes it via vmscan.c:1789 static unsigned long shrink_list() which double checks that the inactive list is indeed inactive and has enough pages on it, to vmscan.c:1414:shrink_inactive_list() which will verify and cull pages that can be evicted to swap. It creates an accessory page list (a doubly linked list like the LRU lists, with struct list_head l_hold), which it populates with eviction candidates by passing it to vmscan.c:1227: isolate_lru_pages() (again, the actual parameters will be the lruvec, and the macro to access the intended list). The pages are removed from the LRU list and added to the accessory list. After the accessory list is populated, shrink_inactive_list() passes it to vmscan.c:758:static unsigned long shrink_page_list() shrink_page_list() performs many types of checking and management on the eviction candidates it receives; for instance, it handles the situations in which a page is under writeback, dirty, or congested. Basically, it does a final sorting of the pages that the kernel really wants to keep from the ones it can let go of. The most important thing it does, though, is to send the outgoing pages to the actual swapping process. For each page on the accessory list, it makes a final check of its references to be sure the page isn’t being actively used. Then it removes the page from the accessory list and sends it along to: swap_state.c 163 add_to_swap() add_to_swap() might be worth looking at in detail, so here’s a step by step breakdown: @@@TODO add link @ It’s also a good idea to examine how a location in swap is described. @@@TODO add link @ To add the page to swap, add_to_swap() must first find a location, described by a swap_entry_t, for the page; then it must add it to the swap cache. It does the first by calling swapfile.c line 640 swp_entry_t get_swap_page(void) which first decides on an appropriate swap area based on a priority rating maintained as a static variable within swapfile.c. It then allocates space within the swap area, and obtains an by calling swapfile.c line 469 static unsigned long scan_swap_map() After get_swap_page() returns, add_to_swap() swap_state.c 121:int add_to_swap_cache(), passing it the page to be evicted, and the location returned from get_swap_page(). add_to_swap_cache() does all its important work through its helper, 81 int __add_to_swap_cache() It marks the page as being part of the swap cache. The I/O operation needed to write it into its allocated location in the swap area will be carried out after add_to_swap_cache() returns. add_to_swap() marks the page as dirty, so that bdflush(?) will do the actual write to disk.