The goals for this implementation of non-resident page bookkeeping: * minimal space overhead * SMP scalability * reasonably fast == Interface == {{{ extern int recently_evicted(struct address_space * mapping, unsigned long offset); extern int remember_page(struct address_space * mapping, unsigned long offset); }}} The ''recently_evicted'' function is queried by the pagein or page cache allocation code, to determine whether the data at the offset ''offset'' from the page cache or process object ''mapping'' was recently evicted. The function returns 0 if the page was not found, 1 if the page was found. The ''remember_page'' function is called by the pageout code, telling the non-resident page management code to remember that a page at offset ''offset'' from ''mapping'' was just paged out. == Data structures == When looking up a page, we use ''page->offset'', ''page->mapping'' to determine which ''nr_bucket'' hash bucket to put the page in, and hash ''page->offset'', ''page->mapping'' and ''page->mapping->host->i_ino'' to get to the value stored in the ''nr_bucket.page'' array. We recycle ''nr_bucket.page'' entries in a circular fashion, with ''hand'' pointing to the current entry that's being updated. This means we do not need a lookup list for these pages, we simply look through all the objects in the cacheline, doing simple comparisons. Hopefully the cost of comparing 31 entries is outweighed by only having 1 cache miss per lookup. {{{ /* Number of non-resident pages per hash bucket */ #define NUM_NR ((L1_CACHE_BYTES - sizeof(atomic_t))/sizeof(unsigned long)) struct nr_bucket { atomic_t hand; unsigned long page[NUM_NR]; } ____cacheline_aligned; }}} == Uniqueness of mapping == Because the address_space structure underlying ''mapping'' can be freed by the kernel and then reallocated for another file, we need to take some measures to prevent the VM from thinking a brand-new page of a newly opened file was recently evicted. The obvious way to do this would be invalidating all entries of a particular ''mapping'' when the struct address_space is freed. However, we would need a much larger data structure if we wanted to support efficient invalidation. The alternative is to simply hash the inode number into the non-resident page value, and hope that the next time a particular ''struct address_space'' is allocated to a different file, that file will have a different inode number. The swap cache is a special case, since we can invalidate entries when a process exits, because we free up swap pages one by one. We can simply call the ''recently_evicted'' function from ''remove_exclusive_swap_page''. This also covers swapoff(8) and a subsequent swapon(8), since the non-resident entries will be invalidated at swapoff time. For other pages in the AdvancedPageReplacement category, please see CategoryAdvancedPageReplacement