2203
Comment:
|
2991
|
Deletions are marked like this. | Additions are marked like this. |
Line 8: | Line 8: |
The ''nr_page'' data structure represents one non-resident page, by virtue of pointing to the page mapping (or mm) and the page offset. We steal some bits from both fields for the object generation and to indicate which pageout list the page was on before it got evicted. | The ''nr_page'' data structure represents one non-resident page, by virtue of pointing to the page mapping (or mm) and the page offset. The ''offset_and_gen'' field contains a cryptographic hash of ''offset'', ''mapping->host->i_ino'' and ''mapping->host->i_sb''. It would be more natural to have the cryptographic hash on the ''mapping'' field, but the consequences of a collision would be much more severe... |
Line 30: | Line 30: |
== Object generation number == | == Uniqueness of mapping == |
Line 32: | Line 32: |
Because this data structure does not allow efficient invalidation of entries when an address space is destroyed, the caller needs to pass in an object generation cookie. This does not need to be an actual version number stored in the data structure that ''*mapping'' points to, but could also be something generated on the fly - as long as it stays stable across the lifetime of the object and is not too likely to collide with the next object allocated at the same address. | Because the address_space structure underlying ''mapping'' can be freed by the kernel and then reallocated for another file, we need to take some measures to prevent the VM from thinking a brand-new page of a newly opened file was recently evicted. The obvious way to do this would be invalidating all entries of a particular ''mapping'' when the struct address_space is freed. However, that would mean increasing the size of the ''nr_page'' structure, which is something we don't want to do. |
Line 34: | Line 34: |
Eg. for a pagecache ''struct address_space'' we could use a hash of ''mapping->host->i_ino'' and ''mapping->host->i_sb''. After all, neither of these should change during normal system operation. | What we can do instead is look at other data than the ''mapping'' itself and use a cryptographic hash to fold them into one of the fields in ''nr_page''. Eg. for a pagecache ''struct address_space'' we could use a hash of ''mapping->host->i_ino'' and ''mapping->host->i_sb''. After all, neither of these should change during normal system operation. |
Line 36: | Line 36: |
The swap cache is a special case, since we ''can'' invalidate entries when a process exits, because we free up swap pages one by one. We can simply call the ''recently_evicted'' function from ''remove_exclusive_swap_page''. This also covers swapoff(8) and a subsequent swapon(8), since the non-resident entries will be invalidated at swapoff time. | The swap cache is a special case, since we can invalidate entries when a process exits, because we free up swap pages one by one. We can simply call the ''recently_evicted'' function from ''remove_exclusive_swap_page''. This also covers swapoff(8) and a subsequent swapon(8), since the non-resident entries will be invalidated at swapoff time. For now I am hashing the ''mapping->host->i_ino'' and ''mapping->host->i_sb'' into the ''offset_and_gen'' field. The birthday paradox limit for a 32 bit field should be 2^16 pages - or enough for 256MB. Not sure if this is the right thing to do, but the alternative would be hashing something into the ''mapping'' field itself, and collissions there could have worse consequences than a single page false positive... I'm not sure what to do best. Back to AdvancedPageReplacement |
The goals for this implementation of non-resident page bookkeeping:
- minimal space overhead
- SMP scalability
- reasonably fast
Data structures
The nr_page data structure represents one non-resident page, by virtue of pointing to the page mapping (or mm) and the page offset. The offset_and_gen field contains a cryptographic hash of offset, mapping->host->i_ino and mapping->host->i_sb. It would be more natural to have the cryptographic hash on the mapping field, but the consequences of a collision would be much more severe...
struct nr_page { void * mapping; unsigned long offset_and_gen; };
We fit multiple of these nr_page structs in one (cacheline sized?) hash bucket. This means we do not need a lookup list for these pages, we simply look through all the objects in the cacheline, doing quick pointer comparisons. Having one spin_lock per hash bucket, and having that spinlock in the same cacheline, should help SMP scalability.
/* Number of non-resident pages per hash bucket */ #define NUM_NR ((L1_CACHE_BYTES - sizeof(spinlock_t))/sizeof(struct nr_page)) struct nr_bucket { spinlock_t lock; struct nr_page pages[NUM_NR]; } __cacheline_aligned;
Uniqueness of mapping
Because the address_space structure underlying mapping can be freed by the kernel and then reallocated for another file, we need to take some measures to prevent the VM from thinking a brand-new page of a newly opened file was recently evicted. The obvious way to do this would be invalidating all entries of a particular mapping when the struct address_space is freed. However, that would mean increasing the size of the nr_page structure, which is something we don't want to do.
What we can do instead is look at other data than the mapping itself and use a cryptographic hash to fold them into one of the fields in nr_page. Eg. for a pagecache struct address_space we could use a hash of mapping->host->i_ino and mapping->host->i_sb. After all, neither of these should change during normal system operation.
The swap cache is a special case, since we can invalidate entries when a process exits, because we free up swap pages one by one. We can simply call the recently_evicted function from remove_exclusive_swap_page. This also covers swapoff(8) and a subsequent swapon(8), since the non-resident entries will be invalidated at swapoff time.
For now I am hashing the mapping->host->i_ino and mapping->host->i_sb into the offset_and_gen field. The birthday paradox limit for a 32 bit field should be 2^16 pages - or enough for 256MB. Not sure if this is the right thing to do, but the alternative would be hashing something into the mapping field itself, and collissions there could have worse consequences than a single page false positive... I'm not sure what to do best.
Back to AdvancedPageReplacement