3208
Comment:
|
14175
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
= Compressed Caching = = For 2.6.x kernels = === Nitin Gupta === |
= Compressed Caching for 2.6.x kernels = [:NitinGupta:Nitin Gupta] [[MailTo(nitingupta.mail AT gmail DOT com)]] |
Line 5: | Line 6: |
This page gives all the details collected together for compressed cache implementation. Many of the ideas have been taken from Rodrigo’s work and this paper: Adaptive Main Memory Compression by Irina Chihaia, Thomas Gross. | '''This project is now part of Google SoC...yay!!''' :)) |
Line 7: | Line 8: |
Here I will keep track of project progress. Please add any suggestions you have or edit anything you find incorrect. | (The org is Fedora. Here's the [http://fedoraproject.org/wiki/SummerOfCode/2006/NitinGupta SoC application]). |
Line 9: | Line 10: |
This page gives all the details collected together for compressed cache implementation. Many of the ideas are taken from previous works (see References) and many new ones have been introduced. Here I will keep track of project progress. Please add any suggestions you have or edit anything you find incorrect! |
|
Line 10: | Line 14: |
|| [http://linuxcompressed.sourceforge.net Project Home Page] || [mailto:linuxcompressed-devel@sourceforge.net Mailing List] ([http://sourceforge.net/mailarchive/forum.php?forum_id=2050 Archives]) || [:CompressedCaching/Code:Code] || [http://cvs.sourceforge.net/viewcvs.py/linuxcompressed CVS] || |
|
Line 19: | Line 26: |
* Swap-cache pages are written out to swap disksusing ‘swapper_space’ writepage() (swap_writepage()). * Dirty page-cache pages are flushed to filesystemdisks using filesystem specific writepage(). |
* Swap-cache pages are written out to swap disks using ‘swapper_space’ writepage() (swap_writepage()). * Dirty page-cache pages are flushed to filesystem disks using filesystem specific writepage(). |
Line 31: | Line 38: |
All swap cache pages are part of a singleswapper_space – a single radix tree maintains all pages in the swap cache.swp_entry_t is used as a key to locate the corresponding pages in memory. | All swap cache pages are part of a single swapper_space – a single radix tree maintains all pages in the swap cache.swp_entry_t is used as a key to locate the corresponding pages in memory. |
Line 41: | Line 48: |
attachment:ShrinkList.jpg == During Swap-In == If page is anonymous, '''do_swap_page()''' looks-up swap cache first (using swp_entry_t stored in pte as key), if not in swap cache the required page is read in from swap disk (with some readahead), added to swap cache and the page is returned (mapped to process’ VMA). For file-backed pages same logic applies ('''do_file_page()'''): If page is file backed, page cache is looked up first (using offset within file stored in pte as key), if not in page cache the required page is read in from filesystem disk (with some readahead), added to page cache and returned (mapped to process’ VMA). attachment:HandleFault.jpg pt_none(*pte)==true means page not present in memory, so call do_nopage() to get it? == Handling Page Cache Pages == === Taking page cache pages to compressed cache === When page is to be flushed to disk (dirty) or just freed (clean), determine if it is to be added to compressed cache (should_add_to_ccache()). If yes, then compress it (compress_page()) and add to compressed cache (add_to_ccache()). add_to_ccache() will not maintain separate data structures to maintain location of various pages in ccache. Instead it will modify the radix tree node to now point to a '''chunk_head''' which will contain all the information needed to locate the page in ccache. This chunk_head will have flag PG_compressed set to identify that it does not point to a real page but is a container for information required to locate it in ccache. So, when a page is looked-up up in page cache, it is determined if it is ccache just by checking PG_compressed flag. In that case the chunk_head contains information to locate the corres. page in ccache. The page will be taken from the ccache, decompressed to a page (newly allocated), and radix tree node will again be made to point to this newly decompressed page. === Swap-out for page cache pages === attachment:SwapOutCC.jpg === Swap In for page cache pages === attachment:SwapInCC.jpg This approach eliminates the need for double lookups: On single page cache lookup you will be able to tell if page is in ccache or not and if it is compressed, then where it is places in ccache structure. This is better than requiring to lookup ccache every time a page is not found in page-cache. == Handling Swap Cache pages == We can take completely analogous approach to handle swap cache pages i.e.: When a swap-cache page is to be moved to swap disk, mark it as PG_compressed, modify tree node (as is done for page-cache pages) to now point to a new struct chunk_head which contains all information required to locate page in ccache. When page is again looked-up in swap cache during page fault, simply note if PG_compressed is set and decompress, clear PG_compressed, set node to again point to this decompressed page and return this page. There are MAX_SWAPFILES (default 32) swap disks differentiated using ‘type’ field (5 bits) in swp_entry_t. Mark one (or some) of them as ‘virtual swap disks’ and assign it the highest priority of all swap disks. The swapper_space radix tree can then be used for ccache too and no separate data structure will be required to save location information of compressed page in ccache structure (i.e. no double lookups). ---- == Compression Structure == Many of ideas here are taken from: Adaptive Main Memory Compression by Irina Chihaia, Thomas Gross. (http://www.lst.inf.ethz.ch/research/publications/publications/USENIX_2005/USENIX_2005.pdf) The basic idea is to store compressed pages in variable sized memory blocks (called '''chunks'''). A compressed page can be stored in several of these chunks. Memory space for chunks is obtained by allocating 0-order pages at a time and managing this space using chunks. All the chunks are always linked as a doubly linked list called ''master'' ''chunk list''. Related chunks are also linked as singly-linked list using ''related-chunk list ''e.g. all free chunks are linked together, all chunks belonging to same compressed page are linked together. Thus all chunks are linked using master chunk list and related chunks are also linked using one of related-chunk list (e.g. free list, chunks belonging to same compressed page). attachment:CompressionStructure.jpg Same colored blocks belong to same compressed page and white is free space. Arrow indicates related chunks linked together as singly linked list. Long horizontal line across chunks shows all these chunks also linked together as doubly linked list irrespective of what other list they belong to. '''NOTE''': * A chunk cannot cross page boundaries (explained later) as is shown for ‘green’ compressed page. A chunk is split unconditionally at page boundaries. Thus maximum chunk size = PAGE_SIZE * This structure will reduce fragmentation to minimum as all the variable free space blocks are being tracked. * When compressed pages are taken out, corresponding chunks are freed and adjacent free '''chunks are merged together''' (while making sure chunks do not cross page boundaries). '''Free Lists''' will be maintained separately according to size of free chunks. e.g.: ‘Good’ free list: chunk size >= 1/4*PAGE_SIZE ‘Bad’ free list: chunk size >= 1/2*PAGE_SIZE ‘Ugly’ free list: chunk size < 1/2*PAGE_SIZE ‘Ugly’ free list is well…ugly! When no. of such small chunks increase, a single compressed page is scattered over many chunks and correspondingly requires more metadata overhead. Also, it will be more expensive to gather all these chunks together to form back original compressed page ready for decompression. Thus use of chunks from this list will be avoided in hope that they will soon be merged to form bigger chunks as compressed pages are freed and chunks are merged. '''LRU Lists''' will be maintained separately for clean page cache pages, dirty page cache pages and swap cache pages. This will allow pages of these individual types to be separately tracked and when compressed cache size reaches limit, specific types of pages and be freed in priority to other (like put more pressure to free clean page cache pages than dirty page cache pages and least on swap cache pages). Now, two main structures involved are '''chunk_head''' and dear '''chunk.''' {{{ struct chunk_head { unsigned long flags; // compression algo used, no. of chunks etc. atomic_t count; // usage count; free this struct when count is 0 struct chunk *chunk_list; // point to first chunk struct list head lru; // to add to one of LRU lists } }}} {{{ struct chunk { void *start_addr; // page addr + offset within page where chunk starts unsigned short size; // size: 12 LSB bits, flags: rest 4 bits struct chunk *next; // link to next ''related'' chunk struct list_head chunks; // ''master chunk list'': every chunk is linked } }}} == CCache Operations == ==== Page Insert ==== The uncompressed page is first compressed in a buffer page. Then a no. of free chunks are taken from free list (or a new page is allocated to get a new chunk) according to size of compressed page. These chunks are linked together as singly linked ''related-list ''(using chunk->next)''. ''The remaining space from last chunk is added back to free list and merged with adjacent free chunks, if present. The entry in page cache radix tree is now made to point to chunk_head allocated for this compressed page. This newly added compressed page is then added to tail of LRU list of its type: clean page-cache, dirty page-cache or swap cache page (using chunk_head->lru). ==== Page Delete ==== When lookup is performed on page-cache, a chunk_head is obtained (identified by PG_compressed set) which gives link to first chunk. Since, all related chunks are linked together (using chunk->next), compressed data is collected from all these chunks in a separate page and then decompression is done. The chunks freed are merged with adjacent free chunks, if present and added to free list (according to their size). ==== Resizing Compressed Area ==== '''''Expanding'''''': '''''Simply allocate a new page and add it to free list.''' ''' '''''Shrinking'': '''Free pages by taking LRU pages from one of the LRU lists maintained. Free chunks associated with these compressed pages. When merging these chunks with adjacent free chunks, if chunks spans entire page allocated for compressed area storage then free the page. Continue this till required no. of pages are not freed.''' ''' == Notes == * This chunked structure basically reduces fragmentation to a large extent since it allows a single compressed page to be stored scattered over many chunks and each of free chunks is kept track of. * Inserting a page is fast since chunks are simply taken from free list (and allocating more pages if required). This does not involve touching any other live data within ccache. * Expanding ccache is piece of cake! * Shrinking ccache is bit of a mess. == Problems == * Metadata size problem: For e.g., a compressed page taking 3 chunks requires sizeof(chunk_head) + 3*sizeof(chunk) amount of metadata (On a 32-bit system this is 72 bytes!!). This is a big problem :( Also each of free chunk requires sizeof(chunk) amount of metadata (18 bytes). For very small chunks this is problem. But free chunks are always merged with adjacent free chunks, so this should not be problem. * When extracting a compressed page, data is read from several chunks which can be on separate physical pages. This total destruction of locality of reference can seriously affect performance. == References == * Rodrigo S. de Castro: Adaptive Compressed Caching: Design and Implementation ([http://linuxcompressed.sourceforge.net/files/docs/paper.pdf pdf]) * Scott F. Kaplan: Compressed Caching and Modern Virtual Memory Simulation ([http://www.cs.amherst.edu/~sfkaplan/papers/sfkaplan-dissertation.ps.gz ps]) * Irina Chihaia and Thomas Gross: Adaptive Main Memory Compression ([http://www.lst.inf.ethz.ch/research/publications/publications/USENIX_2005/USENIX_2005.pdf pdf]) ---- If you want a thing done well, do it yourself. --Napoleon Going to work for a large company is like getting on a train - Are you going sixty miles an hour or is the train going sixty miles an hour and you're just sitting still? --Jean Paul |
Compressed Caching for 2.6.x kernels
[:NitinGupta:Nitin Gupta]
MailTo(nitingupta.mail AT gmail DOT com)
This project is now part of Google SoC...yay!!
(The org is Fedora. Here's the [http://fedoraproject.org/wiki/SummerOfCode/2006/NitinGupta SoC application]).
This page gives all the details collected together for compressed cache implementation. Many of the ideas are taken from previous works (see References) and many new ones have been introduced.
Here I will keep track of project progress. Please add any suggestions you have or edit anything you find incorrect!
[http://linuxcompressed.sourceforge.net Project Home Page] |
[mailto:linuxcompressed-devel@sourceforge.net Mailing List] ([http://sourceforge.net/mailarchive/forum.php?forum_id=2050 Archives]) |
[:CompressedCaching/Code:Code] |
Introduction
Compressed caching is the introduction of new layer in virtual memory hierarchy -- Compressed Cache. It compresses and stores pages that would otherwise have been swapped to slow disks or freed under memory pressure. This effectively increases RAM space and avoids /reduces accesses to slow disks. This basically takes advantage to rapidly increasing CPU power, faster, lower latency memories and sluggish hard-disk speed improvements.
Work Done
Currently the implementation simply replaces a page with a copy of itself (i.e. no actual compression yet) and replaces its reference in page cache with a ‘chunk_head’ (all this is detailed below). On page cache lookup, the original page is obtained by again copying and setting corresponding page cache entry back to this page instead of ‘chunk_head’. Also, one of compression algorithm (WKdm) has been ported to kernel space. The idea is to first have a framework ready for further implementation and gain more familiarity with VMM code.
General Background
The system maintains two LRU lists – active and inactive LRU lists. These lists may contain both page-cache (file backed) and swap-cache (anonymous) pages. When under memory pressure, pages in inactive list are freed as:
- Swap-cache pages are written out to swap disks using ‘swapper_space’ writepage() (swap_writepage()).
- Dirty page-cache pages are flushed to filesystem disks using filesystem specific writepage().
- Clean page-cache pages are simply freed.
For compressed cache to be effective, it needs to store both swap-cache and page-cache (clean and dirty) pages. So, a way is needed to transparently (i.e. changes should be required within VMM subsystem only) take these pages in/out of compressed cache.
About Page Cache
Please see: http://www.linuxsymposium.org/2005/linuxsymposium_procv2.pdf -- paper ‘Examining Linux 2.6 Page-Cache Performance’.
Each open file has a separate radix tree to maintain its pages in page cache. So, in effect, each open file has its own page cache. The offset within the file is used as key to locate the corresponding page in memory.
About Swap Cache
All swap cache pages are part of a single swapper_space – a single radix tree maintains all pages in the swap cache.swp_entry_t is used as a key to locate the corresponding pages in memory.
attachment:SwapCacheEntry.jpg
type identifies things we can swap to.
During Swap-Out
shrink_cache() prepares a list of pages to be freed (these pages are from inactive list) and hands over this list to shrink_list() which then tries to free pages in the list, a page at-a-time handling swap-cache pages and (clean/dirty) page-cache pages as above.
attachment:ShrinkCache.jpg
attachment:ShrinkList.jpg
During Swap-In
If page is anonymous, do_swap_page() looks-up swap cache first (using swp_entry_t stored in pte as key), if not in swap cache the required page is read in from swap disk (with some readahead), added to swap cache and the page is returned (mapped to process’ VMA).
For file-backed pages same logic applies (do_file_page()):
If page is file backed, page cache is looked up first (using offset within file stored in pte as key), if not in page cache the required page is read in from filesystem disk (with some readahead), added to page cache and returned (mapped to process’ VMA).
attachment:HandleFault.jpg
pt_none(*pte)==true means page not present in memory, so call do_nopage() to get it?
Handling Page Cache Pages
Taking page cache pages to compressed cache
When page is to be flushed to disk (dirty) or just freed (clean), determine if it is to be added to compressed cache (should_add_to_ccache()). If yes, then compress it (compress_page()) and add to compressed cache (add_to_ccache()).
add_to_ccache() will not maintain separate data structures to maintain location of various pages in ccache. Instead it will modify the radix tree node to now point to a chunk_head which will contain all the information needed to locate the page in ccache. This chunk_head will have flag PG_compressed set to identify that it does not point to a real page but is a container for information required to locate it in ccache.
So, when a page is looked-up up in page cache, it is determined if it is ccache just by checking PG_compressed flag. In that case the chunk_head contains information to locate the corres. page in ccache. The page will be taken from the ccache, decompressed to a page (newly allocated), and radix tree node will again be made to point to this newly decompressed page.
Swap-out for page cache pages
attachment:SwapOutCC.jpg
Swap In for page cache pages
attachment:SwapInCC.jpg
This approach eliminates the need for double lookups: On single page cache lookup you will be able to tell if page is in ccache or not and if it is compressed, then where it is places in ccache structure. This is better than requiring to lookup ccache every time a page is not found in page-cache.
Handling Swap Cache pages
We can take completely analogous approach to handle swap cache pages i.e.:
When a swap-cache page is to be moved to swap disk, mark it as PG_compressed, modify tree node (as is done for page-cache pages) to now point to a new struct chunk_head which contains all information required to locate page in ccache. When page is again looked-up in swap cache during page fault, simply note if PG_compressed is set and decompress, clear PG_compressed, set node to again point to this decompressed page and return this page.
There are MAX_SWAPFILES (default 32) swap disks differentiated using ‘type’ field (5 bits) in swp_entry_t. Mark one (or some) of them as ‘virtual swap disks’ and assign it the highest priority of all swap disks. The swapper_space radix tree can then be used for ccache too and no separate data structure will be required to save location information of compressed page in ccache structure (i.e. no double lookups).
Compression Structure
Many of ideas here are taken from: Adaptive Main Memory Compression by Irina Chihaia, Thomas Gross.
(http://www.lst.inf.ethz.ch/research/publications/publications/USENIX_2005/USENIX_2005.pdf)
The basic idea is to store compressed pages in variable sized memory blocks (called chunks). A compressed page can be stored in several of these chunks. Memory space for chunks is obtained by allocating 0-order pages at a time and managing this space using chunks. All the chunks are always linked as a doubly linked list called master chunk list. Related chunks are also linked as singly-linked list using related-chunk list e.g. all free chunks are linked together, all chunks belonging to same compressed page are linked together. Thus all chunks are linked using master chunk list and related chunks are also linked using one of related-chunk list (e.g. free list, chunks belonging to same compressed page).
attachment:CompressionStructure.jpg
Same colored blocks belong to same compressed page and white is free space. Arrow indicates related chunks linked together as singly linked list. Long horizontal line across chunks shows all these chunks also linked together as doubly linked list irrespective of what other list they belong to.
NOTE:
- A chunk cannot cross page boundaries (explained later) as is shown for ‘green’ compressed page. A chunk is split unconditionally at page boundaries. Thus maximum chunk size = PAGE_SIZE
- This structure will reduce fragmentation to minimum as all the variable free space blocks are being tracked.
When compressed pages are taken out, corresponding chunks are freed and adjacent free chunks are merged together (while making sure chunks do not cross page boundaries).
Free Lists will be maintained separately according to size of free chunks. e.g.:
‘Good’ free list: chunk size >= 1/4*PAGE_SIZE
‘Bad’ free list: chunk size >= 1/2*PAGE_SIZE
‘Ugly’ free list: chunk size < 1/2*PAGE_SIZE
‘Ugly’ free list is well…ugly! When no. of such small chunks increase, a single compressed page is scattered over many chunks and correspondingly requires more metadata overhead. Also, it will be more expensive to gather all these chunks together to form back original compressed page ready for decompression. Thus use of chunks from this list will be avoided in hope that they will soon be merged to form bigger chunks as compressed pages are freed and chunks are merged.
LRU Lists will be maintained separately for clean page cache pages, dirty page cache pages and swap cache pages. This will allow pages of these individual types to be separately tracked and when compressed cache size reaches limit, specific types of pages and be freed in priority to other (like put more pressure to free clean page cache pages than dirty page cache pages and least on swap cache pages).
Now, two main structures involved are chunk_head and dear chunk.
struct chunk_head { unsigned long flags; // compression algo used, no. of chunks etc. atomic_t count; // usage count; free this struct when count is 0 struct chunk *chunk_list; // point to first chunk struct list head lru; // to add to one of LRU lists }
struct chunk { void *start_addr; // page addr + offset within page where chunk starts unsigned short size; // size: 12 LSB bits, flags: rest 4 bits struct chunk *next; // link to next ''related'' chunk struct list_head chunks; // ''master chunk list'': every chunk is linked }
CCache Operations
Page Insert
The uncompressed page is first compressed in a buffer page. Then a no. of free chunks are taken from free list (or a new page is allocated to get a new chunk) according to size of compressed page. These chunks are linked together as singly linked related-list (using chunk->next). The remaining space from last chunk is added back to free list and merged with adjacent free chunks, if present. The entry in page cache radix tree is now made to point to chunk_head allocated for this compressed page. This newly added compressed page is then added to tail of LRU list of its type: clean page-cache, dirty page-cache or swap cache page (using chunk_head->lru).
Page Delete
When lookup is performed on page-cache, a chunk_head is obtained (identified by PG_compressed set) which gives link to first chunk. Since, all related chunks are linked together (using chunk->next), compressed data is collected from all these chunks in a separate page and then decompression is done. The chunks freed are merged with adjacent free chunks, if present and added to free list (according to their size).
Resizing Compressed Area
Expanding: Simply allocate a new page and add it to free list.
Shrinking: Free pages by taking LRU pages from one of the LRU lists maintained. Free chunks associated with these compressed pages. When merging these chunks with adjacent free chunks, if chunks spans entire page allocated for compressed area storage then free the page. Continue this till required no. of pages are not freed.
Notes
- This chunked structure basically reduces fragmentation to a large extent since it allows a single compressed page to be stored scattered over many chunks and each of free chunks is kept track of.
- Inserting a page is fast since chunks are simply taken from free list (and allocating more pages if required). This does not involve touching any other live data within ccache.
- Expanding ccache is piece of cake!
- Shrinking ccache is bit of a mess.
Problems
Metadata size problem: For e.g., a compressed page taking 3 chunks requires sizeof(chunk_head) + 3*sizeof(chunk) amount of metadata (On a 32-bit system this is 72 bytes!!). This is a big problem Also each of free chunk requires sizeof(chunk) amount of metadata (18 bytes). For very small chunks this is problem. But free chunks are always merged with adjacent free chunks, so this should not be problem.
- When extracting a compressed page, data is read from several chunks which can be on separate physical pages. This total destruction of locality of reference can seriously affect performance.
References
Rodrigo S. de Castro: Adaptive Compressed Caching: Design and Implementation ([http://linuxcompressed.sourceforge.net/files/docs/paper.pdf pdf])
Scott F. Kaplan: Compressed Caching and Modern Virtual Memory Simulation ([http://www.cs.amherst.edu/~sfkaplan/papers/sfkaplan-dissertation.ps.gz ps])
Irina Chihaia and Thomas Gross: Adaptive Main Memory Compression ([http://www.lst.inf.ethz.ch/research/publications/publications/USENIX_2005/USENIX_2005.pdf pdf])
If you want a thing done well, do it yourself. --Napoleon
Going to work for a large company is like getting on a train - Are you going sixty miles an hour or is the train going sixty miles an hour and you're just sitting still? --Jean Paul