6406
Comment:
|
6657
|
Deletions are marked like this. | Additions are marked like this. |
Line 5: | Line 5: |
[http://lwn.net/Kernel/LDD3/ LDD3] chapter 15 discusses this topic, but one important point was not made clear to me: you cannot remap memory allocated with kmalloc(). mmap is a page-oriented interface, but kmalloc does not provide pages - it provides another type of memory object. You may be tempted to use virt_to_page() to get a struct page pointer to a kmalloced region and then remap that in your pagefault handler, but this is a violation of the abstraction as [http://marc.info/?l=linux-mm&m=121238525325385&w=2 explained by Johannes Weiner]. At least with the SLAB allocator, virt_to_page() on a kmalloced region returns pages with the PG_slab flag set, which will cause "Bad page state" messages during the munmap path (see free_pages_check() in mm/page_alloc.c). | [http://lwn.net/Kernel/LDD3/ LDD3] chapter 15 discusses this topic, but one important point was not made clear to me: you cannot remap memory allocated with kmalloc() through the fault/nopage technique. mmap is a page-oriented interface, but kmalloc does not provide pages - it provides another type of memory object. You may be tempted to use virt_to_page() to get a struct page pointer to a kmalloced region and then remap that in your pagefault handler, but this is a violation of the abstraction as [http://marc.info/?l=linux-mm&m=121238525325385&w=2 explained by Johannes Weiner]. At least with the SLAB allocator, virt_to_page() on a kmalloced region returns pages with the PG_slab flag set, which will cause "Bad page state" messages during the munmap path (see free_pages_check() in mm/page_alloc.c). |
Line 7: | Line 7: |
Instead, use an interface such as alloc_pages() to obtain memory which is safe to remap through a page fault handler. | That aside, it is legal to remap kmalloced memory into userspace by using e.g. remap_pfn_range(). This technique does not touch the underlying page structures. |
Line 9: | Line 9: |
Another option is to use vmalloc() to obtain virtually-contiguous memory regions, which can safely be converted to page structure pointers using vmalloc_to_page(). | If you do need the flexibility offered by the nopage/fault VM handlers, use an interface such as alloc_pages() to obtain memory which is safe to remap through a PF handler. Another option is to use vmalloc() to obtain virtually contiguous memory regions, which can safely be converted to page structure pointers using vmalloc_to_page(). |
Remappable memory
Drivers often implement mmap() to allow userspace to have direct access to memory that was allocated/reserved inside the kernel. For example, you may wish to allow userspace to have direct access to a kernel-allocated buffer that is used for DMA with a PCI device.
[http://lwn.net/Kernel/LDD3/ LDD3] chapter 15 discusses this topic, but one important point was not made clear to me: you cannot remap memory allocated with kmalloc() through the fault/nopage technique. mmap is a page-oriented interface, but kmalloc does not provide pages - it provides another type of memory object. You may be tempted to use virt_to_page() to get a struct page pointer to a kmalloced region and then remap that in your pagefault handler, but this is a violation of the abstraction as [http://marc.info/?l=linux-mm&m=121238525325385&w=2 explained by Johannes Weiner]. At least with the SLAB allocator, virt_to_page() on a kmalloced region returns pages with the PG_slab flag set, which will cause "Bad page state" messages during the munmap path (see free_pages_check() in mm/page_alloc.c).
That aside, it is legal to remap kmalloced memory into userspace by using e.g. remap_pfn_range(). This technique does not touch the underlying page structures.
If you do need the flexibility offered by the nopage/fault VM handlers, use an interface such as alloc_pages() to obtain memory which is safe to remap through a PF handler. Another option is to use vmalloc() to obtain virtually contiguous memory regions, which can safely be converted to page structure pointers using vmalloc_to_page().
mmap and real files
This is a cut'n'paste of an IRC conversation on the #kernelnewbies channel. One day this should be rewritten into a more easily readable article...
<bronaugh> if you're using mmap on a file descriptor, how are the changes eventually written to disk? what gets called? <bronaugh> does the normal read/write function eventually get called? <riel> bronaugh: two times <riel> bronaugh: changes are written to disk either at/after msync(2) time, or after munmap(2) time <riel> bronaugh: or, if the system has a memory shortage, by the pageout code <bronaugh> alright. and that uses the normal read/write calls? <rene> but I believe he means if the actual sys_read() / sys_write() code ie getting called. to that, no, the actual "dirty" pages are written <riel> bronaugh: no <riel> bronaugh: data changed through mmap does not go through read/write syscalls <bronaugh> ok. here's why I'm asking. <bronaugh> I'm modifying framebuffer code for some nefarious purposes. I don't want a memory-backed framebuffer; I want all calls like that to go over the network. <bronaugh> now, framebuffers have an fb_read and an fb_write call associated with them. these end up being called in fbmem.c by the main handler for read and write, which is set up in the file_operation struct. <bronaugh> my question is -- will those routines be called? <bronaugh> (given that they will be called normally by a read/write system call) <-- SGH has quit (Quit: Client exiting) <bronaugh> sorry if I might be a bit confusing here.. just trying to get a handle on it myself <riel> if you set those routines as the mmap read and write functions, yes <bronaugh> ohh, special functions. ok. <bronaugh> I'll dig into that. <riel> you can set them at mmap(2) time <bronaugh> ok, so how does one do that? <bronaugh> (set the mmap read and write functions) <riel> lets take a look at drivers/video/skeletonfb.c <riel> static struct fb_ops xxxfb_ops = { <bronaugh> alright. <bronaugh> wish I'd looked at that. heh. <riel> you can see it set .fb_read and .fb_write and .fb_mmap functions ? <bronaugh> yup. <bronaugh> I've set those up in my driver. <bronaugh> they're stubbed but present. <riel> wait, I forgot something important that is device driver specific <riel> on a frame buffer, you want writes to show up on the screen immediately <riel> you don't want to wait on msync() for your changes to hit the screen <bronaugh> yeah. <bronaugh> but this is a network framebuffer, so batching up writes is a plus. <bronaugh> though you don't want to go -too- far with that. <bronaugh> we'll just say it's a normal framebuffer as a simplifying assumption. <bronaugh> normal but remote <bronaugh> (ie, not in the same memory space) <riel> one thing you could do every once in a while is initiate the msync from kernel space <riel> not the cheapest thing to do, but ... <bronaugh> it'd work in a pinch. <riel> easy to verify the functionality, transparently to userspace <bronaugh> ok so... back on topic. I don't see skeletonfb having an mmap func, just a stub. <bronaugh> sorry. not a stub, just a declaration with no implementation. <riel> indeed, the mmap function is in fbmem.c <bronaugh> the main one, yeah. but that dispatches to others if they are present. <bronaugh> I've looked at the main one, but I don't understand io_remap_pfn_range. <bronaugh> I've followed the code, I know that eventually it mucks with page table entries. <bronaugh> but beyond that it is opaque to me. <riel> bronaugh: basically it maps physical addresses to page table entries <riel> bronaugh: and may not be what you want when your frame buffer is backed by non-physically contiguous memory <bronaugh> yeah, I was wondering about that. <riel> I'm wondering if you might be better off hacking up ramfs and using a virtual file as your framebuffer <bronaugh> so is there an alternate type of memory mapping I can set up; one such as used with files? <bronaugh> because clearly that eventually has to call functions to do the IO; the problem is equivalent, a device with a different kind of address space. <bronaugh> hmm, filemap.c... <bronaugh> anyhow, how would one set up a mapping of that sort? <riel> make it a file inside the page cache <riel> then the VM can handle page faults for you <bronaugh> ok, that's definitely what I want. <bronaugh> but how do I go about doing that? is there somewhere I can read? <riel> try fs/ramfs/ <bronaugh> alright. <bronaugh> wow. short. <riel> ramfs was written as a demonstration of what the VFS can do (and what filesystems do not have to do themselves) <bronaugh> sounds like a worthy goal. <bronaugh> ok, hmm. generic_file_mmap <riel> you'll be able to chainsaw out lots of code from ramfs, since you won't need mounting, a directory, etc... <bronaugh> yeah. <bronaugh> it seems to me that I should be able to just plug in generic_file_mmap as my mmap handler. <bronaugh> but - I need to see the code first.
["CategoryLinuxMMInternals"]