== Remappable memory == Drivers often implement mmap() to allow userspace to have direct access to memory that was allocated/reserved within kernel space. For example, you may wish to allow userspace to have direct access to a kernel-allocated buffer that is used for DMA with a PCI device. [[http://lwn.net/Kernel/LDD3/|LDD3]] chapter 15 provides a decent introduction to this topic. In summary, LDD3 explains that you can either remap kernel buffers into userspace by calling remap_pfn_range() from your driver's mmap handler, or you can set up a nopage VM handler to remap on a page-by-page basis. === Physical addresses vs struct page pointers === LDD3 does not explicitly discuss one important difference between remap_pfn_range() and nopage: remap_pfn_range operates on physical addresses, and nopage operates on page structure pointers. This is significant because not all kinds of memory can be represented by page structure pointers - you cannot use nopage in certain scenarios. There is an [[http://lwn.net/Articles/200213/|LWN article mentioning this limitation]]: Meanwhile, one of the longstanding limitations of nopage() is that it can only handle situations where the relevant physical memory has a corresponding struct page. Those structures exist for main memory, but they do not exist when the memory is, for example, on a peripheral device and mapped into a PCI I/O memory region. [...] In such cases, drivers must explicitly map the memory into user space with remap_pfn_range() instead of using nopage(). Another very common scenario where nopage cannot be used is when you are trying to remap a buffer that was allocated by kmalloc(). You may be tempted to call virt_to_page(addr) to get a struct page pointer for a kmalloced address, but [[http://marc.info/?l=linux-mm&m=121238525325385&w=2|this is a violation of the abstraction]]: kmalloc does not return pages, it returns another type of memory object. On the other hand, the remap_pfn_range() approach is legal because remap_pfn_range() does not touch the underlying struct pages - it works on another level. It is also worth mentioning that it is legal to remap buffers allocated by vmalloc() through the nopage handler, thanks to the vmalloc_to_page() function. === Introducing nopfn === The [[http://lwn.net/Articles/200213/|LWN article referenced above]] additionally discussed the proposal of a new VM operation named '''nopfn''''. nopfn basically solves the nopage problem discussed above: nopage does not allow you to remap addresses that do not have corresponding page structure pointers, but nopfn lets you remap based on physical address. To implement a nopfn handler: 1. Find the physical address of the page that you want to remap based on the VMA address. Convert it to a PFN by right-shifting PAGE_SHIFT times. 2. Call vm_insert_pfn() to modify the process address space. 3. Return NOPFN_REFAULT You must also set the VM_PFNMAP flag in vma->vm_flags from your mmap handler. nopfn was introduced in Linux 2.6.19. === Migrating nopage to fault === Linux 2.6.23 introduced an alternative for the nopage API, called '''fault'''. As usual, [[http://lwn.net/Articles/242625/|LWN has a good article]]. The nopage API was later removed when no users remained. The migration from nopage to fault is quite simple, and there are plenty of examples in the kernel history. === Migrating nopfn to fault === '''fault''' intended to replace nopfn too, but this did not happen until Linux 2.6.26. nopfn will be removed in a future release, in favour of doing nopfn remappings through the fault handler. Migrating is fairly easy, again set the VM_PFNMAP flag on the VMA and call vm_insert_pfn() from your fault handler. Return a NULL page where you might have otherwise returned a struct page pointer in the vm_fault structure, and return 0 from your fault handler. It may appear possible to implement a PFN-based remapper through fault with pre-2.6.26 kernels, but don't bother: you'll hit a kernel BUG() - the fault interface wasn't capable of doing PFN-based remappings in earlier releases. == mmap and real files == This is a cut'n'paste of an IRC conversation on the #kernelnewbies channel. One day this should be rewritten into a more easily readable article... {{{ <bronaugh> if you're using mmap on a file descriptor, how are the changes eventually written to disk? what gets called? <bronaugh> does the normal read/write function eventually get called? <riel> bronaugh: two times <riel> bronaugh: changes are written to disk either at/after msync(2) time, or after munmap(2) time <riel> bronaugh: or, if the system has a memory shortage, by the pageout code <bronaugh> alright. and that uses the normal read/write calls? <rene> but I believe he means if the actual sys_read() / sys_write() code ie getting called. to that, no, the actual "dirty" pages are written <riel> bronaugh: no <riel> bronaugh: data changed through mmap does not go through read/write syscalls <bronaugh> ok. here's why I'm asking. <bronaugh> I'm modifying framebuffer code for some nefarious purposes. I don't want a memory-backed framebuffer; I want all calls like that to go over the network. <bronaugh> now, framebuffers have an fb_read and an fb_write call associated with them. these end up being called in fbmem.c by the main handler for read and write, which is set up in the file_operation struct. <bronaugh> my question is -- will those routines be called? <bronaugh> (given that they will be called normally by a read/write system call) <-- SGH has quit (Quit: Client exiting) <bronaugh> sorry if I might be a bit confusing here.. just trying to get a handle on it myself <riel> if you set those routines as the mmap read and write functions, yes <bronaugh> ohh, special functions. ok. <bronaugh> I'll dig into that. <riel> you can set them at mmap(2) time <bronaugh> ok, so how does one do that? <bronaugh> (set the mmap read and write functions) <riel> lets take a look at drivers/video/skeletonfb.c <riel> static struct fb_ops xxxfb_ops = { <bronaugh> alright. <bronaugh> wish I'd looked at that. heh. <riel> you can see it set .fb_read and .fb_write and .fb_mmap functions ? <bronaugh> yup. <bronaugh> I've set those up in my driver. <bronaugh> they're stubbed but present. <riel> wait, I forgot something important that is device driver specific <riel> on a frame buffer, you want writes to show up on the screen immediately <riel> you don't want to wait on msync() for your changes to hit the screen <bronaugh> yeah. <bronaugh> but this is a network framebuffer, so batching up writes is a plus. <bronaugh> though you don't want to go -too- far with that. <bronaugh> we'll just say it's a normal framebuffer as a simplifying assumption. <bronaugh> normal but remote <bronaugh> (ie, not in the same memory space) <riel> one thing you could do every once in a while is initiate the msync from kernel space <riel> not the cheapest thing to do, but ... <bronaugh> it'd work in a pinch. <riel> easy to verify the functionality, transparently to userspace <bronaugh> ok so... back on topic. I don't see skeletonfb having an mmap func, just a stub. <bronaugh> sorry. not a stub, just a declaration with no implementation. <riel> indeed, the mmap function is in fbmem.c <bronaugh> the main one, yeah. but that dispatches to others if they are present. <bronaugh> I've looked at the main one, but I don't understand io_remap_pfn_range. <bronaugh> I've followed the code, I know that eventually it mucks with page table entries. <bronaugh> but beyond that it is opaque to me. <riel> bronaugh: basically it maps physical addresses to page table entries <riel> bronaugh: and may not be what you want when your frame buffer is backed by non-physically contiguous memory <bronaugh> yeah, I was wondering about that. <riel> I'm wondering if you might be better off hacking up ramfs and using a virtual file as your framebuffer <bronaugh> so is there an alternate type of memory mapping I can set up; one such as used with files? <bronaugh> because clearly that eventually has to call functions to do the IO; the problem is equivalent, a device with a different kind of address space. <bronaugh> hmm, filemap.c... <bronaugh> anyhow, how would one set up a mapping of that sort? <riel> make it a file inside the page cache <riel> then the VM can handle page faults for you <bronaugh> ok, that's definitely what I want. <bronaugh> but how do I go about doing that? is there somewhere I can read? <riel> try fs/ramfs/ <bronaugh> alright. <bronaugh> wow. short. <riel> ramfs was written as a demonstration of what the VFS can do (and what filesystems do not have to do themselves) <bronaugh> sounds like a worthy goal. <bronaugh> ok, hmm. generic_file_mmap <riel> you'll be able to chainsaw out lots of code from ramfs, since you won't need mounting, a directory, etc... <bronaugh> yeah. <bronaugh> it seems to me that I should be able to just plug in generic_file_mmap as my mmap handler. <bronaugh> but - I need to see the code first. }}} ---- [[CategoryLinuxMMInternals]]