An OOM (Out Of Memory) error is what happens when the kernel runs out of memory in its own internal pools and is unable to reclaim memory from any other sources. It basically starts killing random processes, and spits a lot of logging into dmesg.
How do I debug an OOM?
Read this page. Look at all the causes of OOM events, and try to figure out into which of the listed causes your OOM falls. Remember, very few OOM events are genuine kernel bugs. Virtually all of them are user applications which are behaving badly.
What leads up to an OOM?
Generally, the system is lazy about reclaiming memory, preferring that it lay about in caches until there is a genuine need. So it's not unusual to see memory usage grow and not shrink if there are no requests for memory. When a request comes in, the system may choose to release some memory that nobody is using to satisfy the request, or it may place data that is still in use out on swap space, and hand over the now available memory. If that data on swap space is ever needed again, it will displace some other piece of disused memory. An OOM actually occurs when this process of replacing things is thought to have stopped making progress.
If things get tight, whole processes are killed on the theory that that will free up gobs of memory. This is not a completely desirable solution, but it does (in theory) allow the system to keep running. In practice, however, people usually object to any of their processes being involuntarily terminated, and this is usually the point at which the problem comes to us.
What causes these OOM events?
The kernel is really out of memory. The workload used more memory than the system has RAM and swap space.
What are SwapFree and MemFree in /proc/meminfo? If both are very low (less than 1% of their total), then the workload may be at fault. (unless mlock() or HugeTLB are involved, see below...)
The kernel is out of low memory on 32-bit architectures.
What is LowFree in /proc/meminfo? If it is very low, but HighFree is much higher, then you have this condition. This workload may benefit from being run on a 64-bit platform or kernel.
There is a kernel data structure or memory leak.
What are SwapFree and MemFree in /proc/meminfo?
- What is the number of task_struct objects in slabinfo? Was the system forking so many processes that it ran out of memory?
- What objects in /proc/slabinfo take up the most space? If one kind of object is taking up a vast portion of the system's total memory, that object may be responsible. Check with the subsystem experts for the area from which that object comes. To see the object usage, run this on the command-line:
awk '{printf "%5d MB %s\n", $3*$4/(1024*1024), $1}' < /proc/slabinfo | sort -n
The kernel is not using its swap space properly.
If the application uses mlock() or HugeTLBfs pages, it may not be able to use its swap space for that application. If this happens, SwapFree may still have a very large value when the OOM occurs. These two features do not allow the system to swap the affected memory out, however, so overusing them may exhaust system memory and leave the system with no other recourse.
- It is also possible for the system to find itself in a sort of deadlock. Writing data out to disk may, itself, require allocating memory for various I/O data structures. If the system cannot find even that memory, the very functions used to create free memory will be hamstrung and the system will likely run out of memory. It is possible to do some minor tuning to start paging earlier, but if the system cannot write dirty pages out fast enough to free memory, one can only conclude that the workload is mis-sized for the installed memory and there is little to be done. Raising the value in /proc/sys/vm/min_free_kbytes will cause the system to start reclaiming memory at an earlier time than it would have before. This makes it harder to get into these kinds of deadlocks. If you get these deadlocks, this is a good value to tune. If you run into a case where tuning this value helps, please report it. We may need to make changes to the default values.
The kernel has made a bad decision and mis-read its statistics. It went OOM while it still had plenty of good RAM to use.
Something really pathological is happening The kernel actually decides to go OOM after it has spend a "significant" amount of time scanning memory for something to free. As of 2.6.19, this "significant amount" happens after the VM has scanned an amount equal to all of the (currently) active+inactive pages in a zone six times.
- If the kernel is rapidly scanning pages, but your I/O devices (swap, filesystem, or network fs) are too slow, the kernel may judge that there is no progress being made and trigger an OOM even if there is swap free.
Run this script during your test, and the OOM. Run the script, send the output to a VM expert. Have them parse it. Then come back and update this page.
How to submit a good OOM bug report
- Describe the workload.
- Describe the system's state before and after the OOM event. Did it hurt anything?
- Include as full of a dmesg as possible, but try to still concisely describe the problem:
- Example: post a URL or attach the full dmesg, but only include the first OOM in your email or bugzilla report
- Try to differentiate between true OOM-killer situations and "page allocation failure"
- "page allocation failure" messages, especially in interrupts or network drivers are OK, non-fatal, and somewhat expected
- True OOM-killer look more like this: "Out of Memory: Killed process 18254 (ntop).", and are potentially much more serious.
- Look for the "order:" of the allocation failure. The lower the order, the more serious the problem. Make sure to point the order out in your bug report.
- If possible, decode the flags: field. Look for GFP_ATOMIC being set, for instance.
- Keep an eye out for "Free swap:" and "Free pages:" in dmesg. If there _is_ free memory during an OOM, make sure to note that in your bug report.
- Does any zone report all_unreclaimable? How many pages_scanned does it have vs. the total amount of memory in the zone?