Respite from the OOM killer
The OOM killer exists because the Linux kernel, by default, can commit to supplying more memory than it can actually provide. Overcommitting memory in this way allows the kernel to make fuller use of the system's resources, because processes typically do not use all of the memory they claim. As an example, consider the fork() system call, which copies all of a process's memory for the new child process. In fact, all it does is to mark the memory as "copy on write" and allow parent and child to share it. Should either change a page shared in this way, a true copy is made. In theory, the kernel could be called upon to copy all of the copy-on-write memory in this way; in practice, that does not happen. If the kernel reserved all of the necessary virtual memory (which includes swap space), some of that space would certainly go unused. Rather than waste that space - and fail to run programs or memory allocations that, in practice, it could have handled - the kernel overcommits itself and hopes for the best.
When the best does not happen, the OOM killer comes into play; its job is to kill processes and free up some memory. Getting it to kill the right processes has been an ongoing challenge, however. One person's useless memory hog is another's crucial application. Thus, over the years, numerous efforts have been made to refine the OOM killer's heuristics, and patches like "oom_pardon" have been created.
Not everybody agrees that this is a fruitful use of developer time. Andries Brouwer came up with this analogy:
Overcommitting memory and fearing the OOM killer are not necessary parts of
the Linux experience, however. Simply setting the sysctl parameter
vm/overcommit_memory to 2 turns off the overcommit
behavior and keeps the OOM killer forever at bay. Most modern systems
should have enough disk space to provide an ample swap file for most
situations. Rather than trying to keep pet processes from being killed
when overcommitted memory runs out, it might be easier just to avoid the
situation altogether.
Index entries for this article | |
---|---|
Kernel | Memory management/Out-of-memory handling |
Kernel | OOM killer |
Posted Sep 30, 2004 2:35 UTC (Thu)
by dank (guest, #1865)
[Link]
'Course, you can't do that on a regular PC, where the user might
Posted Sep 30, 2004 9:20 UTC (Thu)
by rjw (guest, #10415)
[Link] (3 responses)
Especially when it is a Hans Reiser classic, I like to see what caused it ;-)
Posted Sep 30, 2004 13:51 UTC (Thu)
by corbet (editor, #1)
[Link] (2 responses)
Posted Sep 30, 2004 14:39 UTC (Thu)
by southey (guest, #9466)
[Link]
Posted Sep 30, 2004 11:18 UTC (Thu)
by copsewood (subscriber, #199)
[Link] (15 responses)
Posted Sep 30, 2004 12:54 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
Is this entirely wise?
Posted Sep 30, 2004 13:39 UTC (Thu)
by fergal (guest, #602)
[Link]
Posted Sep 30, 2004 15:32 UTC (Thu)
by hppnq (guest, #14462)
[Link] (8 responses)
And that they should forbid overcommitting memory. ;-)
Posted Oct 2, 2004 20:35 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (7 responses)
Does forbidding overcommitting memory make a more reliably performing system? When you forbid overcommitting memory, all you do is make a different process fail at a different time. A process that's minding its own business, using a small amount of memory and doing something very important fails when its fork() gets "out of memory." And this happens even though there's only a 1% chance that letting the fork() go through would lead to trouble. And it happens to dozens of applications while one broken application sucks up all the virtual memory resources.
But in the overcommiting case, the program would work fine and 1% of the time some other process which is likely to be doing something unimportant and/or likely to be the cause of the memory shortage dies.
I think you could say the overcommitting system is performing more reliably.
Posted Oct 4, 2004 17:22 UTC (Mon)
by jzbiciak (guest, #5246)
[Link] (2 responses)
But, just like my car, which currently idles rough, has an exhaust leak, and the "service engine" light's on, it still gets me to and from work.
The difference is in the failure mode. Do you degrade gracefully, or do you start to blow up at the first sign of error? If you're a user, you probably want graceful degradation--you can tolerate some excessive swapping to a point, and if it gets too bad, you reboot. At least OOo didn't implode taking your document with it. If you're a developer, you probably want to know ASAP something's wrong so you can fix it.
Thankfully, my car still runs (albeit not entirely happily), rather than flashing "service engine" and shutting down.
Posted Oct 5, 2004 3:50 UTC (Tue)
by mbp (subscriber, #2737)
[Link]
Even car designers have this problem: some modern cars will refuse to start if the engine is getting into a state where there is a chance of permanent damage. If it's approaching zero oil pressure, I think I would rather have an electronic cutout than an engine seizure.
Posted Oct 6, 2004 15:51 UTC (Wed)
by giraffedata (guest, #1954)
[Link]
True, but that's not an option with either of the cases being discussed -- overcommit or no overcommit. This choice comes into play only when there's no place left to swap to.
The no-overcommit case can cause OOo to fail more gracefully. If OOo is written to tolerate a failed fork (et al) and give you the chance to kill some other process and then save your document, then no-overcommit could be a good thing for OOo.
On the other hand, if you don't have the technical skills to find the right process to kill, you're going to have to reboot anyway and lose your document. By contrast, with overcommit, you probably wouldn't have lost the document. Either there never would have been a crisis in the first place, or the OOM killer would have killed some other process and let OOo continue normally.
Posted Oct 8, 2004 20:05 UTC (Fri)
by tmcguire (guest, #25295)
[Link] (3 responses)
Of course, the process that it seemed to kill first was always inetd, which made the system completely useless and didn't take up much resources anyway. So AIX had to go on and kill other stuff, too.
And naturally system calls like sbrk would never fail, so no application had any opportunity to gracefully handle any problems. But then, no developer ever had the interest or the incentive to actually handle errors, so the situation was nicely symmetrical.
One of the programs best known for allocating memory and then not using it (in large amounts) was the X server, so turning off overallocation wasn't really an option. Fixing *that* bug probably wasn't an option either.
It's always nice to see modern systems learning from their elders, so to speak. But if Linux is going to repeat previous mistakes, it really should go all the way. It's much more fun. Or has someone introduced SIGDANGER*, and I've just missed the memo?
* The SIGDANGER signal was sent to all processes just before the OOM killer started to work. Theoretically, a process handling SIGDANGER could reduce its memory allocation. If it had time. And, if the programmer wanted to. The inetd maintainers apparently didn't. Or, a process could have a handler for SIGDANGER and then just ignore it---the OOM killer would skip any process that handled SIGDANGER.
Posted Apr 27, 2021 18:31 UTC (Tue)
by hendrikboom3 (guest, #151927)
[Link] (2 responses)
If I were to get such a polite message, I'd open an xterm and kill firefox-esr (the most common culprit), which will usually restore its windows and tabs gracefully when I restart it after the crisis is over.
-- hendrik
Posted Apr 27, 2021 23:47 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Apr 28, 2021 21:24 UTC (Wed)
by MrWim (subscriber, #47432)
[Link]
Posted Sep 30, 2004 18:17 UTC (Thu)
by mongre26 (guest, #4224)
[Link] (3 responses)
First you would have to move from a disk partition swap to a disk file swap. That hurts you because you are reading and writing not from a custom swap file system but a file simulating a swap file system placed on a regular file system. That means your file systems ability to handle swap like files will impact swap performance.
Also when a swap file changes size it has to re-organize itself somehwat. This an hurt performance while the swap is resizing. If the swap resizes too much you waste a lot of disk activity unecessarily.
Microsoft does dynamic swap allocation, and I think it is one of the weaknesses of the OS. For performance reasons I lock my swap file to a specific min and max so it does not do this on my gaming system.
In all cases I simply allocate sufficient swap to handle all my RAM and then some, often 4-8GB of swap. Since disks are massive these days committing 2-5% of the main disk for swap is not a problem.
Just allocate enough swap at install time to accomodate your expectations. If you really find you need more swap add another disk and make another swap partition. Linux can use multiple swap partitions. You may even get a performance boost by load balancing swap across spindles.
Posted Oct 1, 2004 15:48 UTC (Fri)
by Luyseyal (guest, #15693)
[Link] (1 responses)
Actually, this is not true anymore. On recent kernels (2.4+) swap file performance is exactly the same as swap partition performance because the kernel bypasses the filesystem layer (I believe it relies on the magic of 'dd' creating files without holes). The only annoying thing with swap files is if you have a large one, it takes awhile to initialize on boot.
Also when a swap file changes size it has to re-organize itself somehwat. This an hurt performance while the swap is resizing. If the swap resizes too much you waste a lot of disk activity unecessarily.
This is true. If you're going to allow overcommit, disallow OOM, and allow the kernel to create new swap files on the fly, it would probably be best done in its own subdirectory using a number of smaller swap files as adding a new swap is probably cheaper than resizing an existing one.
As an aside, I recently converted a swap partition to a swap file on a dedicated ext2 partition so I could use e2label to make swap still work in case of SCSI name reordering. Since performance is identical -- except the longer boot time -- it was worth it.
Cheers!
Posted Oct 7, 2004 23:29 UTC (Thu)
by nobrowser (guest, #21196)
[Link]
It already exists. Look for swapd in the ibiblio archive.
Posted Oct 2, 2004 20:21 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
In all cases I simply allocate sufficient swap to handle all my RAM and then some,
If by RAM you mean physical memory, as most people do, swapping doesn't handle RAM -- it handles virtual memory. So the correct calculation isn't amount of RAM plus something, it's (roughly) amount of virtual memory minus amount of RAM. And there's no practical way to put a limit on the amount of virtual memory. The best you can do normally is watch your swap space and when you see it approaching full, add more.
Just allocate enough swap at install time to accomodate your expectations.
I agree that is the policy for which Linux is designed. The OOM killer is specifically to handle the pathological case that your expectations are exceeded. That's always a possibility. Consider program bugs.
The question of what to do when, despite the sysadmin's best intentions, the system runs out of virtual memory resources, is a difficult one; the various approaches (kill the system, kill some memory-using process, fail any attempt to get more virtual memory, make new swap space, etc.) all have their advantages and disadvantages.
Posted Sep 30, 2004 15:16 UTC (Thu)
by aashenfe (guest, #12212)
[Link] (1 responses)
If their is, is their a tunable parameter?
Posted Oct 1, 2004 0:44 UTC (Fri)
by ilmari (guest, #14175)
[Link]
Posted Oct 5, 2004 4:12 UTC (Tue)
by mbp (subscriber, #2737)
[Link] (1 responses)
If I remember correctly, it can be theoretically demonstrated that it is impossible to avoid exhaustion unless all resource requirements are known in advance. That is to say, the kernel always needs to handle the possibility of being unable to complete all the operations that are underway. There are various responses: panicing, killing various tasks, failing operations, etc, but there is no perfect solution.
If you want to avoid exhaustion, then turn off overcommit and design critical applications to avoid resource exhaustion: they must allocate or reserve everything up front, and handle failure gracefully. It may be hard to design general-purpose machines to do that, but it might be done in an embedded machine.
Posted Oct 7, 2004 8:10 UTC (Thu)
by oever (guest, #987)
[Link]
Posted Jan 30, 2014 6:15 UTC (Thu)
by CameronNemo (guest, #94700)
[Link] (3 responses)
It would be cool to have that in systemd.
Posted Jan 30, 2014 6:34 UTC (Thu)
by zdzichu (subscriber, #17118)
[Link] (2 responses)
Posted Jan 30, 2014 6:43 UTC (Thu)
by CameronNemo (guest, #94700)
[Link] (1 responses)
Posted Jan 30, 2014 8:11 UTC (Thu)
by cortana (subscriber, #24596)
[Link]
At my last job, it turned out after much experimentation that the bestEmbrace the OOM killer!
way to handle OOM conditions in our embedded system was to panic and
halt the system, and alert the developers that their application software
had misbehaved.
get a bit ticked off :-)
Could you also post a link to the message in an archive rather than just a bare mail? I have always found this to be an annoying feature of LWN - sometimes it is very useful to see the mail in context, and read the thread.Respite from the OOM killer
That is a pretty common request. If I had an automated way of making archive links, I would be glad to. I really don't have the time to go digging through archive site pages trying to fish particular messages out of the discussions, though. Anything which slows down the writing of the kernel page can only result in less content there... I continue to ponder on how this could be done, however.
Archive links
How about just adding a Google Groups search on the email subject and Archive links
perhaps author or identifier?
Usually it provides a link to the message in the thread and visiting that
message has a link to the full thread (30 messages).
I think in most situations it would make sense for the kernel to extend swap space by creating swap files on any filesystem with free space to which it has write access. OK, there will be marginal cases where there is overcommitted memory and insufficient free disk space to prevent serious thrashing due to the reducing free disk space this approach causes. However, system admins who want a more reliably-performing system have long known that they need to provide adequate memory and disk resources in any case. Why not have swap space dynamically extensible in this manner ?Respite from the OOM killer
This means that any malicious user without sufficiently vicious ulimits can exhaust not just memory but disk space as well, even on disks he can't write to.Respite from the OOM killer
Right now, such a user would cause random processes to be killed, which is arguably worse. Well written programs can gracefully handle a lack of disk space, they cannot gracefully handle being killed by the OOM killer.Respite from the OOM killer
Respite from the OOM killer
However, system admins who want a more reliably-performing system have long known that they need to provide adequate memory and disk resources in any case.
Respite from the OOM killer
However, system admins who want a more reliably-performing system have long known that they need to provide adequate memory and disk resources in any case.
And that they should forbid overcommitting memory. ;-)
...and both are broken.Respite from the OOM killer
OK, graceful degradation is nice. But it's hard to tell whether overcommit helps or hurts.Respite from the OOM killer
Respite from the OOM killer
If you're a user, you probably want graceful degradation--you can tolerate some excessive swapping to a point, and if it gets too bad, you reboot. At least OOo didn't implode taking your document with it.
You know, back in the old days when I was working with AIX (3.2.5, if that means anything to you), it had the policy of overcommitting memory and then randomly killing processes when it discovered that it was out.Respite from the OOM killer
Respite from the OOM killer
Respite from the OOM killer
Respite from the OOM killer
Extending swap dynamically has performance issues I would rather avoid. Respite from the OOM killer
First you would have to move from a disk partition swap to a disk file swap. That hurts you because you are reading and writing not from a custom swap file system but a file simulating a swap file system placed on a regular file system. That means your file systems ability to handle swap like files will impact swap performance.
Respite from the OOM killer
-l
> it would probably be best done in its own subdirectory using Respite from the OOM killer
> a number of smaller swap files as adding a new swap is probably
> cheaper than resizing an existing one.
The response was completely underwhelming when it was young (long ago),
and I don't see why that should have changed.
Respite from the OOM killer
Is there a limit on the overcommit? For instance the kernel will only commit a percentage of available memory (swap + ram). For instance 150% would allow allowcated memory to grow to 1 and a half time the total available memory?A limit on overcommit?
It is tunable, almost exactly as you suggest. Quoting Documentation/vm/overcommit-accounting:
A limit on overcommit?
[S]trict overcommit. The total address space commit
for the system is not permitted to exceed swap + a
configurable percentage (default is 50) of physical RAM.
Depending on the percentage you use, in most situations
this means a process will not be killed while accessing
pages but will receive errors on memory allocation as
appropriate.
The overcommit policy is set via the sysctl `vm.overcommit_memory'.
The overcommit percentage is set via `vm.overcommit_ratio'.
Turning off overcommit will cause the process that tried to allocate the last page to fail. This was considered for the OOM killer earlier on, and rejected as an insufficient solution: it may happen that the unlucky process is the most important one on the system.Respite from the OOM killer
This seems like a robust approach. Unfortunately, it requires all vital Respite from the OOM killer
programs to be adapted to allocate memory in advance. It would be nice to
be able to allocate an amount of memory for vital processes that cannot
be taken by other programs without needing to change the code.
Respite from the OOM killer
Respite from the OOM killer
Respite from the OOM killer
Respite from the OOM killer