| |

Subscribe / Log in / New account

Respite from the OOM killer

[Posted September 28, 2004 by corbet]

Thomas Habets had an unfortunate experience recently. His Linux system ran out of memory, and the dreaded "OOM killer" was loosed upon the system's unsuspecting processes. One of its victims turned out to be his screen locking program, leaving his session open to whoever might happen to walk by. His response was the oom_pardon patch, which allows the system administrator to exempt certain processes from the OOM killer's revenge. It turns out that SUSE has a similar patch which allows administrators to set the "OOM score" of specific processes, increasing or decreasing their chances of being chosen for an untimely demise.

The OOM killer exists because the Linux kernel, by default, can commit to supplying more memory than it can actually provide. Overcommitting memory in this way allows the kernel to make fuller use of the system's resources, because processes typically do not use all of the memory they claim. As an example, consider the fork() system call, which copies all of a process's memory for the new child process. In fact, all it does is to mark the memory as "copy on write" and allow parent and child to share it. Should either change a page shared in this way, a true copy is made. In theory, the kernel could be called upon to copy all of the copy-on-write memory in this way; in practice, that does not happen. If the kernel reserved all of the necessary virtual memory (which includes swap space), some of that space would certainly go unused. Rather than waste that space - and fail to run programs or memory allocations that, in practice, it could have handled - the kernel overcommits itself and hopes for the best.

When the best does not happen, the OOM killer comes into play; its job is to kill processes and free up some memory. Getting it to kill the right processes has been an ongoing challenge, however. One person's useless memory hog is another's crucial application. Thus, over the years, numerous efforts have been made to refine the OOM killer's heuristics, and patches like "oom_pardon" have been created.

Not everybody agrees that this is a fruitful use of developer time. Andries Brouwer came up with this analogy:

An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

Overcommitting memory and fearing the OOM killer are not necessary parts of the Linux experience, however. Simply setting the sysctl parameter vm/overcommit_memory to 2 turns off the overcommit behavior and keeps the OOM killer forever at bay. Most modern systems should have enough disk space to provide an ample swap file for most situations. Rather than trying to keep pet processes from being killed when overcommitted memory runs out, it might be easier just to avoid the situation altogether.

Index entries for this article
Kernel	Memory management/Out-of-memory handling
Kernel	OOM killer

Embrace the OOM killer!

Posted Sep 30, 2004 2:35 UTC (Thu) by dank (guest, #1865) [Link]

At my last job, it turned out after much experimentation that the best
way to handle OOM conditions in our embedded system was to panic and
halt the system, and alert the developers that their application software
had misbehaved.

'Course, you can't do that on a regular PC, where the user might
get a bit ticked off :-)

Respite from the OOM killer

Posted Sep 30, 2004 9:20 UTC (Thu) by rjw (guest, #10415) [Link] (3 responses)

Could you also post a link to the message in an archive rather than just a bare mail? I have always found this to be an annoying feature of LWN - sometimes it is very useful to see the mail in context, and read the thread.

Especially when it is a Hans Reiser classic, I like to see what caused it ;-)

Archive links

Posted Sep 30, 2004 13:51 UTC (Thu) by corbet (editor, #1) [Link] (2 responses)

That is a pretty common request. If I had an automated way of making archive links, I would be glad to. I really don't have the time to go digging through archive site pages trying to fish particular messages out of the discussions, though. Anything which slows down the writing of the kernel page can only result in less content there... I continue to ponder on how this could be done, however.

Archive links

Posted Sep 30, 2004 14:39 UTC (Thu) by southey (guest, #9466) [Link]

How about just adding a Google Groups search on the email subject and
perhaps author or identifier?

Usually it provides a link to the message in the thread and visiting that
message has a link to the full thread (30 messages).

Archive links

Posted Sep 30, 2004 16:21 UTC (Thu) by cgray4 (guest, #11599) [Link]

I suppose if you used gmane to read your mailing lists, then you could link to the article in the gmane web interface pretty easily.

Respite from the OOM killer

Posted Sep 30, 2004 11:18 UTC (Thu) by copsewood (subscriber, #199) [Link] (15 responses)

I think in most situations it would make sense for the kernel to extend swap space by creating swap files on any filesystem with free space to which it has write access. OK, there will be marginal cases where there is overcommitted memory and insufficient free disk space to prevent serious thrashing due to the reducing free disk space this approach causes. However, system admins who want a more reliably-performing system have long known that they need to provide adequate memory and disk resources in any case. Why not have swap space dynamically extensible in this manner ?

Respite from the OOM killer

Posted Sep 30, 2004 12:54 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

This means that any malicious user without sufficiently vicious ulimits can exhaust not just memory but disk space as well, even on disks he can't write to.

Is this entirely wise?

Respite from the OOM killer

Posted Sep 30, 2004 13:39 UTC (Thu) by fergal (guest, #602) [Link]

Right now, such a user would cause random processes to be killed, which is arguably worse. Well written programs can gracefully handle a lack of disk space, they cannot gracefully handle being killed by the OOM killer.

Respite from the OOM killer

Posted Sep 30, 2004 15:32 UTC (Thu) by hppnq (guest, #14462) [Link] (8 responses)

However, system admins who want a more reliably-performing system have long known that they need to provide adequate memory and disk resources in any case.

And that they should forbid overcommitting memory. ;-)

Respite from the OOM killer

Posted Oct 2, 2004 20:35 UTC (Sat) by giraffedata (guest, #1954) [Link] (7 responses)

However, system admins who want a more reliably-performing system have long known that they need to provide adequate memory and disk resources in any case.
And that they should forbid overcommitting memory. ;-)

Does forbidding overcommitting memory make a more reliably performing system? When you forbid overcommitting memory, all you do is make a different process fail at a different time. A process that's minding its own business, using a small amount of memory and doing something very important fails when its fork() gets "out of memory." And this happens even though there's only a 1% chance that letting the fork() go through would lead to trouble. And it happens to dozens of applications while one broken application sucks up all the virtual memory resources.

But in the overcommiting case, the program would work fine and 1% of the time some other process which is likely to be doing something unimportant and/or likely to be the cause of the memory shortage dies.

I think you could say the overcommitting system is performing more reliably.

Respite from the OOM killer

Posted Oct 4, 2004 17:22 UTC (Mon) by jzbiciak (guest, #5246) [Link] (2 responses)

...and both are broken.

But, just like my car, which currently idles rough, has an exhaust leak, and the "service engine" light's on, it still gets me to and from work.

The difference is in the failure mode. Do you degrade gracefully, or do you start to blow up at the first sign of error? If you're a user, you probably want graceful degradation--you can tolerate some excessive swapping to a point, and if it gets too bad, you reboot. At least OOo didn't implode taking your document with it. If you're a developer, you probably want to know ASAP something's wrong so you can fix it.

Thankfully, my car still runs (albeit not entirely happily), rather than flashing "service engine" and shutting down.

Respite from the OOM killer

Posted Oct 5, 2004 3:50 UTC (Tue) by mbp (subscriber, #2737) [Link]

OK, graceful degradation is nice. But it's hard to tell whether overcommit helps or hurts.

Even car designers have this problem: some modern cars will refuse to start if the engine is getting into a state where there is a chance of permanent damage. If it's approaching zero oil pressure, I think I would rather have an electronic cutout than an engine seizure.

Respite from the OOM killer

Posted Oct 6, 2004 15:51 UTC (Wed) by giraffedata (guest, #1954) [Link]

If you're a user, you probably want graceful degradation--you can tolerate some excessive swapping to a point, and if it gets too bad, you reboot. At least OOo didn't implode taking your document with it.

True, but that's not an option with either of the cases being discussed -- overcommit or no overcommit. This choice comes into play only when there's no place left to swap to.

The no-overcommit case can cause OOo to fail more gracefully. If OOo is written to tolerate a failed fork (et al) and give you the chance to kill some other process and then save your document, then no-overcommit could be a good thing for OOo.

On the other hand, if you don't have the technical skills to find the right process to kill, you're going to have to reboot anyway and lose your document. By contrast, with overcommit, you probably wouldn't have lost the document. Either there never would have been a crisis in the first place, or the OOM killer would have killed some other process and let OOo continue normally.

Respite from the OOM killer

Posted Oct 8, 2004 20:05 UTC (Fri) by tmcguire (guest, #25295) [Link] (3 responses)

You know, back in the old days when I was working with AIX (3.2.5, if that means anything to you), it had the policy of overcommitting memory and then randomly killing processes when it discovered that it was out.

Of course, the process that it seemed to kill first was always inetd, which made the system completely useless and didn't take up much resources anyway. So AIX had to go on and kill other stuff, too.

And naturally system calls like sbrk would never fail, so no application had any opportunity to gracefully handle any problems. But then, no developer ever had the interest or the incentive to actually handle errors, so the situation was nicely symmetrical.

One of the programs best known for allocating memory and then not using it (in large amounts) was the X server, so turning off overallocation wasn't really an option. Fixing *that* bug probably wasn't an option either.

It's always nice to see modern systems learning from their elders, so to speak. But if Linux is going to repeat previous mistakes, it really should go all the way. It's much more fun. Or has someone introduced SIGDANGER*, and I've just missed the memo?

* The SIGDANGER signal was sent to all processes just before the OOM killer started to work. Theoretically, a process handling SIGDANGER could reduce its memory allocation. If it had time. And, if the programmer wanted to. The inetd maintainers apparently didn't. Or, a process could have a handler for SIGDANGER and then just ignore it---the OOM killer would skip any process that handled SIGDANGER.

Respite from the OOM killer

Posted Apr 27, 2021 18:31 UTC (Tue) by hendrikboom3 (guest, #151927) [Link] (2 responses)

For some users, the SIGDANGER signals could be useful to launch a process informing them that things are filling up and asking the if there are any processes they would like to terminate.

If I were to get such a polite message, I'd open an xterm and kill firefox-esr (the most common culprit), which will usually restore its windows and tabs gracefully when I restart it after the crisis is over.

-- hendrik

Respite from the OOM killer

Posted Apr 27, 2021 23:47 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

Sometimes I forget to contain my builds (yay C++ templates) in a systemd unit with lower memory limits and I'd like the notice to go and kill it rather than it killing my entire tmux session (since it's in the same slice/process group).

Respite from the OOM killer

Posted Apr 28, 2021 21:24 UTC (Wed) by MrWim (subscriber, #47432) [Link]

I have a bash alias for make and ninja to run them via systemd-run. It also sends a desktop notification when they complete. It works well.

Respite from the OOM killer

Posted Sep 30, 2004 18:17 UTC (Thu) by mongre26 (guest, #4224) [Link] (3 responses)

Extending swap dynamically has performance issues I would rather avoid.

First you would have to move from a disk partition swap to a disk file swap. That hurts you because you are reading and writing not from a custom swap file system but a file simulating a swap file system placed on a regular file system. That means your file systems ability to handle swap like files will impact swap performance.

Also when a swap file changes size it has to re-organize itself somehwat. This an hurt performance while the swap is resizing. If the swap resizes too much you waste a lot of disk activity unecessarily.

Microsoft does dynamic swap allocation, and I think it is one of the weaknesses of the OS. For performance reasons I lock my swap file to a specific min and max so it does not do this on my gaming system.

In all cases I simply allocate sufficient swap to handle all my RAM and then some, often 4-8GB of swap. Since disks are massive these days committing 2-5% of the main disk for swap is not a problem.

Just allocate enough swap at install time to accomodate your expectations. If you really find you need more swap add another disk and make another swap partition. Linux can use multiple swap partitions. You may even get a performance boost by load balancing swap across spindles.

Respite from the OOM killer

Posted Oct 1, 2004 15:48 UTC (Fri) by Luyseyal (guest, #15693) [Link] (1 responses)

Actually, this is not true anymore. On recent kernels (2.4+) swap file performance is exactly the same as swap partition performance because the kernel bypasses the filesystem layer (I believe it relies on the magic of 'dd' creating files without holes). The only annoying thing with swap files is if you have a large one, it takes awhile to initialize on boot.

This is true. If you're going to allow overcommit, disallow OOM, and allow the kernel to create new swap files on the fly, it would probably be best done in its own subdirectory using a number of smaller swap files as adding a new swap is probably cheaper than resizing an existing one.

As an aside, I recently converted a swap partition to a swap file on a dedicated ext2 partition so I could use e2label to make swap still work in case of SCSI name reordering. Since performance is identical -- except the longer boot time -- it was worth it.

Cheers!
-l

Respite from the OOM killer

Posted Oct 7, 2004 23:29 UTC (Thu) by nobrowser (guest, #21196) [Link]

> it would probably be best done in its own subdirectory using
> a number of smaller swap files as adding a new swap is probably
> cheaper than resizing an existing one.

It already exists. Look for swapd in the ibiblio archive.
The response was completely underwhelming when it was young (long ago),
and I don't see why that should have changed.

Respite from the OOM killer

Posted Oct 2, 2004 20:21 UTC (Sat) by giraffedata (guest, #1954) [Link]

In all cases I simply allocate sufficient swap to handle all my RAM and then some,

If by RAM you mean physical memory, as most people do, swapping doesn't handle RAM -- it handles virtual memory. So the correct calculation isn't amount of RAM plus something, it's (roughly) amount of virtual memory minus amount of RAM. And there's no practical way to put a limit on the amount of virtual memory. The best you can do normally is watch your swap space and when you see it approaching full, add more.

Just allocate enough swap at install time to accomodate your expectations.

I agree that is the policy for which Linux is designed. The OOM killer is specifically to handle the pathological case that your expectations are exceeded. That's always a possibility. Consider program bugs.

The question of what to do when, despite the sysadmin's best intentions, the system runs out of virtual memory resources, is a difficult one; the various approaches (kill the system, kill some memory-using process, fail any attempt to get more virtual memory, make new swap space, etc.) all have their advantages and disadvantages.

A limit on overcommit?

Posted Sep 30, 2004 15:16 UTC (Thu) by aashenfe (guest, #12212) [Link] (1 responses)

Is there a limit on the overcommit? For instance the kernel will only commit a percentage of available memory (swap + ram). For instance 150% would allow allowcated memory to grow to 1 and a half time the total available memory?

If their is, is their a tunable parameter?

A limit on overcommit?

Posted Oct 1, 2004 0:44 UTC (Fri) by ilmari (guest, #14175) [Link]

It is tunable, almost exactly as you suggest. Quoting Documentation/vm/overcommit-accounting:

	[S]trict overcommit. The total address space commit
	for the system is not permitted to exceed swap + a
	 configurable percentage (default is 50) of physical RAM.
	Depending on the percentage you use, in most situations
	this means a process will not be killed while accessing
	pages but will receive errors on memory allocation as
	appropriate.

    The overcommit policy is set via the sysctl `vm.overcommit_memory'.

    The overcommit percentage is set via `vm.overcommit_ratio'.

Respite from the OOM killer

Posted Oct 5, 2004 4:12 UTC (Tue) by mbp (subscriber, #2737) [Link] (1 responses)

Turning off overcommit will cause the process that tried to allocate the last page to fail. This was considered for the OOM killer earlier on, and rejected as an insufficient solution: it may happen that the unlucky process is the most important one on the system.

If I remember correctly, it can be theoretically demonstrated that it is impossible to avoid exhaustion unless all resource requirements are known in advance. That is to say, the kernel always needs to handle the possibility of being unable to complete all the operations that are underway. There are various responses: panicing, killing various tasks, failing operations, etc, but there is no perfect solution.

If you want to avoid exhaustion, then turn off overcommit and design critical applications to avoid resource exhaustion: they must allocate or reserve everything up front, and handle failure gracefully. It may be hard to design general-purpose machines to do that, but it might be done in an embedded machine.

Respite from the OOM killer

Posted Oct 7, 2004 8:10 UTC (Thu) by oever (guest, #987) [Link]

This seems like a robust approach. Unfortunately, it requires all vital
programs to be adapted to allocate memory in advance. It would be nice to
be able to allocate an amount of memory for vital processes that cannot
be taken by other programs without needing to change the code.

Respite from the OOM killer

Posted Jan 30, 2014 6:15 UTC (Thu) by CameronNemo (guest, #94700) [Link] (3 responses)

You can specify in Upstart jobs to leave a service alone by adding the statement `oom score never`. There are more fine grained tuning values as well (-999 to 1000). Check it out here: http://upstart.ubuntu.com/cookbook/#oom-score

It would be cool to have that in systemd.

Respite from the OOM killer

Posted Jan 30, 2014 6:34 UTC (Thu) by zdzichu (subscriber, #17118) [Link] (2 responses)

http://www.freedesktop.org/software/systemd/man/systemd.e...

Respite from the OOM killer

Posted Jan 30, 2014 6:43 UTC (Thu) by CameronNemo (guest, #94700) [Link] (1 responses)

I was looking in systemd.service (specifically). My fault for skimm/pping the description.

Respite from the OOM killer

Posted Jan 30, 2014 8:11 UTC (Thu) by cortana (subscriber, #24596) [Link]

FYI, systemd.directives is a master index of everything you can do to a unit, so if you are unsure whether a feature exists, you can look there first..