in DevOps

Rebooting Linux temporarily loses (some) limits.conf settings

Like any wildly managed environment, you probably have to create custom-defined settings in your /etc/security/limits.conf because of application specific requirements. Maybe you have to allow for more open files. Maybe you have to reduce the memory allowed to a process. Or maybe you just like being ultra-hardcore in defining exactly what a user can do.

As an example, our environment requires that we up the number of open files. A lot. We tend to have a lot of open stuff in the file system. Ordinarily, this isn’t an issue. Except yesterday when we ran into a weird case after rebooting a server.

But first, let’s back track a bit.

The limits.conf is part of the PAM chain. Specifically, it’s the configuration file for pam_limits.so. In order to make use of this file, your process has to have been run through or inherited an environment that ran through PAM at some point in it’s history. For example, if you login with ssh, you run through PAM. If you use sudo, you use PAM. If you supply your username and password to an X Windows login screen, you probably use PAM.

The best way to tell if you’re able to use limits.conf is to look in the PAM configs and see what commands invoke it. The configs for CentOS exist in /etc/pam.d. Almost everything includes /etc/pam.d/system-auth where pam_limits.so is set as a required module for the session phase.

So, now that we have a bit of background on what this is and where it’s from, let’s continue.

We use limits.conf pretty extensively, especially for our MySQL servers. We rebooted one of our servers yesterday and discovered that clients were having issues afterwards and the MySQL instance was complaining about not being able to open some tables. This was odd. We set our open file limit pretty high. High enough to know that if we were hitting it, we had a pretty crazy problem. We confirmed that our limits.conf was correct, so we started poking the process itself to determine what was going on.

We wanted to see if mysqld was not observing the correct limits setting. But, how do you determine that on a running process?

Every process shows it’s current set of limits in /proc/$PID/limits. In our case, we found a surprisingly low setting.

$ sudo cat /proc/$(sudo cat /var/lib/mysql/mysql.pid)/limits | grep open
Max open files            1185                 1185                 files

So, assumption confirmed. The running limit was definitely too low by an order of magnitude.

Now, we had just rebooted the machine, so we weren’t sure what was going on. We decided to restart the MySQL instance to see what happened. To our surprise, the open file settings went back to normal.

$ sudo cat /proc/$(sudo cat /var/lib/mysql/mysql.pid)/limits | grep open
Max open files            96000                96000                files

What the heck?

I had a suspicion. We knew several things.

  • The system was rebooted.
  • The system started mysql on boot.
  • Restarting mysql fixed the problem.
  • A process must go through PAM in order to use limits.conf.
  • init has no direct hook into PAM.

At boot time, init is invoking daemons and processes in order to get the system to a running state. We looked at other daemons to determine if they had similar issues. Some did, some didn’t.

I ended up posing this question on the LOPSA irc channel.

pop quiz. /etc/security/limits.conf settings only get honored if you have something that goes through a pam context that invokes pam_limits.so … but at boot time, init doesn’t do this, so none of the correct settings get configured for limits. What’s the work around for this?

And got several responses, including this one:

geekosaur: this is why many startup scripts use su

And this was the clue we needed. It helped describe why this was only affecting some daemons, including MySQL. Here’s why.

At boot time, init has a default limit set for root. When init starts running the scripts in init.d/rcX.d, these scripts inherit that limit. If a script is starting a daemon AND that daemon needs some custom limit set, very often that script will be designed to su to the user that needs to run the limit. Since su is a pam-enabled thing, pam_limits.so gets invoked and reads the new default limits.

In the case of mysqld, init runs mysqld_safe --user=mysql, which then invokes /usr/sbin/mysqld --user=mysql. I suspect mysqld is then just doing a setuid()/seteuid() to go from root to mysql. This bypasses the entire PAM chain.

The workaround would be to have the init script either su to mysql before invoking mysqld_safe (possibly non-trivial change) or just modify the startup script to set the ulimits appropriately. We do this for supervisord, as an example.

It’s important to know and understand how the different pieces of your architecture work together. In this case, we thought we understood how and why things worked. What we hadn’t taken into account is how these things interact directly after a system reboot. Our MySQL servers end up running longer than other systems in the environment and they use a different mechanism to start up their MySQL processes compared to other daemons, so it wasn’t something that we would have quickly encountered

Using cobbler with a fast file system creation snippet for Kickstart %post install of Hadoop nodes

I run Hadoop servers with 12 2TB hard drives in them. One of the bottlenecks with this occurs during kickstart when we’re using anaconda to create the filesystems. Previously, I just had a specific partition configuration that was brought in during %pre, but this caused the filesystem formatting section of kickstart to take several hours […]