lxc exec vs ssh

alt text

Recently, I’ve implemented several improvements for lxc exec. In case you didn’t know, lxc exec is LXD’s client tool that uses the LXD client api to talk to the LXD daemon and execute any program the user might want. Here is a small example of what you can do with it:

One of our main goals is to make lxc exec feel as similar to ssh as possible since this is the standard of running commands interactively or non-interactively remotely. Making lxc exec behave nicely was tricky.

1. Handling background tasks

A long-standing problem was certainly how to correctly handle background tasks. Here’s an asciinema illustration of the problem with a pre LXD 2.7 instance:

What you can see there is that putting a task in the background will lead to lxc exec not being able to exit. A lot of command sequences can trigger this problem:

chb@conventiont|~
> lxc exec zest1 bash
root@zest1:~# yes &
y
y
y
.
.
.

Nothing would save you now. yes will simply write to stdout till the end of time as quickly as it can… The root of the problem lies with stdout being kept open which is necessary to ensure that any data written by the process the user has started is actually read and sent back over the websocket connection we established. As you can imagine this becomes a major annoyance when you e.g. run a shell session in which you want to run a process in the background and then quickly want to exit. Sorry, you are out of luck. Well, you were. The first, and naive approach is obviously to simply close stdout as soon as you detect that the foreground program (e.g. the shell) has exited. Not quite as good as an idea as one might think… The problem becomes obvious when you then run quickly executing programs like:

lxc exec -- ls -al /usr/lib

where the lxc exec process (and the associated forkexec process (Don’t worry about it now. Just remember that Go + setns() are not on speaking terms…)) exits before all buffered data in stdout was read. In this case you will cause truncated output and no one wants that. After a few approaches to the problem that involved, disabling pty buffering (Wasn’t pretty I tell you that and also didn’t work predictably.) and other weird ideas I managed to solve this by employing a few poll() “tricks” (In some sense of the word “trick”.). Now you can finally run background tasks and cleanly exit. To wit:

2. Reporting exit codes caused by signals

ssh is a wonderful tool. One thing however, I never really liked was the fact that when the command that was run by ssh received a signal ssh would always report -1 aka exit code 255. This is annoying when you’d like to have information about what signal caused the program to terminate. This is why I recently implemented the standard shell convention of reporting any signal-caused exits using the standard convention 128 + n where n is defined as the signal number that caused the executing program to exit. For example, on SIGKILL you would see 128 + SIGKILL = 137 (Calculating the exit codes for other deadly signals is left as an exercise to the reader.). So you can do:

chb@conventiont|~
> lxc exec zest1 sleep 100

Now, send SIGKILL to the executing program (Not to lxc exec itself, as SIGKILL is not forwardable.):

kill -KILL $(pidof sleep 100)

and finally retrieve the exit code for your program:

chb@conventiont|~
> echo $?
137

Voila. This obviously only works nicely when a) the exit code doesn’t breach the 8-bit wall-of-computing and b) when the executing program doesn’t use 137 to indicate success (Which would be… interesting(?).). Both arguments don’t seem too convincing to me. The former because most deadly signals should not breach the range. The latter because (i) that’s the users problem, (ii) these exit codes are actually reserved (I think.), (iii) you’d have the same problem running the program locally or otherwise. The main advantage I see in this is the ability to report back fine-grained exit statuses for executing programs. Note, by no means can we report back all instances where the executing program was killed by a signal, e.g. when your program handles SIGTERM and exits cleanly there’s no easy way for LXD to detect this and report back that this program was killed by signal. You will simply receive success aka exit code 0.

3. Forwarding signals

This is probably the least interesting (or maybe it isn’t, no idea) but I found it quite useful. As you saw in the SIGKILL case before, I was explicit in pointing out that one must send SIGKILL to the executing program not to the lxc exec command itself. This is due to the fact that SIGKILL cannot be handled in a program. The only thing the program can do is die… like right now… this instance… sofort… (You get the idea…). But a lot of other signals SIGTERM, SIGHUP, and of course SIGUSR1 and SIGUSR2 can be handled. So when you send signals that can be handled to lxc exec instead of the executing program, newer versions of LXD will forward the signal to the executing process. This is pretty convenient in scripts and so on.

In any case, I hope you found this little lxc exec post/rant useful. Enjoy LXD it’s a crazy beautiful beast to play with. Give it a try online https://linuxcontainers.org/lxd/try-it/ and for all you developers out there: Checkout https://github.com/lxc/lxd and send us patches. :) We don’t require any CLA to be signed, we simply follow the kernel style of requiring a Signed-off-by line. :)