lxc exec vs ssh

Recently, I’ve implemented several improvements for lxc exec. In case you
didn’t know, lxc exec is LXD’s client tool that
uses the LXD client api
to talk to the LXD daemon and execute any program the user might want. Here is
a small example of what you can do with it:
One of our main goals is to make lxc exec feel as similar to ssh as
possible since this is the standard of running commands interactively or
non-interactively remotely. Making lxc exec behave nicely was tricky.
1. Handling background tasks
A long-standing problem was certainly how to correctly handle background tasks. Here’s an asciinema illustration of the problem with a pre LXD 2.7 instance:
What you can see there is that putting a task in the background will lead to
lxc exec not being able to exit. A lot of command sequences can trigger this
problem:
chb@conventiont|~
> lxc exec zest1 bash
root@zest1:~# yes &
y
y
y
.
.
.
Nothing would save you now. yes will simply write to stdout till the end of
time as quickly as it can…
The root of the problem lies with stdout being kept open which is necessary
to ensure that any data written by the process the user has started is actually
read and sent back over the websocket connection we established.
As you can imagine this becomes a major annoyance when you e.g. run a shell
session in which you want to run a process in the background and then quickly
want to exit. Sorry, you are out of luck. Well, you were.
The first, and naive approach is obviously to simply close stdout as soon as
you detect that the foreground program (e.g. the shell) has exited. Not quite
as good as an idea as one might think… The problem becomes obvious when you
then run quickly executing programs like:
lxc exec -- ls -al /usr/lib
where the lxc exec process (and the associated forkexec process (Don’t
worry about it now. Just remember that Go + setns() are not on speaking
terms…)) exits before all buffered data in stdout was read. In this case
you will cause truncated output and no one wants that. After a few approaches
to the problem that involved, disabling pty buffering (Wasn’t pretty I tell you
that and also didn’t work predictably.) and other weird ideas I managed to
solve this by employing a few poll() “tricks” (In some sense of the word
“trick”.). Now you can finally run background tasks and cleanly exit. To wit:

2. Reporting exit codes caused by signals
ssh is a wonderful tool. One thing however, I never really liked was the fact
that when the command that was run by ssh received a signal ssh would always
report -1 aka exit code 255. This is annoying when you’d like to have
information about what signal caused the program to terminate. This is why
I recently implemented the standard shell convention of reporting any
signal-caused exits using the standard convention 128 + n where n is
defined as the signal number that caused the executing program to exit. For
example, on SIGKILL you would see 128 + SIGKILL = 137 (Calculating the exit
codes for other deadly signals is left as an exercise to the reader.). So you
can do:
chb@conventiont|~
> lxc exec zest1 sleep 100
Now, send SIGKILL to the executing program (Not to lxc exec itself, as
SIGKILL is not forwardable.):
kill -KILL $(pidof sleep 100)
and finally retrieve the exit code for your program:
chb@conventiont|~
> echo $?
137
Voila. This obviously only works nicely when a) the exit code doesn’t breach
the 8-bit wall-of-computing and b) when the executing program doesn’t use
137 to indicate success (Which would be… interesting(?).). Both arguments
don’t seem too convincing to me. The former because most deadly signals
should not breach the range. The latter because (i) that’s the users problem,
(ii) these exit codes are actually reserved (I think.), (iii) you’d have the
same problem running the program locally or otherwise.
The main advantage I see in this is the ability to report back fine-grained
exit statuses for executing programs. Note, by no means can we report back
all instances where the executing program was killed by a signal, e.g. when
your program handles SIGTERM and exits cleanly there’s no easy way for
LXD to detect this and report back that this
program was killed by signal. You will simply receive success aka exit code
0.
3. Forwarding signals
This is probably the least interesting (or maybe it isn’t, no idea) but I found
it quite useful. As you saw in the SIGKILL case before, I was explicit in
pointing out that one must send SIGKILL to the executing program not to the
lxc exec command itself. This is due to the fact that SIGKILL cannot be
handled in a program. The only thing the program can do is die… like right
now… this instance… sofort… (You get the idea…). But a lot of other
signals SIGTERM, SIGHUP, and of course SIGUSR1 and SIGUSR2 can be
handled. So when you send signals that can be handled to lxc exec instead of
the executing program, newer versions of LXD will
forward the signal to the executing process. This is pretty convenient in
scripts and so on.
In any case, I hope you found this little lxc exec post/rant useful. Enjoy
LXD it’s a crazy beautiful beast to play with.
Give it a try online https://linuxcontainers.org/lxd/try-it/ and for all you
developers out there: Checkout https://github.com/lxc/lxd and send us patches.
:) We don’t require any CLA to be signed, we simply follow the kernel style
of requiring a Signed-off-by line. :)

