lxc exec vs ssh
Recently, I’ve implemented several improvements for lxc exec
. In case you
didn’t know, lxc exec
is LXD’s client tool that
uses the LXD client api
to talk to the LXD daemon and execute any program the user might want. Here is
a small example of what you can do with it:
One of our main goals is to make lxc exec
feel as similar to ssh
as
possible since this is the standard of running commands interactively or
non-interactively remotely. Making lxc exec
behave nicely was tricky.
1. Handling background tasks
A long-standing problem was certainly how to correctly handle background tasks. Here’s an asciinema illustration of the problem with a pre LXD 2.7 instance:
What you can see there is that putting a task in the background will lead to
lxc exec
not being able to exit. A lot of command sequences can trigger this
problem:
chb@conventiont|~
> lxc exec zest1 bash
root@zest1:~# yes &
y
y
y
.
.
.
Nothing would save you now. yes
will simply write to stdout
till the end of
time as quickly as it can…
The root of the problem lies with stdout
being kept open which is necessary
to ensure that any data written by the process the user has started is actually
read and sent back over the websocket connection we established.
As you can imagine this becomes a major annoyance when you e.g. run a shell
session in which you want to run a process in the background and then quickly
want to exit. Sorry, you are out of luck. Well, you were.
The first, and naive approach is obviously to simply close stdout
as soon as
you detect that the foreground program (e.g. the shell) has exited. Not quite
as good as an idea as one might think… The problem becomes obvious when you
then run quickly executing programs like:
lxc exec -- ls -al /usr/lib
where the lxc exec
process (and the associated forkexec
process (Don’t
worry about it now. Just remember that Go
+ setns()
are not on speaking
terms…)) exits before all buffered data in stdout
was read. In this case
you will cause truncated output and no one wants that. After a few approaches
to the problem that involved, disabling pty buffering (Wasn’t pretty I tell you
that and also didn’t work predictably.) and other weird ideas I managed to
solve this by employing a few poll()
“tricks” (In some sense of the word
“trick”.). Now you can finally run background tasks and cleanly exit. To wit:
2. Reporting exit codes caused by signals
ssh
is a wonderful tool. One thing however, I never really liked was the fact
that when the command that was run by ssh received a signal ssh
would always
report -1
aka exit code 255
. This is annoying when you’d like to have
information about what signal caused the program to terminate. This is why
I recently implemented the standard shell convention of reporting any
signal-caused exits using the standard convention 128 + n
where n
is
defined as the signal number that caused the executing program to exit. For
example, on SIGKILL
you would see 128 + SIGKILL = 137
(Calculating the exit
codes for other deadly signals is left as an exercise to the reader.). So you
can do:
chb@conventiont|~
> lxc exec zest1 sleep 100
Now, send SIGKILL
to the executing program (Not to lxc exec
itself, as
SIGKILL
is not forwardable.):
kill -KILL $(pidof sleep 100)
and finally retrieve the exit code for your program:
chb@conventiont|~
> echo $?
137
Voila. This obviously only works nicely when a) the exit code doesn’t breach
the 8
-bit wall-of-computing and b) when the executing program doesn’t use
137
to indicate success (Which would be… interesting(?).). Both arguments
don’t seem too convincing to me. The former because most deadly signals
should not breach the range. The latter because (i) that’s the users problem,
(ii) these exit codes are actually reserved (I think.), (iii) you’d have the
same problem running the program locally or otherwise.
The main advantage I see in this is the ability to report back fine-grained
exit statuses for executing programs. Note, by no means can we report back
all instances where the executing program was killed by a signal, e.g. when
your program handles SIGTERM
and exits cleanly there’s no easy way for
LXD to detect this and report back that this
program was killed by signal. You will simply receive success aka exit code
0
.
3. Forwarding signals
This is probably the least interesting (or maybe it isn’t, no idea) but I found
it quite useful. As you saw in the SIGKILL
case before, I was explicit in
pointing out that one must send SIGKILL
to the executing program not to the
lxc exec
command itself. This is due to the fact that SIGKILL
cannot be
handled in a program. The only thing the program can do is die… like right
now… this instance… sofort… (You get the idea…). But a lot of other
signals SIGTERM
, SIGHUP
, and of course SIGUSR1
and SIGUSR2
can be
handled. So when you send signals that can be handled to lxc exec
instead of
the executing program, newer versions of LXD will
forward the signal to the executing process. This is pretty convenient in
scripts and so on.
In any case, I hope you found this little lxc exec
post/rant useful. Enjoy
LXD it’s a crazy beautiful beast to play with.
Give it a try online https://linuxcontainers.org/lxd/try-it/ and for all you
developers out there: Checkout https://github.com/lxc/lxd and send us patches.
:) We don’t require any CLA
to be signed, we simply follow the kernel style
of requiring a Signed-off-by
line. :)