Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS remote exec and HUP signal killing initd service #5113

Closed
byrnedo opened this issue Feb 12, 2016 · 7 comments
Closed

AWS remote exec and HUP signal killing initd service #5113

byrnedo opened this issue Feb 12, 2016 · 7 comments

Comments

@byrnedo
Copy link

byrnedo commented Feb 12, 2016

When running an initd command on ubuntu 14.04, my service ends up getting killed once the ssh session exits. Is there a way to prevent this? It appears to receive a HUP signal.

@apparentlymart
Copy link
Contributor

@byrnedo it sounds like you're using a remote-exec provisioner to start up a service by running its init script.

In most cases that should be fine, as long as the init script is correctly written and the service itself is properly "daemonizing"... which includes steps such as detaching from the shell's process group, disconnecting from the controlling terminal, etc.

Are you able to reproduce the same symptom if you SSH in directly and start the service, before logging out? If so, I would expect that the above is the problem: the service in question isn't fully daemonizing, and so when the launching shell exits the service is killed.

In that case, how to resolve this unfortunately depends entirely on what service you're running, although a general solution for running un-cooperative programs as long-running services is to run it under some sort of daemon supervisor like supervisord or damontools. In that case, it is the supervisor program that "daemonizes", and the target service is expected not to daemonize... it remains as a child process of the supervisor program and running in the "foreground" from its perspective, and the supervisor program monitors to ensure it stays running.

Modern init replacements in Linux distributions often have a mechanism for this built in. For example, both upstart and systemd expect "foregrounded" applications by default, and only with additional settings will expect a program to actively daemonize itself. At work we're now a 100% systemd environment, and so very few of our programs actually daemonize themselves and in most cases we just let systemd supervise them.

When using such a "supervisor" system, whether an add-on one or one built in to your main init system, you can then use remote-exec provisioners to interact with that system, rather than directly with the service in question. For example, for systemd you might have remoteexec run sudo systemctl start foo.service, in which case the actual launching of foo.service will happen "behind the scenes" in systemd, unaware of Terraform's SSH connection.

I hope something in all of that was helpful. If you think your problem is not what I've described here, it'd help to have a bit more info on exactly what service you're trying to run.

@byrnedo
Copy link
Author

byrnedo commented Feb 14, 2016

Hi @apparentlymart and thanks for the thorough reply!

The thing is that when manually ssh'ing onto the box, starting the service and then logging off, it works fine.

I am interacting with the service using (on ubuntu) 'service ' via a remote exec. I believe it's upstart which is running on my box, not systemd.

The provisioner line:

 provisioner "remote-exec" {
        inline = [ "sudo service rexray restart" ]
    }

What actually ends up working for me is forcing it to ssh manually from a local-exec:

 provisioner "local-exec" {
        command = "ssh -o IdentitiesOnly=yes -oStrictHostKeyChecking=no -i ${var.key_path} ubuntu@${self.public_ip} 'sudo service rexray restart'"
    }

I'm not too well versed in these things but is it something to do with creating a shell session?

Again, thanks for your exhaustive reply.


Further info:

the service is rexray.

The init.d script that rexray installs:

### BEGIN INIT INFO
# Provides:          rexray
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start daemon at boot time
# Description:       Enable service provided by daemon.
### END INIT INFO
[ -f /etc/default/rexray ] && . /etc/default/rexray

case "$1" in
  start)
    /usr/bin/rexray start
    ;;
  stop)
    /usr/bin/rexray stop
    ;;
  status)
    /usr/bin/rexray status
    ;;
  restart)
    /usr/bin/rexray restart
    ;;
  reload)
    /usr/bin/rexray reload
    ;;
  force-reload)
    /usr/bin/rexray force-reload
    ;;
  *)
    echo "Usage: $0 {start|stop|status|restart|reload|force-reload}"
esac

@apparentlymart
Copy link
Contributor

@byrnedo I guess the main question now then is what is different about how ssh runs that command vs. how Terraform runs that command.

Terraform is internally using a native Go SSH implementation rather than running the ssh command directly, so unfortunately quite a few things are different, but I expect most of that would just boil down to not processing options set in your .ssh/config or global ssh_config files.

I found rexray/rexray#85, which seems to be old and resolved but the discussion in there did give me a lead: it would appear that rexray is sensitive to whether or not it's running in a pseudo-tty. IIRC, ssh by default will not create a pseudo-tty when running a command given on the command line (as opposed to creating a regular interactive session), but as far as I can tell Terraform will unconditionally create one when dealing with a remote-exec provisioner.

We could attempt to confirm this by taking your working local-exec example and adding the additional -t option, which will instruct OpenSSH to allocate a pseudo-tty even though you're directly running a command. If your rexray process starts getting SIGHUP in this case then that would confirm that the pseudo-tty is a problem.

Unfortunately I don't think Terraform currently provides a way to disable the pseudo-tty creation, so if this is the problem then the immediate solution would be to try to insulate rexray from the problem. (more on that below)

If the pseudo-tty doesn't seem to be the problem then I think a solution will come from further investigating differences between the ssh command and Terraform's SSH communicator.


I don't know upstart well, but based on my experience with systemd I'd guess that upstart is running that init script in some sort of "initscript emulation mode", since upstart's native service definition is a different format. You may be able to get more control over how rexray launches by writing a first-class upstart configuration for it. A quick skim of the rexray docs leads me to believe that you can specify an -f flag to get it to run in the foreground, so perhaps something like this is a starting point:

description "rexray"

start on runlevel [2345]
stop on runlevel [016] or unmounting-filesystem or deconfiguring-networking

respawn

script
    [ ! -s /etc/default/rexray ] || . /etc/default/rexray

    exec /usr/bin/rexray start -f
end script

I'll need to defer to the upstart docs for the details on that, but if upstart is similar to systemd then I expect upstart would then be able to run this program in a more controlled fashion, isolated from the environment where you're running the service command.

@byrnedo
Copy link
Author

byrnedo commented Feb 15, 2016

I think your first assumption is right. I did the following

❯ssh -o IdentitiesOnly=yes -i terraform.pem ubuntu@xxxx -t "sudo service rexray start"                                                                           sendify-terraform/git/master 
INFO[0000] [linux]                                      
INFO[0000] [docker]                                     
INFO[0000] ec2                                          
INFO[0000] docker volume driver initialized              availabilityZone= iops= provider=docker size=5 volumeRootPath=/data volumeType=gp2
INFO[0000] os driver initialized                         provider=linux
INFO[0000] storage driver initialized                    provider=ec2
Starting REX-Ray...SUCCESS!

  The REX-Ray daemon is now running at PID 7049. To
  shutdown the daemon execute the following command:

    sudo /usr/bin/rexray stop

Connection to xxxx closed.
❯ssh -o IdentitiesOnly=yes -i terraform.pem ubuntu@xxxx "sudo service rexray status"                                                                             sendify-terraform/git/master 
REX-Ray is stopped

And it appears to not have started.

I'll try out rolling my own upstart script, I had hoped I could avoid doing this but it's not that bad really.

There's no way to force terraform to use native ssh?

@apparentlymart
Copy link
Contributor

Cool... I'm glad we got to the bottom of that. Sorry it ended up being a bit of a "busy-work" solution but hopefully it works out well.

r.e. using the OpenSSH client, using the local-exec provisioner to run it, as you've done here, is currently the only way.

@byrnedo
Copy link
Author

byrnedo commented Feb 15, 2016

That's grand, I can make do with that. Thanks for all the help!

@byrnedo byrnedo closed this as completed Feb 15, 2016
@ghost
Copy link

ghost commented Apr 28, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants