When I run docker-compose stop id, after about 10 seconds the docker log shows: nightly_id_1 exited with code 137
According to the docker docs, docker-compose stop will send SIGTERM and wait 10 seconds, then will send SIGKILL. I’m guessing the service isn’t responding to SIGTERM. I’ve increased the timeout to 60 seconds with same result, so doesn’t seem like I’m simply waiting for lagom to stop.
My docker packaging uses sbt-native-packager and is very close to the chirper example, except my image is "openjdk:8-jdk-slim-stretch".
Any ideas on how to get the service to shutdown gracefully, and presumably faster?
It should shut down on SIGTERM with the default settings, though it’s possible that it could take more than ten seconds. Sixty seems like a lot. Are you seeing anything in the service logs after it receives the signal? There should be info-level logging as the shutdown process proceeds.
There were some known issues in older versions of Lagom where shutdown could deadlock. What version are you using?
root@9b62ded44b19:/opt/docker# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
daemon 1 0.0 0.0 4292 468 ? Ss 19:57 0:00 /bin/sh -c bin/
daemon 14 12.6 2.1 9996128 343576 ? Sl 19:57 0:45 /docker-java-ho
root 2322 0.0 0.0 19900 3520 pts/0 Ss 20:02 0:00 bash
root 2596 0.0 0.0 38384 3108 pts/0 R+ 20:03 0:00 ps aux
and then root@9b62ded44b19:/opt/docker# kill -n 15 14
It shutdown almost instantly and checking the logs showed a nice orderly shutdown.
I think docker stop sends to pid 1 which doesn’t work for these images. I’m not super familiar with docker yet so I’m not sure if this is an issue with sbt-native-packager or how it’s configured in build.sbt - or something else. My docker settings are closely ripped from the chirper kubernetes example and you can see them in the TagWriter issue I posted at TagWriter fails if cassandra isn’t started before lagom service.
I’m not at the stage of using kubernetes yet, but I don’t see why this would be any different for anyone using the chirper example as a guide to build docker images for any orchestration platform. Do you know if others are getting their containers shutdown gracefully or are they all being killed after timeout?
When sending a SIGTERM to a process, if the process didn’t register a handler for SIGTERM then the kernel will fallback to SIGKILL. When the PID of the process is 1, this fallback doesn’t exist.
By default, docker stop sends a SIGTERM and, after 10 seconds, sends a SIGKILL.
bash doesn’t forward signals to underlying processes
A process must take care of its child processes’ reaping
I think the containers produced are using a setup (not sure which) causing the SIGTERM generated by docker stop to not reach the Lagom process inside the container. Could you share the Dockerfile?
We also have been working on some improvements on the docker images (using JRE instead of JDK to reduce from 800Mb to 140Mb), etc… and, finally, we’ve reviewed the tooling around k8s and mesos DC/OS deployment. For all this reasons I suggest you had a look at build.sbt and KUBERNETES.md in https://github.com/lagom/lagom-java-sbt-chirper-example and upgrade.