Pod in not a running state or run for a brief time before dying? Kubernetes got you down? Do you want to make more money, sure we all do…er wait, I lost the topic here. Ah pods, stupid glorious pods! You’ve gotten yourself scheduled on a node, but the damn thing just isn’t doing what you need, where do you go from here?
Pods will enter a running state when one or more containers in the pod are executing without issue. This doesn’t always mean it is doing what you need. Let’s look at an example.
$ kubectl apply -f https://raw.githubusercontent.com/gleamingthekube/troubleshooting-kubernetes/main/multicontainer-pod.yaml
Here we create a pod with two containers, nginx and busybox. If we look at the status of the pod right after deployment we will see both containers are running.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
multicontainer-7d966666f8-d5bfc 2/2 Running 0 5s
If we wait continue to montior the status a bit longer we will see it flips between Running and NotRaedy, eventually showing a status of CrashLoopBackOff.
$ kubectl get pod -w
NAME READY STATUS RESTARTS AGE
multicontainer-7d966666f8-d5bfc 2/2 Running 3 80s
multicontainer-7d966666f8-d5bfc 1/2 NotReady 3 80s
multicontainer-7d966666f8-d5bfc 1/2 CrashLoopBackOff 3 95s
We know one pod is running fine and one is not. If we take a look at the application logs we will not see anything meaningful.
$ kubectl logs multicontainer-7d966666f8-d5bfc -c busybox
$ kubectl logs multicontainer-7d966666f8-d5bfc -c nginx
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/08/22 19:41:25 [notice] 1#1: using the "epoll" event method
So far we can summarize this is not likely an application problem. We can move to checking either the events output or the describe output of the container. In the describe output we can put focus to the containers section. Here we see it is our busybox container that is exiting.
$ kubectl describe pod multicontainer-7d966666f8-d5bfc
...
Containers:
busybox:
Container ID: docker://42ba0efea197d513d684e2c8e3d73447c4a8d9789a1ddad1d68b958c47924068
Image: busybox:1.34.0
Command:
sleep
10
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 22 Aug 2021 15:45:17 -0400
Finished: Sun, 22 Aug 2021 15:45:27 -0400
Ready: False
Restart Count: 5
nginx:
Container ID: docker://da34126eeb974345ada8b988daacbf9e37e2d890ad8dd63cc48c087eca020fc9
Image: nginx
Image ID: docker-pullable://nginx@sha256:4d4d96ac750af48c6a551d757c1cbfc071692309b491b70b2b8976e102dd3fef
Port: <none>
Host Port: <none>
State: Running
Started: Sun, 22 Aug 2021 15:41:24 -0400
Ready: True
Restart Count: 0
Notice however that the exit code is 0 and the reason is completed. This means the pod ran to a successful completion. The main process ran and completed successfully and noted by the 0 return code. We know the process completed whatever work it was given. If you dig a bit further into the busybox container, we find that we are passing a command to sleep for 10s. Once that command completes, by nature of containers, the container will exit. Based on the default Kubernetes restart policy it will keep trying to startup the container again, hence why we see the container running again for a short period of time before eventually resulting in an error.
Let’s troubleshoot a bit more, first clean up the previous deployment:
$ kubectl delete -f https://raw.githubusercontent.com/gleamingthekube/troubleshooting-kubernetes/main/multicontainer-pod.yaml
Go ahead and create a new deployment to troubelshoot.
$ kubectl apply -f https://raw.githubusercontent.com/gleamingthekube/troubleshooting-kubernetes/main/multicontainer-pod-bad-cmd.yaml
If we follow the pod output here we will see the below.
$ kubectl get pod -w
NAME READY STATUS RESTARTS AGE
multicontainer-54456d7cb4-gbhlc 1/2 RunContainerError 1 16s
multicontainer-54456d7cb4-gbhlc 1/2 CrashLoopBackOff 1 21s
Since we never reached a running state we can assume something is wrong with one of the containers. So again, application logs won’t be relevant here. Take a look at the pod events to see if we can find anything wrong.
$ kubectl describe pod multicontainer-54456d7cb4-gbhlc | grep Events: -A20
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> Successfully assigned default/multicontainer-54456d7cb4-gbhlc to docker-desktop
Normal Pulling 117s kubelet, docker-desktop Pulling image "nginx"
Normal Started 116s kubelet, docker-desktop Started container nginx
Normal Pulled 116s kubelet, docker-desktop Successfully pulled image "nginx" in 847.3296ms
Normal Created 116s kubelet, docker-desktop Created container nginx
Warning BackOff 44s (x4 over 113s) kubelet, docker-desktop Back-off restarting failed container
Normal Pulled 24s (x5 over 117s) kubelet, docker-desktop Container image "busybox:1.34.0" already present on machine
Normal Created 23s (x5 over 117s) kubelet, docker-desktop Created container busybox
Warning Failed 12s (x5 over 117s) kubelet, docker-desktop Error: failed to start container "busybox": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "fake": executable file not found in $PATH: unknown
Note the warning about the busybox container. Here we see that the command we are attempting to run in the container does not exist. We will need to either update to use the correct command or rebuild our image to make sure the command referenced is included.