Troubleshooting Running & Terminating Pods

Pod in not a running state or run for a brief time before dying? Kubernetes got you down? Do you want to make more money, sure we all do…er wait, I lost the topic here. Ah pods, stupid glorious pods! You’ve gotten yourself scheduled on a node, but the damn thing just isn’t doing what you need, where do you go from here?

Pods will enter a running state when one or more containers in the pod are executing without issue. This doesn’t always mean it is doing what you need. Let’s look at an example.

$ kubectl apply -f https://raw.githubusercontent.com/gleamingthekube/troubleshooting-kubernetes/main/multicontainer-pod.yaml

Here we create a pod with two containers, nginx and busybox. If we look at the status of the pod right after deployment we will see both containers are running.

$ kubectl get pod
NAME                              READY   STATUS    RESTARTS   AGE
multicontainer-7d966666f8-d5bfc   2/2     Running   0          5s

If we wait continue to montior the status a bit longer we will see it flips between Running and NotRaedy, eventually showing a status of CrashLoopBackOff.

$ kubectl get pod -w
NAME                              READY   STATUS    RESTARTS   AGE
multicontainer-7d966666f8-d5bfc   2/2     Running   3          80s
multicontainer-7d966666f8-d5bfc   1/2     NotReady   3          80s
multicontainer-7d966666f8-d5bfc   1/2     CrashLoopBackOff   3          95s

We know one pod is running fine and one is not. If we take a look at the application logs we will not see anything meaningful.

$ kubectl logs multicontainer-7d966666f8-d5bfc -c busybox
$ kubectl logs multicontainer-7d966666f8-d5bfc -c nginx
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/08/22 19:41:25 [notice] 1#1: using the "epoll" event method

So far we can summarize this is not likely an application problem. We can move to checking either the events output or the describe output of the container. In the describe output we can put focus to the containers section. Here we see it is our busybox container that is exiting.

$ kubectl describe pod multicontainer-7d966666f8-d5bfc
...
Containers:
  busybox:
    Container ID:  docker://42ba0efea197d513d684e2c8e3d73447c4a8d9789a1ddad1d68b958c47924068
    Image:         busybox:1.34.0
    Command:
      sleep
      10
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 22 Aug 2021 15:45:17 -0400
      Finished:     Sun, 22 Aug 2021 15:45:27 -0400
    Ready:          False
    Restart Count:  5
  nginx:
    Container ID:   docker://da34126eeb974345ada8b988daacbf9e37e2d890ad8dd63cc48c087eca020fc9
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:4d4d96ac750af48c6a551d757c1cbfc071692309b491b70b2b8976e102dd3fef
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Sun, 22 Aug 2021 15:41:24 -0400
    Ready:          True
    Restart Count:  0

Notice however that the exit code is 0 and the reason is completed. This means the pod ran to a successful completion. The main process ran and completed successfully and noted by the 0 return code. We know the process completed whatever work it was given. If you dig a bit further into the busybox container, we find that we are passing a command to sleep for 10s. Once that command completes, by nature of containers, the container will exit. Based on the default Kubernetes restart policy it will keep trying to startup the container again, hence why we see the container running again for a short period of time before eventually resulting in an error.

Let’s troubleshoot a bit more, first clean up the previous deployment:

$ kubectl delete -f https://raw.githubusercontent.com/gleamingthekube/troubleshooting-kubernetes/main/multicontainer-pod.yaml

Go ahead and create a new deployment to troubelshoot.

$ kubectl apply -f https://raw.githubusercontent.com/gleamingthekube/troubleshooting-kubernetes/main/multicontainer-pod-bad-cmd.yaml

If we follow the pod output here we will see the below.

$ kubectl get pod -w
NAME                              READY   STATUS              RESTARTS   AGE
multicontainer-54456d7cb4-gbhlc   1/2     RunContainerError   1          16s
multicontainer-54456d7cb4-gbhlc   1/2     CrashLoopBackOff    1          21s

Since we never reached a running state we can assume something is wrong with one of the containers. So again, application logs won’t be relevant here. Take a look at the pod events to see if we can find anything wrong.

$ kubectl describe pod multicontainer-54456d7cb4-gbhlc | grep Events: -A20
Events:
  Type     Reason     Age                 From                     Message
  ----     ------     ----                ----                     -------
  Normal   Scheduled  <unknown>                                    Successfully assigned default/multicontainer-54456d7cb4-gbhlc to docker-desktop
  Normal   Pulling    117s                kubelet, docker-desktop  Pulling image "nginx"
  Normal   Started    116s                kubelet, docker-desktop  Started container nginx
  Normal   Pulled     116s                kubelet, docker-desktop  Successfully pulled image "nginx" in 847.3296ms
  Normal   Created    116s                kubelet, docker-desktop  Created container nginx
  Warning  BackOff    44s (x4 over 113s)  kubelet, docker-desktop  Back-off restarting failed container
  Normal   Pulled     24s (x5 over 117s)  kubelet, docker-desktop  Container image "busybox:1.34.0" already present on machine
  Normal   Created    23s (x5 over 117s)  kubelet, docker-desktop  Created container busybox
  Warning  Failed     12s (x5 over 117s)  kubelet, docker-desktop  Error: failed to start container "busybox": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "fake": executable file not found in $PATH: unknown

Note the warning about the busybox container. Here we see that the command we are attempting to run in the container does not exist. We will need to either update to use the correct command or rebuild our image to make sure the command referenced is included.