The nsenter command executes program in the namespace(s) that are specified in the command-line options
– Linux man pages
At some point while you are running Kubernetes you will undoubtably run into a network problem. There will be times when you will want to see traffic from the pod perspective using a something like tcpdump. While options do exist to allow for this, i.e, sidecar or ephemeral containers. The former requires injecting the 2nd container in your pod and restarting it, the latter only an alpha feature at 1.22 and requiring the feature gate to be enabled in the cluster.
Enter…nsenter. This handy utility will allow you to enter the network namespace for a running pod and execute any program on your host OS within that namespace. This means we now have a way to on-deman inject tcpdump into the namespace to capture traffic, no restart required.
For our example here we will use an ‘echo’ container, a simple application that echos back request parameters to the user. This can be downloaded here.
- Create the new deployment
$ kubectl apply -f https://raw.githubusercontent.com/gleamingthekube/kubernetes-echo/main/echo.yaml
service/echo created
deployment.apps/echo created
2. We will scale this down to 1 replica for ease of tracking. You can either modify the manifest or run the below.
$ kubectl scale deployment echo --replicas 1
deployment.apps/echo scaled
3. Check that our pod is running.
$ kubectl get pod
echo-5fc5b5bc84-cp8rb 1/1 Running 0 44s
4. To enter the network namespace for this container we will need to know the PID of the running container. We can pull this from the docker container by using the docker inspect command. To find the docker container backing the pod container we have a few options, we can use the docker ‘container ls’ command or pull it from the kubectl pod output.
a. docker container ls
$ docker container ls | grep -i k8s_echo
34b04fb4c5dc 4081d9a83108 "/usr/local/bin/run.…" 24 seconds ago Up 58 seconds k8s_echo_echo-5fc5b5bc84-cp8rb_default_c8f7b6a1-3a54-4201-ab0d-5d47058b380b_0
b. kubectl
$ kubectl get pod -l app=echo -o jsonpath='{.items[0].status.containers[0].containerid}' | awk -F"[://]+" '/docker:\/\//{print $2}'
docker inspect --format '{{json .State.Pid}}' 34b04fb4c5dc.....
5. Now that we have the container ID we can use docker inspect to get the PID
$ docker inspect --format '{{json .State.Pid}}' 34b04fb4c5dc
26835
6. Before we enter into the network namespace, we will take a quick look at our current interfaces. We will do this to confirm we are indeed inside the namespace in the next steps. Check the ip address on local host.
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether f2:5f:04:0b:e4:96 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether e8:5e:01:1b:e3:87 brd ff:ff:ff:ff:ff:ff
inet 192.168.10.98/24 brd 192.168.86.255 scope global noprefixroute wlan0
valid_lft forever preferred_lft forever
inet6 fe80::d417:6a42:d46e:c8e5/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:6b:bb:92:ee brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
<rest_of_output_omitted>
7. We can now use nsenter to get into the network namespace. The -t switch it to pass in our PID and the -n switch to signify that we want to enter the network namespace.
$ sudo nsenter -t 26835 -n
8. If you run the below command you will notice a different set of IP addresses and far fewer interfaces. This will confirm that you are not scoped within the namespace of the container.
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether 7a:f6:f0:41:03:b4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.115.17/32 brd 192.168.149.111 scope global eth0
valid_lft forever preferred_lft forever
4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
9. Assuming tcpdump is installed on the host system we can now proceed to set it up.
$ tcpdump -s 0 -i any -w /home/net.pcap
10. You can now send requests or reproduce any issue you are experiencing with this runs. Once complete use ctrl+c to break out of the capture.
11. Exit the namespace by typing exit and you will return to your former shell. You can now open the pcap to view and inspect the traffic.