How does kubectl port-forward work ?

Dumlu Timuralp
6 min readNov 5, 2020

Thanks to @Yasen Simeonov and @Yuki Tsuboi for providing me great help in figuring this out.

Assumption: I always thought that - when you use “kubectl port-forward” to test an application pod - the traffic gets tunneled from the client to kube-apiserver and then the kubernetes master/control plane node just accesses the pod through the actual networking data plane on the respective worker node and onwards on the eth0 of the application pod. Well I was totally wrong.

What I was interested in is the data plane, how packets get delivered.

Test environment is simple; three K8S nodes. One master/control plane and two worker nodes.

Started port forwarding on the client, for the frontend pod running on worker1.

Client’ s IP is 172.16.100.2. Kubeadm is used to bootstrap this cluster so that means kube-apiserver is actually a pod running in the hostnetwork of master/control plane node on port 6443 and the node’ s IP is 10.79.1.200.

My assumption was:

Step 1 : client IP -> master/control plane node IP (port 6443)

Step 2 : master/control plane node IP -> Pod IP (port 80)

Considering the assumption above, performed tcpdump on the frontend pod’ s eth0 interace, while doing “curl localhost:11111” on the client.

As shown above, no packets on frontend’ s pod interface; but frontend pod responds to the curl requests successfully (shown above). How come ?

As explained in this post, apparently kubectl port-forward relies on “socat” within the worker node. What actually is happening is that the master/control plane node uses the kubelet communication channel to send the request to the worker node and then within the worker node it is basically localhost communication between the node and the frontend pod.

To further investigate this, the logging level of kubelet process on the worker1 node is modified. This can be done by editing the “/var/lib/kubelet/kubeadm-flags.env” on the node and appending “--v=8” at the end. Then the kubelet process is restarted by “sudo systemctl restart kubelet” .

As soon as performing “kubectl port-forward pod/frontend 11111:80” on the client; and then turning over to the worker node, the kubelet logs starts giving clues.

root@worker1:/var/lib/kubelet# journalctl -u kubelet -f
-- Logs begin at Tue 2020-07-07 09:58:13 UTC. --
Nov 05 16:00:17 worker1 kubelet[14362]: I1105 16:00:17.354581 14362 auth.go:112] Node request attributes: user=&user.DefaultInfo{Name:"kube-apiserver-kubelet-client", UID:"", Groups:[]string{"system:masters", "system:authenticated"}, Extra:map[string][]string(nil)} attrs=authorizer.AttributesRecord{User:(*user.DefaultInfo)(0xc00082aa80), Verb:"create", Namespace:"", APIGroup:"", APIVersion:"v1", Resource:"nodes", Subresource:"proxy", Name:"worker1", ResourceRequest:true, Path:"/portForward/default/frontend"}Nov 05 16:00:17 worker1 kubelet[14362]: I1105 16:00:17.386516 14362 upgradeaware.go:278] Connecting to backend proxy (direct dial) http://127.0.0.1:45367/portforward/IWMyVeDDNov 05 16:00:17 worker1 kubelet[14362]: Headers: map[Connection:[Upgrade] Content-Length:[0] Upgrade:[SPDY/3.1] User-Agent:[kubectl/v1.19.0 (darwin/amd64) kubernetes/e199641] X-Forwarded-For:[172.16.100.2, 10.79.1.200] X-Stream-Protocol-Version:[portforward.k8s.io]]Nov 05 16:00:17 worker1 kubelet[14362]: I1105 16:00:17.387490 14362 httpstream.go:45] Upgrading port forward responseNov 05 16:00:17 worker1 kubelet[14362]: I1105 16:00:17.387647 14362 httpstream.go:53] (conn=0xc0016d5ef0) setting port forwarding streaming connection idle timeout to 4h0m0sNov 05 16:00:17 worker1 kubelet[14362]: I1105 16:00:17.387677 14362 httpstream.go:209] (conn=0xc0016d5ef0) waiting for port forward streams

172.16.100.2 is the client IP (where I run the kubectl commands), 10.79.1.200 is the master/control plane node IP.

Then as soon as performing “curl localhost:11111” on the client, kubectl logs reveal the steps taking place on worker1 node.

Nov 05 16:05:50 worker1 kubelet[14362]: I1105 16:05:50.807160   14362 httpstream.go:219] (conn=0xc0016d5ef0, request=0) received new stream of type errorNov 05 16:05:50 worker1 kubelet[14362]: I1105 16:05:50.807707   14362 httpstream.go:128] (conn=0xc0016d5ef0, request=0) creating new stream pairNov 05 16:05:50 worker1 kubelet[14362]: I1105 16:05:50.895526   14362 httpstream.go:219] (conn=0xc0016d5ef0, request=0) received new stream of type dataNov 05 16:05:50 worker1 kubelet[14362]: I1105 16:05:50.895607   14362 httpstream.go:124] (conn=0xc0016d5ef0, request=0) found existing stream pairNov 05 16:05:50 worker1 kubelet[14362]: I1105 16:05:50.895651   14362 httpstream.go:245] (conn=0xc0016d5ef0, request=0) invoking forwarder.PortForward for port 80Nov 05 16:05:50 worker1 kubelet[14362]: I1105 16:05:50.896533   14362 httpstream.go:146] (conn=&{0xc00111f6b0 [0xc000396140 0xc0006c3860] {0 0} 0x1c10870}, request=0) successfully received error and data streamsNov 05 16:05:50 worker1 kubelet[14362]: I1105 16:05:50.897375   14362 docker_streaming_others.go:55] executing port forwarding command: /usr/bin/nsenter -t 5941 -n /usr/bin/socat - TCP4:localhost:80Nov 05 16:05:51 worker1 kubelet[14362]: I1105 16:05:51.071438   14362 httpstream.go:247] (conn=0xc0016d5ef0, request=0) done invoking forwarder.PortForward for port 80

Here the “5941” is actually the network namespace of the frontend pod’ s pause container.

Getting tcpdump capture on ens160 interface on the worker1 node on kubelet port 10250 and performing “curl localhost:11111” continuously through a simple loop (ie while true;do curl localhost:11111 & sleep 0.01; done) demonstrated that the kubelet channel becomes very chatty because of the curl requests coming into the worker1 node. Shown below.

root@worker1:/var# tcpdump -i ens160 port 10250
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
16:28:51.989411 IP master.56978 > worker1.10250: Flags [P.], seq 3896068586:3896068610, ack 350940681, win 501, options [nop,nop,TS val 3526512926 ecr 1519619911], length 24
16:28:51.989470 IP master.56978 > worker1.10250: Flags [P.], seq 24:48, ack 1, win 501, options [nop,nop,TS val 3526512926 ecr 1519619911], length 24
16:28:51.989601 IP master.56978 > worker1.10250: Flags [P.], seq 48:74, ack 1, win 501, options [nop,nop,TS val 3526512926 ecr 1519619911], length 26
16:28:51.990014 IP worker1.10250 > master.56978: Flags [.], ack 74, win 501, options [nop,nop,TS val 1519647179 ecr 3526512926], length 0
16:28:51.990022 IP master.56978 > worker1.10250: Flags [P.], seq 74:100, ack 1, win 501, options [nop,nop,TS val 3526512926 ecr 1519619911], length 26
16:28:51.990045 IP master.56978 > worker1.10250: Flags [P.], seq 100:126, ack 1, win 501, options [nop,nop,TS val 3526512926 ecr 1519619911], length 26
16:28:51.990098 IP master.56978 > worker1.10250: Flags [P.], seq 126:149, ack 1, win 501, options [nop,nop,TS val 3526512926 ecr 1519619911], length 23
16:28:51.990335 IP worker1.10250 > master.56978: Flags [.], ack 149, win 501, options [nop,nop,TS val 1519647179 ecr 3526512926], length 0
16:28:51.990369 IP master.56978 > worker1.10250: Flags [P.], seq 149:172, ack 1, win 501, options [nop,nop,TS val 3526512927 ecr 1519647179], length 23
16:28:51.990417 IP master.56978 > worker1.10250: Flags [P.], seq 172:204, ack 1, win 501, options [nop,nop,TS val 3526512927 ecr 1519647179], length 32

The way http request gets to the pod can be verified by checking the netstat output within the frontend pod’ s network namespace. Shown below.

The connections which has “State” as “TIME_WAIT” are the continuous http requests that were sent by “curl localhost:11111” on the client side.

Note : Alternatively you can just “kubectl exec -it frontend -- sh” and then use “netstat” in that shell to see the same output shown above.

To summarize, the actual data plane pattern is as below :

Step 1 : client IP -> master/control plane node IP (port 6443)

Step 2 : master/control plane node IP -> worker1 node IP (port 10250) - http request is carried within this traffic

Step 3 : worker1 node -> pod (port 80) (but this occurs as localhost communication within worker1 node)

--

--