본문 바로가기
IT 잡지식/DevOps

[CKA] KodeKloud - Worker Node Failure

by 쯀리♥️ 2024. 8. 23.

안녕하세요, 쯀리입니다.

지난번 Application Failure와  ControlPlane Failure에 이어
오늘은  Worker Node Failure 에 관해 다뤄보겠습니다. 

https://kubernetes.io/docs/tasks/debug/debug-application/

 

Troubleshooting Applications

Debugging common containerized application issues.

kubernetes.io

 

https://funlife-julie.tistory.com/80

 

[CKA] KodeKloud - Application Failure

안녕하세요, 쯀리입니다. 10강 Troubleshooting을 들어가게되었습니다! 오늘은 Application Failure에 관해 다뤄보겠습니다. https://kubernetes.io/docs/tasks/debug/debug-application/ Troubleshooting ApplicationsDebugging common

funlife-julie.tistory.com

https://funlife-julie.tistory.com/81

 

[CKA] KodeKloud - Application Failure

안녕하세요, 쯀리입니다. 10강 Troubleshooting을 들어가게되었습니다! 오늘은 Application Failure에 관해 다뤄보겠습니다. https://kubernetes.io/docs/tasks/debug/debug-application/ Troubleshooting ApplicationsDebugging common

funlife-julie.tistory.com

 


Quiz.

1. Fix the broken cluster (node01)

 

node01 ~ ➜  systemctl status containerd
● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor >
     Active: active (running) since Thu 2024-08-22 15:38:21 UTC; 15min ago
       Docs: https://containerd.io
   Main PID: 1183 (containerd)
      Tasks: 99
     Memory: 132.9M
     CGroup: /system.slice/containerd.service
             ├─1183 /usr/bin/containerd
             ├─2713 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 26>
             ├─2714 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 53>
             ├─kubepods-besteffort-pod6836abc1_6b74_4f0c_a21e_eb71f551449d.sl>
             │ └─2763 /pause
             ├─kubepods-besteffort-pod6836abc1_6b74_4f0c_a21e_eb71f551449d.sl>
             │ └─2965 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/>
             ├─kubepods-burstable-pod8fc07dc8_a6ce_486a_94ea_91973b22a39c.sli>
             │ └─2765 /pause
             └─kubepods-burstable-pod8fc07dc8_a6ce_486a_94ea_91973b22a39c.sli>
               └─3387 /opt/bin/flanneld --ip-masq --kube-subnet-mgr --iface=e>

Aug 22 15:39:29 node01 containerd[1183]: time="2024-08-22T15:39:29.920803087Z>
Aug 22 15:39:30 node01 containerd[1183]: time="2024-08-22T15:39:30.182698528Z>
Aug 22 15:39:30 node01 containerd[1183]: time="2024-08-22T15:39:30.202544490Z>
Aug 22 15:39:30 node01 containerd[1183]: time="2024-08-22T15:39:30.202610264Z>
Aug 22 15:39:30 node01 containerd[1183]: time="2024-08-22T15:39:30.202620287Z>
Aug 22 15:39:30 node01 containerd[1183]: time="2024-08-22T15:39:30.271256324Z>
Aug 22 15:39:30 node01 containerd[1183]: time="2024-08-22T15:39:30.888125268Z>
Aug 22 15:39:30 node01 containerd[1183]: time="2024-08-22T15:39:30.922267880Z>

node01 ~ ➜  systemctl status kubelet
○ kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor pre>
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: inactive (dead) since Thu 2024-08-22 15:48:02 UTC; 5min ago
       Docs: https://kubernetes.io/docs/
    Process: 2562 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELE>
   Main PID: 2562 (code=exited, status=0/SUCCESS)

Aug 22 15:39:27 node01 kubelet[2562]: I0822 15:39:27.781386    2562 reconcile>
Aug 22 15:39:27 node01 kubelet[2562]: I0822 15:39:27.781401    2562 reconcile>
Aug 22 15:39:27 node01 kubelet[2562]: I0822 15:39:27.781417    2562 reconcile>
Aug 22 15:39:27 node01 kubelet[2562]: I0822 15:39:27.781429    2562 reconcile>
Aug 22 15:39:28 node01 kubelet[2562]: I0822 15:39:28.805667    2562 kuberunti>
Aug 22 15:39:28 node01 kubelet[2562]: I0822 15:39:28.884666    2562 kuberunti>
Aug 22 15:39:29 node01 kubelet[2562]: I0822 15:39:29.884279    2562 kuberunti>
Aug 22 15:39:29 node01 kubelet[2562]: I0822 15:39:29.903389    2562 pod_start>
Aug 22 15:39:30 node01 kubelet[2562]: I0822 15:39:30.203314    2562 kubelet_n>
Aug 22 15:39:30 node01 kubelet[2562]: I0822 15:39:30.887778    2562 kuberunti>

kubelet이 죽어있네요

node01 ~ ✖ systemctl start kubelet

node01 ~ ➜  systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor pre>
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Thu 2024-08-22 15:54:13 UTC; 2s ago
       Docs: https://kubernetes.io/docs/
   Main PID: 8277 (kubelet)
      Tasks: 26 (limit: 251379)
     Memory: 50.3M
     CGroup: /system.slice/kubelet.service
             └─8277 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/b>

 

완료: 

controlplane ~ ➜  k get node
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   15m   v1.30.0
node01         Ready    <none>          15m   v1.30.0

 

 

2. The cluster is broken again. Investigate and fix the issue.

node01 ~ ✖ journalctl -u kubelet -f

Aug 22 15:57:32 node01 kubelet[10069]: I0822 15:57:32.040698   10069 server.go:205] "--pod-infra-container-image will not be pruned by the image garbage collector in kubelet and should also be set in the remote runtime"
Aug 22 15:57:32 node01 kubelet[10069]: E0822 15:57:32.044187   10069 run.go:74] "command failed" err="failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/pki/WRONG-CA-FILE.crt: open /etc/kubernetes/pki/WRONG-CA-FILE.crt: no such file or directory"

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

 

kubelet

Synopsis The kubelet is the primary "node agent" that runs on each node. It can register the node with the apiserver using one of: the hostname; a flag to override the hostname; or specific logic for a cloud provider. The kubelet works in terms of a PodSpe

kubernetes.io

 

Kubelet의 기본 config위치는 /var/lib/kubelet/config.yaml입니다.

node01 /var/lib/kubelet ➜  vi config.yaml 
## 수정 전
  x509:
    clientCAFile: /etc/kubernetes/pki/WRONG-CA-FILE.crt

## 수정 후 
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt

 

3. The cluster is broken again. Investigate and fix the issue.

Aug 22 16:03:31 node01 kubelet[13373]: E0822 16:03:31.828174   13373 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://controlplane:6553/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.35.51.3:6553: connect: connection refused

node01과 controlplane이 통신 되지 않는 모습입니다.

우선 controlplane의 port가 무엇인지 확인해야합니다

controlplane ~ ➜  cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep port
    - --secure-port=6443
        port: 6443
        port: 6443
        port: 6443

 

그럼 node01 의 port가 6553으로 설정되어있으니 6443으로 변경해주어야합니다.

node01 ~ ➜  cd /etc/kubernetes/
node01 /etc/kubernetes ➜  vi kubelet.conf
....
### controlplane의 포트 수정 : 6553 -> 6443
    server: https://controlplane:6553
    ...
    
    
## kubelet 재시작    
node01 ~ ✖ systemctl restart kubelet

node01 ~ ➜  exit
logout
Connection to node01 closed.

controlplane ~ ➜  k get node
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   36m   v1.30.0
node01         Ready    <none>          36m   v1.30.0

오늘 Worker Node Failure에 대해 알아보았습니다. 다음시간에는 Troubleshoot Network 를 알아보겠습니다. 

 

 


참조

※ Udemy Labs - Certified Kubernetes Administrator with Practice Tests