[CKA] KodeKloud - OS Upgrade

안녕하세요, 쯀리입니다.

오늘은 4번째 장에서 OS Upgrade에 관해 알아보겠습니다.

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

Upgrading kubeadm clusters

This page explains how to upgrade a Kubernetes cluster created with kubeadm from version 1.29.x to version 1.30.x, and from version 1.30.x to 1.30.y (where y > x). Skipping MINOR versions when upgrading is unsupported. For more details, please visit Versio

kubernetes.io

OS 업그레이드 란?

Kubernetes(K8s)에서 "OS 업그레이드"는 클러스터의 노드를 구성하는 각 머신의 운영 체제를 업그레이드하는 것을 의미합니다. 이는 보안 패치, 성능 개선 또는 새로운 기능을 활용하기 위해 필요할 수 있습니다. Kubernetes 클러스터에서 운영 체제 업그레이드는 주의 깊게 계획하고 실행해야 합니다.

주요 단계

노드 Cordoning 및 Draining: 업그레이드할 노드를 새 워크로드 스케줄링에서 제외하고 현재 실행 중인 워크로드를 안전하게 다른 노드로 이동시킵니다.
운영 체제 업그레이드: 노드의 운영 체제를 업그레이드합니다.
노드 재등록 및 워크로드 복구: 업그레이드된 노드를 다시 클러스터에 참여시키고 워크로드를 다시 스케줄링합니다.

1. 노드 Cordoning 및 Draining

노드를 업그레이드하기 전에 해당 노드에서 새로운 파드가 스케줄링되지 않도록 하고, 현재 실행 중인 파드를 다른 노드로 이동시켜야 합니다.

kubectl cordon <노드명> 
kubectl drain <노드명> --ignore-daemonsets --delete-emptydir-data

##예시
kubectl cordon my-node 
kubectl drain my-node --ignore-daemonsets --delete-emptydir-data

2. 운영 체제 업그레이드

노드에서 운영 체제를 업그레이드합니다. 이는 노드의 SSH에 접속하여 직접 실행하거나, 자동화된 스크립트를 사용할 수 있습니다.

예를 들어, Ubuntu 서버를 최신 패치로 업그레이드하려면 다음 명령어를 사용할 수 있습니다

sudo apt update sudo apt upgrade -y sudo reboot

재부팅 후 노드가 다시 시작됩니다.

3. 노드 재등록 및 워크로드 복구

노드가 다시 시작되면 클러스터에 노드를 다시 참여시키고 파드를 다시 스케줄링합니다.

kubectl uncordon <노드명>

## 예시
kubectl uncordon my-node

주의 사항

업그레이드 계획: 클러스터의 가용성을 유지하기 위해 노드 업그레이드를 하나씩 순차적으로 수행하는 것이 좋습니다.
백업: 중요한 데이터와 구성을 백업합니다.
테스트: 업그레이드 전에 테스트 환경에서 업그레이드 절차를 검증합니다.
모니터링: 업그레이드 중 및 업그레이드 후에 클러스터 상태를 모니터링합니다.

Quiz

1. Let us explore the environment first. How many nodes do you see in the cluster?
Including the controlplane and worker nodes.

controlplane ~ ➜  k get nodes -A
NAME           STATUS   ROLES           AGE     VERSION
controlplane   Ready    control-plane   4m34s   v1.30.0
node01         Ready    <none>          3m52s   v1.30.0

2개

2. How many applications do you see hosted on the cluster?
Check the number of deployments in the default namespace.

controlplane ~ ➜  k get deploy
NAME   READY   UP-TO-DATE   AVAILABLE   AGE
blue   3/3     3            3           21s

1개

3. Which nodes are the applications hosted on?
controlplane과 node01로 이루어져있습니다.

controlplane ~ ➜  k get pods -o wide
NAME                   READY   STATUS    RESTARTS   AGE    IP           NODE           NOMINATED NODE   READINESS GATES
blue-fffb6db8d-48jkp   1/1     Running   0          2m5s   10.244.0.4   controlplane   <none>           <none>
blue-fffb6db8d-8sjc6   1/1     Running   0          2m5s   10.244.1.3   node01         <none>           <none>
blue-fffb6db8d-bxdzv   1/1     Running   0          2m5s   10.244.1.2   node01         <none>           <none>

4. We need to take node01 out for maintenance. Empty the node of all applications and mark it unschedulable.

Node node01 Unschedulable
Pods evicted from node01

controlplane ~ ➜  k drain node01 --ignore-daemonsets
node/node01 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-8h74f, kube-system/kube-proxy-l5f8r
evicting pod default/blue-fffb6db8d-bxdzv
evicting pod default/blue-fffb6db8d-8sjc6
pod/blue-fffb6db8d-bxdzv evicted
pod/blue-fffb6db8d-8sjc6 evicted
node/node01 drained

https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/

Safely Drain a Node

This page shows how to safely drain a node, optionally respecting the PodDisruptionBudget you have defined. Before you begin This task assumes that you have met the following prerequisites: You do not require your applications to be highly available during

kubernetes.io

5. What nodes are the apps on now?

controlplane ~ ➜  k get pods -o wide
NAME                   READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
blue-fffb6db8d-48jkp   1/1     Running   0          11m   10.244.0.4   controlplane   <none>           <none>
blue-fffb6db8d-b29j4   1/1     Running   0          79s   10.244.0.6   controlplane   <none>           <none>
blue-fffb6db8d-jjrzs   1/1     Running   0          79s   10.244.0.5   controlplane   <none>           <none>

이제 node01 사라지고 controlplane으로 node 설정 되네요

6. The maintenance tasks have been completed. Configure the node node01 to be schedulable again.

다시 스케쥴을 거는 방법은 uncordon 을 사용하면 됩니다

controlplane ~ ➜  k get nodes
NAME           STATUS                     ROLES           AGE     VERSION
controlplane   Ready                      control-plane   6m28s   v1.30.0
node01         Ready,SchedulingDisabled   <none>          5m40s   v1.30.0

controlplane ~ ➜  k uncordon node01
node/node01 uncordoned

controlplane ~ ➜  k get nodes
NAME           STATUS   ROLES           AGE     VERSION
controlplane   Ready    control-plane   7m51s   v1.30.0
node01         Ready    <none>          7m3s    v1.30.0

7. How many pods are scheduled on node01 now in the default namespace?

아직 node01로 스케줄된 pod는 없습니다.

controlplane ~ ➜  k get pods -o wide
NAME                   READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
blue-fffb6db8d-2bdqj   1/1     Running   0          4m44s   10.244.0.4   controlplane   <none>           <none>
blue-fffb6db8d-9vqhm   1/1     Running   0          3m47s   10.244.0.6   controlplane   <none>           <none>
blue-fffb6db8d-kcrwq   1/1     Running   0          3m48s   10.244.0.5   controlplane   <none>           <none>

8. Why are there no pods on node01?
아마 새로운 파드가 생성될때 node01로 schedule될것입니다.

Only when new pods are created they will be scheduled

9. Why are the pods placed on the controlplane node? Check the controlplane node details.

controlplane ~ ✖ k describe nodes controlplane | grep Taints
Taints:             <none>

따로 Taints가 설정되어있지 않기때문에 다른 파드들이 배포되는것입니다

11. We need to carry out a maintenance activity on node01 again.
Try draining the node again using the same command as before: kubectl drain node01 --ignore-daemonsets

controlplane ~ ➜  k get nodes
NAME           STATUS   ROLES           AGE   VERSION
controlplane   Ready    control-plane   20m   v1.30.0
node01         Ready    <none>          20m   v1.30.0

controlplane ~ ➜  kubectl drain node01 --ignore-daemonsets
node/node01 cordoned
error: unable to drain node "node01" due to error:cannot delete cannot delete Pods that declare no controller (use --force to override): default/hr-app, continuing command...
There are pending nodes to be drained:
 node01
cannot delete cannot delete Pods that declare no controller (use --force to override): default/hr-app

해당 명령어로는 node 가 삭제되지 않는것을 확인할 수 있습니다.

12. Why did the drain command fail on node01? It worked the first time!
해당 에러는 노드에서 파드를 삭제하려고 할 때, 특정 파드가 컨트롤러(예: 디플로이먼트, 스테이트풀셋, 데몬셋 등)에 의해 관리되지 않기 때문에 발생합니다. Kubernetes는 기본적으로 컨트롤러에 의해 관리되지 않는 파드를 안전하게 삭제할 수 없다고 판단합니다. 이는 데이터 손실이나 서비스 중단을 방지하기 위한 것입니다.

there is a pod in node01 which is not part of a replicaset

13. What is the name of the POD hosted on node01 that is not part of a replicaset?

controlplane ~ ✖ k get pods -o wide
NAME                   READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
blue-fffb6db8d-2bdqj   1/1     Running   0          20m     10.244.0.4   controlplane   <none>           <none>
blue-fffb6db8d-9vqhm   1/1     Running   0          19m     10.244.0.6   controlplane   <none>           <none>
blue-fffb6db8d-kcrwq   1/1     Running   0          19m     10.244.0.5   controlplane   <none>           <none>
hr-app                 1/1     Running   0          5m20s   10.244.1.4   node01         <none>           <none>

14. What would happen to hr-app if node01 is drained forcefully?

Try it and see for yourself.

controlplane ~ ➜  kubectl drain node01 --ignore-daemonsets --force
node/node01 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-flannel/kube-flannel-ds-xdcwq, kube-system/kube-proxy-256b4; deleting Pods that declare no controller: default/hr-app
evicting pod default/hr-app

pod/hr-app evicted
node/node01 drained


controlplane ~ ➜  k get pods -o wide
NAME                   READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
blue-fffb6db8d-2bdqj   1/1     Running   0          22m   10.244.0.4   controlplane   <none>           <none>
blue-fffb6db8d-9vqhm   1/1     Running   0          21m   10.244.0.6   controlplane   <none>           <none>
blue-fffb6db8d-kcrwq   1/1     Running   0          21m   10.244.0.5   controlplane   <none>           <none>

evicted : 쫓아내다

pod/hr-app will be lost forever

15. Oops! We did not want to do that! hr-app is a critical application that should not be destroyed. We have now reverted back to the previous state and re-deployed hr-app as a deployment.

16. hr-app is a critical app and we do not want it to be removed and we do not want to schedule any more pods on node01. Mark node01 as unschedulable so that no new pods are scheduled on this node.

Make sure that hr-app is not affected.
Node01 Unschedulable
hr-app still running on node01?

이때 사용해야할 부분은

cordon

cordon 명령어는 특정 노드를 스케줄링 불가능 상태로 만듭니다. 이 상태가 되면 해당 노드에 새로운 파드가 스케줄링되지 않습니다. 그러나 이미 실행 중인 파드는 그대로 유지됩니다.

drain

drain 명령어는 노드에서 실행 중인 모든 파드를 안전하게 종료하고, 다른 노드로 재스케줄링합니다. drain 명령어는 먼저 해당 노드를 cordon 상태로 만들고, 기존 파드가 모두 제거될 때까지 기다립니다. 이 과정에서 DaemonSet에 의해 관리되는 파드나 빈 emptyDir 볼륨을 사용하는 파드도 고려됩니다.

controlplane ~ ✖ k cordon node01
node/node01 cordoned

controlplane ~ ➜  k get pods 
NAME                      READY   STATUS    RESTARTS   AGE
blue-fffb6db8d-2bdqj      1/1     Running   0          26m
blue-fffb6db8d-9vqhm      1/1     Running   0          25m
blue-fffb6db8d-kcrwq      1/1     Running   0          25m
hr-app-74c9788784-dksjm   1/1     Running   0          3m34s

controlplane ~ ➜  k get nodes
NAME           STATUS                     ROLES           AGE   VERSION
controlplane   Ready                      control-plane   31m   v1.30.0
node01         Ready,SchedulingDisabled   <none>          31m   v1.30.0

오늘은 OS를 Upgrade 하는 방법을 알아보았습니다.

다음시간에는 Cluster를 Upgrade하는 방법을 알아볼게요!

참조

※ Udemy Labs - Certified Kubernetes Administrator with Practice Tests

'IT 잡지식 > DevOps' 카테고리의 다른 글

[CKA] KodeKloud - Backup and Restore Methods 1,2 (0)	2024.07.13
[CKA] KodeKloud - Cluster Upgrade Process (0)	2024.07.12
[CKA] KodeKloud - Multi Container PODs (0)	2024.07.05
[CKA] KodeKloud - Secrets (0)	2024.07.04
[CKA] KodeKloud - Env Variables (0)	2024.07.02

Funlife Julie

[CKA] KodeKloud - OS Upgrade

OS 업그레이드 란?

주요 단계

1. 노드 Cordoning 및 Draining

2. 운영 체제 업그레이드

3. 노드 재등록 및 워크로드 복구

주의 사항

Quiz

cordon

drain

참조

'IT 잡지식 > DevOps' 카테고리의 다른 글

티스토리툴바

[CKA] KodeKloud - OS Upgrade

OS 업그레이드 란?

주요 단계

1. 노드 Cordoning 및 Draining

2. 운영 체제 업그레이드

3. 노드 재등록 및 워크로드 복구

주의 사항

Quiz

cordon

drain

참조

'IT 잡지식 > DevOps' 카테고리의 다른 글

관련글

티스토리툴바