DKE/Kubernetes
[Kubernetes] NVIDIA device plugin 설치하기 / 2023.06.15
강원대목동녀
2023. 6. 15. 22:52
728x90
Kubernetes에서 GPU를 사용하기 위해 NVIDIA device plugin을 설치했다
아래 github 주소를 참고!!!
https://github.com/NVIDIA/k8s-device-plugin#quick-start
GitHub - NVIDIA/k8s-device-plugin: NVIDIA device plugin for Kubernetes
NVIDIA device plugin for Kubernetes. Contribute to NVIDIA/k8s-device-plugin development by creating an account on GitHub.
github.com
Step 1. nvidia-container-toolkit 설치하기
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Step 2. Docker 구성하기
sudo vim /etc/docker/daemon.json
# 아래 코드 복사
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Docker 재시작
sudo systemctl restart docker
Step 3. nvidia-container-runtime을 low-level로 설정하기
sudo vim /etc/containerd/config.toml
#아래 코드 복사
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
containerd 재시작
sudo systemctl restart containerd
Step 4. nvidia-device-plugin Daemonset 배포하기
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
Step 5. GPU를 사용하는 pod 생성하기
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOF
로그 확인 시, 아래와 같이 나오면 성공!!
kubectl logs gpu-pod
728x90