Kubernetes에서 GPU를 사용하기 위해 NVIDIA device plugin을 설치했다
아래 github 주소를 참고!!!
https://github.com/NVIDIA/k8s-device-plugin#quick-start
Step 1. nvidia-container-toolkit 설치하기
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
Step 2. Docker 구성하기
sudo vim /etc/docker/daemon.json
# 아래 코드 복사
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Docker 재시작
sudo systemctl restart docker
Step 3. nvidia-container-runtime을 low-level로 설정하기
sudo vim /etc/containerd/config.toml
#아래 코드 복사
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
containerd 재시작
sudo systemctl restart containerd
Step 4. nvidia-device-plugin Daemonset 배포하기
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
Step 5. GPU를 사용하는 pod 생성하기
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOF
로그 확인 시, 아래와 같이 나오면 성공!!
kubectl logs gpu-pod
'DKE > Kubernetes' 카테고리의 다른 글
[Kubeflow] Jupyter jovyan 사용자를 sudo 그룹에 추가하기 / 2023.06.16 (0) | 2023.06.16 |
---|---|
[Kubernetes] local-path-storage 설치하기 / 2023.06.15 (0) | 2023.06.15 |
[Kubeflow] 계정(User) 추가하기 / 2023.06.14 (0) | 2023.06.14 |
[Kubeflow] ERROR [403] poddefaults.kubeflow.org is forbidden / 2023.06.14 (0) | 2023.06.14 |
Kubeflow 설치하기 (Ubuntu 20.04) / 2023.06.09 (6) | 2023.06.09 |