[Pytorch] 분산 훈련 명령어 / torch.distributed

CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node="2"  --master_port=25210 train.py

해석

CUDA_VISIBLE_DEVICES=2,3 ➡ GPU 2,3 번 사용

torch.distributed.launch ➡ 분산 처리 시작

--nproc_per_node="2" ➡ 노드 개수 2개 (gpu 사용 개수와 동일하게 설정)

--master_port=25210 ➡ 포트 번호(이미 사용중인 포트 번호 외에 모든 번호 사용 가능)

train.py ➡ 실행 파이썬 코드

Latent Space에 대한 이해 (1)	2023.03.22
[Tensorflow] tf-nightly 란?, 설치법, 사용법 (1)	2023.02.28
[Image Inpainting - MAT] metrics 문제에 대하여 (0)	2023.01.03
Object Detection, Segmentation 분야에 자주 사용되는 MIoU에 대하여.. (1)	2022.12.22
[딥러닝] pytorch의 autograd 란? (0)	2022.10.06

AI 연구하는 깨굴이