AI, ML
PyTorch DataParallel 코드구현 및 성능 비교
개발공주
2023. 5. 30. 00:41
728x90
PyTorch DataParallel 모듈을 사용하여 병렬컴퓨팅을 구현할 수 있다. 참고로 구현에 사용한 모델은 규모가 작은 AlexNet이다.
1. DataParallel 적용 전
def main():
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 데이터셋 불러오기
transform = transforms.Compose([
transforms.Resize(size=(227, 227)),
transforms.ToTensor(), #이미지를 pytorch tensors 타입으로 변형, 0.0~1.0 으로 변환
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # rgb, -1~1로 변환
])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False) # TODO: fix
# 모델 초기화 및 하이퍼파라미터 설정
model = AlexNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# train, test
num_epochs = 5
for epoch in range(num_epochs):
train_loss, train_acc = train(model, trainloader, criterion, optimizer, device)
test_acc = test(model, testloader, device)
print(f'epoch {epoch+1:02d}, train loss: {train_loss:.5f}, train acc: {train_acc:.5f}, test accuracy: {test_acc:.5f}')
2. DataParallel 적용 후
def main():
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# 데이터셋 불러오기
transform = transforms.Compose([
transforms.Resize(size=(227, 227)),
transforms.ToTensor(), #이미지를 pytorch tensors 타입으로 변형, 0.0~1.0 으로 변환
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # rgb, -1~1로 변환
])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False) # TODO: fix
# 모델 초기화 및 하이퍼파라미터 설정
model = AlexNet().to(device)
if torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# train, test
num_epochs = 5
for epoch in range(num_epochs):
train_loss, train_acc = train(model, trainloader, criterion, optimizer, device)
test_acc = test(model, testloader, device)
print(f'epoch {epoch+1:02d}, train loss: {train_loss:.5f}, train acc: {train_acc:.5f}, test accuracy: {test_acc:.5f}')
3. 적용 전, 후의 성능 비교
Training Accuracy Avg(Baseline): 0.51755
Training Accuracy Avg(DP): 0.54058
Training Time Avg(Baseline): 12.302(sec / epoch)
Training Time Avg(DP): 6.546(sec / epoch)
DP를 적용한 결과 Epoch 당 Training Time은 Baseline 대비 -5.756 변동
DP를 적용한 결과 Training Accuracy는 Baseline 대비 +0.023 변동
4. GPU 사용량 확인
$ nvidia-smi
728x90