프린세스 다이어리

PyTorch DataParallel 코드구현 및 성능 비교 본문

AI, ML

PyTorch DataParallel 코드구현 및 성능 비교

개발공주 2023. 5. 30. 00:41
728x90
PyTorch DataParallel 모듈을 사용하여 병렬컴퓨팅을 구현할 수 있다. 참고로 구현에 사용한 모델은 규모가 작은 AlexNet이다.

 

1. DataParallel 적용 전

def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # 데이터셋 불러오기
    transform = transforms.Compose([
        transforms.Resize(size=(227, 227)),
        transforms.ToTensor(), #이미지를 pytorch tensors 타입으로 변형, 0.0~1.0 으로 변환
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # rgb, -1~1로 변환
    ])

    trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

    testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False) # TODO: fix

    # 모델 초기화 및 하이퍼파라미터 설정
    model = AlexNet().to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    # train, test
    num_epochs = 5

    for epoch in range(num_epochs):
        train_loss, train_acc = train(model, trainloader, criterion, optimizer, device)   
        test_acc = test(model, testloader, device)

        print(f'epoch {epoch+1:02d}, train loss: {train_loss:.5f}, train acc: {train_acc:.5f}, test accuracy: {test_acc:.5f}')

 

2. DataParallel 적용 후

def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # 데이터셋 불러오기
    transform = transforms.Compose([
        transforms.Resize(size=(227, 227)),
        transforms.ToTensor(), #이미지를 pytorch tensors 타입으로 변형, 0.0~1.0 으로 변환
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # rgb, -1~1로 변환
    ])

    trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

    testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False) # TODO: fix

    # 모델 초기화 및 하이퍼파라미터 설정
    model = AlexNet().to(device)
    if torch.cuda.device_count() > 1:
        model = nn.DataParallel(model)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    # train, test
    num_epochs = 5

    for epoch in range(num_epochs):
        train_loss, train_acc = train(model, trainloader, criterion, optimizer, device)   
        test_acc = test(model, testloader, device)

        print(f'epoch {epoch+1:02d}, train loss: {train_loss:.5f}, train acc: {train_acc:.5f}, test accuracy: {test_acc:.5f}')

 

3. 적용 전, 후의 성능 비교

Training Accuracy Avg(Baseline): 0.51755
Training Accuracy Avg(DP): 0.54058
Training Time Avg(Baseline): 12.302(sec / epoch)
Training Time Avg(DP): 6.546(sec / epoch)
DP를 적용한 결과 Epoch 당 Training Time은 Baseline 대비 -5.756 변동
DP를 적용한 결과 Training Accuracy는 Baseline 대비 +0.023 변동

 

4. GPU 사용량 확인

$ nvidia-smi

 

728x90
Comments