Skip to content

mfatih7/MNIST_grad_acc

Repository files navigation

Examining Effects of Gradient Accumulation and Gradient Checkpointing on GPU Memory Usage, GPU Utilization, and Training Time

Gradient Accumulation and Gradient Checkpointing are among the methods used to limit GPU memory usage during training.

In order to take advantage of these methods, one needs to sacrifice training time.

This project investigates the effects of Gradient Accumulation and Gradient Checkpointing on different parameters.

MNIST dataset is used as input and the size of the input images is increased to pressure GPU memory capacity.

The accuracy of the models is not presented since it is out of the scope of the project.

The project is implemented with Pytorch 2.0.1 and all of the source code and results are shared.

The run is performed in 2 parallel threads, while one of the threads performs training the other thread observes GPU status.

4 data loaders are used to feed GPUs with input data.

-----------------Batch Normalization ADAM Optimizer--------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Batch Normalization SGD Optimizer --------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Group Normalization ADAM Optimizer--------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Group Normalization SGD Optimizer --------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages