Examining Effects of Gradient Accumulation and Gradient Checkpointing on GPU Memory Usage, GPU Utilization, and Training Time
Gradient Accumulation and Gradient Checkpointing are among the methods used to limit GPU memory usage during training.
In order to take advantage of these methods, one needs to sacrifice training time.
This project investigates the effects of Gradient Accumulation and Gradient Checkpointing on different parameters.
MNIST dataset is used as input and the size of the input images is increased to pressure GPU memory capacity.
The accuracy of the models is not presented since it is out of the scope of the project.
The project is implemented with Pytorch 2.0.1 and all of the source code and results are shared.
The run is performed in 2 parallel threads, while one of the threads performs training the other thread observes GPU status.
4 data loaders are used to feed GPUs with input data.