Examining Effects of Gradient Accumulation and Gradient Checkpointing on GPU Memory Usage, GPU Utilization, and Training Time

Gradient Accumulation and Gradient Checkpointing are among the methods used to limit GPU memory usage during training.

In order to take advantage of these methods, one needs to sacrifice training time.

This project investigates the effects of Gradient Accumulation and Gradient Checkpointing on different parameters.

MNIST dataset is used as input and the size of the input images is increased to pressure GPU memory capacity.

The accuracy of the models is not presented since it is out of the scope of the project.

The project is implemented with Pytorch 2.0.1 and all of the source code and results are shared.

The run is performed in 2 parallel threads, while one of the threads performs training the other thread observes GPU status.

4 data loaders are used to feed GPUs with input data.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
00_github_media		00_github_media
models		models
.gitignore		.gitignore
MNIST_train_test_grad_acc.py		MNIST_train_test_grad_acc.py
README.md		README.md
download.py		download.py
plots.py		plots.py
run_MNIST_train_test_grad_acc_mult_thread.py		run_MNIST_train_test_grad_acc_mult_thread.py
thread_trials.py		thread_trials.py
thread_trials2.py		thread_trials2.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Examining Effects of Gradient Accumulation and Gradient Checkpointing on GPU Memory Usage, GPU Utilization, and Training Time

-----------------Batch Normalization ADAM Optimizer--------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Batch Normalization SGD Optimizer --------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Group Normalization ADAM Optimizer--------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Group Normalization SGD Optimizer --------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

About

Releases

Packages

Languages

mfatih7/MNIST_grad_acc

Folders and files

Latest commit

History

Repository files navigation

Examining Effects of Gradient Accumulation and Gradient Checkpointing on GPU Memory Usage, GPU Utilization, and Training Time

-----------------Batch Normalization ADAM Optimizer--------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Batch Normalization SGD Optimizer --------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Group Normalization ADAM Optimizer--------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

-----------------Group Normalization SGD Optimizer --------------------

--Without Gradient Checkpointing------With Gradient Checkpointing------

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages