Skip to content

Latest commit

 

History

History
71 lines (55 loc) · 2.32 KB

README.md

File metadata and controls

71 lines (55 loc) · 2.32 KB

Dorefa-net

A pytorch implementation of dorefa.The code is inspired by LaVieEnRoseSMZ and zzzxxxttt.

Requirements

  • python > 3.5
  • torch >= 1.1.0
  • torchvision >= 0.4.0
  • tb-nightly, future (for tensorboard)
  • nvidia-dali >= 0.12 (faster dataloader)

Cifar-10 Accuracy

Quantized model are trained from scratch

Model W_bit A_bit Acc
resnet-18 32 32 94.71%
resnet-18 4 4 94.36%
resnet-18 1 4 93.87%

ImageNet Accuracy

Quantized model are trained from scratch

Model W_bit A_bit Top1 Top5
resnet-18 32 32 69.80% 89.32%
resnet-18 4 4 66.60% 87.15%

Usages

Download the ImageNet dataset and move validation images to labeled subfolders.To do this, you can use the following script

  • To train the model
python3 cifar_train_eval.py    
python3 imagenet_torch_loader --multiprocessing-distributed    or    python3 imagenet_dali_loader.py 
  • To check the tensorboard log

     tensorboard --logdir='your_log_dir'
    

    then navigating to https://localhost:6006 .

  • To test the quantized model and bn fused

    • convert to the quantized model for inference
     python3 test_fused_quant_model.py
    
    • test bn fuse on the float model
     python3 bn_fuse.py
    

    Obviously, this fusion method is not suitable for quantized models. We will change the bn fuse in the future according to the paper section 3.2.2.

    This bn fuse test result is not serious. However, it is OK to explain the problem qualitatively.

Model on CPU before fuse after fuse
resnet-18 0.74 s 0.51 s
resnet-34 1.41 s 0.92 s
resnet-50 1.96 s 1.02 s

To do

  • Train on imagenet2012
  • Fold bn
  • Test speedup from quantization and bn fold
  • Deploy models to embedded devices
  • ...