Depth prediction from monocular images for Autonomous Driving.
We are solving this problem by using a multiscale regression cnn.
We have used Apollo Scenes dataset and a custom dataset created by us using AirSim Engine by Unity and Microsoft in this project.
Apollo dataset consists of 23k images.
Custom dataset created by us consists of 99k images.
We clipped the depth image at 80 instead of 165. This is done to focus more on depth upto 80m (near sight vision)
We also removed the upper half of the both rgb and depth image to remove sky from the scene.
Model architecture used by us is:-
Here, DCN block mentioned is:-
Input Ground Truth Output
Here is the plot of the losses:
L1 is absolute mean difference at 1/64th of original resolution
L2 is absolute mean difference at 1/16th of original resolution
L3 is absolute mean difference at 1/4th of original resolution
L4 is absolute mean difference at original resolution
Here is the mean and standard deviation plot:
We can see in the mean and standard deviation plot that the predicted depth is showing characteristics of a gamma function. Since the predicted depth should be linear and identical to ground truth depth, hence we can apply a gamma correction to output depth image which will give us better results.
We applied gamma correction on the predicted depth images:
Output Image
Output image with gamma correction
The R2 metric score for different cases is as follows:
Case | score |
---|---|
Output without Gamma correction | 0.700598 |
Output with GC (gamma = 0.8) | 0.757086 |
Output with GC (gamma = 0.5) | 0.931790 |
Pseudo color result images
Ground Truth Original output Output with gamma correction
The entire codebase is written in compatibility with tensorflow 2.0. for installing the other python libraries please use the requiremets.txt file.
$pip install -r requirements.txt
This text file contains all the requirements except cv2 install guide.
To be able to train and validate your model, create 2 .json files of the names train.json and val.json ad they should be of the following format:
{
"<full path and name of input image>":"<full path and name of the corresponding ground truth image>"
}
First and foremost please download the model checkpoint from here and save the uncompressed files inside ./tmp
There are 2 things we can do here: 1 Train from scratch:
$ python execute.py -m train
2 Test on images in the ./test folder pleaser make sure that the image that you have provided is in 4:3 in W:H ratio. The given image will be first resized to (800,600) first and the input to the network is the IMAGE[H-288:H,W] and the result is the depth estimate stitched parallel to the given input.
$ python execute.py -m test