Skip to content

MatteoMendula/TensorRT-YOLOv8-Instance-Segmentation

Repository files navigation

TensorRT YOLOv8 Instance Segmentation

Python scripts performing instance segmentation using the YOLOv8 model in Python3. ! ONNX YOLOv8 Instance Segmentation Original image: Ultralytics This is the result of a finetuning on COCO images, the model is able to detect the standard coco classes within the image.

This represents my personal take on extensive online resources, although I couldn't replicate it verbatim. Hence, I developed my custom pipeline to execute Yolov8-seg on TRT. I've exclusively tested this on Jetson TX2, so I cannot guarantee its compatibility with other platforms. Though it was a challenging process and the outcome isn't flawless, I've shared it to contribute to the community's progress, combining my research efforts and imperfect programming towards a more refined solution. The initial resource I used to start are linked below, many congratulation and thanks to the authors, I would not be able to make any significant progress without your directions.

Requirements

  • Jetson Nano
    • nvidia@tegra-ubuntu:~/Documents/mybeatifulpath$ head -n 1 /etc/nv_tegra_release R32 (release), REVISION: 7.5, GCID: 36557527, BOARD: t210ref, EABI: aarch64, DATE: Tue Jun 11 23:12:44 UTC 2024

    • nvidia@tegra-ubuntu:~/Documents/mybeatifulpath$ python3 --version Python 3.6.9

  • Check the requirements.txt file.

Installation

git clone https://github.com/ibaiGorordo/ONNX-YOLOv8-Instance-Segmentation.git
pip install -r requirements.txt

1. Export ONNX model

Since ultralytics requires high-level resources and close-to-latest pip dependencies it is highly suggested to export the model on a laptop where you do not need to fight too much against CUDA and CUDNN versioning. To check you nvidia drivers and CUDA version run:

nvidia@tegra-ubuntu:~/Documents/mybeatifulpath$: nvidia-smi

This will return someting like:

NVIDIA driver version is top left while CUDA version is top right NVIDIA driver version is top left while CUDA version is top right

Assuming GPU drivers are updated, this only requires to install torch following the direction provided at: PyTorch Start Locally

For other CUDA versions which are not listed in the GUI change:

--index-url https://download.pytorch.org/whl/cu118

into:

--index-url https://download.pytorch.org/whl/cu"YOUR_VERSION_HERE_NO_DOTS"

pasting that url on your browser will give you the list of dependencies which will be installed with torch - torchvision and torchaudio.

Now can convert the Pytorch model to ONNX using the following Jupyter notebook: Jupyter notebook

N.B. when you export the model remember to set the input size you used to run the inference with TRT

Build a TRT engine with trtexec

trtexec is a command line wrapper that helps quickly utilize and protoype models with TensorRT, without requiring you to write your own inference application (link here).

trtexec comes preinstalled on Jetson platforms together with TensorRT. Its location should be: /usr/src/tensorrt/bin/trtexec

So, move the onnx file generated on your laptop at the previous step on the board using scp or a usb drive and run:

/usr/src/tensorrt/bin/trtexec  --onnx="yolov8-seg${LETTER}.onnx"  --saveEngine="yolov8-seg${LETTER}.engine"  --explicitBatch  >  trtexec_log.txt

For fp16 float rapresentation:

/usr/src/tensorrt/bin/trtexec  --onnx="yolov8-seg${LETTER}.onnx"  --saveEngine="yolov8-seg${LETTER}.engine"  --explicitBatch --fp16 >  trtexec_log.txt

to build the engine and to save the log outputs to the trtexec_log.txt text file. This will produce a lot of information about the building process and even if some warning pops out (depending on your TRT version) this should work flowless.

You can delete the current file by clicking the Remove button in the file explorer. The file will be moved into the Trash folder and automatically deleted after 7 days of inactivity.

Run the inference

The engine parsing together with pre/post processing have been condensed in a single python file. Here the inference is done with pycuda, allocating memory from and to the GPU with high level APIs. So simply run:

python3 Yolov8Seg_pycuda.py --engine_path yolov8s-seg.engine --image_path original_bus_832x1088.jpg.jpg --save_output True

To run with camera:

python3 Yolov8Seg_pycuda.py --engine_path yolov8s-seg.engine --camera True --camera_id 0 --width 640 --height 480 --save_output True

Sources

Again a big thank you to the main authors:

Other state of the art resources

About

TensorRT YOLOv8 Instance Segmentation with pycuda for Jetson TX2

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published