YOLOv8 with NPU (VOXL2 Mini)

yashpyrrhus

Dear Dev Team,

My goal is to run the yolov8s-oiv model on the NPU. I've quantized the model to INT8 following this method:

from ultralytics import YOLO
model = YOLO('yolov8s-oiv7.pt')
model.export(format='tflite', int8=True, data='dataset.yaml', imgsz=640, nms=False, single_cls=False)

where dataset.yaml provides details to the calibration dataset.

When I run the quantized model on the VOXL2 Mini (SDK 1.6.3), it silently falls back to the CPU. I have edited the voxl-tflite-server.conf file to set delegate to nnapi.

I'd really appreciate any inputs on the best practices for quantizing YOLOv8 for the NPU, and references to relevant documentation. I also wished to run a sanity check to ensure the NPU works, can you please provide some quantized models that I can use to verify a functional NPU?

Thanks,
Yash

fhaltmayer

Adding onto this, is there a way to get the server to log out which operations/how many operations were delegated to the CPU? I believe its something that the tflite interpreter can spit out and would be super useful when trying to figure out why a model isnt loading correctly.

ModalAI Forum

YOLOv8 with NPU (VOXL2 Mini)