Neural network inference fails on VOXL2 Adreno GPU, but works on CPU, with Qualcomm SDK

dario-pisanti

Hi,
I hope you could help me with the following issue.

SUMMARY:
I am interested in running inference of deep neural network models on a VOXL2 by using the Qualcomm Neural Processing SDK, hopefully benefiting from the GPU and the NPUs onboard.
Specifically, I'm trying to run a pre-trained VGG-16 model from the ONNX framework, following the tutorial at https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/tutorial_onnx.html

After successfully converting the model from ONNX to DLC format through Qualcomm SDK, everything works fine when I run inference of the vgg16.dlc model (Step 7. of the tutorial) on the VOXL2 CPUs by running:

cd $SNPE_ROOT/examples/Models/VGG/data/cropped
snpe-net-run --input_list raw_list.txt --container ../../dlc/vgg16.dlc --output_dir ../../output***

with the expected output:

-------------------------------------------------------------------------------
Model String: N/A
SNPE v2.15.4.231013125348_62905
-------------------------------------------------------------------------------
Processing DNN input(s):
/opt/qcom/aistack/snpe/2.15.4.231013/examples/Models/VGG/data/cropped/kitten.raw
Successfully executed!

However, when I enable GPU usage, by running:

snpe-net-run --input_list raw_list.txt --container ../../dlc/vgg16.dlc --output_dir ../../output --use_gpu

I get the following error:

error_code=201; error_message=Casting of tensor failed. error_code=201; error_message=Casting of tensor failed. Failed to create input tensor: vgg0_dense0_weight_permute for Op: vgg0_dense0_fwd error: 1002; error_component=Dl System; line_no=817; thread_id=547788872288; error_component=Dl System; line_no=277; thread_id=547865747472

In conclusion, why the same model inference works on the VOXL2 CPU, but not on its GPU? In addition: does anyone have any experience with running deep learning inference on the VOXL2 NPUs with Qualcomm SDKs?

HOW TO REPRODUCE:
I succesfully setup Qualcomm Neural Processing SDK on VOXL2 following the instructions at
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/setup.html, using the binaries in $SNPE_ROOT/bin/aarch64-ubuntu-gcc7.5 and I accordingly modified $SNPE_ROOT/bin/envsetup.sh for correct environment variables setup.

I followed the instructions from steps1 to step 4 of the VGG tutorial at https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-2/tutori..., on VOXL2.

I converted the VGG ONNX model into Qualcomm SDK DLC format (step 5) on a Host machine running with Ubuntu 20.04 and a Clang 9 compiler installed, where I setup the Qualcomm Neural Processing SDK addressing the binaries in $SNPE_ROOT/bin/x86_64-linux-clang (the conversion operation is not supported on VOXL2 architecture).

I pushed the converted VGG model in DLC format to the VOXL2 and I followed the remaining instructions of the tutorial up to step 7, where I got the situation reported in the summary above.

VOXL2 SPECS:
Architecture: Aarch64
OS: Ubuntu 18.04
CPU: Qualcomm QRB5165: 8 cores up to 3.091 GHz, 8GB LPDDR5
GPU: Adreno 650 GPU - 1024 ALU
NPU: 15 TOPS AI embedded Neural Processing Unit
ONNX PYTHON PACKAGES: onnx==1.14.1, onnxruntime==1.16.1

HOST SPECS:
Architecture: x86
OS: Ubuntu 20.04
CPU: Intel(R) Xeon(R) W-2125 8 cores @ 4.00GHz
GPU: NVIDIA Corporation GP106GL [Quadro P2000]
ONNX PYTHON PACKAGES: onnx==1.14.1, onnxruntime==1.16.1

FURTHER DETAILS:
I checked the availability of GPU runtime on VOXL2, by executing the snpe-platform-validator tool (available with the Qualcomm Neural Processing SDK) from my Host machine:

cd /opt/qcom/aistack/snpe/2.15.4.231013/bin/x86_64-linux-clang 
python3 snpe-platform-validator-py --runtime="all" --directory=/opt/qcom/aistack/snpe/2.15.4.231013 --buildVariant="aarch64-ubuntu-gcc7.5"

The platform validator results for GPU are:

Runtime supported: Supported
Library Prerequisites: Found
Library Version: Not Queried
Runtime Core Version: Not Queried
Unit Test: Passed
Overall Result: Passed

Moderator

@dario-pisanti Our efforts have focused on voxl-tflite-server. voxl-tflite-server can take advantage of the CPU, GPU and NPU depending on the model.

Manu Bhardwaj 0

@dario-pisanti Hi Dario,

Thanks for sharing your issue. I'm also working with the Qualcomm Neural Processing SDK on VOXL2 Mini.

The error you mentioned could be due to:

Model compatibility issues during conversion.
SDK configuration, especially for GPU.
Double-check the GPU-specific settings in $SNPE_ROOT/bin/envsetup.sh and try running a simpler model to verify the GPU setup.

Have you managed to resolve this, or do you have any additional details? I'm also trying to use the QNN SDK with ONNX runtime on VOXL2 Mini.

Best,
Manu