Hi @modaltb, @Chad-Sweet,
I hope you can help me with this issue:
SUMMARY:
I deployed my own .tflite model on VOXL2 by properly customizing the inference_helper.cpp class of the voxl-tflite-server. My model is supposed to take two input images pre-loaded on-board and perform image matching. No input is taken from the voxl cameras.
When I run the server, it fails to apply GPU delegate, as shown in this output:
(base) voxl2:/$ voxl-tflite-server
=================================================================
skip_n_frames: 0
=================================================================
model: /usr/bin/dnn/outdoor_ds_640_ONNXop12_TFv2.8_ExpNewConv_custOps_float16.tflite
=================================================================
input_pipe: /run/mpa/hires/
=================================================================
delegate: gpu
=================================================================
allow_multiple: false
=================================================================
output_pipe_prefix: mobilenet
=================================================================
existing instance of voxl-tflite-server found, attempting to stop it
INFO: Created TensorFlow Lite delegate for GPU.
Failed to apply GPU delegate
------VOXL TFLite Server------
It failed to apply also the XNNPACK and NNAPI delegates.
For the deployment on Voxl2, i modified the following files of the voxl-tflite-server:
- ./src/inference_helper.cpp
- ./include/inference_helper.h
- ./src/main.cpp
- ./scripts/qrb5165/voxl-configure-tflite
VOXL2 SPECS:
Architecture: Aarch64
OS: Ubuntu 18.04
CPU: Qualcomm QRB5165: 8 cores up to 3.091 GHz, 8GB LPDDR5
GPU: Adreno 650 GPU - 1024 ALU
NPU: 15 TOPS AI embedded Neural Processing Unit
HOST (from which the voxl-tflite-served is deployed):
Architecture: x86
OS: Ubuntu 20.04
CPU: Intel(R) Xeon(R) W-2125 8 cores @ 4.00GHz
GPU: NVIDIA Corporation GP106GL [Quadro P2000]
MODEL CONVERSION DETAILS:
I converted my .tflite model from a TensorFlow model with a post-training quantization as in the Python instructions at https://docs.modalai.com/voxl-tflite-server/
This is my Python code for the conversion with tensorflow==2.8.0, following v2.8 API:
# Load the tensorflow model
converter = tf.lite.TFLiteConverter.from_saved_model(tf_model_path)
# Set converter flags
converter.experimental_new_converter = True
converter.allow_custom_ops = True
# Post-training quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
# Model conversion and saving
tflite_model = converter.convert()
with open(tflite_model_path, 'wb') as f:
f.write(tflite_model)
The model is converted although these warning messages are shown in the output:
2023-12-08 19:34:18.409799: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-08 19:34:19.967671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2945 MB memory: -> device: 0, name: Quadro P2000, pci bus id: 0000:65:00.0, compute capability: 6.1
2023-12-08 20:12:28.357684: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:357] Ignored output_format.
2023-12-08 20:12:28.357739: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:360] Ignored drop_control_dependency.
2023-12-08 20:12:28.359555: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8
2023-12-08 20:12:28.437131: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2023-12-08 20:12:28.437171: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8
2023-12-08 20:12:28.618928: I tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.
2023-12-08 20:12:29.814406: I tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8
2023-12-08 20:12:30.886233: I tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 2526683 microseconds.
2023-12-08 20:12:32.444671: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:237] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-12-08 20:12:34.744161: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1903] The following operation(s) need TFLite custom op implementation(s):
Custom ops: Cast, Range, RealDiv, StridedSlice, Transpose
Details:
tf.Cast(tensor<1xf64>) -> (tensor<1xi64>) : {Truncate = false, device = ""}
tf.Cast(tensor<1xi64>) -> (tensor<1xf64>) : {Truncate = false, device = ""}
tf.Cast(tensor<?xf64>) -> (tensor<?xi64>) : {Truncate = false, device = ""}
tf.Cast(tensor<?xi64>) -> (tensor<?xf64>) : {Truncate = false, device = ""}
tf.Cast(tensor<f64>) -> (tensor<i64>) : {Truncate = false, device = ""}
tf.Cast(tensor<i64>) -> (tensor<f64>) : {Truncate = false, device = ""}
tf.Range(tensor<i64>, tensor<i64>, tensor<i64>) -> (tensor<?xi64>) : {device = ""}
tf.RealDiv(tensor<1xf64>, tensor<1xf64>) -> (tensor<1xf64>) : {device = ""}
tf.RealDiv(tensor<?xf64>, tensor<f64>) -> (tensor<?xf64>) : {device = ""}
tf.RealDiv(tensor<f64>, tensor<f64>) -> (tensor<f64>) : {device = ""}
tf.StridedSlice(tensor<5x2x60x80x60x1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> (tensor<2x60x80x60x1xi64>) : {begin_mask = 0 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 0 : i64, new_axis_mask = 0 : i64, shrink_axis_mask = 1 : i64}
tf.Transpose(tensor<1x128x5x60x5x80xf32>, tensor<6xi32>) -> (tensor<1x128x5x5x60x80xf32>) : {device = ""}
See instructions: https://www.tensorflow.org/lite/guide/ops_custom
If in the Python conversion code I disable the allow_custom_ops flag and I enable the supported ops as shown in this code snippet:
- converter.allow_custom_ops = True
+ converter.target_spec.supported_ops = [
+ tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
+ tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
]
The last output warnings turn into these ones:
2023-12-08 14:35:09.243817: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1892] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s):
Flex ops: FlexCast, FlexRange, FlexRealDiv, FlexStridedSlice, FlexTranspose
Details:
tf.Cast(tensor<1xf64>) -> (tensor<1xi64>) : {Truncate = false, device = ""}
tf.Cast(tensor<1xi64>) -> (tensor<1xf64>) : {Truncate = false, device = ""}
tf.Cast(tensor<?xf64>) -> (tensor<?xi64>) : {Truncate = false, device = ""}
tf.Cast(tensor<?xi64>) -> (tensor<?xf64>) : {Truncate = false, device = ""}
tf.Cast(tensor<f64>) -> (tensor<i64>) : {Truncate = false, device = ""}
tf.Cast(tensor<i64>) -> (tensor<f64>) : {Truncate = false, device = ""}
tf.Range(tensor<i64>, tensor<i64>, tensor<i64>) -> (tensor<?xi64>) : {device = ""}
tf.RealDiv(tensor<1xf64>, tensor<1xf64>) -> (tensor<1xf64>) : {device = ""}
tf.RealDiv(tensor<?xf64>, tensor<f64>) -> (tensor<?xf64>) : {device = ""}
tf.RealDiv(tensor<f64>, tensor<f64>) -> (tensor<f64>) : {device = ""}
tf.StridedSlice(tensor<5x2x60x80x60x1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> (tensor<2x60x80x60x1xi64>) : {begin_mask = 0 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 0 : i64, new_axis_mask = 0 : i64, shrink_axis_mask = 1 : i64}
tf.Transpose(tensor<1x128x5x60x5x80xf32>, tensor<6xi32>) -> (tensor<1x128x5x5x60x80xf32>) : {device = ""}
See instructions: https://www.tensorflow.org/lite/guide/ops_select
By using the .tflite model generated with these new flags, the voxl-tflite-server still fails to apply the GPU delegate.
FURTHER DETAILS:
I tested the same .tflite model in C++ by building TensorFlow Lite with CMake on a macOS Ventura 13.6 (x86), following the instructions at https://www.tensorflow.org/lite/guide/build_cmake
I built the Flex delegate shared library "libtensorflowlite_flex.so" with the following command (see instructions at https://www.tensorflow.org/lite/guide/ops_select)
bazel build -c opt --config=monolithic tensorflow/lite/delegates/flex:tensorflowlite_flex
and I linked it to my model.
I was able to succesfully run an inference of the model and get correct output.