Fail to apply GPU delegate with custom model on voxl-tflite-server
-
Hi @modaltb, @Chad-Sweet,
I hope you can help me with this issue:
SUMMARY:
I deployed my own .tflite model on VOXL2 by properly customizing the inference_helper.cpp class of the voxl-tflite-server. My model is supposed to take two input images pre-loaded on-board and perform image matching. No input is taken from the voxl cameras.
When I run the server, it fails to apply GPU delegate, as shown in this output:
(base) voxl2:/$ voxl-tflite-server ================================================================= skip_n_frames: 0 ================================================================= model: /usr/bin/dnn/outdoor_ds_640_ONNXop12_TFv2.8_ExpNewConv_custOps_float16.tflite ================================================================= input_pipe: /run/mpa/hires/ ================================================================= delegate: gpu ================================================================= allow_multiple: false ================================================================= output_pipe_prefix: mobilenet ================================================================= existing instance of voxl-tflite-server found, attempting to stop it INFO: Created TensorFlow Lite delegate for GPU. Failed to apply GPU delegate ------VOXL TFLite Server------It failed to apply also the XNNPACK and NNAPI delegates.
For the deployment on Voxl2, i modified the following files of the voxl-tflite-server:
- ./src/inference_helper.cpp
- ./include/inference_helper.h
- ./src/main.cpp
- ./scripts/qrb5165/voxl-configure-tflite
VOXL2 SPECS:
Architecture: Aarch64
OS: Ubuntu 18.04
CPU: Qualcomm
QRB5165: 8 cores up to 3.091 GHz, 8GB LPDDR5
GPU: Adreno 650 GPU - 1024 ALU
NPU: 15 TOPS AI embedded Neural Processing UnitHOST (from which the voxl-tflite-served is deployed):
Architecture: x86
OS: Ubuntu 20.04
CPU: Intel(R) Xeon(R) W-2125 8 cores @ 4.00GHz
GPU: NVIDIA Corporation GP106GL [Quadro P2000]MODEL CONVERSION DETAILS:
I converted my .tflite model from a TensorFlow model with a post-training quantization as in the Python instructions at https://docs.modalai.com/voxl-tflite-server/
This is my Python code for the conversion with tensorflow==2.8.0, following v2.8 API:
# Load the tensorflow model converter = tf.lite.TFLiteConverter.from_saved_model(tf_model_path) # Set converter flags converter.experimental_new_converter = True converter.allow_custom_ops = True # Post-training quantization converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.target_spec.supported_types = [tf.float16] # Model conversion and saving tflite_model = converter.convert() with open(tflite_model_path, 'wb') as f: f.write(tflite_model)The model is converted although these warning messages are shown in the output:
2023-12-08 19:34:18.409799: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-12-08 19:34:19.967671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2945 MB memory: -> device: 0, name: Quadro P2000, pci bus id: 0000:65:00.0, compute capability: 6.1 2023-12-08 20:12:28.357684: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:357] Ignored output_format. 2023-12-08 20:12:28.357739: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:360] Ignored drop_control_dependency. 2023-12-08 20:12:28.359555: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8 2023-12-08 20:12:28.437131: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve } 2023-12-08 20:12:28.437171: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8 2023-12-08 20:12:28.618928: I tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle. 2023-12-08 20:12:29.814406: I tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8 2023-12-08 20:12:30.886233: I tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 2526683 microseconds. 2023-12-08 20:12:32.444671: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:237] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable. 2023-12-08 20:12:34.744161: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1903] The following operation(s) need TFLite custom op implementation(s): Custom ops: Cast, Range, RealDiv, StridedSlice, Transpose Details: tf.Cast(tensor<1xf64>) -> (tensor<1xi64>) : {Truncate = false, device = ""} tf.Cast(tensor<1xi64>) -> (tensor<1xf64>) : {Truncate = false, device = ""} tf.Cast(tensor<?xf64>) -> (tensor<?xi64>) : {Truncate = false, device = ""} tf.Cast(tensor<?xi64>) -> (tensor<?xf64>) : {Truncate = false, device = ""} tf.Cast(tensor<f64>) -> (tensor<i64>) : {Truncate = false, device = ""} tf.Cast(tensor<i64>) -> (tensor<f64>) : {Truncate = false, device = ""} tf.Range(tensor<i64>, tensor<i64>, tensor<i64>) -> (tensor<?xi64>) : {device = ""} tf.RealDiv(tensor<1xf64>, tensor<1xf64>) -> (tensor<1xf64>) : {device = ""} tf.RealDiv(tensor<?xf64>, tensor<f64>) -> (tensor<?xf64>) : {device = ""} tf.RealDiv(tensor<f64>, tensor<f64>) -> (tensor<f64>) : {device = ""} tf.StridedSlice(tensor<5x2x60x80x60x1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> (tensor<2x60x80x60x1xi64>) : {begin_mask = 0 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 0 : i64, new_axis_mask = 0 : i64, shrink_axis_mask = 1 : i64} tf.Transpose(tensor<1x128x5x60x5x80xf32>, tensor<6xi32>) -> (tensor<1x128x5x5x60x80xf32>) : {device = ""} See instructions: https://www.tensorflow.org/lite/guide/ops_customIf in the Python conversion code I disable the allow_custom_ops flag and I enable the supported ops as shown in this code snippet:
- converter.allow_custom_ops = True + converter.target_spec.supported_ops = [ + tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops. + tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops. ]The last output warnings turn into these ones:
2023-12-08 14:35:09.243817: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1892] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s): Flex ops: FlexCast, FlexRange, FlexRealDiv, FlexStridedSlice, FlexTranspose Details: tf.Cast(tensor<1xf64>) -> (tensor<1xi64>) : {Truncate = false, device = ""} tf.Cast(tensor<1xi64>) -> (tensor<1xf64>) : {Truncate = false, device = ""} tf.Cast(tensor<?xf64>) -> (tensor<?xi64>) : {Truncate = false, device = ""} tf.Cast(tensor<?xi64>) -> (tensor<?xf64>) : {Truncate = false, device = ""} tf.Cast(tensor<f64>) -> (tensor<i64>) : {Truncate = false, device = ""} tf.Cast(tensor<i64>) -> (tensor<f64>) : {Truncate = false, device = ""} tf.Range(tensor<i64>, tensor<i64>, tensor<i64>) -> (tensor<?xi64>) : {device = ""} tf.RealDiv(tensor<1xf64>, tensor<1xf64>) -> (tensor<1xf64>) : {device = ""} tf.RealDiv(tensor<?xf64>, tensor<f64>) -> (tensor<?xf64>) : {device = ""} tf.RealDiv(tensor<f64>, tensor<f64>) -> (tensor<f64>) : {device = ""} tf.StridedSlice(tensor<5x2x60x80x60x1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> (tensor<2x60x80x60x1xi64>) : {begin_mask = 0 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 0 : i64, new_axis_mask = 0 : i64, shrink_axis_mask = 1 : i64} tf.Transpose(tensor<1x128x5x60x5x80xf32>, tensor<6xi32>) -> (tensor<1x128x5x5x60x80xf32>) : {device = ""} See instructions: https://www.tensorflow.org/lite/guide/ops_selectBy using the .tflite model generated with these new flags, the voxl-tflite-server still fails to apply the GPU delegate.
FURTHER DETAILS:
I tested the same .tflite model in C++ by building TensorFlow Lite with CMake on a macOS Ventura 13.6 (x86), following the instructions at https://www.tensorflow.org/lite/guide/build_cmake
I built the Flex delegate shared library "libtensorflowlite_flex.so" with the following command (see instructions at https://www.tensorflow.org/lite/guide/ops_select)
bazel build -c opt --config=monolithic tensorflow/lite/delegates/flex:tensorflowlite_flexand I linked it to my model.
I was able to succesfully run an inference of the model and get correct output.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login