Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
Collapse
Brand Logo

ModalAI Forum

  1. ModalAI Support Forum
  2. Software Development
  3. Fail to apply GPU delegate with custom model on voxl-tflite-server

Fail to apply GPU delegate with custom model on voxl-tflite-server

Scheduled Pinned Locked Moved Software Development
1 Posts 1 Posters 849 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Offline
    D Offline
    dario-pisanti
    wrote on last edited by
    #1

    Hi @modaltb, @Chad-Sweet,

    I hope you can help me with this issue:

    SUMMARY:

    I deployed my own .tflite model on VOXL2 by properly customizing the inference_helper.cpp class of the voxl-tflite-server. My model is supposed to take two input images pre-loaded on-board and perform image matching. No input is taken from the voxl cameras.

    When I run the server, it fails to apply GPU delegate, as shown in this output:

    (base) voxl2:/$ voxl-tflite-server 
    
    ================================================================= 
    skip_n_frames:                    0 
    ================================================================= 
    model:                            /usr/bin/dnn/outdoor_ds_640_ONNXop12_TFv2.8_ExpNewConv_custOps_float16.tflite 
    ================================================================= 
    input_pipe:                       /run/mpa/hires/ 
    ================================================================= 
    delegate:                         gpu 
    ================================================================= 
    allow_multiple:                   false 
    ================================================================= 
    output_pipe_prefix:               mobilenet 
    ================================================================= 
    
    existing instance of voxl-tflite-server found, attempting to stop it 
    
    INFO: Created TensorFlow Lite delegate for GPU. 
    
    Failed to apply GPU delegate 
    
    ------VOXL TFLite Server------ 
    
    

    It failed to apply also the XNNPACK and NNAPI delegates.

    For the deployment on Voxl2, i modified the following files of the voxl-tflite-server:

    • ./src/inference_helper.cpp
    • ./include/inference_helper.h
    • ./src/main.cpp
    • ./scripts/qrb5165/voxl-configure-tflite

    VOXL2 SPECS:
    Architecture: Aarch64
    OS: Ubuntu 18.04
    CPU: Qualcomm® QRB5165: 8 cores up to 3.091 GHz, 8GB LPDDR5
    GPU: Adreno 650 GPU - 1024 ALU
    NPU: 15 TOPS AI embedded Neural Processing Unit

    HOST (from which the voxl-tflite-served is deployed):
    Architecture: x86
    OS: Ubuntu 20.04
    CPU: Intel(R) Xeon(R) W-2125 8 cores @ 4.00GHz
    GPU: NVIDIA Corporation GP106GL [Quadro P2000]

    MODEL CONVERSION DETAILS:

    I converted my .tflite model from a TensorFlow model with a post-training quantization as in the Python instructions at https://docs.modalai.com/voxl-tflite-server/

    This is my Python code for the conversion with tensorflow==2.8.0, following v2.8 API:

    # Load the tensorflow model
    converter = tf.lite.TFLiteConverter.from_saved_model(tf_model_path)
    
    # Set converter flags
    converter.experimental_new_converter = True
    converter.allow_custom_ops = True
               
    # Post-training quantization
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_types = [tf.float16]
    
    # Model conversion and saving
    tflite_model = converter.convert()
    with open(tflite_model_path, 'wb') as f:
        f.write(tflite_model)
    

    The model is converted although these warning messages are shown in the output:

    2023-12-08 19:34:18.409799: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA 
    
    To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 
    
    2023-12-08 19:34:19.967671: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2945 MB memory:  -> device: 0, name: Quadro P2000, pci bus id: 0000:65:00.0, compute capability: 6.1 
    
    2023-12-08 20:12:28.357684: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:357] Ignored output_format. 
    
    2023-12-08 20:12:28.357739: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:360] Ignored drop_control_dependency. 
    
    2023-12-08 20:12:28.359555: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8 
    
    2023-12-08 20:12:28.437131: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve } 
    
    2023-12-08 20:12:28.437171: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8 
    
    2023-12-08 20:12:28.618928: I tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle. 
    
    2023-12-08 20:12:29.814406: I tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: models/LoFTR/weights/outdoor_ds_640_ONNXop12_TFv2.8 
    
    2023-12-08 20:12:30.886233: I tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 2526683 microseconds. 
    
    2023-12-08 20:12:32.444671: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:237] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable. 
    
    2023-12-08 20:12:34.744161: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1903] The following operation(s) need TFLite custom op implementation(s): 
    
    Custom ops: Cast, Range, RealDiv, StridedSlice, Transpose 
    
    Details: 
    
            tf.Cast(tensor<1xf64>) -> (tensor<1xi64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<1xi64>) -> (tensor<1xf64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<?xf64>) -> (tensor<?xi64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<?xi64>) -> (tensor<?xf64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<f64>) -> (tensor<i64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<i64>) -> (tensor<f64>) : {Truncate = false, device = ""} 
    
            tf.Range(tensor<i64>, tensor<i64>, tensor<i64>) -> (tensor<?xi64>) : {device = ""} 
    
            tf.RealDiv(tensor<1xf64>, tensor<1xf64>) -> (tensor<1xf64>) : {device = ""} 
    
            tf.RealDiv(tensor<?xf64>, tensor<f64>) -> (tensor<?xf64>) : {device = ""} 
    
            tf.RealDiv(tensor<f64>, tensor<f64>) -> (tensor<f64>) : {device = ""} 
    
            tf.StridedSlice(tensor<5x2x60x80x60x1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> (tensor<2x60x80x60x1xi64>) : {begin_mask = 0 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 0 : i64, new_axis_mask = 0 : i64, shrink_axis_mask = 1 : i64} 
    
            tf.Transpose(tensor<1x128x5x60x5x80xf32>, tensor<6xi32>) -> (tensor<1x128x5x5x60x80xf32>) : {device = ""} 
    
    See instructions: https://www.tensorflow.org/lite/guide/ops_custom
    

    If in the Python conversion code I disable the allow_custom_ops flag and I enable the supported ops as shown in this code snippet:

    -  converter.allow_custom_ops = True
    +  converter.target_spec.supported_ops = [
    +            tf.lite.OpsSet.TFLITE_BUILTINS, # enable TensorFlow Lite ops.
    +            tf.lite.OpsSet.SELECT_TF_OPS # enable TensorFlow ops.
                ]
    

    The last output warnings turn into these ones:

    2023-12-08 14:35:09.243817: W tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1892] TFLite interpreter needs to link Flex delegate in order to run the model since it contains the following Select TFop(s): 
    Flex ops: FlexCast, FlexRange, FlexRealDiv, FlexStridedSlice, FlexTranspose 
    
    Details: 
    
            tf.Cast(tensor<1xf64>) -> (tensor<1xi64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<1xi64>) -> (tensor<1xf64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<?xf64>) -> (tensor<?xi64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<?xi64>) -> (tensor<?xf64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<f64>) -> (tensor<i64>) : {Truncate = false, device = ""} 
    
            tf.Cast(tensor<i64>) -> (tensor<f64>) : {Truncate = false, device = ""} 
    
            tf.Range(tensor<i64>, tensor<i64>, tensor<i64>) -> (tensor<?xi64>) : {device = ""} 
    
            tf.RealDiv(tensor<1xf64>, tensor<1xf64>) -> (tensor<1xf64>) : {device = ""} 
    
            tf.RealDiv(tensor<?xf64>, tensor<f64>) -> (tensor<?xf64>) : {device = ""} 
    
            tf.RealDiv(tensor<f64>, tensor<f64>) -> (tensor<f64>) : {device = ""} 
    
            tf.StridedSlice(tensor<5x2x60x80x60x1xi64>, tensor<1xi64>, tensor<1xi64>, tensor<1xi64>) -> (tensor<2x60x80x60x1xi64>) : {begin_mask = 0 : i64, device = "", ellipsis_mask = 0 : i64, end_mask = 0 : i64, new_axis_mask = 0 : i64, shrink_axis_mask = 1 : i64} 
    
            tf.Transpose(tensor<1x128x5x60x5x80xf32>, tensor<6xi32>) -> (tensor<1x128x5x5x60x80xf32>) : {device = ""} 
    
    See instructions: https://www.tensorflow.org/lite/guide/ops_select
    

    By using the .tflite model generated with these new flags, the voxl-tflite-server still fails to apply the GPU delegate.

    FURTHER DETAILS:

    I tested the same .tflite model in C++ by building TensorFlow Lite with CMake on a macOS Ventura 13.6 (x86), following the instructions at https://www.tensorflow.org/lite/guide/build_cmake

    I built the Flex delegate shared library "libtensorflowlite_flex.so" with the following command (see instructions at https://www.tensorflow.org/lite/guide/ops_select)

    bazel build -c opt --config=monolithic tensorflow/lite/delegates/flex:tensorflowlite_flex
    

    and I linked it to my model.

    I was able to succesfully run an inference of the model and get correct output.

    1 Reply Last reply
    0

    Hello! It looks like you're interested in this conversation, but you don't have an account yet.

    Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

    With your input, this post could be even better 💗

    Register Login
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    ModalAI
    Categories Recent Tags ModalAI.com Docs
    © 2026 ModalAI® · Accelerating autonomy for smaller, smarter, safer drones · Powered by NodeBB
    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups