ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    voxl-tflite-server v0.3.0 apq8096 yolov5

    FAQs
    2
    17
    1123
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Philemon BennerP
      Philemon Benner @Philemon Benner
      last edited by

      @Matt-Turi in yolov5 export did you also use the --half option for fp16 inference.

      1 Reply Last reply Reply Quote 0
      • ?
        A Former User
        last edited by

        @Philemon-Benner,

        The threshold_class_confidence_ parameter just sets a lower bound for detections to be discarded based on the confidence of that class being detected in the image. We have this information available since we do the NMS post-processing ourselves, and it can be turned up for more robust detections. Also, the --half option was used when exported, so a different quantization technique etc will be needed for increased speed.

        Philemon BennerP 1 Reply Last reply Reply Quote 0
        • Philemon BennerP
          Philemon Benner @Guest
          last edited by

          @Matt-Turi Ah ok. How did you quantize the Model ?

          1 Reply Last reply Reply Quote 0
          • Philemon BennerP
            Philemon Benner
            last edited by Philemon Benner

            Also the Model that i trained is slower than your's, is that because of the bigger model image size. I use 640x640, your's is 320x320, with your model i get around 41ms and with mine around 110ms per Image.
            Edit: I Also used yolov5s and trained from scratch, i exported with tflite and half option.

            1 Reply Last reply Reply Quote 0
            • ?
              A Former User
              last edited by

              I also used the tflite and half options when exporting, but you could try integer quantization instead of sticking with a floating point model. As for inference speed, that sounds about right - I chose the input dimensions of 320x320 when testing on voxl for a balance of performance and speed, going with larger inputs usually drastically slows down the inference.

              1 Reply Last reply Reply Quote 0
              • Philemon BennerP
                Philemon Benner
                last edited by

                @Matt-Turi Is integer Quantisation faster than floating point model ? Yeah Problem is that we are using a higher resolution camera, and we are flying at high altitudes, so cropping images down to a smaller resolution drastically changes the Precision. Also wich input dimensions are you using for testing(Camera size), and have u also tested the model outside of your office ?

                1 Reply Last reply Reply Quote 0
                • ?
                  A Former User
                  last edited by

                  Integer quantization can drastically reduce the size of the model and speed up inference, but you typically do not target the GPU with an int model. See here for more details: https://www.tensorflow.org/lite/performance/post_training_quantization.

                  I understand the issue of cropping/downscaling images, and it may be worth implementing a tracker of sorts to selectively feed in different portions of the frame rather than just downscaling the entire image to a smaller resolution. Otherwise, if high fps is something that can be sacrificed, using a larger model would likely be better. The model has been tested outside of the office, and I tested with VGA [640x480] camera input but have only used the default yolo weights with no extra training.

                  Philemon BennerP 1 Reply Last reply Reply Quote 0
                  • Philemon BennerP
                    Philemon Benner @Guest
                    last edited by Philemon Benner

                    @Matt-Turi Thanks for the answer. What do you mean with implementing a tracker, that selectively feed different portions of the frame ? Something like just running Inference on the Roi of the Image ?

                    1 Reply Last reply Reply Quote 0
                    • Philemon BennerP
                      Philemon Benner
                      last edited by Philemon Benner

                      @Matt-Turi also besides the topic you could add:
                      InferenceHelper.h

                      //gpu Opts for more Info see:  https://github.com/tensorflow/tensorflow/blob/v2.8.0/tensorflow/lite/delegates/gpu/delegate.h
                      static constexpr TfLiteGpuDelegateOptionsV2 gpuOptions = {-1, //Allow Precision Loss
                                  TFLITE_GPU_INFERENCE_PREFERENCE_SUSTAINED_SPEED, //Preference
                                  TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY, //Prio1
                                  TFLITE_GPU_INFERENCE_PRIORITY_AUTO, //Prio2
                                  TFLITE_GPU_INFERENCE_PRIORITY_AUTO, //Prio3
                                  TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT, //Experimental Flags
                                  1, nullptr, nullptr //Serializiation Stuff probably don't touch
                              };
                      

                      And might:

                      constexpr TfLiteGpuDelegateOptionsV2 InferenceHelper::gpuOptions;
                      

                      InferenceHelper.cpp (because C++11 sucks with constepxr).

                      With that you can move the creation of the gpu Options to Compile time, and these should always be the same. Tensorflow recommends doing this because the default configuration might change.
                      Here is the function that you are calling when you are creating the options:
                      https://github.com/tensorflow/tensorflow/blob/v2.8.0/tensorflow/lite/delegates/gpu/delegate.cc#:~:text=TfLiteGpuDelegateOptionsV2-,TfLiteGpuDelegateOptionsV2Default,-() {

                      1 Reply Last reply Reply Quote 0
                      • Philemon BennerP
                        Philemon Benner
                        last edited by

                        @Matt-Turi also is there any particular reason for having a custom resize Function instead of Opencv's resize function?

                        1 Reply Last reply Reply Quote 0
                        • ?
                          A Former User
                          last edited by

                          Hey @Philemon-Benner,

                          In terms of a tracker, something we have done before is only run inference on the entire image (downscaled) until we find an object of interest, and then switching to feed in regions focused around the last detection to prevent downscaling and losing information. This is only useful in certain cases but can help with a smaller input size.

                          I will add your suggestion for the TfLiteGpuDelegateOptionsV2 in the next version, and we have a custom resize function using a lookup table for speed improvements, as opencv is a bit too slow for low-latency inference.

                          Philemon BennerP 1 Reply Last reply Reply Quote 0
                          • Philemon BennerP
                            Philemon Benner @Guest
                            last edited by

                            @Matt-Turi Ah ok that sounds interesting might be worth a try. Good to know that. Thanks for the fast answer as always.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Powered by NodeBB | Contributors