ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    voxl-tflite-server v0.3.0 apq8096 yolov5

    FAQs
    2
    17
    1870
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Philemon BennerP
      Philemon Benner
      last edited by

      Hey,
      just saw in the voxl-tflite-server dev branch, that you now also support yolov5 models for apq8096, i wanted to ask if you have any performance (inference time, max fps) results on that model yet ? Also is there already a prebuilt ipk package for tensorflow lite v2.8.0 ?

      1 Reply Last reply Reply Quote 0
      • Philemon BennerP
        Philemon Benner
        last edited by

        Also wich yolov5 model size are you using (n,s,m,l) ?

        1 Reply Last reply Reply Quote 0
        • ?
          A Former User
          last edited by

          Hey @Philemon-Benner,

          We have yet to record any benchmarks for yolov5 on apq8096, but they will be up on our docs site here once they are collected.

          I used the yolov5s weights for the model included in the voxl-tflite-server package, and the ipk can be downloaded from http://voxl-packages.modalai.com/dists/apq8096/dev/binary-arm64/ or installed via opkg if you are pointing to the ap8096 dev repo.

          Philemon BennerP 2 Replies Last reply Reply Quote 0
          • Philemon BennerP
            Philemon Benner @Guest
            last edited by

            @Matt-Turi Thank you for the fast answer

            1 Reply Last reply Reply Quote 0
            • Philemon BennerP
              Philemon Benner @Guest
              last edited by

              @Matt-Turi Also do you know what the class confidence exactly is ?

              static constexpr float threshold_class_confidence_ = 0.20;    // not sure if this is too low or
              
              

              But thanks for getting the new version running on apq8096, helps me a lot.

              Philemon BennerP 1 Reply Last reply Reply Quote 0
              • Philemon BennerP
                Philemon Benner @Philemon Benner
                last edited by

                @Matt-Turi in yolov5 export did you also use the --half option for fp16 inference.

                1 Reply Last reply Reply Quote 0
                • ?
                  A Former User
                  last edited by

                  @Philemon-Benner,

                  The threshold_class_confidence_ parameter just sets a lower bound for detections to be discarded based on the confidence of that class being detected in the image. We have this information available since we do the NMS post-processing ourselves, and it can be turned up for more robust detections. Also, the --half option was used when exported, so a different quantization technique etc will be needed for increased speed.

                  Philemon BennerP 1 Reply Last reply Reply Quote 0
                  • Philemon BennerP
                    Philemon Benner @Guest
                    last edited by

                    @Matt-Turi Ah ok. How did you quantize the Model ?

                    1 Reply Last reply Reply Quote 0
                    • Philemon BennerP
                      Philemon Benner
                      last edited by Philemon Benner

                      Also the Model that i trained is slower than your's, is that because of the bigger model image size. I use 640x640, your's is 320x320, with your model i get around 41ms and with mine around 110ms per Image.
                      Edit: I Also used yolov5s and trained from scratch, i exported with tflite and half option.

                      1 Reply Last reply Reply Quote 0
                      • ?
                        A Former User
                        last edited by

                        I also used the tflite and half options when exporting, but you could try integer quantization instead of sticking with a floating point model. As for inference speed, that sounds about right - I chose the input dimensions of 320x320 when testing on voxl for a balance of performance and speed, going with larger inputs usually drastically slows down the inference.

                        1 Reply Last reply Reply Quote 0
                        • Philemon BennerP
                          Philemon Benner
                          last edited by

                          @Matt-Turi Is integer Quantisation faster than floating point model ? Yeah Problem is that we are using a higher resolution camera, and we are flying at high altitudes, so cropping images down to a smaller resolution drastically changes the Precision. Also wich input dimensions are you using for testing(Camera size), and have u also tested the model outside of your office ?

                          1 Reply Last reply Reply Quote 0
                          • ?
                            A Former User
                            last edited by

                            Integer quantization can drastically reduce the size of the model and speed up inference, but you typically do not target the GPU with an int model. See here for more details: https://www.tensorflow.org/lite/performance/post_training_quantization.

                            I understand the issue of cropping/downscaling images, and it may be worth implementing a tracker of sorts to selectively feed in different portions of the frame rather than just downscaling the entire image to a smaller resolution. Otherwise, if high fps is something that can be sacrificed, using a larger model would likely be better. The model has been tested outside of the office, and I tested with VGA [640x480] camera input but have only used the default yolo weights with no extra training.

                            Philemon BennerP 1 Reply Last reply Reply Quote 0
                            • Philemon BennerP
                              Philemon Benner @Guest
                              last edited by Philemon Benner

                              @Matt-Turi Thanks for the answer. What do you mean with implementing a tracker, that selectively feed different portions of the frame ? Something like just running Inference on the Roi of the Image ?

                              1 Reply Last reply Reply Quote 0
                              • Philemon BennerP
                                Philemon Benner
                                last edited by Philemon Benner

                                @Matt-Turi also besides the topic you could add:
                                InferenceHelper.h

                                //gpu Opts for more Info see:  https://github.com/tensorflow/tensorflow/blob/v2.8.0/tensorflow/lite/delegates/gpu/delegate.h
                                static constexpr TfLiteGpuDelegateOptionsV2 gpuOptions = {-1, //Allow Precision Loss
                                            TFLITE_GPU_INFERENCE_PREFERENCE_SUSTAINED_SPEED, //Preference
                                            TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY, //Prio1
                                            TFLITE_GPU_INFERENCE_PRIORITY_AUTO, //Prio2
                                            TFLITE_GPU_INFERENCE_PRIORITY_AUTO, //Prio3
                                            TFLITE_GPU_EXPERIMENTAL_FLAGS_ENABLE_QUANT, //Experimental Flags
                                            1, nullptr, nullptr //Serializiation Stuff probably don't touch
                                        };
                                

                                And might:

                                constexpr TfLiteGpuDelegateOptionsV2 InferenceHelper::gpuOptions;
                                

                                InferenceHelper.cpp (because C++11 sucks with constepxr).

                                With that you can move the creation of the gpu Options to Compile time, and these should always be the same. Tensorflow recommends doing this because the default configuration might change.
                                Here is the function that you are calling when you are creating the options:
                                https://github.com/tensorflow/tensorflow/blob/v2.8.0/tensorflow/lite/delegates/gpu/delegate.cc#:~:text=TfLiteGpuDelegateOptionsV2-,TfLiteGpuDelegateOptionsV2Default,-() {

                                1 Reply Last reply Reply Quote 0
                                • Philemon BennerP
                                  Philemon Benner
                                  last edited by

                                  @Matt-Turi also is there any particular reason for having a custom resize Function instead of Opencv's resize function?

                                  1 Reply Last reply Reply Quote 0
                                  • ?
                                    A Former User
                                    last edited by

                                    Hey @Philemon-Benner,

                                    In terms of a tracker, something we have done before is only run inference on the entire image (downscaled) until we find an object of interest, and then switching to feed in regions focused around the last detection to prevent downscaling and losing information. This is only useful in certain cases but can help with a smaller input size.

                                    I will add your suggestion for the TfLiteGpuDelegateOptionsV2 in the next version, and we have a custom resize function using a lookup table for speed improvements, as opencv is a bit too slow for low-latency inference.

                                    Philemon BennerP 1 Reply Last reply Reply Quote 0
                                    • Philemon BennerP
                                      Philemon Benner @Guest
                                      last edited by

                                      @Matt-Turi Ah ok that sounds interesting might be worth a try. Good to know that. Thanks for the fast answer as always.

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Powered by NodeBB | Contributors