ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Can anyone recommend a Tflite Colab Notebook for VOXL2 Training

    Ask your questions right here!
    4
    53
    4880
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      sansoy
      last edited by

      I’m a total loss with making my Tfiite models work on the VOXL2.

      I’ve followed every thread.

      I’ve successfully trained models using the recommended Tensorflow, i’ve quantized it down to 16 float.

      i can successfully run these on my linux box, my macbook pro, a raspberrypi and the nVidia Jetson Nano.

      But when uploaded to VOXL2 i do get video but absolutely no detection what so ever.

      Also per https://docs.modalai.com/voxl-tflite-server/ i followed the instructions for post training quantization on
      my frozen graph and on my saved models.

      I used Netron to find the input/output parameters.

      I also successfully converted a YOLOv8 model into a tFlite model and ran object detection perfectly on all the different platforms except for the VOXL2.

      Here's a link to one of my colab notebooks i've used to train an object detection model.
      https://colab.research.google.com/drive/1QdgpSl63OSQdLTnFwOyP8dxLQ7W0HtmW?usp=sharing

      S 1 Reply Last reply Reply Quote 1
      • S
        sansoy @sansoy
        last edited by

        I successfully trained a YOLOv5 model using the following instruction sets

        results.png

        https://docs.ultralytics.com/yolov5/
        https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/
        https://docs.ultralytics.com/yolov5/tutorials/model_export/

        Followed the directions to export the model to a tflite having FP16 half precision
        python export.py --weights best.pt --include tflite --half
        💡 ProTip: Add --half to export models at FP16 half precision for smaller file sizes

        and when i bring over to voxl2 I get the following error in /var/logs/syslog

        Jan 7 18:19:21 m0054 systemd[1]: Started voxl-tflite-server.
        Jan 7 18:19:21 m0054 bash[18587]: WARNING: Unknown model type provided! Defaulting post-process to object detection.
        Jan 7 18:19:21 m0054 bash[18587]: INFO: Created TensorFlow Lite delegate for GPU.
        Jan 7 18:19:29 m0054 bash[18587]: received SIGTERM
        Jan 7 18:19:29 m0054 systemd[1]: Stopping voxl-tflite-server...
        Jan 7 18:19:39 m0054 bash[18587]: INFO: Initialized OpenCL-based API.
        Jan 7 18:19:39 m0054 bash[18587]: INFO: Created 1 GPU delegate kernels.
        Jan 7 18:19:39 m0054 bash[18587]: ------VOXL TFLite Server------
        Jan 7 18:19:40 m0054 bash[18587]: Error in TensorData<float>: should not reach here
        Jan 7 18:19:40 m0054 bash[18587]: Segmentation fault:
        Jan 7 18:19:40 m0054 bash[18587]: Fault thread: voxl-tflite-ser(tid: 18670)
        Jan 7 18:19:40 m0054 systemd[1]: voxl-tflite-server.service: Main process exited, code=killed, status=11/SEGV
        Jan 7 18:19:40 m0054 systemd[1]: voxl-tflite-server.service: Failed with result 'signal'.
        Jan 7 18:19:40 m0054 systemd[1]: Stopped voxl-tflite-server.
        Jan 7 18:19:40 m0054 systemd[1]: Started voxl-tflite-server.
        Jan 7 18:19:40 m0054 bash[18674]: WARNING: Unknown model type provided! Defaulting post-process to object detection.
        Jan 7 18:19:40 m0054 bash[18674]: INFO: Created TensorFlow Lite delegate for GPU.
        Jan 7 18:19:41 m0054 bash[1425]: ERROR in pipe_client_init_channel opening request pipe: No such device or address
        Jan 7 18:19:41 m0054 bash[1425]: Most likely the server stopped without cleaning up
        Jan 7 18:19:41 m0054 bash[1425]: Client is cleaning up pipes for the server

        ? 1 Reply Last reply Reply Quote 0
        • ?
          A Former User @sansoy
          last edited by

          @sansoy

          Hey, happy to try and help resolve this!

          The first thing I notice is that the voxl-tflite-server is defaulting your model to object detection which seems incorrect as you have a YOLO model. This is because the tflite server does a string compare call to determine which model is being used as seen here. What this means is you'll need to rename your model to yolov5_float16_quant.tflite for the time being to get proper YOLO processing. Obviously this isn't ideal and we're working on making this functionality better in a future software release.

          However, I'm more curious about that "should not reach here" message. I've traced that back to inference_helper.cpp which is likely hitting this line or one of the other ones which are similar to it. So the TFLite Server is attempting to read in your Tensors in some expected format but it's differing from the type it's getting.

          What you should try first is the renaming suggestion I mentioned in the first paragraph. It's possible that because it's defaulting to an object detection model and not a YOLO model, that's causing the server to read in your Tensors as the wrong datatype. If that doesn't work, if you could provide me with the output of cat /etc/modalai/voxl-tflite-server.conf that might help me in better diagnosing your issue. If I can't help, I may need you to pass along your actual model file so that I can load in your exact configuration and do some debugging to find the issue.

          Sorry about this!

          Thomas Patton
          thomas.patton@modalai.com

          S 1 Reply Last reply Reply Quote 0
          • S
            sansoy @Guest
            last edited by

            @Thomas-Patton Thanks for your response. Didnt realize you were checking for exact names.

            I renamed my yolo model to be yolov5_float16_quant.tflite and updated the yolov5_labels.txt file
            but still getting an error.
            Jan 9 15:06:30 m0054 systemd[1]: Started voxl-tflite-server.
            Jan 9 15:06:30 m0054 bash[5690]: INFO: Created TensorFlow Lite delegate for GPU.
            Jan 9 15:06:47 m0054 bash[5690]: INFO: Initialized OpenCL-based API.
            Jan 9 15:06:47 m0054 bash[5690]: INFO: Created 1 GPU delegate kernels.
            Jan 9 15:06:47 m0054 bash[5690]: ------VOXL TFLite Server------
            Jan 9 15:06:47 m0054 bash[5690]: Segmentation fault:
            Jan 9 15:06:47 m0054 bash[5690]: Fault thread: voxl-tflite-ser(tid: 5770)
            Jan 9 15:06:47 m0054 bash[5690]: Fault address: 0x656972623d3d206e
            Jan 9 15:06:47 m0054 bash[5690]: Unknown reason.
            Jan 9 15:06:47 m0054 bash[1410]: ERROR in pipe_client_init_channel opening request pipe: No such device or address
            Jan 9 15:06:47 m0054 bash[1410]: Most likely the server stopped without cleaning up
            Jan 9 15:06:47 m0054 bash[1410]: Client is cleaning up pipes for the server
            Jan 9 15:06:47 m0054 systemd[1]: voxl-tflite-server.service: Main process exited, code=killed, status=11/SEGV
            Jan 9 15:06:47 m0054 systemd[1]: voxl-tflite-server.service: Failed with result 'signal'.
            Jan 9 15:06:48 m0054 systemd[1]: voxl-tflite-server.service: Service hold-off time over, scheduling restart.
            Jan 9 15:06:48 m0054 systemd[1]: voxl-tflite-server.service: Scheduled restart job, restart counter is at 16.
            Jan 9 15:06:48 m0054 systemd[1]: Stopped voxl-tflite-server.
            Jan 9 15:06:48 m0054 systemd[1]: Started voxl-tflite-server.
            Jan 9 15:06:48 m0054 bash[5774]: INFO: Created TensorFlow Lite delegate for GPU.

            here's my voxl-tflite-server.conf
            /**

            • This file contains configuration that's specific to voxl-tflite-server.
            • skip_n_frames - how many frames to skip between processed frames. For 30Hz
            •                     input frame rate, we recommend skipping 5 frame resulting
              
            •                     in 5hz model output. For 30Hz/maximum output, set to 0.
              
            • model - which model to use. Currently support mobilenet, fastdepth,
            •                     posenet, deeplab, and yolov5.
              
            • input_pipe - which camera to use (tracking, hires, or stereo).
            • delegate - optional hardware acceleration: gpu, cpu, or nnapi. If
            •                     the selection is invalid for the current model/hardware,
              
            •                     will silently fall back to base cpu delegate.
              
            • allow_multiple - remove process handling and allow multiple instances
            •                     of voxl-tflite-server to run. Enables the ability
              
            •                     to run multiples models simultaneously.
              
            • output_pipe_prefix - if allow_multiple is set, create output pipes using default
            •                     names (tflite, tflite_data) with added prefix.
              
            •                     ONLY USED IF allow_multiple is set to true.
              

            */
            {
            "skip_n_frames": 0,
            "model": "/usr/bin/dnn/yolov5_float16_quant.tflite",
            "input_pipe": "/run/mpa/hires_color",
            "delegate": "gpu",
            "allow_multiple": false,
            "output_pipe_prefix": "mobilenet"
            }

            ? 1 Reply Last reply Reply Quote 0
            • ?
              A Former User @sansoy
              last edited by

              @sansoy

              Thanks for an informative response. One thing that's confusing me is the message "ERROR in pipe_client_init_channel" as the pipe_client_init_channel method is deprecated. Do you mind letting me know what version of the SDK you're on? If it isn't the most recent SDK, it's probably worth upgrading to see if it fixes anything. I know we've put out a lot of changes in libmodal-pipe. You can read how to flash the latest SDK here.

              Unfortunately just from these debug messages I can't pin down the issue and so I might need you to provide me with a model file to help out more. I can understand if you don't want to leak your trained model file, though. One thing you could do in this case would be to just train for a single epoch just as a means of creating a model through the same process. If I have a model file I can do some more rigorous debugging to determine the issue.

              Thanks and sorry about all of this!

              Thomas Patton
              thomas.patton@modalai.com

              S 1 Reply Last reply Reply Quote 0
              • S
                sansoy @Guest
                last edited by

                @Thomas-Patton

                voxl2:/$ voxl-version

                system-image: 1.6.2-M0054-14.1a-perf
                kernel: #1 SMP PREEMPT Fri May 19 22:19:33 UTC 2023 4.19.125

                hw version: M0054

                voxl-suite: 1.0.0

                will update to 1.0.1

                Can I email you my tflite and saved model for review? I'm doing a run right now that should be completed in a couple hours.
                Sabri

                tomT 1 Reply Last reply Reply Quote 0
                • tomT
                  tom admin @sansoy
                  last edited by

                  @sansoy You should upgrade to the latest SDK (1.1.2)

                  S 1 Reply Last reply Reply Quote 0
                  • S
                    sansoy @tom
                    last edited by

                    @tom so i downloaded the upgrade and started the upgrade but its been stuck for about an hour.
                    How long does it take to flash the upgrade?
                    Sabri

                    Flashing the following System Image:
                    Build Name: 1.7.1-M0054-14.1a-perf-nightly-20231025
                    Build Date: 2023-10-25
                    Platform: M0054
                    System Image Version: 1.7.1

                    Installing the following version of voxl-suite:
                    voxl-suite Version: 1.1.2

                    Would you like to continue with SDK install?

                    1. Yes
                    2. No
                      #? yes
                      [ERROR] invalid option
                      #? 1
                      [INFO] adb installed
                      [INFO] fastboot installed

                    ---- Starting System Image Flash ----
                    ----./flash-system-image.sh ----
                    Detected OS: Linux

                    Installer Version: 0.8
                    Image Version: 1.7.1

                    Please power off your VOXL, connect via USB,
                    then power on VOXL. We will keep searching for
                    an ADB or Fastboot device over USB
                    [INFO] Found ADB device
                    [INFO] Rebooting to fastboot
                    .
                    [INFO] Found fastboot device
                    [WARNING] This system image flash is intended only for the following
                    platform: VOXL2 (m0054)

                          Make sure that the device that will be flashed is correct.
                          Flashing a device with an incorrect system image will lead
                          the device to be stuck in fastboot.
                    

                    Would you like to continue with the VOXL2 (m0054) system image flash?

                    1. Yes
                    2. No
                      #? 1
                    tomT 1 Reply Last reply Reply Quote 0
                    • tomT
                      tom admin @sansoy
                      last edited by

                      @sansoy It should start right away, I would power cycle your voxl2 and try again

                      S 1 Reply Last reply Reply Quote 0
                      • S
                        sansoy @tom
                        last edited by

                        @tom i did all that and still stuck. could it be whats in the warning about being stuck in fastboot?
                        it is the voxl2 and not the voxl2 mini.

                        [WARNING] This system image flash is intended only for the following
                        platform: VOXL2 (m0054)

                              Make sure that the device that will be flashed is correct.
                              Flashing a device with an incorrect system image will lead
                              the device to be stuck in fastboot.
                        
                        tomT 1 Reply Last reply Reply Quote 0
                        • tomT
                          tom admin @sansoy
                          last edited by

                          @sansoy As long as you are using the voxl2 SDK and are indeed flashing voxl2 hardware then that warning can be ignored.

                          S 1 Reply Last reply Reply Quote 0
                          • S
                            sansoy @tom
                            last edited by

                            @tom hey Tom, i'm having absolutely no luck.
                            i've tried 3 times and it still just hangs at

                            Would you like to continue with the VOXL2 (m0054) system image flash?

                            1. Yes
                            2. No
                              #? 1

                            I then followed the unbrick instructions and reinstalled everything per
                            https://docs.modalai.com/voxl2-unbricking/#ubuntu-host

                            Got the system back up and running and tried to install the latest SDK again with no luck.
                            It just hangs.

                            S 1 Reply Last reply Reply Quote 0
                            • S
                              sansoy @sansoy
                              last edited by sansoy

                              UPDATE: Got it working with "sudo" for the install. normally one would get a permission errors and thought maybe that was the issue and sure enough. recommend updating your docs to
                              say sudo ./install.sh

                              S tomT 2 Replies Last reply Reply Quote 0
                              • S
                                sansoy @sansoy
                                last edited by

                                @tom so i trained on a new batch of AR15 images and got really good numbers in terms of losses and mAPs. Ran an unquantized and quantized version in voxl-tflite-server and again nothing is being recognized.

                                Here's a link to the tflites, and saved_models with inference results on never before seen images.
                                Any insight on how to make these models work in your environment would be awesomely appreciated.

                                https://drive.google.com/drive/folders/1N1pU0jMRTb3rODSfIuETPrBf66m4ody7?usp=drive_link

                                tomT 1 Reply Last reply Reply Quote 0
                                • tomT
                                  tom admin @sansoy
                                  last edited by

                                  @sansoy Interesting, sudo isn't normally required. I'm curious, what linux distro are you running?

                                  S 1 Reply Last reply Reply Quote 0
                                  • tomT
                                    tom admin @sansoy
                                    last edited by

                                    @sansoy @Thomas-Patton is the ML expert here and I'll let him comment on that front

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      sansoy @tom
                                      last edited by

                                      @tom Ubuntu 22.04.3 LTS

                                      tomT 1 Reply Last reply Reply Quote 0
                                      • tomT
                                        tom admin @sansoy
                                        last edited by

                                        @sansoy Huh, okay, that's what I run as well.

                                        What groups are your default user in? For example, here is mine:

                                         ~  groups                                               ok | 10:20:36 AM 
                                        tom adm dialout cdrom sudo dip plugdev lpadmin lxd sambashare docker
                                        
                                        S 1 Reply Last reply Reply Quote 0
                                        • S
                                          sansoy @tom
                                          last edited by

                                          @tom eve@eve:~$ groups
                                          eve adm cdrom sudo dip plugdev lpadmin lxd sambashare

                                          tomT 1 Reply Last reply Reply Quote 0
                                          • tomT
                                            tom admin @sansoy
                                            last edited by

                                            @sansoy Can you try adding your user to the dialout group and seeing if that fixes the issue?

                                            sudo usermod -a -G dialout $USER

                                            S 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Powered by NodeBB | Contributors