ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PX4 qmi_error abort

    VOXL SDK
    2
    23
    341
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Rowan DempsterR
      Rowan Dempster
      last edited by

      Hey ModalAI PX4 users, has anyone been running into qmi_error causing the PX4 process to abort? At Cleo it happens at boot about 1/20 or 1/100 times. After booting successfully it's more rare about 1/200 times or 1/500 times.
      Here's the full error from journalctl:

      terminate called after throwing an instance of 'qmi_error'
      Mar 19 15:33:57 m0054 voxl-px4[1854]:   what():  qmi_client_send_msg_sync() failed, (client_id=)0, result=0: qmi service error (-2)
      Mar 19 15:33:57 m0054 voxl-px4[1854]: /usr/bin/voxl-px4: line 140:  1868 Aborted                 GPS=$GPS RC=$RC OSD=$OSD EXTRA_STEPS=$EXTRA_STEPS px4 $DAEMON -s /usr/bin/voxl-px4-start
      Mar 19 15:33:57 m0054 systemd[1]: voxl-px4.service: Main process exited, code=exited, status=134/n/a
      Mar 19 15:33:57 m0054 systemd[1]: voxl-px4.service: Failed with result 'exit-code'.
      
      Eric KatzfeyE 1 Reply Last reply Reply Quote 0
      • Eric KatzfeyE
        Eric Katzfey ModalAI Team @Rowan Dempster
        last edited by

        @Rowan-Dempster Yes, we used to see this happen on older SDK versions. It was an indication that the DSP was crashing. It was happening at about that frequency. But there have been multiple bug fixes since then and as far as I know it no longer happens. Are you using a recent version of VOXL SDK? Have you made any modifications to the SDK?

        Rowan DempsterR 1 Reply Last reply Reply Quote 0
        • Rowan DempsterR
          Rowan Dempster @Eric Katzfey
          last edited by

          @Eric-Katzfey Thanks for the response!

          Are you using a recent version of VOXL SDK?

          Cleo branched off of your repo at this tag: https://github.com/modalai/px4-firmware/tree/v1.14.0-2.0.36-dev

          Have you made any modifications to the SDK?

          Yup we actively development on the PX4 modules, including the controllers and the EKF that run on the DSP.

          So it may be our code running on the DSP causing the DSP crash, or it could be related to the bugs in the https://github.com/modalai/px4-firmware/tree/v1.14.0-2.0.36-dev tag itself that you mentioned have been fixed.

          As far as a path forward, are there any methods you can suggest for inspecting the DSP to find the root cause of crashes? Things we can add to the code, perhaps a debug mode we can run the DSP modules in, etc

          Also, do you know of bug fix commits in your repo's mainline that we at Cleo can attempt to backport to our fork and see if we also no longer see the DSP crashes?

          Thank you for your help,
          Rowan

          Rowan DempsterR 1 Reply Last reply Reply Quote 0
          • Rowan DempsterR
            Rowan Dempster @Rowan Dempster
            last edited by

            @Eric-Katzfey Any insight into this ^

            Eric KatzfeyE 2 Replies Last reply Reply Quote 0
            • Eric KatzfeyE
              Eric Katzfey ModalAI Team @Rowan Dempster
              last edited by

              @Rowan-Dempster Sorry, not sure why I didn't see your response. Let me look through the commits to see if any of those important bug fixes have been added since then.

              1 Reply Last reply Reply Quote 0
              • Eric KatzfeyE
                Eric Katzfey ModalAI Team @Rowan Dempster
                last edited by

                @Rowan-Dempster Of course v1.14.0-2.0.36 is extremely old and there have been a lot of improvements / fixes since then. But the fixes for DSP crashes were made in the modalai-slpi codebase. What version of modalai-slpi are you running? One critical bug fix was added in v1.1.9 and another in v1.1.14.

                Rowan DempsterR 1 Reply Last reply Reply Quote 0
                • Rowan DempsterR
                  Rowan Dempster @Eric Katzfey
                  last edited by

                  @Eric-Katzfey I am not familiar with the "modalai-slpi" codebase, could you elaborate on what that is.

                  Eric KatzfeyE 1 Reply Last reply Reply Quote 0
                  • Eric KatzfeyE
                    Eric Katzfey ModalAI Team @Rowan Dempster
                    last edited by

                    @Rowan-Dempster That codebase is not open source since it is mostly proprietary Qualcomm code that runs on the DSP so you cannot inspect the code. But it is a standard package in the VOXL SDK. If you enter voxl-version it will show you all the versions of the installed SDK packages including the version of modalai-slpi. The latest version is located here: http://voxl-packages.modalai.com/dists/qrb5165/dev/binary-arm64/modalai-slpi_1.1.19-202407112016_arm64.deb

                    Rowan DempsterR 1 Reply Last reply Reply Quote 0
                    • Rowan DempsterR
                      Rowan Dempster @Eric Katzfey
                      last edited by

                      @Eric-Katzfey Gotcha thanks for the info I didn't know about that! Is the version of modalai-slpi highly coupled with the version of PX4 that we are using, or can we update modalai-slpi to get bug fixes without having to worry about compatibility with a specific version of PX4?

                      I will look into which version of modalai-slpi we are using and get back to you!

                      Eric KatzfeyE 1 Reply Last reply Reply Quote 0
                      • Eric KatzfeyE
                        Eric Katzfey ModalAI Team @Rowan Dempster
                        last edited by

                        @Rowan-Dempster Some of the newer features in voxl-px4 require later versions of modalai-slpi but newer versions of modalai-slpi should work fine with older versions of voxl-px4. So I think you should be okay moving to the newer modalai-slpi.

                        Rowan DempsterR 1 Reply Last reply Reply Quote 0
                        • Rowan DempsterR
                          Rowan Dempster @Eric Katzfey
                          last edited by

                          @Eric-Katzfey I do not see modal-slpi in the output of voxl-version:

                          voxl2:/$ voxl-version | grep slpi
                                  qrb5165-slpi-test-sig                     01-r0
                                  voxl-slpi-uart-bridge                     1.0.1
                          

                          Here is the full output:

                          voxl2:/$ voxl-version 
                          --------------------------------------------------------------------------------
                          system-image: 1.7.8-M0054-14.1a-perf
                          kernel:       #1 SMP PREEMPT Sat May 18 00:10:25 UTC 2024 4.19.125
                          --------------------------------------------------------------------------------
                          hw version:   M0054
                          --------------------------------------------------------------------------------
                          voxl-suite:   1.0.0
                          --------------------------------------------------------------------------------
                          Packages:
                          Repo:  http://voxl-packages.modalai.com/ ./dists/qrb5165/sdk-1.0/binary-arm64/
                          Last Updated: 2023-03-02 13:01:31
                          List:
                                  kernel-module-voxl-fsync-mod-4.19.125     1.0-r0
                                  kernel-module-voxl-gpio-mod-4.19.125      1.0-r0
                                  kernel-module-voxl-platform-mod-4.19.125  1.0-r0
                                  libmodal-c2d                              0.1
                                  libmodal-cv                               0.3.2
                                  libmodal-exposure                         0.0.0+89cd3ac03
                                  libmodal-journal                          0.2.2
                                  libmodal-json                             0.4.3
                                  libmodal-pipe                             2.10.3
                                  libqrb5165-io                             0.3.3
                                  libvoxl-cci-direct                        0.2.3
                                  libvoxl-cutils                            0.1.1
                                  mv-voxl                                   0.1-r0
                                  qrb5165-bind                              0.1-r0
                                  qrb5165-dfs-server                        0.1.0
                                  qrb5165-imu-server                        0.6.0
                                  qrb5165-slpi-test-sig                     01-r0
                                  qrb5165-system-tweaks                     0.2.2
                                  qrb5165-tflite                            2.8.0-2
                                  voxl-bind-spektrum                        0.1.0
                                  voxl-camera-calibration                   0.4.0
                                  voxl-camera-server                        0.0.0+89cd3ac03
                                  voxl-configurator                         0.2.7
                                  voxl-cpu-monitor                          0.4.6
                                  voxl-docker-support                       1.2.5
                                  voxl-eigen3                               3.4.0
                                  voxl-elrs                                 0.0.7
                                  voxl-esc                                  1.2.2
                                  voxl-feature-tracker                      0.2.3
                                  voxl-flow-server                          0.3.3
                                  voxl-fsync-mod                            1.0-r0
                                  voxl-gphoto2-server                       0.0.10
                                  voxl-gpio-mod                             1.0-r0
                                  voxl-imu-server                           0.0.0+89cd3ac03
                                  voxl-jpeg-turbo                           2.1.3-5
                                  voxl-lepton-server                        1.1.2
                                  voxl-libgphoto2                           0.0.4
                                  voxl-libuvc                               1.0.7
                                  voxl-logger                               0.3.4
                                  voxl-mavcam-manager                       0.5.1
                                  voxl-mavlink                              0.1.1
                                  voxl-mavlink-server                       1.2.0
                                  voxl-microdds-agent                       2.4.1-0
                                  voxl-modem                                1.0.5
                                  voxl-mongoose                             7.7.0-1
                                  voxl-mpa-to-ros                           0.3.6
                                  voxl-mpa-tools                            1.0.4
                                  voxl-opencv                               4.5.5-1
                                  voxl-platform-mod                         1.0-r0
                                  voxl-portal                               0.5.9
                                  voxl-px4                                  1.14.0-2.0.36+deb
                                  voxl-px4-imu-server                       0.1.2
                                  voxl-px4-params                           0.1.8
                                  voxl-qvio-server                          0.0.0+89cd3ac03
                                  voxl-remote-id                            0.0.8
                                  voxl-slpi-uart-bridge                     1.0.1
                                  voxl-streamer                             0.0.0+89cd3ac03
                                  voxl-suite                                1.0.0
                                  voxl-tag-detector                         0.0.4
                                  voxl-tflite-server                        0.3.1
                                  voxl-utils                                1.3.1
                                  voxl-uvc-server                           0.1.6
                          
                          Rowan DempsterR 1 Reply Last reply Reply Quote 0
                          • Rowan DempsterR
                            Rowan Dempster @Rowan Dempster
                            last edited by Rowan Dempster

                            @Eric-Katzfey The first distro I see modalai-slpi in is http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.2/binary-arm64/

                            We install http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.0/binary-arm64/ which is probably why I don't see it in voxl-version!

                            Is it okay for me to install the latest distro's (http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.4/binary-arm64/) modalai-slpi on my voxl2 alongside the existing older software, or will that break anything?

                            If modalai-slpi has something to do with PX4 communication with the SLPI, how is it possible that I don't have any version of modalai-slpi installed but PX4 can still run software on the SLPI?

                            Thank you,

                            Rowan

                            Eric KatzfeyE 1 Reply Last reply Reply Quote 0
                            • Eric KatzfeyE
                              Eric Katzfey ModalAI Team @Rowan Dempster
                              last edited by

                              @Rowan-Dempster The SLPI image used to be part of the main system image. It was then separated out into it's own package for easier maintenance. So you have a very old version missing many important bug fixes. We've never tried installing the latest modalai-slpi package on an old SDK but I think it will work. Give it a try and see what happens. But, obviously, it's really hard for us to support you when you use such old software with custom modifications on top of it. You should really try to make any customization such that they can easily be used with newer versions of VOXL SDK as they come out.

                              Rowan DempsterR 1 Reply Last reply Reply Quote 0
                              • Rowan DempsterR
                                Rowan Dempster @Eric Katzfey
                                last edited by

                                @Eric-Katzfey

                                The SLPI image used to be part of the main system image. It was then separated out into it's own package for easier maintenance.

                                Gotcha makes sense!

                                So you have a very old version missing many important bug fixes. We've never tried installing the latest modalai-slpi package on an old SDK but I think it will work. Give it a try and see what happens.

                                Will do, just wanted to confirm that it "might work" so not totally wasting my time exploring this avenue haha.

                                But, obviously, it's really hard for us to support you when you use such old software with custom modifications on top of it. You should really try to make any customization such that they can easily be used with newer versions of VOXL SDK as they come out.

                                Yes this is something that we constantly run into at Cleo as a small company trying release a stable product but also keep up with the latest and greatest from modal and other open source vendors. As a dev at Cleo it feels like two things are pulling in opposite directions:

                                1. Having your base platform constantly updating, which then requires patches for API changes and sometimes more low level incompatibilities that come along with those updates.
                                2. Getting the stability and functional improvements that come along with those base platform updates.

                                I'm sure these competing forces are felt by others as well, not just at Cleo. It's a conversation that is perhaps worthy of a call between Cleo and Modal devs to get aligned on the best way to get the stability and functional improvements from each vendor (modalai) release while at the same time minimizing the Cleo dev time needed to do those API patches and minimize incompatibilities or at least forecast them before spending the time trying to do an upgrade and then finding a incompatibility.

                                Rowan DempsterR 1 Reply Last reply Reply Quote 0
                                • Rowan DempsterR
                                  Rowan Dempster @Rowan Dempster
                                  last edited by

                                  Just following up on the testing that Cleo did with @Eric-Katzfey 's suggestion of installing http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.4/binary-arm64/modalai-slpi_1.1.19_arm64.deb :

                                  Before installing the new package

                                  • qmi_client_send_msg_sync at PX4 startup during boot number 48
                                  • qmi_client_send_msg_sync at PX4 startup during boot number 62
                                  • qmi_client_send_msg_sync at PX4 startup during boot number 95
                                  • qmi_client_send_msg_sync at PX4 startup during boot number 131
                                  • qmi_client_send_msg_sync at PX4 startup during boot number 22

                                  After installing the new package:

                                  • 550 boots in a row without any failures during PX4 startup

                                  Going forward Cleo will be installing http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.4/binary-arm64/modalai-slpi_1.1.19_arm64.deb on all dronuts we build.

                                  Thank you for your help @Eric-Katzfey !

                                  Rowan DempsterR 1 Reply Last reply Reply Quote 0
                                  • Rowan DempsterR
                                    Rowan Dempster @Rowan Dempster
                                    last edited by

                                    @Eric-Katzfey Unfortunately it seems like the specific crash that was happening at 12 seconds after power in boot up was only one of the issues. After boot up is completed we are still seeing PX4 crashes with the same error message at about 77 seconds after power up:

                                    Mar 02 12:59:17 m0054 voxl-px4[1832]: terminate called after throwing an instance of 'qmi_error'
                                    Mar 02 12:59:17 m0054 voxl-px4[1832]:   what():  qmi_client_send_msg_sync() failed, (client_id=)0, result=0: qmi service error (-2)
                                    Mar 02 12:59:17 m0054 voxl-px4[1832]: /usr/bin/voxl-px4: line 140:  1838 Aborted                 GPS=$GPS RC=$RC OSD=$OSD EXTRA_STEPS=$EXTRA_STEPS px4 $DAEMON -s /usr/bin/voxl-px4-start
                                    Mar 02 12:59:17 m0054 systemd[1]: voxl-px4.service: Main process exited, code=exited, status=134/n/a
                                    Mar 02 12:59:17 m0054 systemd[1]: voxl-px4.service: Failed with result 'exit-code'.
                                    

                                    We were also able to get the dmesg from this system boot and these messages line up with when PX4 crashed and output the qmi_error:

                                    [   77.107460] Fatal error on slpi!
                                    [   77.107529] slpi subsystem failure reason: err_qdi.c:1079:PC=e61fc160,SP=317931e8,FP=31793268,LR=e621d784,BADVA=0,CAUSE=7003,TASK=Anonymous.
                                    [   77.107564] subsys-restart: subsystem_restart_dev(): Restart sequence requested for slpi, restart_level = RELATED.
                                    [   77.108605] adsprpc: fastrpc_restart_notifier_cb: slpi subsystem is restarting
                                    [   77.108612] subsys-restart: subsystem_shutdown(): [kworker/u19:0:2966]: Shutting down slpi
                                    [   77.120971] qcom_rpmh DRV:apps_rsc TCS Busy, retrying RPMH message send: addr=0x30030
                                    [   77.123099] adsprpc: fastrpc_rpmsg_remove: closed rpmsg channel of slpi
                                    [   77.123533] adsprpc: fastrpc_restart_notifier_cb: received RAMDUMP notification for slpi
                                    [   77.123932] coresight-remote-etm soc:ssc_etm0: Connection disconnected between QMI handle and 8 service
                                    [   77.123941] sysmon-qmi: ssctl_del_server: Connection lost between QMI handle and slpi's SSCTL service
                                    [   77.124485] subsys-restart: subsystem_powerup(): [kworker/u19:0:2966]: Powering up slpi
                                    [   77.124863] subsys-pil-tz 5c00000.qcom,ssc: slpi: loading from 0x0000000088c00000 to 0x000000008a600000
                                    [   77.198746] subsys-pil-tz 5c00000.qcom,ssc: slpi: Brought out of reset
                                    [   77.254413] subsys-pil-tz 5c00000.qcom,ssc: Subsystem error monitoring/handling services are up
                                    [   77.254573] subsys-pil-tz 5c00000.qcom,ssc: slpi: Power/Clock ready interrupt received
                                    [   77.259994] adsprpc: fastrpc_restart_notifier_cb: slpi subsystem is up
                                    [   77.259999] subsys-restart: subsystem_restart_wq_func(): [kworker/u19:0:2966]: Restart sequence for slpi completed.
                                    [   77.261053] -1836034584:Entered
                                    [   77.264781] -1836034584:SMD QRTR driver probed
                                    [   77.267518] sysmon-qmi: ssctl_new_server: Connection established between QMI handle and slpi's SSCTL service
                                    [   77.267568] coresight-remote-etm soc:ssc_etm0: Connection established between QMI handle and 8 service
                                    [   77.268271] adsprpc: fastrpc_rpmsg_probe: opened rpmsg channel for slpi
                                    [   77.274585] diag: In diag_send_peripheral_buffering_mode, buffering flag not set for 3
                                    
                                    Eric KatzfeyE 3 Replies Last reply Reply Quote 0
                                    • Eric KatzfeyE
                                      Eric Katzfey ModalAI Team @Rowan Dempster
                                      last edited by

                                      @Rowan-Dempster That program counter (PC=0xe61fc160) indicates that the crash happened in the loaded px4 library.

                                      1 Reply Last reply Reply Quote 0
                                      • Eric KatzfeyE
                                        Eric Katzfey ModalAI Team @Rowan Dempster
                                        last edited by

                                        @Rowan-Dempster There is a way to figure out where in the code the crash happened. I can update the modalai-slpi package to add a debug print in mini-dm to show the address where libpx4.so was loaded into memory. I just ran that and it showed that libpx4.so was loaded at address 0xe6120000. Once you know that you can disassemble libpx4.so to get the address map (For example: /local/mnt/workspace/Qualcomm/Hexagon_SDK/4.1.0.4/tools/HEXAGON_Tools/8.4.05/Tools/bin/hexagon-llvm-objdump -d build/modalai_voxl2-slpi_default/platforms/qurt/libpx4.so > dsp-image.dis). Take the address you get for the PC in dmesg (in this case 0xe61fc160), subtract off the base address (0xe61fc160 - 0xe6120000 = 0xdc160), then look up that address in the disassembled file. That will show you where it crashed. You also get the LR in the fatal error message so that can help show where it was called from.

                                        1 Reply Last reply Reply Quote 0
                                        • Eric KatzfeyE
                                          Eric Katzfey ModalAI Team @Rowan Dempster
                                          last edited by

                                          @Rowan-Dempster The new package is here: http://voxl-packages.modalai.com/dists/qrb5165/dev/binary-arm64/modalai-slpi_1.1.20-202504131441_arm64.deb

                                          Rowan DempsterR 1 Reply Last reply Reply Quote 0
                                          • Rowan DempsterR
                                            Rowan Dempster @Eric Katzfey
                                            last edited by

                                            @Eric-Katzfey Thank you for the awesome debugging tools! We are looking into narrowing down the crash using them today.

                                            We also noticed that going from modalai-slpi_1.1.19_arm64.deb to modalai-slpi_1.1.20-202504131441_arm64.deb also decreased the CPU util reported by mini-dm by 10% (36% to 26%), is that expected? What changed from 1.1.19 to 1.1.20? Which should we be using in production?

                                            Thank you!

                                            Eric KatzfeyE 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Powered by NodeBB | Contributors