PX4 qmi_error abort
-
Hey ModalAI PX4 users, has anyone been running into
qmi_error
causing the PX4 process to abort? At Cleo it happens at boot about 1/20 or 1/100 times. After booting successfully it's more rare about 1/200 times or 1/500 times.
Here's the full error fromjournalctl
:terminate called after throwing an instance of 'qmi_error' Mar 19 15:33:57 m0054 voxl-px4[1854]: what(): qmi_client_send_msg_sync() failed, (client_id=)0, result=0: qmi service error (-2) Mar 19 15:33:57 m0054 voxl-px4[1854]: /usr/bin/voxl-px4: line 140: 1868 Aborted GPS=$GPS RC=$RC OSD=$OSD EXTRA_STEPS=$EXTRA_STEPS px4 $DAEMON -s /usr/bin/voxl-px4-start Mar 19 15:33:57 m0054 systemd[1]: voxl-px4.service: Main process exited, code=exited, status=134/n/a Mar 19 15:33:57 m0054 systemd[1]: voxl-px4.service: Failed with result 'exit-code'.
-
@Rowan-Dempster Yes, we used to see this happen on older SDK versions. It was an indication that the DSP was crashing. It was happening at about that frequency. But there have been multiple bug fixes since then and as far as I know it no longer happens. Are you using a recent version of VOXL SDK? Have you made any modifications to the SDK?
-
@Eric-Katzfey Thanks for the response!
Are you using a recent version of VOXL SDK?
Cleo branched off of your repo at this tag: https://github.com/modalai/px4-firmware/tree/v1.14.0-2.0.36-dev
Have you made any modifications to the SDK?
Yup we actively development on the PX4 modules, including the controllers and the EKF that run on the DSP.
So it may be our code running on the DSP causing the DSP crash, or it could be related to the bugs in the https://github.com/modalai/px4-firmware/tree/v1.14.0-2.0.36-dev tag itself that you mentioned have been fixed.
As far as a path forward, are there any methods you can suggest for inspecting the DSP to find the root cause of crashes? Things we can add to the code, perhaps a debug mode we can run the DSP modules in, etc
Also, do you know of bug fix commits in your repo's mainline that we at Cleo can attempt to backport to our fork and see if we also no longer see the DSP crashes?
Thank you for your help,
Rowan -
@Eric-Katzfey Any insight into this ^
-
@Rowan-Dempster Sorry, not sure why I didn't see your response. Let me look through the commits to see if any of those important bug fixes have been added since then.
-
@Rowan-Dempster Of course v1.14.0-2.0.36 is extremely old and there have been a lot of improvements / fixes since then. But the fixes for DSP crashes were made in the modalai-slpi codebase. What version of modalai-slpi are you running? One critical bug fix was added in v1.1.9 and another in v1.1.14.
-
@Eric-Katzfey I am not familiar with the "modalai-slpi" codebase, could you elaborate on what that is.
-
@Rowan-Dempster That codebase is not open source since it is mostly proprietary Qualcomm code that runs on the DSP so you cannot inspect the code. But it is a standard package in the VOXL SDK. If you enter
voxl-version
it will show you all the versions of the installed SDK packages including the version of modalai-slpi. The latest version is located here: http://voxl-packages.modalai.com/dists/qrb5165/dev/binary-arm64/modalai-slpi_1.1.19-202407112016_arm64.deb -
@Eric-Katzfey Gotcha thanks for the info I didn't know about that! Is the version of modalai-slpi highly coupled with the version of PX4 that we are using, or can we update modalai-slpi to get bug fixes without having to worry about compatibility with a specific version of PX4?
I will look into which version of modalai-slpi we are using and get back to you!
-
@Rowan-Dempster Some of the newer features in voxl-px4 require later versions of modalai-slpi but newer versions of modalai-slpi should work fine with older versions of voxl-px4. So I think you should be okay moving to the newer modalai-slpi.
-
@Eric-Katzfey I do not see
modal-slpi
in the output of voxl-version:voxl2:/$ voxl-version | grep slpi qrb5165-slpi-test-sig 01-r0 voxl-slpi-uart-bridge 1.0.1
Here is the full output:
voxl2:/$ voxl-version -------------------------------------------------------------------------------- system-image: 1.7.8-M0054-14.1a-perf kernel: #1 SMP PREEMPT Sat May 18 00:10:25 UTC 2024 4.19.125 -------------------------------------------------------------------------------- hw version: M0054 -------------------------------------------------------------------------------- voxl-suite: 1.0.0 -------------------------------------------------------------------------------- Packages: Repo: http://voxl-packages.modalai.com/ ./dists/qrb5165/sdk-1.0/binary-arm64/ Last Updated: 2023-03-02 13:01:31 List: kernel-module-voxl-fsync-mod-4.19.125 1.0-r0 kernel-module-voxl-gpio-mod-4.19.125 1.0-r0 kernel-module-voxl-platform-mod-4.19.125 1.0-r0 libmodal-c2d 0.1 libmodal-cv 0.3.2 libmodal-exposure 0.0.0+89cd3ac03 libmodal-journal 0.2.2 libmodal-json 0.4.3 libmodal-pipe 2.10.3 libqrb5165-io 0.3.3 libvoxl-cci-direct 0.2.3 libvoxl-cutils 0.1.1 mv-voxl 0.1-r0 qrb5165-bind 0.1-r0 qrb5165-dfs-server 0.1.0 qrb5165-imu-server 0.6.0 qrb5165-slpi-test-sig 01-r0 qrb5165-system-tweaks 0.2.2 qrb5165-tflite 2.8.0-2 voxl-bind-spektrum 0.1.0 voxl-camera-calibration 0.4.0 voxl-camera-server 0.0.0+89cd3ac03 voxl-configurator 0.2.7 voxl-cpu-monitor 0.4.6 voxl-docker-support 1.2.5 voxl-eigen3 3.4.0 voxl-elrs 0.0.7 voxl-esc 1.2.2 voxl-feature-tracker 0.2.3 voxl-flow-server 0.3.3 voxl-fsync-mod 1.0-r0 voxl-gphoto2-server 0.0.10 voxl-gpio-mod 1.0-r0 voxl-imu-server 0.0.0+89cd3ac03 voxl-jpeg-turbo 2.1.3-5 voxl-lepton-server 1.1.2 voxl-libgphoto2 0.0.4 voxl-libuvc 1.0.7 voxl-logger 0.3.4 voxl-mavcam-manager 0.5.1 voxl-mavlink 0.1.1 voxl-mavlink-server 1.2.0 voxl-microdds-agent 2.4.1-0 voxl-modem 1.0.5 voxl-mongoose 7.7.0-1 voxl-mpa-to-ros 0.3.6 voxl-mpa-tools 1.0.4 voxl-opencv 4.5.5-1 voxl-platform-mod 1.0-r0 voxl-portal 0.5.9 voxl-px4 1.14.0-2.0.36+deb voxl-px4-imu-server 0.1.2 voxl-px4-params 0.1.8 voxl-qvio-server 0.0.0+89cd3ac03 voxl-remote-id 0.0.8 voxl-slpi-uart-bridge 1.0.1 voxl-streamer 0.0.0+89cd3ac03 voxl-suite 1.0.0 voxl-tag-detector 0.0.4 voxl-tflite-server 0.3.1 voxl-utils 1.3.1 voxl-uvc-server 0.1.6
-
@Eric-Katzfey The first distro I see modalai-slpi in is http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.2/binary-arm64/
We install http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.0/binary-arm64/ which is probably why I don't see it in
voxl-version
!Is it okay for me to install the latest distro's (http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.4/binary-arm64/) modalai-slpi on my voxl2 alongside the existing older software, or will that break anything?
If modalai-slpi has something to do with PX4 communication with the SLPI, how is it possible that I don't have any version of modalai-slpi installed but PX4 can still run software on the SLPI?
Thank you,
Rowan
-
@Rowan-Dempster The SLPI image used to be part of the main system image. It was then separated out into it's own package for easier maintenance. So you have a very old version missing many important bug fixes. We've never tried installing the latest modalai-slpi package on an old SDK but I think it will work. Give it a try and see what happens. But, obviously, it's really hard for us to support you when you use such old software with custom modifications on top of it. You should really try to make any customization such that they can easily be used with newer versions of VOXL SDK as they come out.
-
The SLPI image used to be part of the main system image. It was then separated out into it's own package for easier maintenance.
Gotcha makes sense!
So you have a very old version missing many important bug fixes. We've never tried installing the latest modalai-slpi package on an old SDK but I think it will work. Give it a try and see what happens.
Will do, just wanted to confirm that it "might work" so not totally wasting my time exploring this avenue haha.
But, obviously, it's really hard for us to support you when you use such old software with custom modifications on top of it. You should really try to make any customization such that they can easily be used with newer versions of VOXL SDK as they come out.
Yes this is something that we constantly run into at Cleo as a small company trying release a stable product but also keep up with the latest and greatest from modal and other open source vendors. As a dev at Cleo it feels like two things are pulling in opposite directions:
- Having your base platform constantly updating, which then requires patches for API changes and sometimes more low level incompatibilities that come along with those updates.
- Getting the stability and functional improvements that come along with those base platform updates.
I'm sure these competing forces are felt by others as well, not just at Cleo. It's a conversation that is perhaps worthy of a call between Cleo and Modal devs to get aligned on the best way to get the stability and functional improvements from each vendor (modalai) release while at the same time minimizing the Cleo dev time needed to do those API patches and minimize incompatibilities or at least forecast them before spending the time trying to do an upgrade and then finding a incompatibility.