Firmware Structure (Adding VTOL support) - Clarifications
-
@Eric-Katzfey Those timeouts indicate that the processor is overloaded. Perhaps try enabling the new modules one by one to see if you can figure out which one is taking up too much time.
-
@Eric-Katzfey surprisingly after leaving voxl2 overnight the QGC connection stopped being an issue. However, baro/mag timeout still occurs.
However, I wonder how I can check if this is indeed processor overload. When I am running voxl-px4-hitl + connect voxl2 to QGC the total CPU usage is only slightly over 70%:
Similarly, when I looked at mavlink messages, nothing really stood out as too heavy:
-
onboard services:
-
from gcs:
Some of the things I tried:
- adjusting -r parameter in
voxl-px4-hitl-start
script. I went as high as 500000. That affected neither the cpu use nor the frequency of timeout error for baro/mag. Specifically, this is line I changed
# start the onboard fast link to connect to voxl-mavlink-server mavlink start -x -u 14556 -o 14557 -r 100000 -n lo -m onboard
- I also tried adjusting upd_mtu in
/etc/modalai/voxl-mavlink-server.conf
. I went as high as 300, but it had no effect either. - adjusted dsp_hitl baudrate to 2000000 in
voxl-px4-hitl-start
. Also with little to no effect
qshell dsp_hitl start -g -m -o -b 2000000
Correct me if I am wrong, but as far as I understand the issue must be DSP communicating with Linux side. Where could I look to try and speed it up if possible?
i also may be able to get ARK V6 Flight Controller to run with voxl2 - would that be able to solve my issue?
Thank you in advance for your response
-
-
This post is deleted! -
@garret You need to look at the CPU utilization in the DSP where the majority of the PX4 code is running.
top
will show you only what's happening on the applications processor (Linux) side. The PX4 log contains this information. -
@garret If the Iris configuration doesn't produce timeouts then possibly it's Gazebo not sending sensor updates fast enough. But I'm just guessing at this point. We've never tried a VTOL configuration so you are in new territory here. I'd first take a PX4 log and verify that the DSP CPU load is not exceeding 65 - 70%. Then perhaps start putting some diagnostic prints around the reception of the sensor messages from Gazebo and see how often they are being sent. Just have to start isolating the root cause.
-
@Eric-Katzfey I tried examining the logs for default modal AI px4 firmware IRIS configuration and my customized firmware with vtol modules enabled. Strangely, the CPU usage seems identical and above 75% even in the default case.
Log plots here: https://drive.google.com/file/d/1UZfAYMRQ-VfYnyf1u__l9NAUHnM8Q0PJ/view?usp=sharing.
I assume one of the cpus is the apps one and the other is the dsp one. This is still strange, since according to my comment above the total cpu usage of the apps was slightly over 70%.
@garret said in Firmware Structure (Adding VTOL support) - Clarifications:
total CPU usage is only slightly over 70%
As for the gazebo sending messages, still looking into it. However, I did some quick diagnostic and found that gazebo's publication rate seems sufficient:
- IRIS case
root@nikita-ThinkPad-P16s-Gen-2:/usr/workspace/voxl2_hitl_gazebo# gz topic -d 1 -z /gazebo/modalai_world/modalai/mag Hz: 89.91 Hz: 88.87 Hz: 76.84 Hz: 105.65 Hz: 87.52 Hz: 90.77 Hz: 83.98 Hz: 85.79 Hz: 57.25 Hz: 134.63 Hz: 82.31 Hz: 88.04 Hz: 48.27 Hz: 227.83 Hz: 87.30 Hz: 75.15 Hz: 60.36 Hz: 105.97 Hz: 101.91 Hz: 92.13 Hz: 89.28 Hz: 83.36 Hz: 88.95 Hz: 88.49 Hz: 55.99 Hz: 156.46 Hz: 85.59 Hz: 90.70 Hz: 50.35 Hz: 129.66 Hz: 92.53 Hz: 87.62 Hz: 84.50 Hz: 66.56 Hz: 90.34 Hz: 89.11 Hz: 73.71 Hz: 58.14 Hz: 117.21 Hz: 88.86 Hz: 49.12 Hz: 179.37 Hz: 91.40 Hz: 89.39 Hz: 62.59 Hz: 108.00 Hz: 87.86 Hz: 75.88 Hz: 84.17 Hz: 77.79 Hz: 72.78 Hz: 87.25 Hz: 93.21 Hz: 64.89 Hz: 73.10 Hz: 92.97 Hz: 53.24 Hz: 196.72 Hz: 90.09 Hz: 89.55 Hz: 48.60 Hz: 121.67 Hz: 68.07 Hz: 96.66 Hz: 91.59 Hz: 89.99 Hz: 91.62 Hz: 72.11 Hz: 91.43 Hz: 78.47 Hz: 91.06 Hz: 81.71 Hz: 72.23 Hz: 93.85 Hz: 82.83 Hz: 84.47 Hz: 73.80 Hz: 114.91 Hz: 85.34 Hz: 95.42 Hz: 79.51 root@nikita-ThinkPad-P16s-Gen-2:/usr/workspace/voxl2_hitl_gazebo# gz topic -d 1 -z /gazebo/modalai_world/modalai/baro Hz: 45.30 Hz: 33.52 Hz: 40.19 Hz: 38.64 Hz: 39.93 Hz: 45.78 Hz: 41.65 Hz: 42.63 Hz: 44.85 Hz: 39.09 Hz: 45.65 Hz: 36.76 Hz: 41.51 Hz: 43.36 Hz: 34.07 Hz: 47.10 Hz: 45.87 Hz: 39.17 Hz: 53.24 Hz: 40.17 Hz: 37.47 Hz: 37.41 Hz: 50.90 Hz: 37.27 Hz: 53.60 Hz: 38.04 Hz: 45.86 Hz: 43.55 Hz: 43.98 Hz: 44.47 Hz: 40.48 Hz: 39.65 Hz: 44.65 Hz: 43.66 Hz: 41.32 Hz: 42.98 Hz: 40.23 Hz: 44.23 Hz: 42.25 Hz: 30.15
- VTOL case
root@nikita-ThinkPad-P16s-Gen-2:/usr/workspace/voxl2_hitl_gazebo# gz topic -d 1 -z /gazebo/standard_vtol_hitl_world/standard_vtol_hitl/mag Hz: 261.34 Hz: 268.29 Hz: 165.38 Hz: 301.10 Hz: 219.42 Hz: 318.94 Hz: 258.53 Hz: 260.81 Hz: 259.69 Hz: 261.20 Hz: 230.35 Hz: 246.97 Hz: 105.02 Hz: 27011.70 Hz: 584.93 Hz: 245.19 Hz: 210.23 Hz: 137.50 Hz: 317.01 Hz: 217.37 Hz: 213.23 Hz: 245.10 Hz: 102.70 Hz: 7085.62 Hz: 255.86 Hz: 214.26 Hz: 358.80 Hz: 135.56 Hz: 601.92 Hz: 197.99 Hz: 289.66 Hz: 146.01 Hz: 710.52 Hz: 285.12 Hz: 183.18 Hz: 283.64 Hz: 267.36 Hz: 196.66 Hz: 293.54 Hz: 261.75 Hz: 189.79 Hz: 285.15 Hz: 269.74 Hz: 210.62 Hz: 247.68 Hz: 278.95 Hz: 92.07 Hz: 9.93 Hz: 39858.11 Hz: 249.81 Hz: 2192.14 Hz: 263.26 Hz: 255.81 Hz: 268.84 Hz: 204.21 Hz: 119.26 Hz: 9.63 Hz: 9.72 Hz: 9.71 Hz: 9.96 Hz: 2776.08 Hz: 133.24 Hz: 274.92 Hz: 823.84 Hz: 202.26 Hz: 276.13 Hz: 276.10 Hz: 216.36 Hz: 339.81 Hz: 270.43 Hz: 266.76 Hz: 261.38 Hz: 265.82 Hz: 268.36 Hz: 256.52 Hz: 276.77 Hz: 93.41 Hz: 1257.61 Hz: 263.08 Hz: 263.41 Hz: 146.94 Hz: 270.98 Hz: 243.40 Hz: 283.23 Hz: 150.90 Hz: 1511.17 Hz: 274.21 Hz: 258.02 Hz: 144.53 Hz: 2885.13 Hz: 247.98 Hz: 137.59 Hz: 829.43 Hz: 247.74 Hz: 254.85 Hz: 264.31 Hz: 209.19 Hz: 275.45 Hz: 267.43 Hz: 270.33 Hz: 245.57 Hz: 228.72 Hz: 232.26 Hz: 264.85 Hz: 93.60 Hz: 49265.94 Hz: 499.46 Hz: 233.43 Hz: 273.17 Hz: 234.92 Hz: 233.14 Hz: 258.48 Hz: 278.42 Hz: 225.91 Hz: 280.44 Hz: 135.68 Hz: 276.36 Hz: 134.54 Hz: 282.50 root@nikita-ThinkPad-P16s-Gen-2:/usr/workspace/voxl2_hitl_gazebo# gz topic -d 1 -z /gazebo/standard_vtol_hitl_world/standard_vtol_hitl/baro Hz: 132.49 Hz: 129.18 Hz: 129.00 Hz: 130.49 Hz: 111.43 Hz: 151.83 Hz: 97.87 Hz: 133.37 Hz: 108.81 Hz: 145.29 Hz: 88.34 Hz: 257.54 Hz: 127.70 Hz: 102.37 Hz: 131.12 Hz: 124.66 Hz: 104.16 Hz: 135.17 Hz: 117.07 Hz: 123.88 Hz: 134.75 Hz: 130.16 Hz: 134.76 Hz: 70.86 Hz: 577.87 Hz: 123.25 Hz: 120.28 Hz: 104.71 Hz: 179.62 Hz: 125.18 Hz: 119.61 Hz: 121.50 Hz: 148.17 Hz: 126.87 Hz: 103.36 Hz: 132.89 Hz: 145.11 Hz: 114.83 Hz: 113.25 Hz: 148.53 Hz: 132.36 Hz: 128.08 Hz: 124.23 Hz: 72.73 Hz: 282.04 Hz: 127.52 Hz: 128.52 Hz: 124.72 Hz: 121.43 Hz: 112.05 Hz: 126.82 Hz: 108.68 Hz: 165.72 Hz: 128.03 Hz: 124.98 Hz: 100.47 Hz: 9.51 Hz: 137.65 Hz: 93.98 Hz: 88.69 Hz: 135.25 Hz: 129.01 Hz: 130.18 Hz: 89.62 Hz: 143.84 Hz: 126.38 Hz: 138.17 Hz: 133.42 Hz: 122.09 Hz: 136.52 Hz: 136.22 Hz: 117.27 Hz: 54.04 Hz: 39575.75 Hz: 160.93 Hz: 127.02 Hz: 87.96 Hz: 260.10 Hz: 93.48 Hz: 110.70 Hz: 144.94 Hz: 129.48 Hz: 132.85 Hz: 119.29 Hz: 123.63 Hz: 143.99 Hz: 103.66 Hz: 136.10 Hz: 9.17
Weirdly there is some frequency jumps with VTOL world. Could this be the cause of the error? If you have an idea where these come from , I would appreciate the hint.
Although, when looking at the logs again, this is the magnetometer plots:https://drive.google.com/file/d/1yxWfiHGunIUJAk8WNnSOku5CxXks31so/view?usp=sharing
Apologies for small dot thickness, but it can be noted that magnetometer coming from the vehicle is at twice the frequency that it is coming from the sensor. Ironically, these rates are also off from the frequencies pasted above. For IRIS default the vehicle's magnetometer is at about 2Hz, and same for VTOL. Sensor frequency for IRIS is about 1Hz and same for VTOL. I wonder how this is possible, since for EKF2 to run smoothly it requires at least 5Hz update rate on sensors. I assume I am misinterpreting the data, so if you spot a mistake in my analysis please let me know.
-
@Eric-Katzfey An update from me:
I have looked at the cpu usage from voxl-portal and got even more confused. When comparing with my previous posts, the results differ again. To summarize:
- after adb shell into voxl2, "top" suggested total linux cpu usage of ~70% when running px4
- voxl-inspect-services gave about ~15%
- PX4 logs give varying cpu usage from 0% to 88% depending on the core
- voxl portal outputs about 10% cpu usage on every core when running px4 hitl sim.
Is there a reason all these sources differ so much?
I also looked further into potential reasons for timeout errors on vtol. Keeping px4 parameters identical, the error shows up only when loading a vtol frame in gazebo docker. I wonder if there is anything special that you can think of about how gazebo docker is treating a vtol vs multicopter? I looked into both .world and .sdf files line by line, but nothing stood out. Wondering if there is something I am missing somewhere else.
Lastly, to follow up on super low rate for "sensor_mag" and "vehicle_magnetometer" from the px4 log.
@garret said in Firmware Structure (Adding VTOL support) - Clarifications:https://drive.google.com/file/d/1yxWfiHGunIUJAk8WNnSOku5CxXks31so/view?usp=sharing
Identical frequency in both multicopter and vtol case makes me wonder if there is a hardcoded bottleneck somewhere for this specific data - could you point it out to me if that is the case?
Hoping to hear from you and thank you for your cooperation.
-
@garret You have to be careful looking at CPU utilization on the Linux side. Sometimes the number reflects usage on a single core or the usage of a single process and sometimes it is total usage across all cores. The utilization on the DSP is what usually matters more for PX4 since most of the code runs on the DSP.
-
@garret Unfortunately, I only have experience with the IRIS model in Gazebo so cannot comment on why it would act differently with the VTOL.
-
@garret This is the code that deals with the Mavlink messages coming in from Gazebo and the sending of Mavlink messages back to Gazebo: https://github.com/modalai/px4-firmware/blob/voxl-dev/boards/modalai/voxl2-slpi/src/drivers/dsp_hitl/dsp_hitl.cpp
-
@Eric-Katzfey I am 99% confident I found the issue. When running gazebo for IRIS, the real time factor is 1. When running vtol the real time factor drops to ~0.5 or lower. I just tried running gazebo for iris and manually lower the real time factor. i was able to replicate the issue. I think it is because in the mavlink communication script you shared the update for mag and baro happens only in cases where HIL_SENSOR messege is receiver. If the message is delayed, the sensors timeout.
I will start looking into troubleshooting, but if you know what can be done to speed up gazebo's computation or to ease the load of VTOL model, I would appreciate the hint
-
@garret I also wonder if there is a reason for running gazebo provided in docker except for iris comparability? Could i theoretically run voxl2 with a newer version of gazebo? It seems like those might be more efficient with modelling physics