voxl-px4 service hangs at startup sometimes with SDK 1.1.2
-
Hello community,
I have been using VOXL2 with SDK 1.1.2 with latest PX4 1.14 firmware on top. While this has been significant upgrade since last SDK and PX4 firmware version, I am experiencing strange problem with my voxl-px4 startup service. This service starts on boot but GPS LED (Holybro M8N) dosent' blink sometimes and even GCS dosent get connected to firmware. This then needs manual intervention like ssh into voxl and then perform "systemctl restart voxl-px4" or in worst case hard reboot since service restart also goes into same loop of not starting PX4 correctly.
If you are lucky, sometimes service starts correctly and everything is fine.Please find logs below and help me ahead to debug this problem.
voxl2:~$ /usr/bin/voxl-px4 [INFO] Reading from /etc/modalai/voxl-px4.conf Found DSP signature file [INFO] Daemon mode enabled ************************* GPS=AUTODETECT RC=M0065_SBUS ESC=VOXL2_IO_PWM_ESC POWER MANAGER=VOXLPM DISTANCE SENSOR=NONE OSD=DISABLE DAEMON_MODE=ENABLE SENSOR_CAL=ACTUAL EXTRA STEPS: ************************* INFO [px4] mlockall() enabled. PX4's virtual address space is locked into RAM. INFO [px4] assuming working directory is rootfs, no symlinks needed. INFO [muorb] Got muorb init command Sending initialization request Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete INFO [muorb] SLPI: muorb aggregator thread running INFO [muorb] muorb protobuf initalize method succeeded INFO [muorb] succesfully did ADVERTISE_TEST_TYPE INFO [muorb] SLPI: Creating pthread test_MUORB INFO [muorb] SLPI: Successfully created px4 task PX4_test_MUORB with tid 2097656 INFO [muorb] succesfully did SUBSCRIBE_TEST_TYPE INFO [muorb] SLPI: Creating pthread test_MUORB INFO [muorb] SLPI: Successfully created px4 task PX4_test_MUORB with tid 2097655 INFO [muorb] succesfully did TOPIC_TEST_TYPE INFO [muorb] SLPI: Creating pthread test_MUORB INFO [muorb] SLPI: Successfully created px4 task PX4_test_MUORB with tid 2097654 INFO [muorb] succesfully did UNSUBSCRIBE_TEST_TYPE INFO [muorb] SLPI: Creating pthread test_MUORB INFO [muorb] SLPI: Successfully created px4 task PX4_test_MUORB with tid 2097653 INFO [muorb] muorb test passed ______ __ __ ___ | ___ \ \ \ / / / | | |_/ / \ V / / /| | | __/ / \ / /_| | INFO [muorb] SLPI: Advertising remote topic log_message | | / /^\ \ \___ | \_| \/ \/ |_/ px4 starting. INFO [parameters] Starting param sync THREAD INFO [px4] startup script: /bin/sh /usr/bin/voxl-px4-start 0 ************************* GPS: AUTODETECT RC: M0065_SBUS ESC: VOXL2_IO_PWM_ESC POWER MANAGER: VOXLPM DISTANCE SENSOR: NONE OSD: DISABLE EXTRA STEPS: ************************* Running on M0054 INFO [muorb] SLPI: Starting param sync THREAD INFO [muorb] SLPI: before starting the qshell_entry task INFO [muorb] SLPI: Creating pthread qshell INFO [muorb] SLPI: Successfully created px4 task PX4_qshell with tid 2097652 INFO [muorb] SLPI: qshell entry..... INFO [muorb] SLPI: after starting the qshell_entry task INFO [muorb] SLPI: Init app map initialized INFO [param] selected parameter default file /data/px4/param/parameters INFO [muorb] SLPI: Marking DeviceNode(parameter_client_reset_request) as advertised in process_remote_topic INFO [uORB] Marking DeviceNode(parameter_client_reset_response) as advertised in process_remote_topic INFO [muorb] SLPI: Advertising remote topic parameter_update INFO [muorb] SLPI: Marking DeviceNode(parameter_client_set_value_request) as advertised in process_remote_to INFO [uORB] Marking DeviceNode(parameter_server_set_used_request) as advertised in process_remote_topic INFO [muorb] SLPI: Marking DeviceNode(parameter_server_set_used_response) as advertised in process_remote_to INFO [uORB] Marking DeviceNode(parameter_client_set_value_response) as advertised in process_remote_topic INFO [parameters] BSON document size 2650 bytes, decoded 2650 bytes (INT32:47, FLOAT:75) INFO [logger] logger started (mode=all) Starting IMU driver with no rotation INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: icm42688p start -s INFO [muorb] SLPI: arg0 = 'icm42688p' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-s' INFO [qshell] Send cmd: 'icm42688p start -s' INFO [muorb] SLPI: *** SPI Device ID 0x26000a 2490378 INFO [uORB] Advertising remote topic sensor_accel INFO [uORB] Advertising remote topic sensor_gyro INFO [muorb] SLPI: ICM42688P::probe successful! INFO [muorb] SLPI: on SPI bus 1 INFO [muorb] SLPI: icm42688p #0 on SPI bus 1 INFO [muorb] SLPI: INFO [muorb] SLPI: >>> ICM42688P this: 3176ec20 INFO [muorb] SLPI: Ok executing command: icm42688p start -s INFO [uORB] Advertising remote topic qshell_retval INFO [muorb] SLPI: >>> ICM42688P this: 3176ec20 INFO [qshell] qshell return value timestamp: 174626437, local time: 174631744 INFO [muorb] SLPI: >>> ICM42688P this: 3176ec20 INFO [muorb] SLPI: Register interrupt b21d31a4 e61fedec 3176ec20 INFO [uORB] Advertising remote topic sensor_gyro_fifo INFO [uORB] Advertising remote topic sensor_accel_fifo INFO [uORB] Advertising remote topic imu_server INFO [qshell] Send cmd: 'icp101xx start -I -b 5' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: icp101xx start -I -b 5 INFO [muorb] SLPI: arg0 = 'icp101xx' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-I' INFO [muorb] SLPI: arg3 = '-b' INFO [muorb] SLPI: arg4 = '5' INFO [muorb] SLPI: *** I2C Device ID 0xb76329 12018473 INFO [muorb] SLPI: icp101xx #0 on I2C bus 5 INFO [muorb] SLPI: address 0x63 INFO [muorb] SLPI: INFO [muorb] SLPI: Ok executing command: icp101xx start -I -b 5 INFO [qshell] qshell return value timestamp: 174670478, local time: 174674102 Looking for qmc5883l magnetometer INFO [qshell] Send cmd: 'qmc5883l start -R 10 -X -b 1' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: qmc5883l start -R 10 -X -b 1 INFO [muorb] SLPI: arg0 = 'qmc5883l' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-R' INFO [muorb] SLPI: arg3 = '10' INFO [muorb] SLPI: arg4 = '-X' INFO [muorb] SLPI: arg5 = '-b' INFO [muorb] SLPI: arg6 = '1' INFO [muorb] SLPI: *** I2C Device ID 0x80d09 527625 ERROR [muorb] SLPI: i2c probe failed INFO [muorb] SLPI: PX4_qshell: no instance started (no device on bus?) ERROR [muorb] SLPI: Failed to execute command: qmc5883l start -R 10 -X -b 1 INFO [qshell] cmd returned with: -1 INFO [qshell] qshell return value timestamp: 174713285, local time: 174716440 ERROR [qshell] Command failed Looking for ist8310 magnetometer INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: ist8310 start -R 10 -X -b 1 INFO [muorb] SLPI: arg0 = 'ist8310' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-R' INFO [muorb] SLPI: arg3 = '10' INFO [muorb] SLPI: arg4 = '-X' INFO [muorb] SLPI: arg5 = '-b' INFO [muorb] SLPI: arg6 = '1' INFO [qshell] Send cmd: 'ist8310 start -R 10 -X -b 1' INFO [muorb] SLPI: *** I2C Device ID 0x60e09 396809 INFO [muorb] SLPI: ist8310 #0 on I2C bus 1 INFO [muorb] SLPI: (external) INFO [muorb] SLPI: address 0xE INFO [muorb] SLPI: rotation 10 INFO [muorb] SLPI: INFO [muorb] SLPI: Ok executing command: ist8310 start -R 10 -X -b 1 INFO [muorb] SLPI: >>> ICM42688P this: 3176ec20 INFO [qshell] qshell return value timestamp: 174743444, local time: 174747090 INFO [qshell] Send cmd: 'gps start' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: gps start INFO [muorb] SLPI: arg0 = 'gps' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: Creating pthread gps INFO [muorb] SLPI: Successfully created px4 task PX4_gps with tid 2097647 INFO [muorb] SLPI: Ok executing command: gps start INFO [qshell] qshell return value timestamp: 174786779, local time: 174788641 Looking for ncp5623c RGB LED INFO [muorb] SLPI: GPS UART baudrate set to 115200 INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: rgbled_ncp5623c start -X -b 1 -f 400 -a 56 INFO [muorb] SLPI: arg0 = 'rgbled_ncp5623c' INFO [muorb] SLPI: arg1 = 'start' INFO [qshell] Send cmd: 'rgbled_ncp5623c start -X -b 1 -f 400 -a 56' INFO [muorb] SLPI: arg2 = '-X' INFO [muorb] SLPI: arg3 = '-b' INFO [muorb] SLPI: arg4 = '1' INFO [muorb] SLPI: arg5 = '-f' INFO [muorb] SLPI: arg6 = '400' INFO [muorb] SLPI: arg7 = '-a' INFO [muorb] SLPI: arg8 = '56' INFO [muorb] SLPI: *** I2C Device ID 0x7b3809 8075273 INFO [muorb] SLPI: rgbled_ncp5623c #0 on I2C bus 1 INFO [muorb] SLPI: (external) INFO [muorb] SLPI: address 0x38 INFO [muorb] SLPI: INFO [muorb] SLPI: Ok executing command: rgbled_ncp5623c start -X -b 1 -f 400 -a 56 INFO [qshell] qshell return value timestamp: 174824162, local time: 174825291 INFO [uORB] Advertising remote topic sensor_mag INFO [qshell] Send cmd: 'pca9685_pwm_out start -a 0x40 -b 4' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: pca9685_pwm_out start -a 0x40 -b 4 INFO [muorb] SLPI: arg0 = 'pca9685_pwm_out' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-a' INFO [muorb] SLPI: arg3 = '0x40' INFO [muorb] SLPI: arg4 = '-b' INFO [muorb] SLPI: arg5 = '4' INFO [uORB] Advertising remote topic sensor_baro INFO [uORB] Advertising remote topic actuator_outputs INFO [muorb] SLPI: *** I2C Device ID 0x694021 6897697 INFO [muorb] SLPI: running on I2C bus 4 address 0x40 INFO [muorb] SLPI: Ok executing command: pca9685_pwm_out start -a 0x40 -b 4 INFO [qshell] qshell return value timestamp: 175125498, local time: 175128045 INFO [muorb] SLPI: PCA9685 PWM frequency: target=50.00 real=50.03 Starting VOXL IO for PWM ESC with SBUS RC INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'voxl2_io start' INFO [muorb] SLPI: qshell gotten: voxl2_io start INFO [muorb] SLPI: arg0 = 'voxl2_io' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Marking DeviceNode(actuator_outputs) as advertised in process_remote_topic INFO [muorb] SLPI: Opened UART connection to M0065 device on port 7 INFO [muorb] SLPI: u-blox firmware version: SPG 3.01 INFO [muorb] SLPI: u-blox protocol version: 18.00 INFO [muorb] SLPI: u-blox module: NEO-M8N-0 INFO [uORB] Advertising remote topic sensor_gps ERROR [qshell] command timed out ERROR [qshell] Command failed M0065 SBUS RC driver already started with PWM ESC start INFO [qshell] Send cmd: 'voxlpm start -X -b 2' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic
-
@Aaky Can you provide the output of
dmesg
after a failure? This will help us to debug the issue. -
@Aaky How often do you see it fail to start properly?
-
@Aaky If you comment out the
qshell gps start
command in/usr/bin/voxl-px4-start
does it prevent the failure from happening? -
Here's output of dmesg when this happens:
Link to download dmesg logs : https://drive.google.com/file/d/143QVl4bhNdHPD0i8ACgGWLt7COsrljcy/view?usp=sharing
I see this happening more often, say 5 out of 10 times.
if i comment qshell gps start from /usr/bin/voxl-px4-start stillresult is similar, sometimes it starts sometimes it dosent.
Let me know further steps.
-
@Aaky What drone are you using? Is it a custom drone? In your first post you included the output from px4. But it doesn't look complete. Are you sure that is the complete log? If so, does it look like that every time?
-
@Eric-Katzfey Yes Eric its custom drone and the logs are actually complete posted in first post. Still if possible I will see if I can provide you with more logs. Any idea from these logs about the problem?
-
@Aaky There's only one log in the first message and it doesn't look complete. The
command timed out
andCommand failed
messages seem to be related to the attempt to start thevoxl2_io
driver. Can you comment that part out so that it skips trying to start the voxl2_io driver and see if it comes up then? -
@Eric-Katzfey Actually this driver failing happens very randomly, not always and also not because of any one particular driver in general. For example check one more dmesg and reset voxl-px4 log where voxl2_io driver spawned up correcly but still GPS LED wasn't blinking red and neither px4 got connected to GCS.
voxl-px4-logs : https://drive.google.com/file/d/1PHqQLldXs5XFKhHvenWqy-0nDLUOfbDB/view?usp=sharing
dmesg logs: https://drive.google.com/file/d/1ZOK7-ypm_9BfgbE4khD2NLKFdTecTHXD/view?usp=sharing
Let me know any hints to further debug this issue.
-
@Aaky , can you please confirm that you can reproduce the issue even with GPS driver completely disabled?
-
@Alex-Kushleyev Yes I confirm even if I disable GPS driver can reproduce this issue.
-
@Aaky Those 2 logs look normal except for the GPS taking a while to finally connect to the unit and advertise sensor_gps topic. But nothing to indicate there was any crash. So the GPS LED wasn't illuminated at all? Or was a different color? And PX4 not connecting to GCS doesn't necessarily mean that PX4 is having any problems.
-
@Eric-Katzfey Yes PX4 dosent crash but dosent get connected to QGC inspite of having good connectivity (I have been performing ping tests). GPS LED dosent blink at all when this particular scenario occurs. As I have told before this is very random occuring problem.
-
@Aaky I'm having trouble following your comments. You say that px4 does not crash. But in the first log you posted it stops only partially through the startup script without any error indications. In that case is PX4 still running? If so, it seems like it is stuck somehow. If not, perhaps something is killing the px4 process. When this happens, adb into the voxl2 and look at
voxl-inspect-services
. Does it show voxl-px4 enabled and running? You say you are running this on a custom drone. Do you also have a reference drone from ModalAI? If so, does everything work fine on that drone? That is the main purpose that we sell those reference drones. It allows you to see everything working so that when you build a custom drone and something doesn't work you can compare against the reference drone. -
@Eric-Katzfey Hi Eric, My apologies for the confusion.
I will correct my statement, PX4 never crashes, it just gets stuck which means the service is always running but never connects to QGC. On our custom drone post power boot up whenever we see GPS LED isn't blinking at all (neither red nor green) we come to know PX4 service is stuck somewhere. At this point if I ssh into VOXL2 and check voxl-inspect-services, I can see vozl-px4 service is definately running but GPS LED isn't blinking and also PX4 isn't connecting to QGC. These are the observations. Then if I do systemctl restart vozl-px4 , then sometimes problem gets solved and PX4 starts without getting stuck anywhere. I clarify once again, PX4 never crashes but just gets stuck somewhere. All all the logs starting from my first post involves the issue which I have described in this comment.
I am very old customer of ModalAI and have been working since voxl1 on multiple custom drones so I definitely understand the underlying architecture Eric.I hope this comment clarifies the confusion.
-
@Aaky Okay, this isn't anything we have encountered before. I'm not sure what could make PX4 get stuck like that. When it is in the stuck condition what does CPU utilization look like?
-
@Eric-Katzfey Here's screenshot of cpu utilization when this happpens. Everything looks normal.
-
@Aaky Well, unfortunately it isn't obvious what is going wrong. You will have to start by disabling everything and when px4 comes up reliably start enabling things to find out the culprit.