DSP Tasks Failing Unless mini-dm is Run
-
We are experiencing issues with voxl-px4. We notice on our GCS that we are constantly gaining and loosing rc connection and the mavlink telemetry stream is extremely laggy.
We seem to be observing strange behavior with tasks running on o observing strange behavior with tasks running on our DSP. If we start PX4 from a cold boot, it never works correctly, and we see sensor timeouts printing on the output of voxl-px4 from the SLPI shown in the console output below.
Even more strange is that when we plug in a usbc cable to our voxl2 and run mini-dm the problem seems to go away. After mini-dm is run once we are able to stop and start px4 with no issue. Once we power cycle our voxl then the problem returns until we plug in and run mini-dm again.
We are running a slightly modified version of this commit of px4.
https://github.com/modalai/px4-firmware/tree/cf055af349bcd7a141640936184e03054863022bit is the commit that is tagged in the voxl-px4 sdkv1.1.1 tag.
this is the voxl2 sku info
root@microUAS1:~/mav_ws# voxl-inspect-sku family code: MCCA-M0054 (voxl2-board-only) compute board: 4 (voxl2) hw version: 1 cam config: 8 SKU: MCCA-M0054-C8
we are also using the voxl2io board which we believe has the legacy firmware (red light with blinking blue light) to communicate with our sbus reciever.
Do you all have any thoughts on what might be happening here?
here is what our GCS shows:
here is the console output from running
root@microUAS1:~/mav_ws# voxl-px4 -g -n -s [INFO] Reading from /etc/modalai/voxl-px4.conf Found DSP signature file [INFO] GR-12L SBUS Reciever Selected [INFO] MRO M9N with MAG IST8308 selected [INFO] SDP3X airspeed sensor selected [INFO] Daemon mode enabled ************************* GPS=MRO_M9N RC=M0065_SBUS ESC=VOXL_ESC POWER MANAGER=VOXLPM DISTANCE SENSOR=NONE OSD=DISABLE DAEMON_MODE=ENABLE SENSOR_CAL=ACTUAL EXTRA STEPS: ************************* INFO [px4] mlockall() enabled. PX4's virtual address space is locked into RAM. INFO [px4] assuming working directory is rootfs, no symlinks needed. INFO [muorb] Got muorb init command Sending initialization request Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete Got topic data before configuration complete INFO [muorb] SLPI: muorb aggregator thread running INFO [muorb] muorb protobuf initalize method succeeded INFO [muorb] succesfully did ADVERTISE_TEST_TYPE INFO [muorb] SLPI: Creating pthread test_MUORB INFO [muorb] SLPI: Successfully created px4 task PX4_test_MUORB with tid 2097656 INFO [muorb] succesfully did SUBSCRIBE_TEST_TYPE INFO [muorb] SLPI: Creating pthread test_MUORB INFO [muorb] SLPI: Successfully created px4 task PX4_test_MUORB with tid 2097655 INFO [muorb] succesfully did TOPIC_TEST_TYPE INFO [muorb] SLPI: Creating pthread test_MUORB INFO [muorb] SLPI: Successfully created px4 task PX4_test_MUORB with tid 2097654 INFO [muorb] succesfully did UNSUBSCRIBE_TEST_TYPE INFO [muorb] SLPI: Creating pthread test_MUORB INFO [muorb] SLPI: Successfully created px4 task PX4_test_MUORB with tid 2097653 INFO [muorb] muorb test passed ______ __ __ ___ | ___ \ \ \ / / / | | |_/ / \ V / / /| | | __/ / \ / /_| | | | / /^\ \ \___ | \_| \/ \/ |_/ px4 starting. INFO [px4] startup script: /bin/sh /usr/bin/voxl-px4-start 0 INFO [muorb] SLPI: Advertising remote topic log_message INFO [parameters] Starting param sync THREAD INFO [muorb] SLPI: Starting param sync THREAD ************************* GPS: MRO_M9N RC: M0065_SBUS ESC: VOXL_ESC POWER MANAGER: VOXLPM DISTANCE SENSOR: NONE OSD: DISABLE AISPEED SENSOR: SDP3X EXTRA STEPS: ************************* Running on M0054 INFO [muorb] SLPI: before starting the qshell_entry task INFO [muorb] SLPI: Creating pthread qshell INFO [muorb] SLPI: Successfully created px4 task PX4_qshell with tid 2097652 INFO [muorb] SLPI: qshell entry..... INFO [muorb] SLPI: after starting the qshell_entry task INFO [muorb] SLPI: Init app map initialized INFO [param] selected parameter default file /data/px4/param/parameters INFO [muorb] SLPI: Marking DeviceNode(parameter_client_reset_request) as advertised in process_remote_topic INFO [uORB] Marking DeviceNode(parameter_client_reset_response) as advertised in process_remote_topic INFO [muorb] SLPI: Advertising remote topic parameter_update INFO [muorb] SLPI: Marking DeviceNode(parameter_client_set_value_request) as advertised in process_remote_to INFO [uORB] Marking DeviceNode(parameter_server_set_used_request) as advertised in process_remote_topic INFO [muorb] SLPI: Marking DeviceNode(parameter_server_set_used_response) as advertised in process_remote_to INFO [uORB] Marking DeviceNode(parameter_client_set_value_response) as advertised in process_remote_topic INFO [parameters] BSON document size 3367 bytes, decoded 3367 bytes (INT32:55, FLOAT:104) INFO [logger] logger started (mode=all) Starting IMU driver with no rotation INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: icm42688p start -s INFO [muorb] SLPI: arg0 = 'icm42688p' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-s' INFO [qshell] Send cmd: 'icm42688p start -s' INFO [muorb] SLPI: *** SPI Device ID 0x26000a 2490378 INFO [uORB] Advertising remote topic sensor_accel INFO [uORB] Advertising remote topic sensor_gyro INFO [muorb] SLPI: ICM42688P::probe successful! INFO [muorb] SLPI: on SPI bus 1 INFO [muorb] SLPI: icm42688p #0 on SPI bus 1 INFO [muorb] SLPI: INFO [muorb] SLPI: >>> ICM42688P this: 3176d110 INFO [muorb] SLPI: Ok executing command: icm42688p start -s INFO [uORB] Advertising remote topic qshell_retval INFO [muorb] SLPI: >>> ICM42688P this: 3176d110 INFO [qshell] qshell return value timestamp: 102630757, local time: 102636903 INFO [muorb] SLPI: >>> ICM42688P this: 3176d110 INFO [muorb] SLPI: Register interrupt b21d3164 e61f8ddc 3176d110 INFO [uORB] Advertising remote topic sensor_gyro_fifo INFO [uORB] Advertising remote topic sensor_accel_fifo INFO [uORB] Advertising remote topic imu_server INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'icp101xx start -I -b 5' INFO [muorb] SLPI: qshell gotten: icp101xx start -I -b 5 INFO [muorb] SLPI: arg0 = 'icp101xx' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-I' INFO [muorb] SLPI: arg3 = '-b' INFO [muorb] SLPI: arg4 = '5' INFO [muorb] SLPI: *** I2C Device ID 0xb76329 12018473 INFO [muorb] SLPI: icp101xx #0 on I2C bus 5 INFO [muorb] SLPI: address 0x63 INFO [muorb] SLPI: INFO [muorb] SLPI: Ok executing command: icp101xx start -I -b 5 INFO [qshell] qshell return value timestamp: 102673513, local time: 102677483 Looking for qmc5883l magnetometer INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'qmc5883l start -R 0 -X -b 1' INFO [muorb] SLPI: qshell gotten: qmc5883l start -R 0 -X -b 1 INFO [muorb] SLPI: arg0 = 'qmc5883l' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-R' INFO [muorb] SLPI: arg3 = '0' INFO [muorb] SLPI: arg4 = '-X' INFO [muorb] SLPI: arg5 = '-b' INFO [muorb] SLPI: arg6 = '1' INFO [muorb] SLPI: *** I2C Device ID 0x80d09 527625 INFO [muorb] SLPI: qmc5883l #0 on I2C bus 1 INFO [muorb] SLPI: (external) INFO [muorb] SLPI: address 0xD INFO [muorb] SLPI: INFO [muorb] SLPI: Ok executing command: qmc5883l start -R 0 -X -b 1 INFO [qshell] qshell return value timestamp: 102734319, local time: 102738429 INFO [muorb] SLPI: >>> ICM42688P this: 3176d110 INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'gps start' INFO [muorb] SLPI: qshell gotten: gps start INFO [muorb] SLPI: arg0 = 'gps' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: Creating pthread gps INFO [muorb] SLPI: Successfully created px4 task PX4_gps with tid 2097648 INFO [muorb] SLPI: Ok executing command: gps start INFO [qshell] qshell return value timestamp: 102790110, local time: 102791873 Looking for ncp5623c RGB LED INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'rgbled_ncp5623c start -X -b 1 -f 400 -a 56' INFO [muorb] SLPI: qshell gotten: rgbled_ncp5623c start -X -b 1 -f 400 -a 56 INFO [muorb] SLPI: arg0 = 'rgbled_ncp5623c' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-X' INFO [muorb] SLPI: arg3 = '-b' INFO [muorb] SLPI: arg4 = '1' INFO [muorb] SLPI: arg5 = '-f' INFO [muorb] SLPI: arg6 = '400' INFO [muorb] SLPI: arg7 = '-a' INFO [muorb] SLPI: arg8 = '56' INFO [uORB] Advertising remote topic sensor_mag INFO [muorb] SLPI: *** I2C Device ID 0x7b3809 8075273 INFO [muorb] SLPI: rgbled_ncp5623c #0 on I2C bus 1 INFO [muorb] SLPI: (external) INFO [muorb] SLPI: address 0x38 INFO [muorb] SLPI: INFO [muorb] SLPI: Ok executing command: rgbled_ncp5623c start -X -b 1 -f 400 -a 56 INFO [qshell] qshell return value timestamp: 102875392, local time: 102878649 INFO [uORB] Advertising remote topic sensor_baro Starting VOXL ESC driver INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'voxl_esc start' INFO [muorb] SLPI: qshell gotten: voxl_esc start INFO [muorb] SLPI: arg0 = 'voxl_esc' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Advertising remote topic actuator_outputs INFO [muorb] SLPI: Ok executing command: voxl_esc start INFO [muorb] SLPI: Opened UART ESC device INFO [qshell] qshell return value timestamp: 103031754, local time: 103033825 Attempting to start M0065 SBUS RC driver for original M0065 FW INFO [uORB] Advertising remote topic esc_status INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: dsp_sbus start INFO [muorb] SLPI: arg0 = 'dsp_sbus' INFO [muorb] SLPI: arg1 = 'start' INFO [qshell] Send cmd: 'dsp_sbus start' INFO [muorb] SLPI: Creating pthread dsp_sbus_main INFO [muorb] SLPI: Successfully created px4 task PX4_dsp_sbus_main with tid 2097645 INFO [muorb] SLPI: Ok executing command: dsp_sbus start INFO [qshell] qshell return value timestamp: 103092449, local time: 103095538 INFO [uORB] Advertising remote topic input_rc INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'voxlpm start -X -b 2' INFO [muorb] SLPI: qshell gotten: voxlpm start -X -b 2 INFO [muorb] SLPI: arg0 = 'voxlpm' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-X' INFO [muorb] SLPI: arg3 = '-b' INFO [muorb] SLPI: arg4 = '2' INFO [muorb] SLPI: *** I2C Device ID 0xd24411 13779985 INFO [muorb] SLPI: Set i2c address 0x6a, fd 3 INFO [muorb] SLPI: Set i2c address 0x44, fd 3 INFO [uORB] Advertising remote topic battery_status INFO [muorb] SLPI: voxlpm #0 on I2C bus 2 INFO [muorb] SLPI: (external) INFO [muorb] SLPI: address 0x44 INFO [muorb] SLPI: INFO [muorb] SLPI: Ok executing command: voxlpm start -X -b 2 INFO [uORB] Advertising remote topic power_monitor INFO [qshell] qshell return value timestamp: 103159043, local time: 103161281 starting sdp3x INFO [muorb] SLPI: u-blox firmware version: SPG 4.04 INFO [muorb] SLPI: u-blox protocol version: 32.01 INFO [muorb] SLPI: u-blox module: NEO-M9N INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'sdp3x start -X -a 0x21 -b 4' INFO [muorb] SLPI: qshell gotten: sdp3x start -X -a 0x21 -b 4 INFO [muorb] SLPI: arg0 = 'sdp3x' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = '-X' INFO [muorb] SLPI: arg3 = '-a' INFO [muorb] SLPI: arg4 = '0x21' INFO [muorb] SLPI: arg5 = '-b' INFO [muorb] SLPI: arg6 = '4' INFO [muorb] SLPI: *** I2C Device ID 0x4a2121 4858145 INFO [muorb] SLPI: sdp3x #0 on I2C bus 4 INFO [muorb] SLPI: (external) INFO [muorb] SLPI: address 0x21 INFO [muorb] SLPI: INFO [muorb] SLPI: Ok executing command: sdp3x start -X -a 0x21 -b 4 INFO [qshell] qshell return value timestamp: 103212466, local time: 103214889 INFO [uORB] Advertising remote topic differential_pressure INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'sensors start' INFO [muorb] SLPI: qshell gotten: sensors start INFO [muorb] SLPI: arg0 = 'sensors' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Advertising remote topic sensor_selection INFO [uORB] Advertising remote topic sensors_status_imu INFO [uORB] Advertising remote topic vehicle_acceleration INFO [uORB] Advertising remote topic vehicle_angular_velocity INFO [uORB] Advertising remote topic sensor_combined INFO [uORB] Advertising remote topic vehicle_air_data ERROR [muorb] SLPI: Timeout waiting for parameter_server_set_used_response for SENS_GPS_MASK INFO [uORB] Advertising remote topic vehicle_gps_position INFO [uORB] Advertising remote topic sensors_status_baro INFO [uORB] Advertising remote topic vehicle_magnetometer INFO [uORB] Advertising remote topic sensor_preflight_mag INFO [uORB] Advertising remote topic vehicle_imu INFO [uORB] Advertising remote topic vehicle_imu_status INFO [muorb] SLPI: Ok executing command: sensors start INFO [uORB] Advertising remote topic sensors_status_mag INFO [qshell] qshell return value timestamp: 103496668, local time: 103505109 INFO [uORB] Advertising remote topic airspeed INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: ekf2 start INFO [qshell] Send cmd: 'ekf2 start' INFO [muorb] SLPI: arg0 = 'ekf2' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Advertising remote topic sensor_gps INFO [uORB] Advertising remote topic vehicle_attitude INFO [uORB] Advertising remote topic vehicle_local_position INFO [uORB] Advertising remote topic estimator_event_flags INFO [uORB] Advertising remote topic estimator_innovation_test_ratios INFO [uORB] Advertising remote topic estimator_innovation_variances INFO [uORB] Advertising remote topic estimator_innovations INFO [uORB] Advertising remote topic estimator_sensor_bias INFO [uORB] Advertising remote topic estimator_states INFO [uORB] Advertising remote topic estimator_status_flags INFO [uORB] Advertising remote topic estimator_status INFO [muorb] SLPI: Ok executing command: ekf2 start INFO [qshell] qshell return value timestamp: 103856924, local time: 103862916 INFO [uORB] Advertising remote topic ekf2_timestamps INFO [uORB] Advertising remote topic vehicle_odometry INFO [uORB] Advertising remote topic estimator_aid_src_baro_hgt INFO [uORB] Advertising remote topic estimator_aid_src_fake_pos INFO [uORB] Advertising remote topic estimator_aid_src_fake_hgt INFO [uORB] Advertising remote topic estimator_aid_src_mag_heading INFO [uORB] Advertising remote topic estimator_aid_src_mag INFO [uORB] Advertising remote topic estimator_aid_src_gravity INFO [uORB] Advertising remote topic estimator_aid_src_airspeed INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'mc_pos_control start vtol' INFO [muorb] SLPI: qshell gotten: mc_pos_control start vtol INFO [muorb] SLPI: arg0 = 'mc_pos_control' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = 'vtol' INFO [uORB] Advertising remote topic takeoff_status INFO [muorb] SLPI: Ok executing command: mc_pos_control start vtol INFO [qshell] qshell return value timestamp: 104032898, local time: 104035892 INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'mc_att_control start vtol' INFO [muorb] SLPI: qshell gotten: mc_att_control start vtol INFO [muorb] SLPI: arg0 = 'mc_att_control' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = 'vtol' INFO [muorb] SLPI: Ok executing command: mc_att_control start vtol INFO [qshell] qshell return value timestamp: 104111040, local time: 104112332 INFO [qshell] Send cmd: 'mc_rate_control start vtol' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: mc_rate_control start vtol INFO [muorb] SLPI: arg0 = 'mc_rate_control' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = 'vtol' INFO [uORB] Advertising remote topic rate_ctrl_status INFO [muorb] SLPI: Ok executing command: mc_rate_control start vtol INFO [qshell] qshell return value timestamp: 104259148, local time: 104261585 INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'mc_hover_thrust_estimator start' INFO [muorb] SLPI: qshell gotten: mc_hover_thrust_estimator start INFO [muorb] SLPI: arg0 = 'mc_hover_thrust_estimator' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: Ok executing command: mc_hover_thrust_estimator start INFO [qshell] qshell return value timestamp: 104344496, local time: 104346060 INFO [qshell] Send cmd: 'mc_autotune_attitude_control start' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: mc_autotune_attitude_control start INFO [muorb] SLPI: arg0 = 'mc_autotune_attitude_control' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Advertising remote topic autotune_attitude_control_status INFO [muorb] SLPI: Ok executing command: mc_autotune_attitude_control start INFO [qshell] qshell return value timestamp: 104414886, local time: 104418065 INFO [qshell] Send cmd: 'land_detector start vtol' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: land_detector start vtol INFO [muorb] SLPI: arg0 = 'land_detector' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = 'vtol' INFO [muorb] SLPI: Ok executing command: land_detector start vtol INFO [qshell] qshell return value timestamp: 104496547, local time: 104497727 INFO [uORB] Advertising remote topic vehicle_land_detected INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'manual_control start' INFO [muorb] SLPI: qshell gotten: manual_control start INFO [muorb] SLPI: arg0 = 'manual_control' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: Ok executing command: manual_control start INFO [uORB] Advertising remote topic manual_control_setpoint INFO [qshell] qshell return value timestamp: 104569964, local time: 104571796 INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'control_allocator start' INFO [muorb] SLPI: qshell gotten: control_allocator start INFO [muorb] SLPI: arg0 = 'control_allocator' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Advertising remote topic control_allocator_status INFO [uORB] Marking DeviceNode(control_allocator_status) as advertised in process_remote_topic INFO [uORB] Advertising remote topic actuator_motors INFO [uORB] Advertising remote topic actuator_servos INFO [uORB] Advertising remote topic actuator_servos_trim INFO [muorb] SLPI: Ok executing command: control_allocator start INFO [qshell] qshell return value timestamp: 104924910, local time: 104928529 INFO [qshell] Send cmd: 'load_mon start' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: load_mon start INFO [muorb] SLPI: arg0 = 'load_mon' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: Ok executing command: load_mon start INFO [uORB] Advertising remote topic cpuload INFO [qshell] qshell return value timestamp: 104994040, local time: 104996985 INFO [qshell] Send cmd: 'vtol_att_control start' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: vtol_att_control start INFO [muorb] SLPI: arg0 = 'vtol_att_control' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Advertising remote topic flaps_setpoint INFO [uORB] Advertising remote topic spoilers_setpoint INFO [uORB] Advertising remote topic vtol_vehicle_status INFO [uORB] Advertising remote topic vehicle_thrust_setpoint INFO [uORB] Advertising remote topic vehicle_torque_setpoint INFO [uORB] Marking DeviceNode(vehicle_thrust_setpoint) as advertised in process_remote_topic INFO [uORB] Marking DeviceNode(vehicle_torque_setpoint) as advertised in process_remote_topic INFO [muorb] SLPI: Ok executing command: vtol_att_control start INFO [qshell] qshell return value timestamp: 105128451, local time: 105136476 INFO [qshell] Send cmd: 'fw_rate_control start vtol' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: fw_rate_control start vtol INFO [muorb] SLPI: arg0 = 'fw_rate_control' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = 'vtol' INFO [uORB] Marking DeviceNode(rate_ctrl_status) as advertised in process_remote_topic INFO [muorb] SLPI: Ok executing command: fw_rate_control start vtol INFO [qshell] qshell return value timestamp: 105278018, local time: 105281688 INFO [qshell] Send cmd: 'fw_att_control start vtol' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: fw_att_control start vtol INFO [muorb] SLPI: arg0 = 'fw_att_control' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = 'vtol' INFO [muorb] SLPI: Ok executing command: fw_att_control start vtol INFO [uORB] Advertising remote topic landing_gear_wheel INFO [qshell] qshell return value timestamp: 105379888, local time: 105387644 INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'fw_pos_control start vtol' INFO [muorb] SLPI: qshell gotten: fw_pos_control start vtol INFO [muorb] SLPI: arg0 = 'fw_pos_control' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = 'vtol' INFO [uORB] Advertising remote topic actuator_controls_status_1 INFO [uORB] Advertising remote topic position_controller_status INFO [uORB] Advertising remote topic position_controller_landing_status INFO [uORB] Advertising remote topic tecs_status INFO [uORB] Advertising remote topic launch_detection_status INFO [uORB] Advertising remote topic landing_gear INFO [uORB] Marking DeviceNode(flaps_setpoint) as advertised in process_remote_topic INFO [uORB] Marking DeviceNode(spoilers_setpoint) as advertised in process_remote_topic INFO [muorb] SLPI: Ok executing command: fw_pos_control start vtol INFO [qshell] qshell return value timestamp: 105719884, local time: 105723062 INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'fw_autotune_attitude_control start vtol' INFO [muorb] SLPI: qshell gotten: fw_autotune_attitude_control start vtol INFO [muorb] SLPI: arg0 = 'fw_autotune_attitude_control' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: arg2 = 'vtol' INFO [uORB] Marking DeviceNode(autotune_attitude_control_status) as advertised in process_remote_topic INFO [muorb] SLPI: Ok executing command: fw_autotune_attitude_control start vtol INFO [qshell] qshell return value timestamp: 105788032, local time: 105790087 INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [qshell] Send cmd: 'airspeed_selector start' INFO [muorb] SLPI: qshell gotten: airspeed_selector start INFO [muorb] SLPI: arg0 = 'airspeed_selector' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Advertising remote topic airspeed_validated INFO [uORB] Advertising remote topic airspeed_wind INFO [uORB] Marking DeviceNode(airspeed_wind) as advertised in process_remote_topic INFO [muorb] SLPI: Ok executing command: airspeed_selector start INFO [qshell] qshell return value timestamp: 105909388, local time: 105911596 INFO [qshell] Send cmd: 'rc_update start' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: rc_update start INFO [muorb] SLPI: arg0 = 'rc_update' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: Ok executing command: rc_update start INFO [qshell] qshell return value timestamp: 106355536, local time: 106357417 INFO [uORB] Advertising remote topic rc_channels INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: commander start INFO [qshell] Send cmd: 'commander start' INFO [muorb] SLPI: arg0 = 'commander' INFO [muorb] SLPI: arg1 = 'start' INFO [muorb] SLPI: Creating pthread commander INFO [muorb] SLPI: Successfully created px4 task PX4_commander with tid 2097639 INFO [uORB] Advertising remote topic manual_control_input INFO [uORB] Advertising remote topic manual_control_switches INFO [uORB] Advertising remote topic action_request INFO [muorb] SLPI: Ok executing command: commander start INFO [uORB] Advertising remote topic led_control INFO [muorb] SLPI: LED: open /dev/led0 failed (22) INFO [uORB] Advertising remote topic tune_control INFO [qshell] qshell return value timestamp: 106650894, local time: 106653867 INFO [uORB] Advertising remote topic event INFO [uORB] Advertising remote topic mavlink_log INFO [muorb] SLPI: Preflight Fail: No connection to the ground control station INFO [uORB] Advertising remote topic health_report INFO [uORB] Advertising remote topic failsafe_flags INFO [uORB] Advertising remote topic actuator_armed INFO [uORB] Advertising remote topic vehicle_control_mode INFO [uORB] Advertising remote topic vehicle_thrust_setpoint_virtual_mc INFO [uORB] Advertising remote topic vehicle_thrust_setpoint_virtual_fw INFO [uORB] Advertising remote topic vehicle_torque_setpoint_virtual_fw INFO [uORB] Advertising remote topic vehicle_torque_setpoint_virtual_mc INFO [uORB] Marking DeviceNode(flaps_setpoint) as advertised in process_remote_topic INFO [uORB] Marking DeviceNode(spoilers_setpoint) as advertised in process_remote_topic INFO [uORB] Advertising remote topic vehicle_status INFO [uORB] Advertising remote topic failure_detector_status INFO [uORB] Advertising remote topic mc_virtual_attitude_setpoint INFO [uORB] Advertising remote topic vehicle_rates_setpoint INFO [uORB] Advertising remote topic vehicle_attitude_setpoint INFO [uORB] Advertising remote topic vehicle_local_position_setpoint INFO [uORB] Marking DeviceNode(mc_virtual_attitude_setpoint) as advertised in process_remote_topic INFO [qshell] Send cmd: 'flight_mode_manager start' INFO [muorb] SLPI: Marking DeviceNode(qshell_req) as advertised in process_remote_topic INFO [muorb] SLPI: qshell gotten: flight_mode_manager start INFO [muorb] SLPI: arg0 = 'flight_mode_manager' INFO [muorb] SLPI: arg1 = 'start' INFO [uORB] Advertising remote topic vehicle_command INFO [muorb] SLPI: Ok executing command: flight_mode_manager start INFO [qshell] qshell return value timestamp: 106931567, local time: 106935102 INFO [uORB] Advertising remote topic trajectory_setpoint INFO [uORB] Advertising remote topic vehicle_constraints INFO [dataman] data manager file '/data/px4/dataman' size is 7866640 bytes /usr/bin/voxl-px4-start: line 234: px4-modal_io_bridge: command not found /usr/bin/voxl-px4-start: line 237: microdds_client: command not found no dds namespace set using DOM ID: 1 INFO [muorb] SLPI: Advertising remote topic transponder_report INFO [muorb] SLPI: Advertising remote topic rtl_time_estimate INFO [muorb] SLPI: Advertising remote topic position_setpoint_triplet INFO [mavlink] mode: Onboard, data rate: 100000 B/s on udp port 14556 remote port 14557 INFO [uORB] Advertising remote topic actuator_controls_status_0 INFO [mavlink] partner IP: 127.0.0.1 INFO [muorb] SLPI: Advertising remote topic telemetry_status INFO [mavlink] mode: Normal, data rate: 100000 B/s on udp port 14558 remote port 14559 INFO [muorb] SLPI: Advertising remote topic timesync_status INFO [mavlink] MAVLink only on localhost (set param MAV_{i}_BROADCAST = 1 to enable network) INFO [px4] Startup script returned successfully INFO [muorb] SLPI: Advertising remote topic obstacle_distance INFO [muorb] SLPI: Advertising remote topic offboard_control_mode INFO [mavlink] partner IP: 127.0.0.1 INFO [muorb] SLPI: Marking DeviceNode(telemetry_status) as advertised in process_remote_topic INFO [uORB] Marking DeviceNode(mavlink_log) as advertised in process_remote_topic ERROR [muorb] SLPI: Accel #0 fail: TIMEOUT! INFO [uORB] Marking DeviceNode(mavlink_log) as advertised in process_remote_topic ERROR [muorb] SLPI: MAG #0 failed: TIMEOUT! INFO [uORB] Marking DeviceNode(landing_gear) as advertised in process_remote_topic INFO [muorb] SLPI: Marking DeviceNode(vehicle_command_ack) as advertised in process_remote_topic ERROR [muorb] SLPI: Accel #0 fail: TIMEOUT! ERROR [muorb] SLPI: MAG #0 failed: TIMEOUT! ERROR [muorb] SLPI: Accel #0 fail: TIMEOUT! ERROR [muorb] SLPI: MAG #0 failed: TIMEOUT! INFO [muorb] SLPI: Advertising remote topic ping ERROR [muorb] SLPI: Accel #0 fail: TIMEOUT! ERROR [muorb] SLPI: MAG #0 failed: TIMEOUT! ERROR [muorb] SLPI: Accel #0 fail: TIMEOUT! ERROR [muorb] SLPI: MAG #0 failed: TIMEOUT! ^C PX4 Exiting... Exiting NOW.
-
@ejohnson01 You said you are running a modified version of voxl-px4. Does this problem still happen with the unmodified version of voxl-px4 that came with the SDK release?
-
@Eric-Katzfey that was a good question. today I setup my bench to run the px4 version in sdkv1.1.1 and at first I was unable to recreate the problem.
After further testing I noticed the behavior happening when we would upload our qgc param file from the previous version. My co-worker and I spent some time and isolated the parameters that were the problem children. For what ever reason the parameters below when uploaded via a QGC file is causing the problem
1 1 COM_FLTMODE1 -1 6 1 1 COM_FLTMODE2 1 6 1 1 COM_FLTMODE3 -1 6 1 1 COM_FLTMODE4 2 6 1 1 COM_FLTMODE5 -1 6 1 1 COM_FLTMODE6 2 6
I then set the bench back to the stable sdkv1.1.1 version and created a parameter file with only the parameters above and I was able to see the same problem.
Steps to recreate
make sure you run all commands over ssh and do not create an adb session.- load stable version of px4. (
voxl-px4_1.14.0-2.0.59_arm64.deb
) - remove old parameters by running
rm -rf /data/px4/param/*
- run
voxl-px4
once to set the parameters back to default - power cycle
- make sure sbus receiver and controller are powered on and connected.
- run voxl-px4
- open up QGC and the problem should be visible
I honestly have no Idea why uploading these params in particular break px4. we were able to upload all the other params in the file just fine.
Work around
We did discover a work around. If we upload all our old params but the ones listed above and then configure them via QGC the problem does not present itself. The super weird thing about this is that I can then download the parameters and diff them vs the old ones and the above parameters will show as identical.Also I can not explain why connecting to the voxl with mini-dm once per boot resolves this issue. I did do further testing on though. It seems like if you open up an adb shell it also resolves this issue similar to using mini-dm, which i guess mini-dm works via adb so its probably the debug link that fixes this problem.
We are able to continue testing using our work around, but I would be very interested to see what the root of this issue is. Let me know if you all are able to recreate and or if you need more information.
- load stable version of px4. (
-
@ejohnson01 Wow, that's an odd one! Thanks for the update. Not sure when I can get to it but now I am also curious as to why that would cause an issue with px4.
-
@ejohnson01 I realized I messed up the steps to recreate
Steps to recreate
make sure you run all commands over ssh and do not create an adb session.load stable version of px4. (voxl-px4_1.14.0-2.0.59_arm64.deb)
remove old parameters by running rm -rf /data/px4/param/*
run voxl-px4 once to set the parameters back to default
power cycle
make sure sbus receiver and controller are powered on and connected.
run voxl-px4.
open up QGC
upload a QGC file with the above listed parameters
restart voxl-px4 and the problem should be present -
I just wanted to add some more context to hopefully work towards a solution as @ejohnson01 and I been trying to overcome this issue for the better part of a week now.
I think that the parameter issue may have been a red herring because we have now been able to reproduce the issue in other ways.
We were able to find that using px4-uorb top when things are working, yields around ~350kB/s of traffic.
Whenever we plug in a joystick from QGC or power on a transmitter via SBUS things start going haywire (coincidentally right when the uorb message "manual_control_setpoint" starts getting published). The data rate on uorb top tanks by over half, things start becoming unresponsive, and they don't recover until a PX4 restart.
Are there ways to profile the PX4 tasks that are running on the SLPI? The Linux side shows ~70% of a single core and mini-dm is showing ~70% utilization of the SLPI when things are going bad (contrasted by, I believe, ~50% on Linux and ~45% on SLPI when things are working).
I just cant help but think there is some task running either on Linux or on the SLPI that is occupying too many resources or filling a buffer that cant be cleared or something along those lines but we can't find the right tool to profile what's going on.
-
@bhanner-bell Hmmm, yes, 70% is a bit high. I generally like to see things 60% or lower. There isn't really a good tool to do that kind of profiling. There is a system level profiler included with the Hexagon SDK from Qualcomm that allows you to see how much CPU time each thread is taking. I have used that in the past to help me determine thread priorities. But it would be nice to have a statistical profiler that can tell you which lines of code are running most often. I think it would be possible to create a statistical profiler but it would be a fair amount of work. Without that you will have to use trial and error. Selectively start different drivers and modules to try and determine which ones are consuming the most resources. In normal use the CPU load goes way down after arming the drone and flying. There seems to be a bunch of preflight checking that the commander module does that consumes a lot of time and once the flight starts it no longer runs those checks. A weakness of the DSP is that it doesn't have HW support for floating point numbers so it has to use a software library for floating point operations. Also, the current version of PX4 (our voxl-dev branch) logs CPU load for the DSP. But without a new base DSP system image it will always report 0.1%. If you want to start using that I can provide the debian package that updates the DSP system image with this capability. That way you can see what happens after you arm and fly.
-
Hey @Eric-Katzfey we'd definitely appreciate a new dsp system image to test with.
I'm just going to dump some more info here in hopes that it may spark a thought:
We've been poking around all day and noticed there have been some recent software updates so I updated another one of our units to the SDK 1.1.1 image with our modified PX4-firmware version (it has added in vtol apps and some drivers for dsp_sbus and other sensors.. we are going to try again with 100% vanilla sdk 1.1.1 to make sure its present there too but we tried once before and experienced the same issues)
What I was able to observe when the system starts to become unresponsive is that the RAM consumption starts ballooning. voxl-px4 uses about 92364kb on htop when running normally but once the issue crops up the memory starts to increase and goes on forever until stopped... up into the hundreds of MB range.
Based on observed system behavior we believe the ballooning ram is due to a queue of uorb messages piling up. We can move the vehicle and observe many seconds later the output on the QGC AHRS indicator. The longer PX4 runs the longer the delay in the input->response loop.
We haven't been able to reliably trigger the fault but it seems significantly harder to trigger the fault if the usb c cable is plugged into the voxl and a host linux machine. Interestingly enough we arent actually running and adb commands or anything like that.. its just connected. With the cable completely disconnected the fault is pretty easy to reproduce: just power up the board and start voxl-px4.. if the radio (or a joystick to QGC) px4 seems to have problems.
-
@bhanner-bell I sent the DSP system image debian package to the both of you in email.
-
@Eric-Katzfey Seems like your email server rejected the email with the attachment.
-
@Eric-Katzfey Try
wget https://storage.googleapis.com/modalai_public/forum/modalai-slpi_1.1-12_arm64.deb
-
We were able to update the dsp image and also build voxl-px4 from the sdk-1.1.2 tag.
We are able to reproduce this issue with just a joystick connected to QGC (meaning RC in mode set to joystick only and voxl-io board disconnected).
When the issue occurs, this is what the cpuload topic shows:
Every 0.1s: px4-listener cpuload m0054: Thu Mar 2 13:11:09 2023 TOPIC: cpuload 2 instances Instance 0: cpuload timestamp: 730351047 (59.056290 seconds ago) process_load: 0.86000 system_load: 0.86000 ram_usage: 0.00000 platform: "QURT" Instance 1: cpuload timestamp: 788949259 (0.459716 seconds ago) process_load: 0.52000 system_load: 0.12578 ram_usage: 0.11849 platform: "POSIX"
Interesting to note: the timestamp starts to fall behind for most topics published by the QURT side of things...
Memory ballooning is still here. -
@bhanner-bell Wow, 86%, that's really high.
-
I did a lot of trial and error today. I am 95% sure I just found the problem.
First I got my bench in the messed up state. Then I rolled voxl-px4 back to sdk version 1.0.0 and noticed I was not able to recreate the issue. I immediately after ran dpkg -i on the sdk1.1.2 version and it was broken.
I then proceeded to binary search test all of the commits between 1.0.0 and 1.1.2. After some time I came to the following commits.
b746ab9434b7e4e71f67c9047ea3d3d49de81c00 - working
41a57bc30a40c42990995d5b4e8fe72389f66902 - brokenI saved these debs so i could switch between them multiple times and sure enough every time I switched to
41a57...
I would see link loss issue. as soon as I rolled back tob746a...
fixed.here is a link to the changes in
41a...
https://github.com/modalai/px4-firmware/commit/41a57bc30a40c42990995d5b4e8fe72389f66902the problem is caused by the removal of this line
qshell commander mode manual
I then rolled back to the sdk 1.1.2 commit and added
qshell commander mode manual
back to the start script and sure enough I can no longer recreate the link loss issue. I just figured this out and its almost 6pm on a friday so I have not had time to look into why this might break the rc control, but I suspect something is not getting initialized correctly without that call.I love to help you all recreate this on your end and get to the true root of the issue. Let me know if there is any other information you would like me to provide.
-
@ejohnson01 Okay, thanks for tracking that down. The commander used to come up in AUTO_LOITER mode by default which runs the CPU way too high (as you have witnessed) so I had the explicit command in the startup script to put it into manual mode. The commit changes the default mode to MANUAL so that the explicit statement in the startup script is no longer necessary. That seemed to be working for me but doesn't seem to be working in your case for some reason. If you query the commander status when you remove that what mode does it report?
-
As we have done more testing of the system, we have noticed that one of the symptoms of the issue is that the control inputs become laggy. All inputs from the controller are seen on uorb with more and more time delay over time. For instance sometimes we will let the voxl run for a minute and then look at
watch -n0 px4-listener manual_control_setpoint
and we will not see the corresponding input show up for 30 seconds. The delay gets worse over time. I was under the impression that uorb had a queue size limit, which if the case, it does not seem to be enforced. If the queue lmit was enforce, I would expect to see a maximum delay of the time to process the size of the queue. I see in a lot of places of the code that there are 16 message length queues so a task running a 50 hz should at most see lag of 800ms (16 * 50ms). I think this might imply a more systemic issue as a few minutes ago we saw thecpuload
message get over a minute behind in processing.Does this sound correct to you?
Also, is there any possibility of setting up a call to dig into this issue in more depth?
-
-
When I read this a second time I noticed you told me to check the commander mode. I noticed that when I used the game pad the commander mode was put into POS_HOLD. When this happens it seems to cause the lag to build. If I command the vehicle to switch to STABILIZED or ACRO eventually the system seems to catch up and we see both uorb messages be shown realtime and we can see the HUD update mostly real time as well.
I am guessing the issue has more to do with being in a mode that requires position lock. Interestingly enough, when we have brought the aircraft to the field and booted it with clear view of the sky so that the GPS can get lock, we have seen the aircraft either not have the lag and link loss or have lag and link loss for a few seconds then go away.
You said that you thought mode AUTO_LOITER used too much CPU, do you happen to have any thoughts on which of the modules are using the CPU or where the problem might be originating?
-
@ejohnson01 We noticed that AUTO_LOITER mode was taking a lot of CPU sort of by accident and switched it to default to MANUAL. I have not gone back to determine what was taking so much time in AUTO_LOITER.
-
@ejohnson01 I plan to spend some time looking into why POSCTL is taking so much CPU time. AUTO_LOITER hasn't been a very important mode for us but POSCTL certainly is.
-
@ejohnson01 I have started to look into this and I can recreate the issue. So I'll dig in and see if I can figure out a fix.