ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    PX4 with custom driver on DSP chip crashes when mode is not manual

    Ask your questions right here!
    3
    10
    581
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • LucaVertiqL
      LucaVertiq
      last edited by

      @Moderator
      Hi, I'm adding support for serial connection to vertiq modules for a project and I'm running into a difficult to debug issue with PX4. I have the driver working on normal PX4 boards, and mostly working on the VOXL2 Mini.

      When I am disarmed everything works as expected. If I arm in manual mode, acro mode or stabilize mode (probably others that don't require pos/vel as well) everything works normally most of the time and motors respond as expected with telemetry.

      If I arm in position mode for example the DSP side of the chip seems to stop responding. The output of PX4 on the app processor becomes this:

      INFO  [muorb] SLPI: Armed by external command
      INFO  [logger] Start file log (type: full)
      INFO  [logger] [logger] /data/px4/log/2024-03-14/19_52_13.ulg
      WARN  [mavlink] Event dropped (5, 65526)
      WARN  [mavlink] Event dropped (5, 65526)
      INFO  [logger] Opened full log file: /data/px4/log/2024-03-14/19_52_13.ulg
      WARN  [mavlink] Dropped 65521 events (seq=65526)
      WARN  [mavlink] Dropped 65521 events (seq=65526)
      INFO  [muorb] SLPI: Advertising remote topic logger_status
      INFO  [uORB] Advertising remote topic sensor_gps
      INFO  [uORB] Advertising remote topic estimator_gps_status
      INFO  [uORB] Advertising remote topic estimator_aid_src_gnss_hgt
      INFO  [uORB] Advertising remote topic estimator_aid_src_gnss_pos
      INFO  [uORB] Advertising remote topic estimator_aid_src_gnss_vel
      WARN  [mavlink] Event dropped (65527, 9)
      WARN  [mavlink] Event dropped (65527, 9)
      WARN  [mavlink] Dropped 7 events (seq=9)
      WARN  [mavlink] Dropped 7 events (seq=9)
      >>> Got an exception from send_request <<<
      >>> Send succeeded after retries <<<
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Sending topic message
      --- msg_id: 1033
      --- topic name: manual_control_input
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Sending topic message
      --- msg_id: 1033
      --- topic name: manual_control_input
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Sending topic message
      --- msg_id: 1033
      --- topic name: manual_control_input
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Sending topic message
      --- msg_id: 1033
      --- topic name: manual_control_input
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Sending topic message
      --- msg_id: 1033
      --- topic name: manual_control_input
      Connection error: connection reset
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Got response cb 0
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      --- topic name: cpuload
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Got response cb 0
      Sending topic message
      --- msg_id: 1033
      --- topic name: manual_control_input
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      --- topic name: manual_control_input
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Got response cb 0
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      --- topic name: manual_control_input
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      --- topic name: offboard_control_mode
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      --- topic name: vehicle_visual_odometry
      Got response cb 0
      Got flight controller event
      Received standard error event SNS_STD_ERROR_NOT_SUPPORTED
      Sending topic message
      --- msg_id: 1033
      

      It seems to be some sort of issue with muorb? Maybe the overall volume of messages is too much? I'm not sure, but would love some help as it seems like it's at the intersection of the two chips possibly.

      Thanks,

      Luca

      Alex KushleyevA Eric KatzfeyE 2 Replies Last reply Reply Quote 0
      • Alex KushleyevA
        Alex Kushleyev ModalAI Team @LucaVertiq
        last edited by

        @LucaVertiq , does the issue persist if you disable your custom module for vertiq ESCs?

        1 Reply Last reply Reply Quote 0
        • Eric KatzfeyE
          Eric Katzfey ModalAI Team @LucaVertiq
          last edited by

          @LucaVertiq It's probably just an indication that the DSP crashed. Unfortunately the debugging support on DSP isn't great right now so you'll have to just start ruling things out. As Alex suggested verify that everything runs fine without your new driver. If so then you can start your driver with most functionality commented out and then slowly start adding it back in to see what is causing the crash.

          LucaVertiqL 1 Reply Last reply Reply Quote 0
          • LucaVertiqL
            LucaVertiq @Eric Katzfey
            last edited by

            @Eric-Katzfey @Alex-Kushleyev We determined it was only when we had the telemetry section of our driver working. For now I've disabled it, but it seems like either we're bogging it down, or that the communication through lib sensors is getting bogged down. When I restart the voxl-px4 on the apps side it works again so it seems like the dsp side hasn't actually crashed. I've started looking at work_queue and it seems that we slow down the rate a bit, so we might need to optimize. I may check out how you guys are using reporting esc telemetry. Maybe we should only report it once all escs have been reported on.

            Alex KushleyevA 1 Reply Last reply Reply Quote 0
            • Alex KushleyevA
              Alex Kushleyev ModalAI Team @LucaVertiq
              last edited by

              @LucaVertiq you can check if DSP crashed during execution:

              dmesg -w

              and look for something like this :

              [21911.381543] Fatal error on slpi!
              [21911.381625] slpi subsystem failure reason: err_qdi.c:1063:PC=b21d1a28,SP=e62c1fa8,FP=e62c1ff0,LR=b21d1a28,BADVA=89,CAUSE=2001,TASK=PX4_muorb_agg.
              [21911.381652] subsys-restart: subsystem_restart_dev(): Restart sequence requested for slpi, restart_level = RELATED.
              [21911.382686] adsprpc: fastrpc_restart_notifier_cb: slpi subsystem is restarting
              [21911.382692] subsys-restart: subsystem_shutdown(): [kworker/u19:2:17850]: Shutting down slpi
              

              Note that currently, the DSP (SLPI) will crash when you stop PX4 (it is "normal" behavior right now). So if you are debugging your issue, make sure to pay attention when the crash message appears (during execution of voxl-px4 or after you stopped it)

              LucaVertiqL 1 Reply Last reply Reply Quote 0
              • LucaVertiqL
                LucaVertiq @Alex Kushleyev
                last edited by

                @Alex-Kushleyev When you say 'the crash message' do you mean the ">>> Got an exception from send_request <<<"?

                That occurs while voxl-px4 is running. It won't crash and will continue to output the semi repeating messages seen after that until I kill voxl-px4. I'll test the dmesg -w on Monday when I get back into work.

                Alex KushleyevA 1 Reply Last reply Reply Quote 0
                • Alex KushleyevA
                  Alex Kushleyev ModalAI Team @LucaVertiq
                  last edited by Alex Kushleyev

                  @LucaVertiq , no i mean the crash message from dmesg telling you that SLPI has crashed. See if that message happens at the same time as your Got an exception from send_request in the voxl-px4 output.

                  Please note that the SLPI can crash and you could still have voxl-px4 process running on the CPU.

                  LucaVertiqL 1 Reply Last reply Reply Quote 0
                  • LucaVertiqL
                    LucaVertiq @Alex Kushleyev
                    last edited by

                    @Alex-Kushleyev So it looks like this message just repeats in dmesg -w

                    [   85.669815] Fatal error on slpi!
                    [   85.669887] slpi subsystem failure reason: err_qdi.c:1063:PC=e616ed3c,SP=31782ca0,FP=31782cc8,LR=e616ed40,BADVA=b220fb58,CAUSE=1e01,TASK=Anonymous.
                    [   85.669921] subsys-restart: subsystem_restart_dev(): Restart sequence requested for slpi, restart_level = RELATED.
                    [   85.671120] adsprpc: fastrpc_restart_notifier_cb: slpi subsystem is restarting
                    [   85.671127] subsys-restart: subsystem_shutdown(): [kworker/u19:0:1857]: Shutting down slpi
                    [   85.689766] adsprpc: fastrpc_rpmsg_remove: closed rpmsg channel of slpi
                    [   85.690236] coresight-remote-etm soc:ssc_etm0: Connection disconnected between QMI handle and 8 service
                    [   85.690290] sysmon-qmi: ssctl_del_server: Connection lost between QMI handle and slpi's SSCTL service
                    [   85.690690] adsprpc: fastrpc_restart_notifier_cb: received RAMDUMP notification for slpi
                    [   85.692605] subsys-restart: subsystem_powerup(): [kworker/u19:0:1857]: Powering up slpi
                    [   85.693232] subsys-pil-tz 5c00000.qcom,ssc: slpi: loading from 0x0000000088c00000 to 0x000000008a600000
                    [   85.768512] subsys-pil-tz 5c00000.qcom,ssc: slpi: Brought out of reset
                    [   85.824221] subsys-pil-tz 5c00000.qcom,ssc: Subsystem error monitoring/handling services are up
                    [   85.824375] subsys-pil-tz 5c00000.qcom,ssc: slpi: Power/Clock ready interrupt received
                    [   85.824481] qcom_rpmh DRV:apps_rsc TCS Busy, retrying RPMH message send: addr=0x30040
                    [   85.825742] adsprpc: fastrpc_restart_notifier_cb: slpi subsystem is up
                    [   85.825747] subsys-restart: subsystem_restart_wq_func(): [kworker/u19:0:1857]: Restart sequence for slpi completed.
                    [   85.827958] -1620030744:Entered
                    [   85.828116] -1620030744:SMD QRTR driver probed
                    [   85.831088] sysmon-qmi: ssctl_new_server: Connection established between QMI handle and slpi's SSCTL service
                    [   85.831100] coresight-remote-etm soc:ssc_etm0: Connection established between QMI handle and 8 service
                    [   85.834589] adsprpc: fastrpc_rpmsg_probe: opened rpmsg channel for slpi
                    [   85.882137] diag: In diag_send_peripheral_buffering_mode, buffering flag not set for 3
                    [   87.173372] Fatal error on slpi!
                    [   87.173448] slpi subsystem failure reason: err_qdi.c:1063:EF:sensor_process:0x1:SNS_SEE_I_1:0x67:sns_stream_service.c:436:req_found.
                    [   87.173497] subsys-restart: subsystem_restart_dev(): Restart sequence requested for slpi, restart_level = RELATED.
                    [   87.175158] adsprpc: fastrpc_restart_notifier_cb: slpi subsystem is restarting
                    [   87.175170] subsys-restart: subsystem_shutdown(): [kworker/u19:0:1857]: Shutting down slpi
                    [   87.192557] adsprpc: fastrpc_rpmsg_remove: closed rpmsg channel of slpi
                    [   87.192709] coresight-remote-etm soc:ssc_etm0: Connection disconnected between QMI handle and 8 service
                    [   87.192730] sysmon-qmi: ssctl_del_server: Connection lost between QMI handle and slpi's SSCTL service
                    [   87.194300] adsprpc: fastrpc_restart_notifier_cb: received RAMDUMP notification for slpi
                    [   87.197243] subsys-restart: subsystem_powerup(): [kworker/u19:0:1857]: Powering up slpi
                    [   87.197711] subsys-pil-tz 5c00000.qcom,ssc: slpi: loading from 0x0000000088c00000 to 0x000000008a600000
                    [   87.270116] subsys-pil-tz 5c00000.qcom,ssc: slpi: Brought out of reset
                    [   87.325674] subsys-pil-tz 5c00000.qcom,ssc: Subsystem error monitoring/handling services are up
                    [   87.325726] subsys-pil-tz 5c00000.qcom,ssc: slpi: Power/Clock ready interrupt received
                    [   87.325841] qcom_rpmh DRV:apps_rsc TCS Busy, retrying RPMH message send: addr=0x30040
                    [   87.331325] adsprpc: fastrpc_restart_notifier_cb: slpi subsystem is up
                    [   87.331330] subsys-restart: subsystem_restart_wq_func(): [kworker/u19:0:1857]: Restart sequence for slpi completed.
                    [   87.331891] -1620030744:Entered
                    [   87.340471] -1620030744:SMD QRTR driver probed
                    [   87.343180] adsprpc: fastrpc_rpmsg_probe: opened rpmsg channel for slpi
                    [   87.344830] sysmon-qmi: ssctl_new_server: Connection established between QMI handle and slpi's SSCTL service
                    [   87.344850] coresight-remote-etm soc:ssc_etm0: Connection established between QMI handle and 8 service
                    [   87.350149] diag: In diag_send_peripheral_buffering_mode, buffering flag not set for 3
                    [   88.637779] Fatal error on slpi!
                    

                    Then everything from Fatal error repeats.

                    Alex KushleyevA 1 Reply Last reply Reply Quote 0
                    • Alex KushleyevA
                      Alex Kushleyev ModalAI Team @LucaVertiq
                      last edited by

                      @LucaVertiq , ok so basically it looks like the SLPI does crash while the CPU-side voxl-px4 executable is still running, so the slpi is trying to re-start px4 and crashes again.

                      But the main point is that SLPI does crash, so there is a potential low-level issue like accessing wrong memory address or something like that.

                      LucaVertiqL 1 Reply Last reply Reply Quote 0
                      • LucaVertiqL
                        LucaVertiq @Alex Kushleyev
                        last edited by

                        @Alex-Kushleyev Thank you! I'll look through the code a bit harder to make sure I'm not doing anything silly. I'll let you know if I have any more questions.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Powered by NodeBB | Contributors