Optimizing DSP Load wr.t. IO
-
Hi Modal,
When upping the IMU_GYRO_RATEMAX (for lower end-to-end worse case latency in the gyro movement -> actuation loop), I'm seeing the DSP load jump (measured by
px4-listener cpuload
). After digging into it, it looks like the main culprit is not the processing modules, but rather the FIFO read (transfer((uint8_t *)&buffer, (uint8_t *)&buffer, transfer_size)
in the ICM42688P driver) and the UART write (uart_write(&data, sizeof(DataPacket)) != sizeof(DataPacket)
) to our STM board (for actuation).About 40% of the DSP is being used for the FIFO transfer and the UART write with
IMU_GYRO_RATEMAX
set to 1kHz (1000Hz).Any recommendations on how to minimize the DSP being choked by these IO operations?
I can only think of marginal gains like decreasing how many bytes the FIFO is configured to contain, or optimizing our
DataPacket
size down to like 12 bytes (currently 17 bytes). But maybe there is some more fundamental strategy that I'm missing? Or is IO just generally a bad time for the DSP?Thanks for any advice!
Rowan (Cleo Robotics)
-
The SPI Read is a blocking call, so the DSP will spend time in that call while the data is being transferred, but it may be sleeping while waiting for the data to come into the SPI Hardware.
The UART Write is, actually, also a blocking call, so the DSP will write the data to the UART Hardware and sleep until the data is finished sending.
If you are measuring the times spent in those functions, it can add up. If you increase your UART baud rate, that time should go down. While the DSP is waiting inside those functions, it can process other threads.
The DSP utilization measurement is not very accurate, so it should not be trusted completely. How much utilization increase are you seeing from increasing the processing rate from 800Hz to 1000Hz ?
Maybe @Eric-Katzfey can comment as well - thanks!
Alex
-
@Alex-Kushleyev The IMU is being configured with an 8K ODR so by increasing IMU_GYRO_RATEMAX you are reading the FIFO more often but reading less samples each time. So you are mainly increasing the overhead of context switching. Can you characterize how much the load increases just by increasing IMU_GYRO_RATEMAX and not doing any of the extra UART IO? Unfortunately we don't have a lot of control over the low level implementation of the IO drivers that are in the Qualcomm code. And there is no DMA that could help lower IO overhead. One idea would be to lower the ODR to 1K so that you are only reading one sample from the FIFO at each interrupt.