Inconsistent IMU sampling

Alex Kushleyev · 23 Aug 2024, 03:32

Can you please try to use voxl-logger to log the IMU data concurrently with mpa-to-ros pipeline?

voxl-logger will save imu data with human readable timestamps, so you could easily plot that and double check if the IMU samples produced by qrb5165-imu-server are indeed jumping around. It is possible that something related to ROS is introducing a timestamp issue.

Alex

Morten Nissov · 23 Aug 2024, 12:09

@Alex-Kushleyev Thanks! I'll try this, note I should have mentioned this is with a voxl2 mini computer and on the ROS side the only thing happening is recording to rosbag.

Morten

Alex Kushleyev · 26 Aug 2024, 18:16

@Morten-Nissov yes that makes sense.

so the data flow is as follows:

qrb5165-imu-server ->(mpa_msg) -> voxl-mpa-ros -> (ros_msg) -> rosbag record

So, yeah it should be good to confirm whether voxl logger is also experiencing the issue, where the flow would be:

qrb5165-imu-server ->(mpa_msg) -> voxl_logger

Also, please check if there is any difference when you set the cpu mode to perf:
voxl-set-cpu-mode perf

Morten Nissov · 6 Sept 2024, 16:59

@Alex-Kushleyev Sorry for the delay, some data for you in different configurations:

Getting data from log0000 and data.bag
voxl-logger
	mean: 0.9766542020947352 ms
	std dev: 0.0013052458028126252 ms
	total messages: 61393
rosbag record
	mean: 3.230353079590143 ms
	std dev: 2.992801413066607 ms
	total messages: 18361
----------
Getting data from log0001 and data_nodelay.bag
voxl-logger
	mean: 0.9766513801209619 ms
	std dev: 0.0013052148029740315 ms
	total messages: 61342
rosbag record
	mean: 3.2565453276935123 ms
	std dev: 3.0139953610144303 ms
	total messages: 17860
----------
Getting data from log0002 and data_perf.bag
voxl-logger
	mean: 0.976641095612789 ms
	std dev: 0.001267007877929727 ms
	total messages: 61384
rosbag record
	mean: 3.73874577006071 ms
	std dev: 3.513557787123189 ms
	total messages: 15819
----------

Note significant differences in messages for each, indicating lots of message drop by the ros node.

Considering the sampling rate and standard deviation for different configurations. Logging was done with voxl-logger -t 60 -i imu_apps -d bags/imu_test/ and rosbag record /imu_apps -O data --duration 60. I tried setting the rosbag to record with --tcpnodelay, but this seems to have no effect. During the performance run, the cpu usage was as follows:

Name   Freq (MHz) Temp (C) Util (%)
-----------------------------------
cpu0       1804.8     40.1    22.33
cpu1       1804.8     39.7    18.10
cpu2       1804.8     40.1    17.82
cpu3       1804.8     39.7    16.16
cpu4       2419.2     39.0     0.00
cpu5       2419.2     38.6     0.00
cpu6       2419.2     37.8     0.00
cpu7       2841.6     39.0     0.00
Total                 40.1     9.30
10s avg                       12.26
-----------------------------------
GPU         305.0     37.4     0.00
GPU 10s avg                    0.00
-----------------------------------
memory temp:       37.8 C
memory used:   734/7671 MB
-----------------------------------
Flags
CPU freq scaling mode: performance

The only thing which seems out of the ordinary here is that many CPUs are operating consistently at low frequency. Maybe you have a comment on why that could be? Otherwise, it seems the transfer from the voxl-imu-server to ros causes quite a bit of loss in messages. Here is some voxl 2 mini install information:

--------------------------------------------------------------
system-image: 1.7.8-M0104-14.1a-perf
kernel:       #1 SMP PREEMPT Sat May 18 03:34:36 UTC 2024 4.15
--------------------------------------------------------------
hw platform:  M0104
mach.var:     2.0.0
--------------------------------------------------------------
voxl-suite:   1.3.3
--------------------------------------------------------------

with this IMU config:

        "imu0_enable":  true,
                "imu0_sample_rate_hz":  1000,
                "imu0_lp_cutoff_freq_hz":       92,
                "imu0_rotate_common_frame":     true,
                "imu0_fifo_poll_rate_hz":       100,
                "aux_imu1_enable":      false,
                "aux_imu1_bus": 1,
                "aux_imu1_sample_rate_hz":      1000,
                "aux_imu1_lp_cutoff_freq_hz":   92,
                "aux_imu1_fifo_poll_rate_hz":   100,
                "aux_imu2_enable":      false,
                "aux_imu2_spi_bus":     14,
                "aux_imu2_sample_rate_hz":      1000,
                "aux_imu2_lp_cutoff_freq_hz":   92,
                "aux_imu2_fifo_poll_rate_hz":   100,
                "aux_imu3_enable":      false,
                "aux_imu3_spi_bus":     5,
                "aux_imu3_sample_rate_hz":      1000,
                "aux_imu3_lp_cutoff_freq_hz":   92,
                "aux_imu3_fifo_poll_rate_hz":   100

Morten Nissov · 6 Sept 2024, 17:07

@Alex-Kushleyev Followup, I can change the fifo_poll_rate_hz parameter and this has significantly increased the performance at the cost of significantly increase CPU load on the slow cores. Maybe you can advise how we should proceed.

"imu0_enable":	true,
	"imu0_sample_rate_hz":	1000,
	"imu0_lp_cutoff_freq_hz":	92,
	"imu0_rotate_common_frame":	true,
	"imu0_fifo_poll_rate_hz":	500,

Name   Freq (MHz) Temp (C) Util (%)
-----------------------------------
cpu0       1804.8     41.7    44.95
cpu1       1804.8     41.7    33.94
cpu2       1804.8     42.0    46.88
cpu3       1804.8     42.4    44.00
cpu4       2419.2     40.5     0.00
cpu5       2419.2     40.5     0.00
cpu6       2419.2     39.7     0.00
cpu7       2841.6     40.1     0.00
Total                 42.4    21.22
10s avg                       15.10
-----------------------------------
GPU         305.0     38.6     0.00
GPU 10s avg                    0.00
-----------------------------------
memory temp:       39.0 C
memory used:   746/7671 MB
-----------------------------------
Flags
CPU freq scaling mode: performance
Standby Not Active
-----------------------------------

Note even with this change some IMU messages are still dropped:

Getting data from log0003 and data_perf_fifo500.bag
voxl-logger
	mean: 0.9766390443514245 ms
	std dev: 0.001688066415815054 ms
	total messages: 61397
rosbag record
	mean: 1.0005186426507142 ms
	std dev: 0.15095098578685434 ms
	total messages: 59599

Morten Nissov · 7 Sept 2024, 10:23

@Morten-Nissov Note, although this does make things look better, there still is a pretty significant difference between the voxl logger and recording a rosbag, with a relatively non-insignificant performance difference between the two.

Also, not really sure why the voxl-logger doesn't manage a clean 1kHz, following the configuration. Seems to be consistently off the commanded value.

Morten Nissov · 7 Sept 2024, 11:48

@Alex-Kushleyev Hi again, Sorry to keep spamming. I just wanted to make sure this is up-to-date as I progress.

A bit silly I didn't realize before, but I think the problem lies in the FIFO buffer reading vs IMU queue size in the voxl-mpa-to-ros, that being said the conversion from pipe to ros also costs some latency it seems.

In voxl-mpa-to-ros, there is the following

    m_rosPublisher = m_rosNodeHandle.advertise<sensor_msgs::Imu>(topicName, 1);

Given that the FIFO buffer (with default settings) reads 10 IMU messages at a time, I think this should be changed to

    m_rosPublisher = m_rosNodeHandle.advertise<sensor_msgs::Imu>(topicName, 25);

so it has a little extra depending on what happens. This results in a very large improvement:

Getting data from log0004 and data_incr_queue.bag
voxl-logger
	mean: 0.9766417070988479 ms
	std dev: 0.0014018018261035131 ms
	total messages: 61363
rosbag record
	mean: 0.9766411567432235 ms
	std dev: 0.02937541759378401 ms
	total messages: 59305

although you can see there are still some dropped measurements and still a worsening of the standard deviation. I realized the IMU reading can also directly publish to ros from the voxl-imu-server, and this seems a little more benefit as well

rosbag record
	mean: 0.9768075697417741 ms
	std dev: 0.012558722853297727 ms
	total messages: 61147

still not totally without message dropping. This also still has some worsening about the standard deviation, I wonder if that's simply due to the conversion in _clock_monotonic_to_ros_time.

Alex Kushleyev · 11 Sept 2024, 03:48

@Morten-Nissov , Sorry about the delay. Let me answer your questions / comments in order..

the 8 cpu frequencies ranging from 1.8Ghz to 2.8Ghz are correct for the performance mode. The VOXL2 cpu has 3 types of cores and what you see are the maximum frequencies for those cores. Performance mode pegs the frequencies to max unless the cpu overheats, when frequency throttling will occur.
the IMU frequency is not exactly 1Khz because the IMU is running using its own internal oscillator, which is does not perfectly align with 1000Hz output frequency, so the closest frequency the IMU can achieve is about 976Hz or so.
Finally, you are absolutely right that the ROS publisher queue size being set to 1 is causing the messages to be dropped because they are published too quickly and are simply discarded when queue overflows.

I really don't see the down side of increasing the buffer size and i agree it should be set to at least the sample_rate / fifo_poll_rate , which is the number of samples that would be published at a time.

I will discuss this with the team.. thank you for pointing out the issue and good job figuring it out!

Alex

Morten Nissov · 11 Sept 2024, 07:01

@Alex-Kushleyev Thanks for getting back to me, I must've missed reading the part regarding the CPU frequencies, now that you say it seems obvious.

the IMU frequency is not exactly 1Khz because the IMU is running using its own internal oscillator, which is does not perfectly align with 1000Hz output frequency, so the closest frequency the IMU can achieve is about 976Hz or so.

This is interesting actually, because the dt and sampling rate seems to consistently be slightly higher than 1kHz, approximately 1.03kHz. Can this make sense for the IMU?

Finally, you are absolutely right that the ROS publisher queue size being set to 1 is causing the messages to be dropped because they are published too quickly and are simply discarded when queue overflows.
I really don't see the down side of increasing the buffer size and i agree it should be set to at least the sample_rate / fifo_poll_rate , which is the number of samples that would be published at a time.
I will discuss this with the team.. thank you for pointing out the issue and good job figuring it out!

I didn't see a major change in CPU consumption, at least of the two solutions increasing the FIFO queue cost much more CPU than changing the queue size did. I would offer to make a PR if the change was more complicated than just changing a number

Anyway thanks. I'm a bit curious about the IMU, but otherwise I think this is solved.

Morten

Alex Kushleyev · 11 Sept 2024, 16:33

@Morten-Nissov ,

About the IMU dt, sorry, i got it backwards, the dt is 0.976ms, so that would be slightly above 1khz. So we are on the same page.

Yes, reducing the FIFO size results in more SPI read transactions, which means that the CPU / Kernel has to do more work. There is a per transaction overhead that is much more significant than just getting more data per single transaction.

Let's not bother with the PR because it is a 2-character change . I will get to it and test it soon.

Alex