@James-Strawson I still haven't found the root cause but I wanted to share some of my findings. Maybe you can help my understanding. A little bit of background: we don't want to power off the drone but when we are not flying we go into a sleep state where we shut down some of the voxl programs and our own programs. Then when we wake up we start up the programs and everything should work again. It is after about 6 of these cycles where we have this issue (so it also takes a while to reproduce).
Here is some logging of where the problem just kicked in:
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write to ch: 0 id: 14 result: -1 errno: 32
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write error: Broken pipe
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: previous client state was 2
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: Client voxl_pipe_handler-232508 (id 14) disconnected from channel 0
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write to ch: 0 id: 15 result: -1 errno: 32
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write error: Broken pipe
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: previous client state was 1
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: Client voxl_pipe_handler695511 (id 15) disconnected from channel 0
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write to ch: 1 id: 14 result: -1 errno: 32
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write error: Broken pipe
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: previous client state was 2
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: Client voxl_pipe_handler672850 (id 14) disconnected from channel 1
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write to ch: 1 id: 15 result: -1 errno: 32
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write error: Broken pipe
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: previous client state was 1
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: Client voxl_pipe_handler507424 (id 15) disconnected from channel 1
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
voxl_pipe_handler
is our own program. When this was logged we weren't properly closing the pipe when our program was terminated so that could explain the write errors. But after handling that correctly, we still have the issue of client_id should be between 0 & 15
.
A few questions. The id behind the client name seems to be a random number although the commit which adds the random number is after SDK-1.0.0. (https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/commit/b5fc28c9fc41184e0bbeee09c7f1867f9dbc1121). How is that possible? Did you deploy it well? This is the version we use:
QUAD:~$ apt show voxl-vision-hub
Package: voxl-vision-hub
Version: 1.6.6
Priority: optional
Section: base
Maintainer: James Strawson <james@modalai.com>
Installed-Size: unknown
Provides: voxl-vision-px4
Depends: librc-math,libmodal-pipe(>=2.4.0),libmodal-json,voxl-mpa-tools(>=0.2.5),voxl-mavlink-server,libmodal-cv(>=0.3.1)
Conflicts: voxl-vision-px4,voxl-mavlink-server(<<1.0.0)
Replaces: voxl-vision-px4
Download-Size: 88.4 kB
APT-Manual-Installed: yes
APT-Sources: file:/data/voxl-suite-offline-packages ./ Packages
Description: main hub managing communication between VOXL MPA services and autopilots
In https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/blob/master/library/src/misc.c?ref_type=heads#L168, I think you're taking the modulo of a possible negative number. The modulo can then also be a negative number (https://stackoverflow.com/questions/7594508/modulo-operator-with-negative-values) so your result is not within the specified range. See also the logging above containing a negative number.
I think that reconnecting a pipe with the same name makes that a new client id is used, possibly due to the random number in the name and then no two client names will be the same. I also saw that you changed something in that part of the code in this commit: https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/commit/773de63cb73000c08df1b04bc4e2f1e78e700816 Do you think that that could be the case? I will try to test that.