ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Possible bug in libmodal_pipe server.c

    Modal Pipe Architecture (MPA)
    2
    6
    550
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • TjarkT
      Tjark
      last edited by

      I was looking through the libmodal_pipe code because we have some issues with an error ERROR in pipe_server_write_to_client, client_id should be between 0 & 15 and I'm not sure what is causing this. We are restarting voxl programs often and I think something goes wrong with connecting/disconnecting pipes but I haven't figured it out yet.

      But when I looked through the code I saw that here all file descriptors are closed: https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/blob/master/library/src/server.c?ref_type=heads#L1611. But a few lines above that (https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/blob/master/library/src/server.c?ref_type=heads#L1568), c[ch].n_clients is set to 0 so I think the code at line 1611 isn't doing anything. I don't know if it gives problems and it is probably not related to my problem but I wanted to point it out to you.

      James StrawsonJ 1 Reply Last reply Reply Quote 0
      • James StrawsonJ
        James Strawson ModalAI Team @Tjark
        last edited by

        @Tjark

        Good find. Likely the file descriptors were being closed in the following step when they are removed from the file system which is why this hasn't presented itself as a problem. However you are right that the loop at line 1611 was not doing anything. I made the following change to address this:

        https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/commit/cf148fcf4b0fc43f29b3f2baeb30fa34fe3b508a

        Let me know what you find with regards to the client_id error and if there are any other issues in libmodal_pipe you come across.

        Best,
        James

        TjarkT 1 Reply Last reply Reply Quote 0
        • TjarkT
          Tjark @James Strawson
          last edited by

          @James-Strawson I still haven't found the root cause but I wanted to share some of my findings. Maybe you can help my understanding. A little bit of background: we don't want to power off the drone but when we are not flying we go into a sleep state where we shut down some of the voxl programs and our own programs. Then when we wake up we start up the programs and everything should work again. It is after about 6 of these cycles where we have this issue (so it also takes a while to reproduce).

          Here is some logging of where the problem just kicked in:

          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write to ch: 0 id: 14 result: -1 errno: 32
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write error: Broken pipe
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: previous client state was 2
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: Client voxl_pipe_handler-232508 (id 14) disconnected from channel 0
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write to ch: 0 id: 15 result: -1 errno: 32
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write error: Broken pipe
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: previous client state was 1
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: Client voxl_pipe_handler695511 (id 15) disconnected from channel 0
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write to ch: 1 id: 14 result: -1 errno: 32
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write error: Broken pipe
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: previous client state was 2
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: Client voxl_pipe_handler672850 (id 14) disconnected from channel 1
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write to ch: 1 id: 15 result: -1 errno: 32
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: write error: Broken pipe
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: previous client state was 1
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: Client voxl_pipe_handler507424 (id 15) disconnected from channel 1
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
          Aug 19 20:15:14 QUAD voxl-vision-hub[1524]: ERROR in pipe_server_write_to_client, client_id should be between 0 & 15
          

          voxl_pipe_handler is our own program. When this was logged we weren't properly closing the pipe when our program was terminated so that could explain the write errors. But after handling that correctly, we still have the issue of client_id should be between 0 & 15.

          A few questions. The id behind the client name seems to be a random number although the commit which adds the random number is after SDK-1.0.0. (https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/commit/b5fc28c9fc41184e0bbeee09c7f1867f9dbc1121). How is that possible? Did you deploy it well? This is the version we use:

          QUAD:~$ apt show voxl-vision-hub 
          Package: voxl-vision-hub
          Version: 1.6.6
          Priority: optional
          Section: base
          Maintainer: James Strawson <james@modalai.com>
          Installed-Size: unknown
          Provides: voxl-vision-px4
          Depends: librc-math,libmodal-pipe(>=2.4.0),libmodal-json,voxl-mpa-tools(>=0.2.5),voxl-mavlink-server,libmodal-cv(>=0.3.1)
          Conflicts: voxl-vision-px4,voxl-mavlink-server(<<1.0.0)
          Replaces: voxl-vision-px4
          Download-Size: 88.4 kB
          APT-Manual-Installed: yes
          APT-Sources: file:/data/voxl-suite-offline-packages ./ Packages
          Description: main hub managing communication between VOXL MPA services and autopilots
          

          In https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/blob/master/library/src/misc.c?ref_type=heads#L168, I think you're taking the modulo of a possible negative number. The modulo can then also be a negative number (https://stackoverflow.com/questions/7594508/modulo-operator-with-negative-values) so your result is not within the specified range. See also the logging above containing a negative number.

          I think that reconnecting a pipe with the same name makes that a new client id is used, possibly due to the random number in the name and then no two client names will be the same. I also saw that you changed something in that part of the code in this commit: https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/commit/773de63cb73000c08df1b04bc4e2f1e78e700816 Do you think that that could be the case? I will try to test that.

          TjarkT 1 Reply Last reply Reply Quote 0
          • TjarkT
            Tjark @Tjark
            last edited by

            @James-Strawson Any comment on this?

            James StrawsonJ 1 Reply Last reply Reply Quote 0
            • James StrawsonJ
              James Strawson ModalAI Team @Tjark
              last edited by

              @Tjark

              Good find on the negative number. I just pushed a fix for that to the dev branch.

              The random number feature was added after the last stable SDK release (1.0.0) and is therefore not considered finished or stable until it's included with the SDK 1.0.1 release. So far we have been using it internally in our nightly builds and also the one customer who requested it had included it in their nightly tests without issues since it was tagged v2.9.1

              I've never seen pipe_server_write_to_client throw that error and I don't know how to reproduce it. Are you calling that function manually or are you using the higher level pipe_server_write() family of functions? I also just pushed to the dev branch a commit with a couple more helpful warning prints for when it does occur.

              reconnecting a pipe with a new random number does not increase the client id unless the old client remains in a connected state. Since you see the message "Client voxl_pipe_handler507424 (id 15) disconnected from channel 1" this means the client ID is being flagged as disconnected and that particular client id is now available for reuse.

              Here are a couple comments in the client connection logic to help explain:

              https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/blob/dev/library/src/server.c?ref_type=heads#L660
              https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/blob/dev/library/src/server.c?ref_type=heads#L788

              Hopefully this helps you track down the issue. Let me know what you find.

              Best,
              James

              TjarkT 1 Reply Last reply Reply Quote 0
              • TjarkT
                Tjark @James Strawson
                last edited by

                @James-Strawson Thanks for your reply! It made me realise what the error must be. We use a locally cloned repository for libmodalpipe because we want to build our software directly on the drone. This is then used in the build and link process of our software. This happened to be on the 'master' branch and not the 'SDK-1.0.0' branch. So the pipe clients use a different version than the pipe servers which is likely the cause of our errors.

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Powered by NodeBB | Contributors