ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    'ERROR failed to set scheduler' after restart of voxl-qvio-server

    GPS-denied Navigation (VIO)
    3
    5
    418
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • TjarkT
      Tjark
      last edited by

      When voxl-qvio-server is started, it sets itself to use the FIFO scheduler with high priority. This is done here: https://gitlab.com/voxl-public/voxl-sdk/services/voxl-qvio-server/-/blob/master/server/main.cpp#L973

      When the drone boots, this is successful the first time. journalctl -u voxl-qvio-server reports:

      Jan 01 00:00:08 Drone_201 voxl-qvio-server[2552]: setting scheduler
      Jan 01 00:00:08 Drone_201 voxl-qvio-server[2552]: set FIFO priority successfully!
      

      Now when I execute systemctl restart voxl-qvio-server it is unable to set itself to use the FIFO scheduler with high priority. journalctl -u voxl-qvio-server reports:

      Jul 01 08:23:52 Drone_201 voxl-qvio-server[11943]: WARNING Failed to set priority, errno = 1
      Jul 01 08:23:52 Drone_201 voxl-qvio-server[11943]: This seems to be a problem with ADB, the scheduler
      Jul 01 08:23:52 Drone_201 voxl-qvio-server[11943]: should work properly when this is a background process
      Jul 01 08:23:52 Drone_201 voxl-qvio-server[11943]: ERROR failed to set scheduler
      

      It seems that this is similar to the issue reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1467919.

      What can we do to make sure that voxl-qvio-server is always running with the FIFO scheduler?


      Version information:

      yocto:~$ opkg list | grep voxl
      libvoxl_cutils - 0.0.2 - ModalAI's c utils
      libvoxl_io - 0.5.4 - ModalAI library allowing apps processor access to accessory serial ports
      voxl-camera-calibration - 0.0.1 - On-board camera calibration for VOXL
      voxl-camera-server - 0.8.1 - publishes camera frames over named pipe interface
      voxl-cpu-monitor - 0.1.7 - publishes CPU Data over MPA pipe and provides fan tools
      voxl-docker-support - 1.1.3 - tools to improve the usability of docker on VOXL
      voxl-imu-server - 0.8.1 - VOXL IMU interface for Modal Pipe Architecture
      voxl-mavlink - 0.0.2 - mavlink headers
      voxl-mpa-tools - 0.2.7 - misc tools for modal pipe architecture
      voxl-nodes - 0.1.7 - ROS nodes supported by ModalAI
      voxl-portal - 0.1.2
      voxl-qvio-server - 0.3.1 - publishes QVIO data over named pipe interface
      voxl-qvio-server - 0.3.4
      voxl-streamer - 0.2.3 - Gstreamer-based application to handle RTSP streaming
      voxl-tag-detector - 0.0.2 - Detect apriltags for MPA
      voxl-tflite - 0.0.1 - 64-bit tensorflow lite libraries
      voxl-tflite-server - 0.1.1 - client of voxl-camera-server that does deep learning (object detection, monocular depth estimation)
      voxl-utils - 0.8.5
      voxl-vision-px4 - 0.9.2 - Interface between VOXL's computer vision services and PX4
      

      Things I already have figured out from looking at the link mentioned above.

      When the drone starts up the voxl-qvio-server task lives in the root cgroup:

      yocto:~$ cat /sys/fs/cgroup/cpu,cpuacct/tasks | grep $(pidof voxl-qvio-server)
      2514
      

      Therefore it uses the realtime runtime budget of the root. This is 0.95 seconds per second (default values):

      yocto:~$ cat /sys/fs/cgroup/cpu,cpuacct/cpu.rt_runtime_us
      950000
      

      It needs a realtime runtime budget bigger than 0 to be able to set the scheduler to FIFO so this is good.

      Now when I execute systemctl restart voxl-qvio-server things change. The task doesn't live in the root cgroup anymore:

      yocto:~$ cat /sys/fs/cgroup/cpu,cpuacct/tasks | grep $(pidof voxl-qvio-server)
      yocto:~$
      

      But now it lives in a new group:

      yocto:~$ cat /sys/fs/cgroup/cpu,cpuacct/system.slice/voxl-qvio-server.service/tasks | grep $(pidof voxl-qvio-server)
      13562
      

      but this new group doesn't have any realtime runtime budget:

      yocto:~$ cat /sys/fs/cgroup/cpu,cpuacct/system.slice/voxl-qvio-server.service/cpu.rt_runtime_us                     
      0
      

      and therefore it is unable to set the scheduler to FIFO with high priority. This is also mentioned in https://www.kernel.org/doc/Documentation/scheduler/sched-rt-group.txt at section 2.2:

      By default all bandwidth is assigned to the root group and new groups get the
      period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you
      want to assign bandwidth to another group, reduce the root group's bandwidth
      and assign some or all of the difference to another group.

      If I manually assign bandwidth/realtime runtime budget to the voxl-qvio-server group it is able to set the scheduler to FIFO with high priority

      echo 550000 > /sys/fs/cgroup/cpu,cpuacct/cpu.rt_runtime_us
      echo 200000 > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us
      echo 200000 > /sys/fs/cgroup/cpu,cpuacct/system.slice/voxl-qvio-server.service/cpu.rt_runtime_us
      

      I tried to script this and add it to the service but then it fails at the first startup because then /sys/fs/cgroup/cpu,cpuacct/system.slice/voxl-qvio-server.service/cpu.rt_runtime_us doesn't exist yet. Maybe it is possible to add it conditionally but I wasn't able to get it working robust yet. This link looks also interesting: https://lists.freedesktop.org/archives/systemd-devel/2017-July/039353.html. But at this point I thought it was better to ask on this forum if you are aware of this problem and maybe already have a solution for this.

      My service file is this:

      yocto:~$ cat /etc/systemd/system/voxl-qvio-server.service
      #
      # Copyright (c) 2021 ModalAI, Inc.
      #
      
      [Unit]
      Description=voxl-qvio-server
      SourcePath=/usr/bin/voxl-qvio-server
      After=voxl-wait-for-fs.service
      Requires=voxl-wait-for-fs.service
      
      [Service]
      User=root
      Type=simple
      PIDFile=/run/voxl-qvio-server.pid
      ExecStart=/usr/bin/voxl-qvio-server
      
      [Install]
      WantedBy=multi-user.target
      

      I hope you can help me out. I think it is important that voxl-qvio-server is always running with the FIFO scheduler and high priority.

      1 Reply Last reply Reply Quote 0
      • James StrawsonJ
        James Strawson ModalAI Team
        last edited by

        Thanks for investigating this. I was unable to recreate this over ADB but was able to over SSH. Are you logged in through ADB or SSH?

        1 Reply Last reply Reply Quote 0
        • TjarkT
          Tjark
          last edited by

          I was logged in through SSH. But we also have our own blowup handler running on the drone which can execute systemctl restart voxl-qvio-server when it has detected a blowup. Then we are not connected through SSH or ADB but it will also fail to set the scheduler to FIFO. And I'm not sure if the voxl-qvio-server is restarted by ModalAI software but then I think it is the same result.

          1 Reply Last reply Reply Quote 0
          • TjarkT
            Tjark
            last edited by

            Is there any update on this issue?

            1 Reply Last reply Reply Quote 0
            • ?
              A Former User
              last edited by

              Hi,

              We're still not sure what caused the scheduler issue, it was a new feature we were hoping to integrate into the stack when you found this bug but aren't able to right now. In the meantime we've pushed updated packages with these calls removed so that all of the packages can run as intended. If you update the packages via OPKG to latest there should be no issues with the scheduler.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Powered by NodeBB | Contributors