ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Minimizing voxl-camera-server CPU usage in SDK1.6

    Video and Image Sensors
    2
    6
    106
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Rowan DempsterR
      Rowan Dempster
      last edited by

      Hi Modal,

      As we at Cleo update voxl-camera-server to SDK 1.6 (from SDK 1.0 with lots of backported changes), I've done some preliminary CPU profiling and have some questions on how to keep voxl-camera-server CPU usage down in SDK 1.6. Two things stand out to me:

      1. Using the new tracking_misp_norm pipe for our tracking cameras uses ~25% more of a CPU core than using the tracking pipe. 25% is total change across our 3 tracking cameras. So the usage goes from ~85% of a core to ~110% of a core (exact numbers change based on core allocation). To illustrate this you can run voxl-inspect-cam top bottom back hires_small_color tof_depth, record the CPU usage of voxl-camera-server, and then run voxl-inspect-cam top_misp_norm bottom_misp_norm back_misp_norm hires_small_color tof_depth and record the CPU usage again and compare. Is there a way to prevent this from happening, or can you explain why the tracking_misp_norm pipes use more CPU?

      2. Adding additional clients to the camera pipe topics causes voxl-camera-server to use more CPU. To illustrate this I only run voxl-camera-server (no other services), then inspect our baseline pipes (e.g. voxl-inspect-cam top_misp_norm bottom_misp_norm back_misp_norm hires_small_color tof_depth) and then in a new terminal(s) I open new clients again using voxl-inspect-cam. For example, when I open 4 new terminals (on top of the baseline terminal) and run voxl-inspect-cam top_misp_norm bottom_misp_norm back_misp_norm in each, then the voxl-camera-server CPU usage increases ~70% (from ~110% of a core, to ~180% of a core). This behavior on its own doesn't quite make sense to me, why would additional clients change the CPU load of the server, and by that much? The images should only be computed once, not per client? Furthermore, what's extra strange is that which tracking pipe is used also matters. If the baseline terminal is instead running voxl-inspect-cam top bottom back hires_small_color tof_depth and the 4 new client terminals are running voxl-inspect-cam top bottom back, then the CPU usage increase only by ~7% (from ~85% to ~92%). The different behavior between type of tracking pipe doesn't make sense to me, could you explain it?

      Overall, at Cleo we are looking to use the tracking_misp_norm pipes going forward, and be able to have multiple clients consuming those pipes with having to worry about increasing CPU usage of the voxl-camera-server. If you could comment on whether these asks are possible, or explain why not (either at this time, or never), that would help us a great deal in our process of taking advantage of Modal's great work in robotic perception!

      Alex KushleyevA 1 Reply Last reply Reply Quote 0
      • Alex KushleyevA
        Alex Kushleyev ModalAI Team @Rowan Dempster
        last edited by Alex Kushleyev

        Hi @Rowan-Dempster,

        We have been looking at some optimizations to help reduce the overall cpu usage of the camera server (not yet in the SDK). Let me test your exact use case and see what can be done.

        Just FYI, we recently added support for sharing ION buffers directly from camera server, which means the camera clients get the images using zero-copy approach. this allows to save the cpu cycles wasted on sending the image bytes over the pipe, especially when there are multiple clients.

        If you would like to learn more how to use the ION buffers, I can post some examples soon. One the client side, the API for receiving and ION buffer vs regular buffer is almost the same. One thing that will be different is that the shared ION buffer has to be released by all clients before it can be re-used by the camera server (which makes sense).

        Even without the ION buffer sharing there is room to reduce the cpu usage, so I will follow up after testing this a bit. Regarding your question whether sending the image to multiple clients should not cause significant extra cpu usage -- yes you are correct, ideally it should not. However, the reason why it is happening here is related to how we set up ION buffer cache policy and currently when the CPU accesses the buffers for the misp_norm images (coming from the gpu), the cpu reads are not cached and the read access is expensive. Reading the same buffer multiple times results in repeated CPU-RAM access (for the data that would normally be already fully cached after the first send to the client pipe). However, in some other cases (when the buffer is not used by the cpu, but is shared as ION buffer and client sends the buffer directly to GPU), this approach results in even lower CPU usage. So i think we need to resolve the buffer cache policy based on the use case.. More details will come soon..

        Alex

        Rowan DempsterR 1 Reply Last reply Reply Quote 0
        • Rowan DempsterR
          Rowan Dempster @Alex Kushleyev
          last edited by

          @Alex-Kushleyev Hi Alex, I appreciate you looking into our use case for us (as always!). Please let me know if I can help by providing more details regarding the structure of our clients that are consuming the camera pipes. If there is a detail Cleo can't share publicly on the forum I can always email you as well or hop on a quick call to elaborate.

          Understood about the misp pipes, that expensive read access would explain both point #1 and the strange part of point #2 in my original post.

          We at Cleo will be monitoring this closely, since CPU usage regressions is pretty much the only gating item for us upgrading our robotic perception stack to SDK1.6. The misp_norm pipes are a great benefit to that perception stack and we'd of course love to take advantage of them as soon as possible. Thus we are definitely open to trying out zero-copy client approaches for keeping CPU usage down, or any other optimizations you could share examples of for us to try out on our use case.

          Rowan DempsterR 1 Reply Last reply Reply Quote 0
          • Rowan DempsterR
            Rowan Dempster @Rowan Dempster
            last edited by

            @Alex-Kushleyev Hi Alex, just following up on any update regarding the shared ION buffers or other methods to work around the CPU hit taken by having many clients to misp based image pipes. Happy to try out any methods/suggestions with our specific clients!

            Rowan

            Alex KushleyevA 1 Reply Last reply Reply Quote 0
            • Alex KushleyevA
              Alex Kushleyev ModalAI Team @Rowan Dempster
              last edited by Alex Kushleyev

              Hi @Rowan-Dempster ,

              I started a new branch where I will be working on some performance optimizations in the camera server.

              https://gitlab.com/voxl-public/voxl-sdk/services/voxl-camera-server/-/tree/perf-optimizations

              In my initial testing, setting cpu to perf and when running one or two instances of the following:

              voxl-inspect-cam tracking_front_misp_norm tracking_down_misp_norm
              

              i was seeing:

              1 instance (2 inspected streams) : 42% CPU (of one core)
              2 instances (4 inspected streams): 58% CPU (of one core)
              

              with the changes i just committed, i am seeing:

              1 instance: 31% cpu
              2 instances : 36% cpu
              

              If you would like you can test camera server from this branch and see if you can reproduce the results.

              notes:

              • the internal buffers were switched from uncached to cached and proper buffer management was added to ensure that data written by GPU is properly accessed by CPU
              • with these changes, if you use the _encoded stream from the tracking camera, it will work, but in dmesg you will see messages related to qbuf cache ops failed -- this is still under investigation and will be fixed soon.

              Meanwhile, I will work on an a simple example that shows the usage of ION buffers, I will try to share it a bit later today.

              Alex

              Alex KushleyevA 1 Reply Last reply Reply Quote 0
              • Alex KushleyevA
                Alex Kushleyev ModalAI Team @Alex Kushleyev
                last edited by Alex Kushleyev

                Hi @Rowan-Dempster ,

                Please take a look at this example (you can build and run it too) : https://gitlab.com/voxl-public/voxl-sdk/utilities/voxl-mpa-tools/-/blob/add-new-image-tools/tools/voxl-image-repub.cpp

                This app can accept a regular image (RAW8 or YUV) and either re-publish it unchanged or crop and publish the result. Sometimes this is useful for quickly cropping an image that is fed into a different application that expects a smaller image or different aspect ratio.

                The app shows how to subscribe and handle ion buffer streams.

                Usage:

                voxl2:/$ voxl-image-repub                          
                ERROR: Pipe name not specified
                
                    
                    Re-publish cropped camera frames (RAW8 or YUV)
                    
                    Options are:
                    -x, --crop-offset-x    crop offset in horizontal dimension
                    -y, --crop-offset-y    crop offset in vertical dimension
                    -w, --crop-size-x      crop size in horizontal dimension (width)
                    -h, --crop-size-y      crop size in vertical dimension (height)
                    -o, --output-name      output pipe name
                    -u, --usage            print this help message
                    
                    The cropped image will be centered if the crop offsets are not provided. 
                    
                    typical usage:
                    /# voxl-image-repub tracking --crop-size-x 256 --crop-size-y 256
                    /# voxl-image-repub tracking --crop-size-x 256 --crop-size-y 256 --crop-offset-x 128 --crop-offset-y 128
                

                example re-publishing ion buffer image as regular image (which you can view in voxl-portal ) :

                voxl-image-repub tracking_front_misp_norm_ion -o test
                

                (you can see which ion pipes are available by running voxl-list-pipes | grep _ion)

                Please note that without the previous fix that i posted above, the client process that receives and uncached ION buffer will incur extra CPU load while accessing this buffer. For example, the same voxl-image-repub client uses 1.7% cpu while republishing the normalized image (cached ion buffer), while using 7.3% cpu republishing an image from an uncached ION buffer. (cpu usage % using one of the smaller cores).

                Please try and let me know if you have any questions.

                I know this cached / uncached buffering may be a bit confusing, but i will document this a bit more to help explain it a little better.

                Alex

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Powered by NodeBB | Contributors