ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Utilizing all CPUs on Voxl2

    Ask your questions right here!
    voxl2
    3
    7
    434
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mrunal SarvaiyaM
      Mrunal Sarvaiya
      last edited by Mrunal Sarvaiya

      Hi,

      I had a question related to utilizing all the cores available on the voxl2. I’m using a cpp library that runs multi threaded optimization using openmp and allows me to specify the number of available threads. What I notice is that the performance increased as I increase the number of threads upto 4, but after that there’s no increase in performance. I would have expected an increase in performance up to 8 threads, since there’s 8 cpus on the voxl2. During my tests, upon running voxl-inspect-cpu, I see that cpu 0-3 have a high utilization while cpu 4-7 have very low utilization even when using 8 threads.

      Could you shed some light on whether this is expected, and if so how I could go about better utilizing the last 4 cpus? (I’m pretty confident that the optimization is resource limited, the same optimization problem runs faster on a laptop with more cpus)

      Thanks!

      Darshit DesaiD 1 Reply Last reply Reply Quote 0
      • Darshit DesaiD
        Darshit Desai @Mrunal Sarvaiya
        last edited by

        @Mrunal-Sarvaiya voxl set cpu mode performance and than try running your experiment. By default some cpu cores are slower than the others even in perf mode

        Mrunal SarvaiyaM 1 Reply Last reply Reply Quote 0
        • Mrunal SarvaiyaM
          Mrunal Sarvaiya @Darshit Desai
          last edited by

          @Darshit-Desai Yup, all these tests were run with the cpu mode set to performance mode

          Alex KushleyevA 1 Reply Last reply Reply Quote 0
          • Alex KushleyevA
            Alex Kushleyev ModalAI Team @Mrunal Sarvaiya
            last edited by

            @Mrunal-Sarvaiya , VOXL2 has three types of cores. 0-3 are low power cores, 4-6 are medium and core 7 is the fastest. It seems that OpenMP may not automatically understand how to use this type of CPU architecture. You may want to look into explicitly specifying which cores should be used by the OpenMP (assigning each thread to specific core). Also, you may need to provide a cpu type as your build flag.

            The cpu on VOXL2 is Kryo 585 / Snapdragon 865 , which is a combination of Cortex A55 and A77 cores. OpenMP may think that these are completely different cpus and does not "dare" to use the additional faster 4 cores by default.

            Alex

            Mrunal SarvaiyaM 1 Reply Last reply Reply Quote 0
            • Mrunal SarvaiyaM
              Mrunal Sarvaiya @Alex Kushleyev
              last edited by

              @Alex-Kushleyev Thanks a ton! I was able to explicitly specify the cpus to use and the optimization is 3-4x faster.

              Posting the command needed here in case someone else stumbles upon this post. Export the following environment variables
              export OMP_PROC_BIND=close # this may not be necessary
              export GOMP_CPU_AFFINITY="4 5 6 7" # here 4 5 6 7 specifies the cpus to use

              Alex KushleyevA 1 Reply Last reply Reply Quote 0
              • Alex KushleyevA
                Alex Kushleyev ModalAI Team @Mrunal Sarvaiya
                last edited by

                @Mrunal-Sarvaiya , so you still used 4 cores, but just the more powerful ones? were you able to make user of all 8 cores (you would need 8 threads in your application)

                Alex

                Mrunal SarvaiyaM 1 Reply Last reply Reply Quote 0
                • Mrunal SarvaiyaM
                  Mrunal Sarvaiya @Alex Kushleyev
                  last edited by

                  @Alex-Kushleyev Correct, I just used the last 4 more powerful cpus. That was enough of a performance boost to run my optimization at the frequency I was hoping for. I didn't try using all 8 threads, I can give it a shot if that's useful information for you

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Powered by NodeBB | Contributors