• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Register
  • Login
ModalAI Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
    • Register
    • Login

    Utilizing all CPUs on Voxl2

    Ask your questions right here!
    voxl2
    3
    7
    447
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      Mrunal Sarvaiya
      last edited by Mrunal Sarvaiya 8 Jun 2024, 12:16 8 Jun 2024, 12:13

      Hi,

      I had a question related to utilizing all the cores available on the voxl2. I’m using a cpp library that runs multi threaded optimization using openmp and allows me to specify the number of available threads. What I notice is that the performance increased as I increase the number of threads upto 4, but after that there’s no increase in performance. I would have expected an increase in performance up to 8 threads, since there’s 8 cpus on the voxl2. During my tests, upon running voxl-inspect-cpu, I see that cpu 0-3 have a high utilization while cpu 4-7 have very low utilization even when using 8 threads.

      Could you shed some light on whether this is expected, and if so how I could go about better utilizing the last 4 cpus? (I’m pretty confident that the optimization is resource limited, the same optimization problem runs faster on a laptop with more cpus)

      Thanks!

      D 1 Reply Last reply 9 Jun 2024, 15:16 Reply Quote 0
      • D
        Darshit Desai @Mrunal Sarvaiya
        last edited by 9 Jun 2024, 15:16

        @Mrunal-Sarvaiya voxl set cpu mode performance and than try running your experiment. By default some cpu cores are slower than the others even in perf mode

        M 1 Reply Last reply 10 Jun 2024, 14:45 Reply Quote 0
        • M
          Mrunal Sarvaiya @Darshit Desai
          last edited by 10 Jun 2024, 14:45

          @Darshit-Desai Yup, all these tests were run with the cpu mode set to performance mode

          A 1 Reply Last reply 11 Jun 2024, 05:35 Reply Quote 0
          • A
            Alex Kushleyev ModalAI Team @Mrunal Sarvaiya
            last edited by 11 Jun 2024, 05:35

            @Mrunal-Sarvaiya , VOXL2 has three types of cores. 0-3 are low power cores, 4-6 are medium and core 7 is the fastest. It seems that OpenMP may not automatically understand how to use this type of CPU architecture. You may want to look into explicitly specifying which cores should be used by the OpenMP (assigning each thread to specific core). Also, you may need to provide a cpu type as your build flag.

            The cpu on VOXL2 is Kryo 585 / Snapdragon 865 , which is a combination of Cortex A55 and A77 cores. OpenMP may think that these are completely different cpus and does not "dare" to use the additional faster 4 cores by default.

            Alex

            M 1 Reply Last reply 11 Jun 2024, 17:51 Reply Quote 0
            • M
              Mrunal Sarvaiya @Alex Kushleyev
              last edited by 11 Jun 2024, 17:51

              @Alex-Kushleyev Thanks a ton! I was able to explicitly specify the cpus to use and the optimization is 3-4x faster.

              Posting the command needed here in case someone else stumbles upon this post. Export the following environment variables
              export OMP_PROC_BIND=close # this may not be necessary
              export GOMP_CPU_AFFINITY="4 5 6 7" # here 4 5 6 7 specifies the cpus to use

              A 1 Reply Last reply 11 Jun 2024, 19:31 Reply Quote 0
              • A
                Alex Kushleyev ModalAI Team @Mrunal Sarvaiya
                last edited by 11 Jun 2024, 19:31

                @Mrunal-Sarvaiya , so you still used 4 cores, but just the more powerful ones? were you able to make user of all 8 cores (you would need 8 threads in your application)

                Alex

                M 1 Reply Last reply 12 Jun 2024, 14:30 Reply Quote 0
                • M
                  Mrunal Sarvaiya @Alex Kushleyev
                  last edited by 12 Jun 2024, 14:30

                  @Alex-Kushleyev Correct, I just used the last 4 more powerful cpus. That was enough of a performance boost to run my optimization at the frequency I was hoping for. I didn't try using all 8 threads, I can give it a shot if that's useful information for you

                  1 Reply Last reply Reply Quote 0
                  1 out of 7
                  • First post
                    1/7
                    Last post
                  Powered by NodeBB | Contributors