ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Voxl2 Docker (Ubuntu 22) with OpenCL/Adreno

    Ask your questions right here!
    4
    20
    2781
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E
      eric @Alex Kushleyev
      last edited by

      @Alex-Kushleyev That aligns pretty well with my own experience so far. Thanks again for looking into this!

      Alex KushleyevA 1 Reply Last reply Reply Quote 0
      • Alex KushleyevA
        Alex Kushleyev ModalAI Team @eric
        last edited by

        @eric I was able to get the GPU device query inside ubuntu 22.04 docker working using the following steps. It is possible that we can reduce the number of mapped devices and libraries to the docker container, but i am just going to give you this information right now so you can test. I will try to clean this up a bit later. I only tried the device query for now, but i figured i would let you know that there is progress..

        #run docker
        docker run -it --rm --privileged --device=/dev/kgsl-3d0 --device=/dev/ion -v /proc:/proc -v /firmware/image:/firmware/image -v /lib/firmware:/lib/firmware -v /sys/class:/sys/class -v /sys/bus:/sys/bus -v /sys/devices:/sys/devices -v /data:/data -v /usr/lib/liblog.so.0:/usr/lib/liblog.so.0 -v /usr/lib/libOpenCL.so:/usr/lib/libOpenCL.so -v /usr/lib/libcutils.so.0:/usr/lib/libcutils.so.0 -v /usr/lib/libllvm-qcom.so:/usr/lib/libllvm-qcom.so -v /usr/lib/libion.so.0.0.0:/usr/lib/libion.so.0.0.0 -v /usr/lib/libsync.so.0.0.0:/usr/lib/libsync.so.0.0.0 -v /usr/lib/libgsl.so:/usr/lib/libgsl.so -v /usr/lib/libCB.so:/usr/lib/libCB.so -v /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0.5600.4:/usr/lib/aarch64-linux-gnu/libglib-2.0.so.0.5600.4 -v `pwd`:/opt/code -w /opt/code arm64v8/ubuntu:22.04 bash
        
        apt-get update
        apt install --no-install-recommends -y pocl-opencl-icd
        

        then run your test app to query the device..

        E 1 Reply Last reply Reply Quote 1
        • E
          eric @Alex Kushleyev
          last edited by

          @Alex-Kushleyev 🙇

          Alex KushleyevA 1 Reply Last reply Reply Quote 0
          • Alex KushleyevA
            Alex Kushleyev ModalAI Team @eric
            last edited by Alex Kushleyev

            OK, a little more clean-up, it seems this is the minimal set of libraries /devices needed:

            docker run -it --rm --privileged \
            	-v /usr/lib/libOpenCL.so:/usr/lib/libOpenCL.so \
            	-v /usr/lib/libCB.so:/usr/lib/libCB.so \
            	-v /usr/lib/libgsl.so:/usr/lib/libgsl.so \
            	-v /usr/lib/liblog.so.0:/usr/lib/liblog.so.0 \
            	-v /usr/lib/libcutils.so.0:/usr/lib/libcutils.so.0 \
            	-v /usr/lib/libsync.so.0.0.0:/usr/lib/libsync.so.0.0.0 \
            	-v /usr/lib/libion.so.0.0.0:/usr/lib/libion.so.0.0.0 \
            	-v /usr/lib/libllvm-qcom.so:/usr/lib/libllvm-qcom.so \
            	-v /usr/lib/aarch64-linux-gnu/libglib-2.0.so.0.5600.4:/usr/lib/aarch64-linux-gnu/libglib-2.0.so.0.5600.4 \
            	-v `pwd`:/opt/code -w /opt/code \
            	arm64v8/ubuntu:22.04 bash
            

            (--privileged mode maps all the needed devices to the docker container)

            Then install some more packages (not sure if this can be reduced, not clear exactly what is missing):

            apt-get update
            apt install --no-install-recommends -y pocl-opencl-icd
            

            Maybe we can figure out what lib is still missing so that pocl-opencl-icd does not have to be installed.. At least the issue was the a missing library, not a mapped device

            For testing, I used a device query script from here:

            root@733a6d4d5fdb:/opt/code# ./simple_query 
            1. Device: QUALCOMM Adreno(TM)
             1.1 Hardware version: OpenCL 2.0 Adreno(TM) 650
             1.2 Software version: OpenCL 2.0 QUALCOMM build: commit # changeid # Date: 11/10/21 Wed Local Branch:  Remote Branch:  Compiler E031.37.12.01
             1.3 OpenCL C version: OpenCL C 2.0 Adreno(TM) 650
             1.4 Parallel compute units: 3
            

            I also verified that a simple matrix multiplication app also worked (not provided here)

            @eric , can you please let me know if this works for you?

            Alex

            E 1 Reply Last reply Reply Quote 0
            • E
              eric @Alex Kushleyev
              last edited by

              @Alex-Kushleyev

              OMG IT WORKS!!

              I was able to extract all these libraries from the host and directly install them inside the docker, and now the pcol-opencl-icd installation isn't needed.

              This is really important for us, since it allows us to build external dependencies that rely on OpenCL in our pipeline directly without bind mounts (outside the host environment).

              Really, really appreciate all your help!

              FROM arm64v8/ubuntu:22.04
              
              # Install necessary dependencies
              RUN apt-get update && \
                  apt-get install -y \
                  cmake \
                  build-essential \
                  libglib2.0-0
              
              # Copy Adreno GPU dependencies
              # - libcutils0_0-r1_arm64.deb
              # - libsync_1.0-r1_arm64.deb
              # - qti-libion_0-r1_arm64.deb
              # - liblog0_1.0-r1_arm64.deb
              # - qti-adreno_1.0-r0_arm64.deb
              COPY dep /root/dep
              
              # Create required directory for qti-adreno install
              RUN mkdir /usr/include/KHR && dpkg -i /root/dep/*.deb 
              
              # Copy and build test script
              COPY ./hellocl /root/hellocl
              RUN cd /root/hellocl && mkdir build && cd build && cmake .. && make
              
              CMD ["bash"]
              
              voxl2:~/opencl$ docker run -it --rm --privileged opencl:latest ./root/hellocl/build/hellocl
              Platform Information:
              Platform Name: QUALCOMM Snapdragon(TM)
              Platform Vendor: QUALCOMM
              Platform Version: OpenCL 2.0 QUALCOMM build: commit # changeid # Date: 11/10/21 Wed Local Branch:  Remote Branch: 
              Platform Profile: FULL_PROFILE
              Platform Extensions:  
              ------------------------------------
              Device Information:
              Device Name: QUALCOMM Adreno(TM)
              Device Vendor: QUALCOMM
              Driver Version: OpenCL 2.0 QUALCOMM build: commit # changeid # Date: 11/10/21 Wed Local Branch:  Remote Branch:  Compiler E031.37.12.01
              Device Version: OpenCL 2.0 Adreno(TM) 650
              Device OpenCL C Version: OpenCL C 2.0 Adreno(TM) 650
              Device Max Compute Units: 3
              This should be three: 3
              
              Alex KushleyevA 1 Reply Last reply Reply Quote 0
              • Alex KushleyevA
                Alex Kushleyev ModalAI Team @eric
                last edited by

                hi @eric ,

                Nice! very clean.

                Did you use dpkg-repack to create debs of installed packages, such as:

                apt-get install dpkg-repack
                dpkg-repack qti-adreno
                

                Cool trick!

                I will test this out and add to our docs.

                Alex

                E 1 Reply Last reply Reply Quote 0
                • E
                  eric @Alex Kushleyev
                  last edited by eric

                  @Alex-Kushleyev Yes, dpkg -S <file path> to figure out which debs installed which libraries (ie, dpkg -S /usr/lib/libOpenCL.so), apt-cache show to see the source (ubuntu ppa vs modalai), then dpkg-repack to repack the modalai debs.

                  Thanks again!

                  Alex KushleyevA 1 Reply Last reply Reply Quote 0
                  • Alex KushleyevA
                    Alex Kushleyev ModalAI Team @eric
                    last edited by

                    @eric , thanks again for your input on this, i have posted a complete tutorial how to enable OpenCL in Docker on VOXL2 : https://docs.modalai.com/voxl-2-opencl-in-docker/

                    Alex

                    E 1 Reply Last reply Reply Quote 0
                    • E
                      eric @Alex Kushleyev
                      last edited by

                      @Alex-Kushleyev Awesome! Thanks again for all your help with this!

                      Peter MilaniP 1 Reply Last reply Reply Quote 0
                      • Peter MilaniP
                        Peter Milani @eric
                        last edited by

                        @Alex-Kushleyev @eric I've implemented your solution and get the same result.

                        I did get a bit confused as running clinfo only returned a single device of type CPU and without the name "Adreno".

                        However I added to your test script a query on the device_type and it returned GPU so I guess its only finding the GPU. I would have expected it to return a few more devices as the [Qualcomm OpenCL guide] (https://docs.qualcomm.com/bundle/publicresource/80-NB295-11_REV_C_Qualcomm_Snapdragon_Mobile_Platform_Opencl_General_Programming_and_Optimization.pdf) suggests that the dsp and CPU could have been returned as well, so I'm not sure what is happening there. I didn't have to link devices only shared the volumes to the relevant libraries. I would have expected the CPU to be returned a a matter of course as that is what happens with the intel implementation.

                        My additional lines to the script (given for info is):

                          cl_device_type device_type;
                          clGetDeviceInfo(devices[j], CL_DEVICE_TYPE, sizeof(cl_device_type), &device_type, NULL);
                          printf("Device type: ");
                          if (device_type & CL_DEVICE_TYPE_CPU)
                              printf("CPU ");
                          if (device_type & CL_DEVICE_TYPE_GPU)
                              printf("GPU ");
                          if (device_type & CL_DEVICE_TYPE_ACCELERATOR)
                              printf("ACCELERATOR ");
                          if (device_type & CL_DEVICE_TYPE_DEFAULT)
                              printf("DEFAULT ");
                          printf("\n");
                        
                        

                        Which returns

                        OpenCL platform count: 1
                        OpenCL device count: 1
                        1. Device: QUALCOMM Adreno(TM)
                         1.1 Hardware version: OpenCL 2.0 Adreno(TM) 650
                        Device type: GPU 
                         1.2 Software version: OpenCL 2.0 QUALCOMM build: commit # changeid # Date: 11/10/21 Wed Local Branch:  Remote Branch:  Compiler E031.37.12.01
                         1.3 OpenCL C version: OpenCL C 2.0 Adreno(TM) 650
                         1.4 Parallel compute units: 3
                        
                        
                        Alex KushleyevA 1 Reply Last reply Reply Quote 0
                        • Alex KushleyevA
                          Alex Kushleyev ModalAI Team @Peter Milani
                          last edited by

                          @Peter-Milani , it looks like Qualcomm CPU device is not supported by OpenCL library from Qualcomm.

                          clinfo may be confused, but installing and running clinfo natively on voxl2 does not return any platforms - the opencl libraries that may get installed by apt are most likely not compatible with the VOXL2 GPU.

                          Alex

                          Peter MilaniP 1 Reply Last reply Reply Quote 0
                          • Peter MilaniP
                            Peter Milani @Alex Kushleyev
                            last edited by

                            @Alex-Kushleyev I was able to get the following when running opencl within the docker instance:

                             clinfo
                            Number of platforms                               1
                              Platform Name                                   Portable Computing Language
                              Platform Vendor                                 The pocl project
                              Platform Version                                OpenCL 1.2 pocl 1.4, None+Asserts, LLVM 9.0.1, RELOC, SLEEF, POCL_DEBUG
                              Platform Profile                                FULL_PROFILE
                              Platform Extensions                             cl_khr_icd
                              Platform Extensions function suffix             POCL
                            
                              Platform Name                                   Portable Computing Language
                            Number of devices                                 1
                              Device Name                                     pthread-0x805
                              Device Vendor                                   Qualcomm
                              Device Vendor ID                                0x13b5
                              Device Version                                  OpenCL 1.2 pocl HSTR: pthread-aarch64-unknown-linux-gnu-GENERIC
                              Driver Version                                  1.4
                              Device OpenCL C Version                         OpenCL C 1.2 pocl
                              Device Type                                     CPU
                              Device Profile                                  FULL_PROFILE
                              Device Available                                Yes
                              Compiler Available                              Yes
                              Linker Available                                Yes
                              Max compute units                               8
                              Max clock frequency                             1804MHz
                              Device Partition                                (core)
                                Max number of sub-devices                     8
                                Supported partition types                     equally, by counts
                                Supported affinity domains                    (n/a)
                              Max work item dimensions                        3
                              Max work item sizes                             4096x4096x4096
                              Max work group size                             4096
                              Preferred work group size multiple              8
                              Preferred / native vector sizes                 
                                char                                                16 / 16      
                                short                                                8 / 8       
                                int                                                  4 / 4       
                                long                                                 2 / 2       
                                half                                                 0 / 0        (n/a)
                                float                                                4 / 4       
                                double                                               2 / 2        (cl_khr_fp64)
                              Half-precision Floating-point support           (n/a)
                              Single-precision Floating-point support         (core)
                                Denormals                                     No
                                Infinity and NANs                             Yes
                                Round to nearest                              Yes
                                Round to zero                                 No
                                Round to infinity                             No
                                IEEE754-2008 fused multiply-add               No
                                Support is emulated in software               No
                                Correctly-rounded divide and sqrt operations  No
                              Double-precision Floating-point support         (cl_khr_fp64)
                                Denormals                                     Yes
                                Infinity and NANs                             Yes
                                Round to nearest                              Yes
                                Round to zero                                 Yes
                                Round to infinity                             Yes
                                IEEE754-2008 fused multiply-add               Yes
                                Support is emulated in software               No
                              Address bits                                    64, Little-Endian
                              Global memory size                              5896568832 (5.492GiB)
                              Error Correction support                        No
                              Max memory allocation                           2147483648 (2GiB)
                              Unified memory for Host and Device              Yes
                              Minimum alignment for any data type             128 bytes
                              Alignment of base address                       1024 bits (128 bytes)
                              Global Memory cache type                        None
                              Image support                                   Yes
                                Max number of samplers per kernel             16
                                Max size for 1D images from buffer            134217728 pixels
                                Max 1D or 2D image array size                 2048 images
                                Max 2D image size                             8192x8192 pixels
                                Max 3D image size                             2048x2048x2048 pixels
                                Max number of read image args                 128
                                Max number of write image args                128
                              Local memory type                               Global
                              Local memory size                               33554432 (32MiB)
                              Max number of constant args                     8
                              Max constant buffer size                        33554432 (32MiB)
                              Max size of kernel argument                     1024
                              Queue properties                                
                                Out-of-order execution                        Yes
                                Profiling                                     Yes
                              Prefer user sync for interop                    Yes
                              Profiling timer resolution                      1ns
                              Execution capabilities                          
                                Run OpenCL kernels                            Yes
                                Run native kernels                            Yes
                              printf() buffer size                            16777216 (16MiB)
                              Built-in kernels                                (n/a)
                              Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_fp64
                            
                            NULL platform behavior
                              clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Portable Computing Language
                              clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [POCL]
                              clCreateContext(NULL, ...) [default]            Success [POCL]
                              clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
                                Platform Name                                 Portable Computing Language
                                Device Name                                   pthread-0x805
                              clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  Success (1)
                                Platform Name                                 Portable Computing Language
                                Device Name                                   pthread-0x805
                              clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
                              clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
                              clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
                              clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
                                Platform Name                                 Portable Computing Language
                                Device Name                                   pthread-0x805
                            
                            ICD loader properties
                              ICD loader Name                                 OpenCL ICD Loader
                              ICD loader Vendor                               OCL Icd free software
                              ICD loader Version                              2.2.11
                              ICD loader Profile                              OpenCL 2.1
                            
                            

                            but after installing:

                            apt install -y -qq pocl-opencl-icd;
                            
                            Alex KushleyevA 1 Reply Last reply Reply Quote 0
                            • Alex KushleyevA
                              Alex Kushleyev ModalAI Team @Peter Milani
                              last edited by

                              @Peter-Milani , I see. this looks like a generic implementation of OpenCL for ARM from 3rd party (not Qualcomm), and i think it also overwrites the proprietary opencl libraries, disabling the GPU opencl support. However, you could make two separate docker images, one for each use case (cpu and gpu)

                              Alex

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Powered by NodeBB | Contributors