GPU "hello world"
-
Hi,
I'm trying to write a benchmark for GPU performance on a VOXL 2 (voxl-suite 1.3.5)
So far I've not been able to get a hardware-accelerated OpenGL context.
Is there a "hello world" example that draws something on the GPU and reads the resulting image?
This package
https://moderngl.readthedocs.io/en/latest/techniques/headless_ubuntu_18_server.html
looked promising, but when I run it I only get a software-rendering context.
Or a similar OpenCL tiny app? (I tried "hellocl.zip" mentioned in another thread, and it doesn't find any hardware, as does "clinfo").
Thanks,
James. -
You can use the following example to do a gpu query using opencl: https://github.com/yell0wd0g/clDeviceQuery/blob/master/clDeviceQuery.cpp
Download that to voxl2, build it using
g++ -O2 clDeviceQuery.cpp -lOpenCL -o opencl-query
and run
voxl2:~/opencl$ ./opencl-query clDeviceQuery Starting... 1 OpenCL Platforms found CL_PLATFORM_NAME: QUALCOMM Snapdragon(TM) CL_PLATFORM_VERSION: OpenCL 2.0 QUALCOMM build: commit # changeid # Date: 11/10/21 Wed Local Branch: Remote Branch: OpenCL Device Info: 1 devices found supporting OpenCL on: QUALCOMM Snapdragon(TM) ---------------------------------- Device QUALCOMM Adreno(TM) --------------------------------- CL_DEVICE_NAME: QUALCOMM Adreno(TM) CL_DEVICE_VENDOR: QUALCOMM CL_DRIVER_VERSION: OpenCL 2.0 QUALCOMM build: commit # changeid # Date: 11/10/21 Wed Local Branch: Remote Branch: Compiler E031.37.12.01 CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU CL_DEVICE_MAX_COMPUTE_UNITS: 3 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3 CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 1024 CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024 CL_DEVICE_MAX_CLOCK_FREQUENCY: 1 MHz CL_DEVICE_ADDRESS_BITS: 64 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 256 MByte CL_DEVICE_GLOBAL_MEM_SIZE: 1024 MByte CL_DEVICE_ERROR_CORRECTION_SUPPORT: no CL_DEVICE_LOCAL_MEM_TYPE: local CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_MAX_READ_IMAGE_ARGS: 128 CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 64 CL_DEVICE_IMAGE <dim> 2D_MAX_WIDTH 16384 2D_MAX_HEIGHT 16384 3D_MAX_WIDTH 16384 3D_MAX_HEIGHT 16384 3D_MAX_DEPTH 2048 CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, FLOAT 1, DOUBLE 1 clDeviceQuery, Platform Name = QUALCOMM Snapdragon(TM), Platform Version = OpenCL 2.0 QUALCOMM build: commit # changeid # Date: 11/10/21 Wed Local Branch: Remote Branch: , NumDevs = 1, Device = QUALCOMM Adreno(TM) System Info: Local Time/Date = 03:55:01, 11/22/2024 CPU Name: none # of CPU processors: 8 Linux version 4.19.125 (oe-user@oe-host) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04), GNU ld (GNU Binutils for Ubuntu) 2.30) #1 SMP PREEMPT Sat May 18 00:10:25 UTC 2024 TEST PASSED
In a similar way, you should be able to find and build standard opencl tests.
Qualcomm provides OpenCL SDK which should be able to download from their web site, but there are also some examples here : https://github.com/willhua/QualcommOpenCLSDKNote/tree/master/src/examples , which include some basic tests like matrix manipulation to using special optimized routines for image conversion, convolution, filtering, matching, etc.
You mentioned drawing, but i don't have good examples for drawing. You should also be able to use OpenGL, if you really need OpenGL sample app, I can find it.
We are working on integrating GPU image processing into our SDK, so this will be coming soon!
Alex
-
@Alex-Kushleyev - Thanks.
I downloaded and compiled clDeviceQuery.cpp.
It gives the same result as "clinfo" and "hellocl.c" - no devices found:$ ./opencl-query clDeviceQuery Starting... Error -1001 in clGetPlatformIDs Call! System Info: Local Time/Date = 13:38:42, 11/22/2024 CPU Name: none # of CPU processors: 8 Linux version 4.19.125 (oe-user@oe-host) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04), GNU ld (GNU Binutils for Ubuntu) 2.30) #1 SMP PREEMPT Fri May 17 23:29:23 UTC 2024 TEST FAILED !!!
Perhaps I need to load a kernel module to enable OpenCL? This is what's loaded:
$ lsmod Module Size Used by voxl_platform_mod 16384 0 voxl_gpio_mod 16384 0 voxl_fsync_mod 16384 0 machine_dlkm 159744 0 wcd938x_slave_dlkm 16384 0 wcd938x_dlkm 110592 1 machine_dlkm wcd9xxx_dlkm 49152 1 wcd938x_dlkm mbhc_dlkm 45056 1 wcd938x_dlkm tx_macro_dlkm 106496 0 rx_macro_dlkm 102400 0 va_macro_dlkm 98304 0 wsa_macro_dlkm 69632 1 machine_dlkm swr_ctrl_dlkm 57344 4 wsa_macro_dlkm,tx_macro_dlkm,rx_macro_dlkm,va_macro_dlkm bolero_cdc_dlkm 57344 5 machine_dlkm,wsa_macro_dlkm,tx_macro_dlkm,rx_macro_dlkm,va_macro_dlkm wsa881x_dlkm 45056 1 machine_dlkm wcd_core_dlkm 32768 7 wsa881x_dlkm,machine_dlkm,wsa_macro_dlkm,tx_macro_dlkm,rx_macro_dlkm,va_macro_dlkm,wcd938x_dlkm stub_dlkm 16384 0 hdmi_dlkm 24576 0 swr_dlkm 24576 4 wsa881x_dlkm,wcd938x_dlkm,swr_ctrl_dlkm,wcd938x_slave_dlkm pinctrl_lpi_dlkm 20480 0 pinctrl_wcd_dlkm 16384 0 usf_dlkm 57344 0 native_dlkm 163840 0 platform_dlkm 2195456 1 native_dlkm q6_dlkm 909312 9 bolero_cdc_dlkm,machine_dlkm,pinctrl_lpi_dlkm,usf_dlkm,va_macro_dlkm,swr_ctrl_dlkm,wcd9xxx_dlkm,native_dlkm,platform_dlkm adsp_loader_dlkm 16384 0 apr_dlkm 229376 4 q6_dlkm,usf_dlkm,adsp_loader_dlkm,platform_dlkm snd_event_dlkm 16384 5 bolero_cdc_dlkm,machine_dlkm,q6_dlkm,pinctrl_lpi_dlkm,apr_dlkm q6_notifier_dlkm 16384 3 q6_dlkm,pinctrl_lpi_dlkm,apr_dlkm q6_pdr_dlkm 16384 1 q6_notifier_dlkm 88XXau 2342912 0 8821cu 2465792 0 8188eu 1200128 0
@Alex-Kushleyev said in GPU "hello world":
QUALCOMM Adreno
-
@jamesbowman , this is strange. i have not seen this error before.
Can you do a quick check right after booting:
voxl2:~/opencl$ dmesg | grep gsl [ 1.676429] arm-smmu 3da0000.kgsl-smmu: Linked as a consumer to regulator.72 [ 1.676512] arm-smmu 3da0000.kgsl-smmu: non-coherent table walk [ 1.676520] arm-smmu 3da0000.kgsl-smmu: (IDR0.CTTW overridden by FW configuration) [ 1.676531] arm-smmu 3da0000.kgsl-smmu: stream matching with 6 register groups [ 1.818850] subsys-pil-tz soc:qcom,kgsl-hyp: for a650_zap segments only will be dumped. [ 1.818883] subsys-pil-tz soc:qcom,kgsl-hyp: for md_a650_zap segments only will be dumped. [ 1.827612] iommu-debug soc:kgsl_iommu_test_device: Linked as a consumer to 3da0000.kgsl-smmu [ 1.827666] iommu: Adding device soc:kgsl_iommu_test_device to group 6 [ 1.949719] platform 3d6a000.qcom,gmu:gmu_user: Linked as a consumer to 3da0000.kgsl-smmu [ 1.950189] platform 3d6a000.qcom,gmu:gmu_kernel: Linked as a consumer to 3da0000.kgsl-smmu [ 1.950877] kgsl-3d 3d00000.qcom,kgsl-3d0: Linked as a consumer to regulator.72 [ 1.950895] kgsl-3d 3d00000.qcom,kgsl-3d0: Linked as a consumer to regulator.73 [ 1.951118] platform 3da0000.qcom,kgsl-iommu:gfx3d_user: Linked as a consumer to 3da0000.kgsl-smmu [ 1.951138] iommu: Adding device 3da0000.qcom,kgsl-iommu:gfx3d_user to group 34 [ 1.951327] platform 3da0000.qcom,kgsl-iommu:gfx3d_secure: Linked as a consumer to 3da0000.kgsl-smmu [ 1.951344] iommu: Adding device 3da0000.qcom,kgsl-iommu:gfx3d_secure to group 35 [ 352.615269] subsys-pil-tz soc:qcom,kgsl-hyp: a650_zap: loading from 0x00000000ede00000 to 0x00000000ede01000 [ 352.621129] subsys-pil-tz soc:qcom,kgsl-hyp: a650_zap: Brought out of reset
kgsl is the kernel module for the gpu.
Alex
-
We recompiled with:
g++ -O2 clDeviceQuery.cpp -L /usr/lib -l OpenCL -o opencl-query
and now I have an OpenCL device. (fwiw my
dmesg
output looks very much like yours above.)Thanks, J.
@Alex-Kushleyev said in GPU "hello world":
dmesg | grep gsl
-
@jamesbowman , so the difference in compilation is just adding of
-L /usr/lib
or something else? I believe this is redundant, as this path should already be in the library path.. Hmm