tflite GPU usage
-
Hello, I'm running the yolov5_float16_quant.tflite model with the voxl-tflite-server on a VOXL-2. It is getting the gpu delegate with the output below.
Questions:
-
When I run while watching voxl-inspect-cpu, the GPU utilization is always 0.00. Is there a way to verify it is actually using the GPU? I am getting around 17ms average preprocessing time and 26ms inference time on the hires_small_color if that sounds like the normal GPU performance. I did build another YOLOv5 model that had a larger input (640x512 vs 320x320) and it still gets 0.00 GPU utilization with a GPU delegate.
-
Where is the source code located that outputs to the cpu_monitor pipe? I wanted see how the GPU utilization was being calculated.
-
Is there another tool to monitor CPU/GPU usage and temperature? I wanted to log to CSV. I can post process the voxl-inspect-cpu output, but just checking if there were other methods.
================================================================= skip_n_frames: 0 ================================================================= model: /usr/bin/dnn/yolov5_float16_quant.tflite ================================================================= input_pipe: /run/mpa/hires_small_color/ ================================================================= delegate: gpu ================================================================= allow_multiple: false ================================================================= output_pipe_prefix: mobilenet ================================================================= INFO: Created TensorFlow Lite delegate for GPU. INFO: Initialized OpenCL-based API. INFO: Created 1 GPU delegate kernels. Successfully built interpreter
------------------------------------------ TIMING STATS (on 3271 processed frames) ------------------------------------------ Preprocessing Time -> Total: 56328.69ms, Average: 17.22ms Inference Time -> Total: 87041.48ms, Average: 26.61ms Postprocessing Time -> Total: 2468.84ms, Average: 0.75ms ------------------------------------------
-
-
@cegeyer that does seem suspicious
Here's the code for cpu-monitor https://gitlab.com/voxl-public/voxl-sdk/services/voxl-cpu-monitor/-/blob/master/server/voxl-cpu-monitor.c?ref_type=heads
-
@Moderator Thanks. I was able to build the voxl-cpu-monitor with some additional debug. While voxl-tflite-server is processing with multiple detections, I'm getting gpu busy counter values like these for the two float values:
395796.000000 10062344192.000000
When converted to a percent, it's a very small value. If I stop voxl-tflite-server, it does return to 0 usage. I just want to make sure this seems correct where the model inference is being done with the GPU and the usage isn't from preprocessing or something.
-
@Moderator I think I might have found the problem. This portion of the code:
This is reading the /sys/class/kgsl/kgsl-3d0/gpubusy file contents into a 15 byte buffer. The problem is the contents of the file are 15 bytes exactly, so when the sscanf is called, it is pulling the contents extending past the 15th byte in memory.
I updated it to be a 16-byte buffer, zeroing out the 16th byte, and still reading only 15-bytes, and it is giving a proper gpu utilization percentage now, matching what is in the /sys/class/kgsl/kgsl-3d0/gpubusy and /sys/class/kgsl/kgsl-3d0/gpu_busy_percentage files.
Name Freq (MHz) Temp (C) Util (%) ----------------------------------- cpu0 691.2 76.8 24.55 cpu1 691.2 76.0 17.78 cpu2 691.2 76.0 16.55 cpu3 691.2 76.8 17.34 cpu4 1286.4 76.8 1.74 cpu5 1286.4 79.9 33.74 cpu6 1286.4 76.8 0.58 cpu7 844.8 77.2 0.00 Total 77.0 14.04 10s avg 14.53 ----------------------------------- small cores only 19.06 big cores only 9.02 ----------------------------------- GPU 587.0 77.6 39.16 GPU 10s avg 38.52 ----------------------------------- memory temp: 79.2 C memory used: 2930/7671 MB ----------------------------------- Flags CPU freq scaling mode: auto Standby Not Active ----------------------------------- $ cat /sys/class/kgsl/kgsl-3d0/gpubusy 418136 1034679 $ cat /sys/class/kgsl/kgsl-3d0/gpu_busy_percentage 40 % // gets gpu busy value static float _get_gpu_busy() { fflush(stdout); float gpu_busy[2]; // stores busy values float gpu_busy_ret = 0; char buf[16]; int fd, ret; buf[15] = 0; fd = open(SYSTEM_GPU_BUSY_COUNTER, O_RDONLY); if(fd<0){ perror("ERROR failed to open gpu busy counter for reading"); return 0; } ret = read(fd, buf, sizeof(buf)-1); if(ret<1){ perror("ERROR failed to read gpu busy counter"); close(fd); return 0; } sscanf(buf, "%f %f", &gpu_busy[0], &gpu_busy[1]); if(en_debug){ printf("gpu busy: %f %f\n", (double)gpu_busy[0], (double)gpu_busy[1]); } if (gpu_busy[1] == 0){ // check if gpu_busy[1] is 0 to avoid divide by 0 errors close(fd); return 0; } gpu_busy_ret = (gpu_busy[0] / gpu_busy[1])*(float)100; close(fd); return gpu_busy_ret; }