tflite GPU usage

cegeyer

Hello, I'm running the yolov5_float16_quant.tflite model with the voxl-tflite-server on a VOXL-2. It is getting the gpu delegate with the output below.

Questions:

When I run while watching voxl-inspect-cpu, the GPU utilization is always 0.00. Is there a way to verify it is actually using the GPU? I am getting around 17ms average preprocessing time and 26ms inference time on the hires_small_color if that sounds like the normal GPU performance. I did build another YOLOv5 model that had a larger input (640x512 vs 320x320) and it still gets 0.00 GPU utilization with a GPU delegate.
Where is the source code located that outputs to the cpu_monitor pipe? I wanted see how the GPU utilization was being calculated.
Is there another tool to monitor CPU/GPU usage and temperature? I wanted to log to CSV. I can post process the voxl-inspect-cpu output, but just checking if there were other methods.

=================================================================
skip_n_frames:                    0
=================================================================
model:                            /usr/bin/dnn/yolov5_float16_quant.tflite
=================================================================
input_pipe:                       /run/mpa/hires_small_color/
=================================================================
delegate:                         gpu
=================================================================
allow_multiple:                   false
=================================================================
output_pipe_prefix:               mobilenet
=================================================================
INFO: Created TensorFlow Lite delegate for GPU.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
Successfully built interpreter

------------------------------------------
TIMING STATS (on 3271 processed frames)
------------------------------------------
Preprocessing Time  -> Total: 56328.69ms, Average:  17.22ms
Inference Time      -> Total: 87041.48ms, Average:  26.61ms
Postprocessing Time -> Total: 2468.84ms, Average:   0.75ms
------------------------------------------

Moderator

@cegeyer that does seem suspicious

Here's the code for cpu-monitor https://gitlab.com/voxl-public/voxl-sdk/services/voxl-cpu-monitor/-/blob/master/server/voxl-cpu-monitor.c?ref_type=heads

cegeyer

@Moderator Thanks. I was able to build the voxl-cpu-monitor with some additional debug. While voxl-tflite-server is processing with multiple detections, I'm getting gpu busy counter values like these for the two float values:

https://gitlab.com/voxl-public/voxl-sdk/services/voxl-cpu-monitor/-/blob/master/server/voxl-cpu-monitor.c?ref_type=heads#L958:

395796.000000 10062344192.000000

When converted to a percent, it's a very small value. If I stop voxl-tflite-server, it does return to 0 usage. I just want to make sure this seems correct where the model inference is being done with the GPU and the usage isn't from preprocessing or something.

cegeyer

@Moderator I think I might have found the problem. This portion of the code:

https://gitlab.com/voxl-public/voxl-sdk/services/voxl-cpu-monitor/-/blob/master/server/voxl-cpu-monitor.c?ref_type=heads#L958

This is reading the /sys/class/kgsl/kgsl-3d0/gpubusy file contents into a 15 byte buffer. The problem is the contents of the file are 15 bytes exactly, so when the sscanf is called, it is pulling the contents extending past the 15th byte in memory.

I updated it to be a 16-byte buffer, zeroing out the 16th byte, and still reading only 15-bytes, and it is giving a proper gpu utilization percentage now, matching what is in the /sys/class/kgsl/kgsl-3d0/gpubusy and /sys/class/kgsl/kgsl-3d0/gpu_busy_percentage files.

Name   Freq (MHz) Temp (C) Util (%)
-----------------------------------
cpu0        691.2     76.8    24.55
cpu1        691.2     76.0    17.78
cpu2        691.2     76.0    16.55
cpu3        691.2     76.8    17.34
cpu4       1286.4     76.8     1.74
cpu5       1286.4     79.9    33.74
cpu6       1286.4     76.8     0.58
cpu7        844.8     77.2     0.00
Total                 77.0    14.04
10s avg                       14.53
-----------------------------------
small cores only              19.06
big cores only                 9.02
-----------------------------------
GPU         587.0     77.6    39.16
GPU 10s avg                   38.52
-----------------------------------
memory temp:       79.2 C
memory used:  2930/7671 MB
-----------------------------------
Flags
CPU freq scaling mode: auto
Standby Not Active
-----------------------------------

$ cat /sys/class/kgsl/kgsl-3d0/gpubusy             
 418136 1034679

$ cat /sys/class/kgsl/kgsl-3d0/gpu_busy_percentage 
40 %

// gets gpu busy value
static float _get_gpu_busy()
{
	fflush(stdout);
	float gpu_busy[2]; // stores busy values
	float gpu_busy_ret = 0;
	
	char buf[16];
	int fd, ret;

	buf[15] = 0;

	fd = open(SYSTEM_GPU_BUSY_COUNTER, O_RDONLY);
	if(fd<0){
		perror("ERROR failed to open gpu busy counter for reading");
		return 0;
	}

	ret = read(fd, buf, sizeof(buf)-1);
	if(ret<1){
		perror("ERROR failed to read gpu busy counter");
		close(fd);
		return 0;
	}
	sscanf(buf, "%f %f", &gpu_busy[0], &gpu_busy[1]);

	if(en_debug){
		printf("gpu busy: %f %f\n", (double)gpu_busy[0], (double)gpu_busy[1]);
	}

	if (gpu_busy[1] == 0){ // check if gpu_busy[1] is 0 to avoid divide by 0 errors
		close(fd);
		return 0;
	}
	gpu_busy_ret = (gpu_busy[0] / gpu_busy[1])*(float)100;

	close(fd);
	return gpu_busy_ret;
}