ModalAI Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    tflite GPU usage

    Ask your questions right here!
    2
    4
    21
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      cegeyer
      last edited by

      Hello, I'm running the yolov5_float16_quant.tflite model with the voxl-tflite-server on a VOXL-2. It is getting the gpu delegate with the output below.

      Questions:

      1. When I run while watching voxl-inspect-cpu, the GPU utilization is always 0.00. Is there a way to verify it is actually using the GPU? I am getting around 17ms average preprocessing time and 26ms inference time on the hires_small_color if that sounds like the normal GPU performance. I did build another YOLOv5 model that had a larger input (640x512 vs 320x320) and it still gets 0.00 GPU utilization with a GPU delegate.

      2. Where is the source code located that outputs to the cpu_monitor pipe? I wanted see how the GPU utilization was being calculated.

      3. Is there another tool to monitor CPU/GPU usage and temperature? I wanted to log to CSV. I can post process the voxl-inspect-cpu output, but just checking if there were other methods.

      =================================================================
      skip_n_frames:                    0
      =================================================================
      model:                            /usr/bin/dnn/yolov5_float16_quant.tflite
      =================================================================
      input_pipe:                       /run/mpa/hires_small_color/
      =================================================================
      delegate:                         gpu
      =================================================================
      allow_multiple:                   false
      =================================================================
      output_pipe_prefix:               mobilenet
      =================================================================
      INFO: Created TensorFlow Lite delegate for GPU.
      INFO: Initialized OpenCL-based API.
      INFO: Created 1 GPU delegate kernels.
      Successfully built interpreter
      
      ------------------------------------------
      TIMING STATS (on 3271 processed frames)
      ------------------------------------------
      Preprocessing Time  -> Total: 56328.69ms, Average:  17.22ms
      Inference Time      -> Total: 87041.48ms, Average:  26.61ms
      Postprocessing Time -> Total: 2468.84ms, Average:   0.75ms
      ------------------------------------------
      
      ModeratorM 1 Reply Last reply Reply Quote 0
      • ModeratorM
        Moderator ModalAI Team @cegeyer
        last edited by

        @cegeyer that does seem suspicious

        Here's the code for cpu-monitor https://gitlab.com/voxl-public/voxl-sdk/services/voxl-cpu-monitor/-/blob/master/server/voxl-cpu-monitor.c?ref_type=heads

        C 2 Replies Last reply Reply Quote 0
        • C
          cegeyer @Moderator
          last edited by

          @Moderator Thanks. I was able to build the voxl-cpu-monitor with some additional debug. While voxl-tflite-server is processing with multiple detections, I'm getting gpu busy counter values like these for the two float values:

          https://gitlab.com/voxl-public/voxl-sdk/services/voxl-cpu-monitor/-/blob/master/server/voxl-cpu-monitor.c?ref_type=heads#L958:

          395796.000000 10062344192.000000

          When converted to a percent, it's a very small value. If I stop voxl-tflite-server, it does return to 0 usage. I just want to make sure this seems correct where the model inference is being done with the GPU and the usage isn't from preprocessing or something.

          1 Reply Last reply Reply Quote 0
          • C
            cegeyer @Moderator
            last edited by

            @Moderator I think I might have found the problem. This portion of the code:

            https://gitlab.com/voxl-public/voxl-sdk/services/voxl-cpu-monitor/-/blob/master/server/voxl-cpu-monitor.c?ref_type=heads#L958

            This is reading the /sys/class/kgsl/kgsl-3d0/gpubusy file contents into a 15 byte buffer. The problem is the contents of the file are 15 bytes exactly, so when the sscanf is called, it is pulling the contents extending past the 15th byte in memory.

            I updated it to be a 16-byte buffer, zeroing out the 16th byte, and still reading only 15-bytes, and it is giving a proper gpu utilization percentage now, matching what is in the /sys/class/kgsl/kgsl-3d0/gpubusy and /sys/class/kgsl/kgsl-3d0/gpu_busy_percentage files.

            Name   Freq (MHz) Temp (C) Util (%)
            -----------------------------------
            cpu0        691.2     76.8    24.55
            cpu1        691.2     76.0    17.78
            cpu2        691.2     76.0    16.55
            cpu3        691.2     76.8    17.34
            cpu4       1286.4     76.8     1.74
            cpu5       1286.4     79.9    33.74
            cpu6       1286.4     76.8     0.58
            cpu7        844.8     77.2     0.00
            Total                 77.0    14.04
            10s avg                       14.53
            -----------------------------------
            small cores only              19.06
            big cores only                 9.02
            -----------------------------------
            GPU         587.0     77.6    39.16
            GPU 10s avg                   38.52
            -----------------------------------
            memory temp:       79.2 C
            memory used:  2930/7671 MB
            -----------------------------------
            Flags
            CPU freq scaling mode: auto
            Standby Not Active
            -----------------------------------
            
            $ cat /sys/class/kgsl/kgsl-3d0/gpubusy             
             418136 1034679
            
            $ cat /sys/class/kgsl/kgsl-3d0/gpu_busy_percentage 
            40 %
            
            // gets gpu busy value
            static float _get_gpu_busy()
            {
            	fflush(stdout);
            	float gpu_busy[2]; // stores busy values
            	float gpu_busy_ret = 0;
            	
            	char buf[16];
            	int fd, ret;
            
            	buf[15] = 0;
            
            	fd = open(SYSTEM_GPU_BUSY_COUNTER, O_RDONLY);
            	if(fd<0){
            		perror("ERROR failed to open gpu busy counter for reading");
            		return 0;
            	}
            
            	ret = read(fd, buf, sizeof(buf)-1);
            	if(ret<1){
            		perror("ERROR failed to read gpu busy counter");
            		close(fd);
            		return 0;
            	}
            	sscanf(buf, "%f %f", &gpu_busy[0], &gpu_busy[1]);
            
            	if(en_debug){
            		printf("gpu busy: %f %f\n", (double)gpu_busy[0], (double)gpu_busy[1]);
            	}
            
            	if (gpu_busy[1] == 0){ // check if gpu_busy[1] is 0 to avoid divide by 0 errors
            		close(fd);
            		return 0;
            	}
            	gpu_busy_ret = (gpu_busy[0] / gpu_busy[1])*(float)100;
            
            	close(fd);
            	return gpu_busy_ret;
            }
            
            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Powered by NodeBB | Contributors