Enabling Hires Camera for AprilTag Detection

Madelyne Rossmann

I am using the VOXL 2 Flight Deck (Config: 2.4/5ghz WiFi, SKU: MDK-F0006-2-03) and am interested in utilizing the included hires sensor (IMX214) for AprilTag detection and localization. It seems that I could enable this in the /etc/modalai/voxl-tag-detector.conf file, but this requires having a calibration file available. I haven't had success in using the voxl-calibrate-camera package to calibrate the hires, but I believe there is a workaround using ROS 2 that I can try. For reference, this was the command I used for attempting to calibrate the hires:

voxl2:~$ voxl-calibrate-camera hires -s 6x8 -l 0.0223
Waiting for valid pipe...

It gets stuck here and does not open a calibration stream on the VOXL portal. I can confirm that the hires sensor does stream to the portal and QGC, so it is being recognized by the system. The AprilTag detection package does work with the tracking camera, but I need to run it with a more powerful sensor for my intended use case.

Here is a printout of my system information:

voxl2:~$ voxl-version
--------------------------------------------------------------------------------
system-image: 1.7.10-M0054-14.1a-perf
kernel:   	#1 SMP PREEMPT Fri Sep 27 21:18:59 UTC 2024 4.19.125
--------------------------------------------------------------------------------
hw platform:  M0054
mach.var: 	1.0.0
--------------------------------------------------------------------------------
voxl-suite:   1.3.5
--------------------------------------------------------------------------------
Packages:
Repo:  http://voxl-packages.modalai.com/ ./dists/qrb5165/sdk-1.3/binary-arm64/
Last Updated: 2025-01-08 16:56:37
List:
kernel-module-voxl-fsync-mod-4.19.125 	1.0-r0
kernel-module-voxl-gpio-mod-4.19.125  	1.0-r0
kernel-module-voxl-platform-mod-4.19.125  1.0-r0
libfc-sensor                          	1.0.7
libmodal-cv                           	0.5.11
libmodal-exposure                     	0.1.1
libmodal-journal                      	0.2.2
libmodal-json                         	0.4.3
libmodal-pipe                         	2.10.2
libqrb5165-io                         	0.4.6
libvoxl-cci-direct                    	0.2.3
libvoxl-cutils                        	0.1.1
modalai-slpi                          	1.1.19
mv-voxl                               	0.1-r0
qrb5165-bind                          	0.1-r0
qrb5165-dfs-server                    	0.2.0
qrb5165-imu-server                    	1.0.1
qrb5165-rangefinder-server            	0.1.3
qrb5165-slpi-test-sig                 	01-r0
qrb5165-system-tweaks                 	0.3.0
qrb5165-tflite                        	2.8.0-2
voxl-bind-spektrum                    	0.1.1
voxl-camera-calibration               	0.5.7
voxl-camera-server                    	2.0.1
voxl-ceres-solver                     	2:1.14.0-10
voxl-configurator                     	0.8.5
voxl-cpu-monitor                      	0.4.8
voxl-docker-support                   	1.3.1
voxl-elrs                             	0.2.2
voxl-esc                              	1.4.7
voxl-feature-tracker                  	0.4.1
voxl-flow-server                      	0.3.6
voxl-fsync-mod                        	1.0-r0
voxl-gphoto2-server                   	0.0.10
voxl-gpio-mod                         	1.0-r0
voxl-jpeg-turbo                       	2.1.3-5
voxl-lepton-server                    	1.2.3
voxl-lepton-tracker                   	0.0.2
voxl-libgphoto2                       	0.0.4
voxl-libuvc                           	1.0.7
voxl-logger                           	0.4.7
voxl-mavcam-manager                   	0.5.7
voxl-mavlink                          	0.1.1
voxl-mavlink-server                   	1.4.2
voxl-microdds-agent                   	2.4.1-0
voxl-modem                            	1.1.3
voxl-mongoose                         	7.7.0-1
voxl-mpa-to-ros                       	0.3.9
voxl-mpa-to-ros2                      	0.0.2
voxl-mpa-tools                        	1.2.3
voxl-neopixel-manager                 	0.0.3
voxl-open-vins                        	0.4.14
voxl-open-vins-server                 	0.2.75
voxl-opencv                           	4.5.5-2
voxl-osd                              	0.0.1
voxl-platform-mod                     	1.0-r0
voxl-portal                           	0.7.0
voxl-px4                              	1.14.0-2.0.84
voxl-px4-imu-server                   	0.1.2
voxl-px4-params                       	0.5.3
voxl-qvio-server                      	1.0.4
voxl-remote-id                        	0.0.9
voxl-reset-slpi                       	0.0.1
voxl-ros2-foxy                        	0.0.1
voxl-state-estimator                  	0.0.2
voxl-streamer                         	0.7.4
voxl-suite                            	1.3.5
voxl-tag-detector                     	0.0.4
voxl-tflite-server                    	0.3.4
voxl-utils                            	1.4.3
voxl-uvc-server                       	0.1.7
voxl-vision-hub                       	1.8.9
voxl-vtx                              	1.1.0
voxl2-io                              	0.0.3
voxl2-system-image                    	1.7.10-r0
voxl2-wlan                            	1.0-r0
--------------------------------------------------------------------------------

Here is my /etc/modalai/voxl-camera-server.conf file:

/**
 * voxl-camera-server Configuration File
 *
 * Each camera has configurations for up to 4 HAL3 streams:
 *	- `preview` stream for raw unprocessed images from CV cameras
 *	- `small_video` 720p (ish) h264/h265 compressed for fpv video streaming
 *	- `large_video` 4k (ish) h264/h265 for onboard video recording to disk
 *	- `snapshot` ISP-processed JPG snapshots that get saved to disk
 *
 * on QRB5165 platforms (VOXL2 and VOXL2 mini) you can only have 3 of the 4 enabled
 *
 * This file is generated from default values by voxl-configure-cameras.
 * Do not expect arbitrary resolutions to work, the ISP and video compression
 * pipelines only support very specific resolutions.
 *
 * The default video compression mode is cqp or Constant Quantization Parameter
 *
 *
 *
 */
{
	"version":	0.1,
	"fsync_en":	false,
	"fsync_gpio":	109,
	"cameras":	[{
        	"type":	"ov7251",
        	"name":	"stereo_front",
        	"enabled":	true,
        	"camera_id":	0,
        	"camera_id_second":	1,
        	"independent_exposure":	false,
        	"fps":	30,
        	"en_rotate":	true,
        	"en_rotate_second":	true,
        	"en_preview":	true,
        	"preview_width":	640,
        	"preview_height":	480,
        	"en_raw_preview":	true,
        	"ae_mode":	"lme_msv",
        	"ae_desired_msv":	60,
        	"exposure_min_us":	20,
        	"exposure_max_us":	33000,
        	"gain_min":	54,
        	"gain_max":	8000,
        	"exposure_soft_min_us":	5000,
        	"ae_filter_alpha":	0.600000023841858,
        	"ae_ignore_fraction":	0.20000000298023224,
        	"ae_slope":	0.05000000074505806,
        	"ae_exposure_period":	1,
        	"ae_gain_period":	1
    	}, {
        	"type":	"ov7251",
        	"name":	"tracking",
        	"enabled":	true,
        	"camera_id":	2,
        	"fps":	30,
        	"en_rotate":	false,
        	"en_preview":	true,
        	"preview_width":	640,
        	"preview_height":	480,
        	"en_raw_preview":	true,
        	"ae_mode":	"lme_msv",
        	"ae_desired_msv":	60,
        	"exposure_min_us":	20,
        	"exposure_max_us":	33000,
        	"gain_min":	54,
        	"gain_max":	8000,
        	"exposure_soft_min_us":	5000,
        	"ae_filter_alpha":	0.600000023841858,
        	"ae_ignore_fraction":	0.20000000298023224,
        	"ae_slope":	0.05000000074505806,
        	"ae_exposure_period":	1,
        	"ae_gain_period":	1
    	}, {
        	"type":	"imx214",
        	"name":	"hires",
        	"enabled":	true,
        	"camera_id":	3,
        	"fps":	30,
        	"en_preview":	false,
        	"preview_width":	640,
        	"preview_height":	480,
        	"en_raw_preview":	false,
        	"en_small_video":	true,
        	"small_video_width":	1024,
        	"small_video_height":	768,
        	"small_venc_mode":	"h265",
        	"small_venc_br_ctrl":	"cqp",
        	"small_venc_Qfixed":	30,
        	"small_venc_Qmin":	15,
        	"small_venc_Qmax":	40,
        	"small_venc_nPframes":	9,
        	"small_venc_mbps":	2,
        	"en_large_video":	true,
        	"large_video_width":	4208,
        	"large_video_height":	3120,
        	"large_venc_mode":	"h265",
        	"large_venc_br_ctrl":	"cqp",
        	"large_venc_Qfixed":	38,
        	"large_venc_Qmin":	15,
        	"large_venc_Qmax":	50,
        	"large_venc_nPframes":	29,
        	"large_venc_mbps":	30,
        	"en_snapshot":	true,
        	"en_snapshot_width":	4208,
        	"en_snapshot_height":	3120,
        	"exif_focal_length":	0,
        	"exif_focal_length_in_35mm_format":	0,
        	"exif_fnumber":	0,
        	"ae_mode":	"isp"
    	}, {
        	"type":	"ov7251",
        	"name":	"stereo_rear",
        	"enabled":	true,
        	"camera_id":	5,
        	"camera_id_second":	4,
        	"independent_exposure":	false,
        	"fps":	30,
        	"en_rotate":	false,
        	"en_rotate_second":	false,
        	"en_preview":	true,
        	"preview_width":	640,
        	"preview_height":	480,
        	"en_raw_preview":	true,
        	"ae_mode":	"lme_msv",
        	"ae_desired_msv":	60,
        	"exposure_min_us":	20,
        	"exposure_max_us":	33000,
        	"gain_min":	54,
        	"gain_max":	8000,
        	"exposure_soft_min_us":	5000,
        	"ae_filter_alpha":	0.600000023841858,
        	"ae_ignore_fraction":	0.20000000298023224,
        	"ae_slope":	0.05000000074505806,
        	"ae_exposure_period":	1,
        	"ae_gain_period":	1
    	}]
}

Additionally, here is my /etc/modalai/voxl-tag-detector.conf file contents:

/**
 * This file contains configuration parameters for voxl-tag-detector.
 * You can specify up to 3 cameras to do detection on simultaneously.
 * For the stereo camera pair, only the left camera is used.
 */
{
	"detector_0":	{
    	"enable":	true,
    	"input_pipe":	"tracking",
    	"en_fast_mode":	true,
    	"n_threads":	1,
    	"en_undistortion":	true,
    	"undistort_scale":	0.600000023841858,
    	"overlay_name":	"tracking_tag_overlay",
    	"lens_cal_file":	"/data/modalai/opencv_tracking_intrinsics.yml",
    	"skip_n_frames":	5
	},
	"detector_1":	{
    	"enable":	false,
    	"input_pipe":	"stereo",
    	"en_fast_mode":	true,
    	"n_threads":	1,
    	"en_undistortion":	true,
    	"undistort_scale":	0.899999976158142,
    	"overlay_name":	"stereo_tag_overlay",
    	"lens_cal_file":	"/data/modalai/opencv_stereo_intrinsics.yml",
    	"skip_n_frames":	5
	},
	"detector_2":	{
    	"enable":	false,
    	"input_pipe":	"extra",
    	"en_fast_mode":	true,
    	"n_threads":	1,
    	"en_undistortion":	false,
    	"undistort_scale":	1,
    	"overlay_name":	"extra_tag_overlay",
    	"lens_cal_file":	"/data/modalai/opencv_extra_intrinsics.yml",
    	"skip_n_frames":	5
	}
}

If able to utilize the hires sensor for AprilTag detection, I do plan to modify its position and orientation on the flight deck to more closely resemble that of the tracking camera’s. It seems that would necessitate an update to the /etc/modalai/extrinsics.conf file. Are there any other files that would need to be modified to reflect a change in position/orientation of the sensor?

Please let me know if any more information about my setup is needed and I am happy to provide it. Any help is greatly appreciated, thank you!

Alex Kushleyev

@Madelyne-Rossmann, what you are doing should be possible.

the first thing, for calibrating the hires camera, you will need to run the calibration and provide the exact camera stream name (and make sure it is available). In your case, the "preview" stream is disabled in camera config, so you do not have a stream that is called just "hires". However, you have enabled the large encoded stream, so along with h265 stream, you will also have a YUV stream (hires_large_color) and monochrome stream (hires_large_grey). Since for calibration and tag detection you do not need color, it is best to use the monochrome stream to reduce cpu usage for processing large images.

You can double check using voxl-list-pipes to see which camera streams are available and you can view the stream using voxl-portal, but due to the high resolution of the image, voxl-portal will not be able to display the stream at full fps and.. you can use it for quick testing of the stream.

So, in your case, the command for calibrating would be:

voxl-calibrate-camera hires_large_grey -s 6x8 -l 0.0223

Since you need the highest possible resolution, IMX214 has max resolution of 4208x3120, which you already have selected for your large encoded stream, so that's fine.

However in the past, I have had issues calibrating hires cameras at full resolution because it was too slow processing such large images. I just tried running calibration on a similar resolution and the calibrator was running at only 2 FPS. I have previously worked around this issue by calibrating the camera at half resolution, however 2104x1560 is not a standard resolution ( you can check by running voxl-camera-server -l and look for resolutions listed under ANDROID_SCALER_AVAILABLE_STREAM_CONFIGURATIONS.

If calibrating using downscaled images, you would want to have the down scaling to be exact (say factor of 2), so that you know that you can simply scale the calibration results (focal length and principal points) by a factor of 2.

One option: you can pick a scaled down supported resolution, such as 2048 x 1536 instead of your full frame 4208x3120 (set "large_video_width": 2048, "large_video_height": 1536). The downscaling and cropping will be done by the ISP on VOXL, so it will not use any CPU. Then you can just run the calibration using this resolution and also try the detection using the same resolution.

Please keep in mind if you are changing resolutions, you may have to re-calibrate because depending on exact resolution, there may be cropping / scaling happening and it is not always clear exactly what the ISP does, so the rule of thumb is.. if resolution changes, recalibrate, unless you are 100% sure that the new resolution is a scaled version of the original resolution and you can just scale the calibration params accordingly.

For calibration purposes you can reduce your camera fps to 10, so that the images are not back up in the camera calibrator. After calibration, you can restore the original 30 fps.

You will also need to update your tag config file to use hires_large_grey image stream and make sure you point it to the correct intrinsics file that you generated.

Please try and let me know if you get stuck on any steps. After you get the scaled down resolution to work (2048x1536), i can help you with enabling the full resolution (but not sure if tag detector can keep up at full fps).

Alex

Madelyne Rossmann

@Alex-Kushleyev Thank you! That worked! I ended up using hires_small_grey since the large stream would not calibrate for me, and the reduced-size did not seem to cause any issues with detection in my code. I was able to confirm tag detection using the hires camera after calibration and all is good. I appreciate the help!

Alex Kushleyev

@Madelyne-Rossmann , sounds good! We will work on enabling larger resolutions for calibration and tag detection. It is a bit of a hassle due to very large frames, so some optimizations need to be made.

Alex