Enabling Hires Camera for AprilTag Detection
-
I am using the VOXL 2 Flight Deck (Config: 2.4/5ghz WiFi, SKU: MDK-F0006-2-03) and am interested in utilizing the included hires sensor (IMX214) for AprilTag detection and localization. It seems that I could enable this in the
/etc/modalai/voxl-tag-detector.conf
file, but this requires having a calibration file available. I haven't had success in using thevoxl-calibrate-camera
package to calibrate the hires, but I believe there is a workaround using ROS 2 that I can try. For reference, this was the command I used for attempting to calibrate the hires:voxl2:~$ voxl-calibrate-camera hires -s 6x8 -l 0.0223 Waiting for valid pipe...
It gets stuck here and does not open a calibration stream on the VOXL portal. I can confirm that the hires sensor does stream to the portal and QGC, so it is being recognized by the system. The AprilTag detection package does work with the tracking camera, but I need to run it with a more powerful sensor for my intended use case.
Here is a printout of my system information:
voxl2:~$ voxl-version -------------------------------------------------------------------------------- system-image: 1.7.10-M0054-14.1a-perf kernel: #1 SMP PREEMPT Fri Sep 27 21:18:59 UTC 2024 4.19.125 -------------------------------------------------------------------------------- hw platform: M0054 mach.var: 1.0.0 -------------------------------------------------------------------------------- voxl-suite: 1.3.5 -------------------------------------------------------------------------------- Packages: Repo: http://voxl-packages.modalai.com/ ./dists/qrb5165/sdk-1.3/binary-arm64/ Last Updated: 2025-01-08 16:56:37 List: kernel-module-voxl-fsync-mod-4.19.125 1.0-r0 kernel-module-voxl-gpio-mod-4.19.125 1.0-r0 kernel-module-voxl-platform-mod-4.19.125 1.0-r0 libfc-sensor 1.0.7 libmodal-cv 0.5.11 libmodal-exposure 0.1.1 libmodal-journal 0.2.2 libmodal-json 0.4.3 libmodal-pipe 2.10.2 libqrb5165-io 0.4.6 libvoxl-cci-direct 0.2.3 libvoxl-cutils 0.1.1 modalai-slpi 1.1.19 mv-voxl 0.1-r0 qrb5165-bind 0.1-r0 qrb5165-dfs-server 0.2.0 qrb5165-imu-server 1.0.1 qrb5165-rangefinder-server 0.1.3 qrb5165-slpi-test-sig 01-r0 qrb5165-system-tweaks 0.3.0 qrb5165-tflite 2.8.0-2 voxl-bind-spektrum 0.1.1 voxl-camera-calibration 0.5.7 voxl-camera-server 2.0.1 voxl-ceres-solver 2:1.14.0-10 voxl-configurator 0.8.5 voxl-cpu-monitor 0.4.8 voxl-docker-support 1.3.1 voxl-elrs 0.2.2 voxl-esc 1.4.7 voxl-feature-tracker 0.4.1 voxl-flow-server 0.3.6 voxl-fsync-mod 1.0-r0 voxl-gphoto2-server 0.0.10 voxl-gpio-mod 1.0-r0 voxl-jpeg-turbo 2.1.3-5 voxl-lepton-server 1.2.3 voxl-lepton-tracker 0.0.2 voxl-libgphoto2 0.0.4 voxl-libuvc 1.0.7 voxl-logger 0.4.7 voxl-mavcam-manager 0.5.7 voxl-mavlink 0.1.1 voxl-mavlink-server 1.4.2 voxl-microdds-agent 2.4.1-0 voxl-modem 1.1.3 voxl-mongoose 7.7.0-1 voxl-mpa-to-ros 0.3.9 voxl-mpa-to-ros2 0.0.2 voxl-mpa-tools 1.2.3 voxl-neopixel-manager 0.0.3 voxl-open-vins 0.4.14 voxl-open-vins-server 0.2.75 voxl-opencv 4.5.5-2 voxl-osd 0.0.1 voxl-platform-mod 1.0-r0 voxl-portal 0.7.0 voxl-px4 1.14.0-2.0.84 voxl-px4-imu-server 0.1.2 voxl-px4-params 0.5.3 voxl-qvio-server 1.0.4 voxl-remote-id 0.0.9 voxl-reset-slpi 0.0.1 voxl-ros2-foxy 0.0.1 voxl-state-estimator 0.0.2 voxl-streamer 0.7.4 voxl-suite 1.3.5 voxl-tag-detector 0.0.4 voxl-tflite-server 0.3.4 voxl-utils 1.4.3 voxl-uvc-server 0.1.7 voxl-vision-hub 1.8.9 voxl-vtx 1.1.0 voxl2-io 0.0.3 voxl2-system-image 1.7.10-r0 voxl2-wlan 1.0-r0 --------------------------------------------------------------------------------
Here is my
/etc/modalai/voxl-camera-server.conf
file:/** * voxl-camera-server Configuration File * * Each camera has configurations for up to 4 HAL3 streams: * - `preview` stream for raw unprocessed images from CV cameras * - `small_video` 720p (ish) h264/h265 compressed for fpv video streaming * - `large_video` 4k (ish) h264/h265 for onboard video recording to disk * - `snapshot` ISP-processed JPG snapshots that get saved to disk * * on QRB5165 platforms (VOXL2 and VOXL2 mini) you can only have 3 of the 4 enabled * * This file is generated from default values by voxl-configure-cameras. * Do not expect arbitrary resolutions to work, the ISP and video compression * pipelines only support very specific resolutions. * * The default video compression mode is cqp or Constant Quantization Parameter * * * */ { "version": 0.1, "fsync_en": false, "fsync_gpio": 109, "cameras": [{ "type": "ov7251", "name": "stereo_front", "enabled": true, "camera_id": 0, "camera_id_second": 1, "independent_exposure": false, "fps": 30, "en_rotate": true, "en_rotate_second": true, "en_preview": true, "preview_width": 640, "preview_height": 480, "en_raw_preview": true, "ae_mode": "lme_msv", "ae_desired_msv": 60, "exposure_min_us": 20, "exposure_max_us": 33000, "gain_min": 54, "gain_max": 8000, "exposure_soft_min_us": 5000, "ae_filter_alpha": 0.600000023841858, "ae_ignore_fraction": 0.20000000298023224, "ae_slope": 0.05000000074505806, "ae_exposure_period": 1, "ae_gain_period": 1 }, { "type": "ov7251", "name": "tracking", "enabled": true, "camera_id": 2, "fps": 30, "en_rotate": false, "en_preview": true, "preview_width": 640, "preview_height": 480, "en_raw_preview": true, "ae_mode": "lme_msv", "ae_desired_msv": 60, "exposure_min_us": 20, "exposure_max_us": 33000, "gain_min": 54, "gain_max": 8000, "exposure_soft_min_us": 5000, "ae_filter_alpha": 0.600000023841858, "ae_ignore_fraction": 0.20000000298023224, "ae_slope": 0.05000000074505806, "ae_exposure_period": 1, "ae_gain_period": 1 }, { "type": "imx214", "name": "hires", "enabled": true, "camera_id": 3, "fps": 30, "en_preview": false, "preview_width": 640, "preview_height": 480, "en_raw_preview": false, "en_small_video": true, "small_video_width": 1024, "small_video_height": 768, "small_venc_mode": "h265", "small_venc_br_ctrl": "cqp", "small_venc_Qfixed": 30, "small_venc_Qmin": 15, "small_venc_Qmax": 40, "small_venc_nPframes": 9, "small_venc_mbps": 2, "en_large_video": true, "large_video_width": 4208, "large_video_height": 3120, "large_venc_mode": "h265", "large_venc_br_ctrl": "cqp", "large_venc_Qfixed": 38, "large_venc_Qmin": 15, "large_venc_Qmax": 50, "large_venc_nPframes": 29, "large_venc_mbps": 30, "en_snapshot": true, "en_snapshot_width": 4208, "en_snapshot_height": 3120, "exif_focal_length": 0, "exif_focal_length_in_35mm_format": 0, "exif_fnumber": 0, "ae_mode": "isp" }, { "type": "ov7251", "name": "stereo_rear", "enabled": true, "camera_id": 5, "camera_id_second": 4, "independent_exposure": false, "fps": 30, "en_rotate": false, "en_rotate_second": false, "en_preview": true, "preview_width": 640, "preview_height": 480, "en_raw_preview": true, "ae_mode": "lme_msv", "ae_desired_msv": 60, "exposure_min_us": 20, "exposure_max_us": 33000, "gain_min": 54, "gain_max": 8000, "exposure_soft_min_us": 5000, "ae_filter_alpha": 0.600000023841858, "ae_ignore_fraction": 0.20000000298023224, "ae_slope": 0.05000000074505806, "ae_exposure_period": 1, "ae_gain_period": 1 }] }
Additionally, here is my
/etc/modalai/voxl-tag-detector.conf
file contents:/** * This file contains configuration parameters for voxl-tag-detector. * You can specify up to 3 cameras to do detection on simultaneously. * For the stereo camera pair, only the left camera is used. */ { "detector_0": { "enable": true, "input_pipe": "tracking", "en_fast_mode": true, "n_threads": 1, "en_undistortion": true, "undistort_scale": 0.600000023841858, "overlay_name": "tracking_tag_overlay", "lens_cal_file": "/data/modalai/opencv_tracking_intrinsics.yml", "skip_n_frames": 5 }, "detector_1": { "enable": false, "input_pipe": "stereo", "en_fast_mode": true, "n_threads": 1, "en_undistortion": true, "undistort_scale": 0.899999976158142, "overlay_name": "stereo_tag_overlay", "lens_cal_file": "/data/modalai/opencv_stereo_intrinsics.yml", "skip_n_frames": 5 }, "detector_2": { "enable": false, "input_pipe": "extra", "en_fast_mode": true, "n_threads": 1, "en_undistortion": false, "undistort_scale": 1, "overlay_name": "extra_tag_overlay", "lens_cal_file": "/data/modalai/opencv_extra_intrinsics.yml", "skip_n_frames": 5 } }
If able to utilize the hires sensor for AprilTag detection, I do plan to modify its position and orientation on the flight deck to more closely resemble that of the tracking camera’s. It seems that would necessitate an update to the
/etc/modalai/extrinsics.conf
file. Are there any other files that would need to be modified to reflect a change in position/orientation of the sensor?Please let me know if any more information about my setup is needed and I am happy to provide it. Any help is greatly appreciated, thank you!
-
@Madelyne-Rossmann, what you are doing should be possible.
the first thing, for calibrating the hires camera, you will need to run the calibration and provide the exact camera stream name (and make sure it is available). In your case, the "preview" stream is disabled in camera config, so you do not have a stream that is called just "hires". However, you have enabled the large encoded stream, so along with h265 stream, you will also have a YUV stream (
hires_large_color
) and monochrome stream (hires_large_grey
). Since for calibration and tag detection you do not need color, it is best to use the monochrome stream to reduce cpu usage for processing large images.You can double check using
voxl-list-pipes
to see which camera streams are available and you can view the stream usingvoxl-portal
, but due to the high resolution of the image,voxl-portal
will not be able to display the stream at full fps and.. you can use it for quick testing of the stream.So, in your case, the command for calibrating would be:
voxl-calibrate-camera hires_large_grey -s 6x8 -l 0.0223
Since you need the highest possible resolution, IMX214 has max resolution of 4208x3120, which you already have selected for your large encoded stream, so that's fine.
However in the past, I have had issues calibrating hires cameras at full resolution because it was too slow processing such large images. I just tried running calibration on a similar resolution and the calibrator was running at only 2 FPS. I have previously worked around this issue by calibrating the camera at half resolution, however 2104x1560 is not a standard resolution ( you can check by running
voxl-camera-server -l
and look for resolutions listed underANDROID_SCALER_AVAILABLE_STREAM_CONFIGURATIONS
.If calibrating using downscaled images, you would want to have the down scaling to be exact (say factor of 2), so that you know that you can simply scale the calibration results (focal length and principal points) by a factor of 2.
One option: you can pick a scaled down supported resolution, such as
2048 x 1536
instead of your full frame 4208x3120 (set"large_video_width": 2048
,"large_video_height": 1536
). The downscaling and cropping will be done by the ISP on VOXL, so it will not use any CPU. Then you can just run the calibration using this resolution and also try the detection using the same resolution.Please keep in mind if you are changing resolutions, you may have to re-calibrate because depending on exact resolution, there may be cropping / scaling happening and it is not always clear exactly what the ISP does, so the rule of thumb is.. if resolution changes, recalibrate, unless you are 100% sure that the new resolution is a scaled version of the original resolution and you can just scale the calibration params accordingly.
For calibration purposes you can reduce your camera fps to 10, so that the images are not back up in the camera calibrator. After calibration, you can restore the original 30 fps.
You will also need to update your tag config file to use
hires_large_grey
image stream and make sure you point it to the correct intrinsics file that you generated.Please try and let me know if you get stuck on any steps. After you get the scaled down resolution to work (2048x1536), i can help you with enabling the full resolution (but not sure if tag detector can keep up at full fps).
Alex
-
@Alex-Kushleyev Thank you! That worked! I ended up using
hires_small_grey
since the large stream would not calibrate for me, and the reduced-size did not seem to cause any issues with detection in my code. I was able to confirm tag detection using the hires camera after calibration and all is good. I appreciate the help! -
@Madelyne-Rossmann , sounds good! We will work on enabling larger resolutions for calibration and tag detection. It is a bit of a hassle due to very large frames, so some optimizations need to be made.
Alex