Running QVIO on a hires camera
-
Hi Modal team,
Has anyone tried to run QVIO on one of the MPA pipes produced by a hires camera (either IMX214 or 412 I suppose)? Wondering if that's feasible, or if the rolling shutter on those sensors makes feature tracking quite impossible under dynamic movements. Happy to try it out myself if the answer is "unsure"!
Thanks,
Rowan (Cleo Robotics) -
We have not tried this recently, but it should work. Here are some tips:
- Use IMX412 camera (M0161 or similar) because it has great image quality and the fastest readout speed of all of our cameras (IMX214 is not recommended for this, it is an old and "slow" camera sensor)
- faster readout = less rolling shutter skew
- use the latest camera drivers, which max out the camera operating speed in all modes : https://storage.googleapis.com/modalai_public/temp/imx412_test_bins/20250919/imx412_fpv_eis_20250919_drivers.zip
- The readout times are documented here for all modes : https://docs.modalai.com/camera-video/low-latency-video-streaming/#imx412-operating-modes
- for example
1996x1520(2x2 binned) mode has about 5.5ms readout time, which is pretty short - QVIO (mvVISLAM.h) has a parameter "readout time", which suggests that it supports rolling shutter. I have not tried it myself, but i heard that it does work.
mvVISLAM_Initialize(...float32_t readoutTime ..) @param readoutTime Frame readout time (seconds). n times row readout time. Set to 0 for global shutter camera. Frame readout time should be (close to) but smaller than the rolling shutter camera frame period.Here is where this param is currently set to 0 in
voxl-qvio-server: https://gitlab.com/voxl-public/voxl-sdk/services/voxl-qvio-server/-/blob/master/server/main.cpp?ref_type=heads#L371- in order to correctly use the readout time, you have to ensure that the camera pipeline indeed selects the correct camera mode (for which there is the corresponding readout time) : https://docs.modalai.com/camera-video/low-latency-video-streaming/#how-to-confirm-which-camera-resolution-was-selected-by-the-pipeline
- also, readout time is printed out by
voxl-camera-serverwhen you run it in-dmode (readout time in nanoseconds here):
- also, readout time is printed out by
VERBOSE: Received metadata for frame 86 from camera imx412 VERBOSE: Timestamp: 69237313613 VERBOSE: Gain: 1575 VERBOSE: Exposure: 22769384 VERBOSE: Readout Time: 16328284- keep the exposure low to avoid motion blur (IMX412 has quite a high analog gain, up to 22x and 16x digital gain). If you want to prioritize gain vs exposure, would need to tweak the auto exposure params in camera server (when you get to that point, i can help you)..
- it would be interesting to compare performance against QVIO with AR0144 - that would probably require collecting images from AR0144 and IMX412 (side by side) + IMU data and running QVIO offline with each camera.
Good luck if you try it! let me know if you have any other questions. Please keep in mind that QVIO is based on a closed-source library from Qualcomm and our support of QVIO is limited.
Alex
- Use IMX412 camera (M0161 or similar) because it has great image quality and the fastest readout speed of all of our cameras (IMX214 is not recommended for this, it is an old and "slow" camera sensor)
-
@Alex-Kushleyev Glad to hear you are optimistic about this! And thank you, as always, for the tips to get us started. I will hopefully dive into experimenting with this in the next week or so. Do you have a suggestion for which of the hires camera pipes we should use?
-
@Rowan-Dempster , you should use a monochrome stream (
_grey), since QVIO needs a RAW8 image.If you are not using MISP on hires cameras, that is fine, you can start off using the output of the ISP.
You should calibrate the camera using whatever resolution you decide to try. This is to avoid any confusion, since if you using ISP pipeline, the camera pipeline may select a higher resolution and downscale + crop. So whenever you are changing resolutions, it is always good to do a quick camera calibration to confirm the camera parameters.
When using MISP, we have more control over which camera mode is selected, because MISP gets the RAW data, not processed by the ISP, so we know the exact dimensions of the image sent from camera.
Alex
-
Hi @Alex-Kushleyev,
Resurrecting this old thread, we now have the IMX412 on a drone and we are now ready to give to VIO on the IMX412 our full attention and lots of testing effort. Where I'm at now is I have a prototype working and QVIO does run on the IMX412 camera and outputs estimates that seem reasonable, but I'm 100% sure it's not configured as good as it could be cause I made so many assumptions that I would like your input on:
Which camera data / pipe to use
Ideally, we would like the IMX412 VIO to perform close to (or of course better than!) aar0144tracking camera, in terms of the quality of the image for feature tracking, low CPU usage, low latency frames, etc etc etc. In this spirit, I've been looking into how to get the MISP normalized pipes coming from the IMX412 and also how to get the camera server producing ION data to get the same CPU usage gains we saw in https://forum.modalai.com/topic/4893/minimizing-voxl-camera-server-cpu-usage-in-sdk1-6.
I saw that in the pipe setup, the normalized code for IMX412 was commented out

After commenting it back in I was able to see in the portal a decent looking normalized stream. I also see the ION pipe pop up for that norm stream but I haven't tried that ION pipe on the QVIO server yet (I'm confident it would work though, just waiting for https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/commit/d18521776e3e88f396d85aa657769c47f29e9c9f to get tagged!).
Do you see any issue with using the MISP norm pipe for IMX412 VIO, or is that actually what you would recommend?Which resolution to use
I know you talked about some resolution advice above, but I'm a little bit confused on the specifics on where to put those numbers. You had suggested1996x1520for a 5.5ms readout time. Do these numbers go into the Preview Width config fields? Here is the entire diff of the config settings I have been using for my testing:

The other values I have a question about in that diff is The MISP width / height fields, I chose 998x760 which is half of the Preview Width resolution you suggested. I did this because I wasn't sure of any compute bottle necks that would pop up if I fed a1996x1520image into QVIO. Do you think 998x760 is good or maybe I should pick a like 0.75 downsample so something like 1497x1140 for the MISP width/height.Camera Driver files to use and how to version control and deploy those
Could you confirm that the binary files in https://storage.googleapis.com/modalai_public/temp/imx412_test_bins/20250919/imx412_fpv_eis_20250919_drivers.zip are still the latest and the recommended binaries to use? Could you also advise on how to version control these files and deploy them to the voxl2 when the camera server .deb is deployed? I want to keep all files related to bringing up the camera in the voxl-camera-server debian if possible. I see some binaries files being stored in this path:

So if I understand the process correctly, those files will end up in
/usr/share/modalai/chi-cdk/imx412-fps-misp. Is that where thevoxl-configure-cameras Clooks for them? Also, do I have to do anything with thecom.qti.sensor.imx412_fpv.sofile in the zip link that you sent, or do I just ignore that file?My end desired behavior is that when I install the voxl camera server .deb, I don't have to worry about also copying binary files over to the voxl, or moving any files around on the voxl, or having to remember to run
voxl-configure-cameras C. So maybe the path forward there is to have all files deployed into the right places by the .deb install and then in the postinst script auto runvoxl-configure-cameras C? What do you think?Aspect ratio concerns and their affect on field of view and camera calibration
Is the aspect ratio you suggested (1996x1520) the actual aspect ratio of the sensor? Or does the sensor support multiple aspect ratios, or is there something more complex going on I don't understand here? I just want to make sure that we're using as much FoV as the sensor supports. That should be the goal for VIO feature detection and for streaming right, maximize field of view?Also, how does changing the aspect ratio / resolution affect the camera calibration? We're using kalibr which asks for a focal length bootstrap, which we've been giving it 470 for the
ar0144camera, do you know of a way that we can figure out an accurate focal length bootstrap for the images that we end up using for the IMX412?How to not lose other IMX412 features like 4k recording and EIS streaming etc
This is kinda my biggest concern about the feasibility of this whole thing: Will we still be able to get the other awesome IMX412 features like high quality streaming to the GCS with EIS, as well as high quality 4K recordings even in difficult low light environments, AND at the same time optimize the IMX412 for VIO which demands stuff like fast readout for less skew?Any advice you can give here on mapping out the tradeoffs? Are there any non-starters like not being able to get 4K recording if we opt to use the MISP norm pipe for VIO? Or are you confident that we can get the best of all 3 worlds

Exposure time concerns
I agree with your initial point that needing low exposure for low motion blur is important. As I mentioned in the intro to this message, I have a prototype working and I am now at a place where I can indeed tune the gain vs exposure / auto exposure params. Could you help me with that? I assume that this tradeoff also applies to thear0144camera, any lessons I can take from there?QVIO readoutTime param
Great find and thanks for pointing that out!! To confirm the specific numbers here, if I use the 1996x1520 preview width/height which has a documented read out time of 5.5ms (I should confirm this using the-dmode), then I should put0.0055for that parameter?As always, thank you for your help in camera related matters, we would be no where close to where we are now with robotic perception without your guidance!
-
Please see the following commit where we recently enabled publishing the normalized frame from IMX412 and IMX664 camera via regular and ION buffers: https://gitlab.com/voxl-public/voxl-sdk/services/voxl-camera-server/-/commit/c42e2febbc6370f9bbc95aff0659718656af6906
The parameters for 1996x1520 look good, basically you will be getting 2x2 binned (full frame) and then further down-scale to 998x760. since you are doing exact 2x2 downscale in misp, you can also remove interpolation, which will make the image a bit sharper, you can see this for reference: link -- basically change the sampler filter mode from linear (interpolate) to nearest. If you use non-integer down-sample, keep the linear interpolation.
Regarding the resolution to use with VIO.. i think the 998x760 with nearest sampling should behave the same or better than AR0144 with 1280x800 resolution, mainly because the IMX412 has a much bigger lens (while still being pretty wide), so image quality is going to be better (of course, you need to calibrate intrinsics). Also the 4:3 aspect ratio may help capture more features in the vertical direction. That of course does not account for rolling shutter effects..
There can definitely be benefit in going up in resolution and using 1996x1520, but you kind of have to use the extra resolution correctly.. typically you would detect features on lower resolution image and then refine using the full resolution (also for tracking features). However, in practice, often some small blur is applied to the image to get rid of pixel noise, etc, so very fine features won't get picked up. Unfortunately, we do not know exactly what QVIO does internally. it may do some kind of pyramidal image decomposition to do these things in a smart way. You should try it and check the cpu usage.
Using MISP you can downsample and crop (while maintaining aspect ratio) to any resolution, so it's easy to experiment.
If i had to test QVIO at different resolutions, i would log raw bayer images and imu data using
voxl-loggerand then use voxl-replay + offline misp + offline qvio to run tests on the same data sets with different processing parameters. This may sound complicated, but it's really not:- voxl-logger can log raw10 frames (on this branch) : https://gitlab.com/voxl-public/voxl-sdk/utilities/voxl-logger/-/tree/extend-cam-logging
- qvio relies on timestamps from incoming messages, so it can work with live data or playback data
- the only missing piece is offline MISP implementation, which is partially available in this tool : https://gitlab.com/voxl-public/voxl-sdk/utilities/voxl-mpa-tools/-/blob/add-new-image-tools/tools/voxl-convert-image.cpp -- and we are working on being able to run exactly the same implementation as in camera-server. The only missing piece in the code listed here is AWB -- however, white balance should not affect VIO too much, since only Y channel is used, so you can set white balance gains to a fixed value.
- running misp offline allows you to experiment with different resolutions / processing directly from the source raw10 bayer image (lossless)
So, if you are really serious about using hires camera for QVIO, since there are a lot of unknowns, you should consider setting up an offline processing pipeline, so that you can run repeatable tests and parameter sweeps. It requires some upfront work, but the pay-off will be significant. You can also use the offline pipeline for regression testing of performance and comparing to other VIO algorithms (which just need the MPA interface). We can discuss this topic more, if you are interested.
imx412_fpv_eis_20250919_drivers.zip are the latest for IMX412. We should really make them default ones shipped in the VOXL2 SDK, but we have not done it.
Since you are maintaining your own version of
voxl-camera-server, you should add them to yourvoxl-camera-serverrepo and.deband install them somewhere like/usr/share/modalai/voxl-camera-server/drivers. then modify the voxl-configure-camera script to first look for imx412 drivers in that folder and then fallback to searching/usr/share/modalai/chi-cdk/. In fact this is something I am considering, as maintaining camera drivers in the core system image is less flexible.EDIT: i guess the older version of the imx412 drivers are already in the repo, so you can just replace them with new ones in your camera server repo: link
Let me know if you have any more questions. Sounds like a fun project

Alex