Running QVIO on a hires camera
-
Hi Modal team,
Has anyone tried to run QVIO on one of the MPA pipes produced by a hires camera (either IMX214 or 412 I suppose)? Wondering if that's feasible, or if the rolling shutter on those sensors makes feature tracking quite impossible under dynamic movements. Happy to try it out myself if the answer is "unsure"!
Thanks,
Rowan (Cleo Robotics) -
We have not tried this recently, but it should work. Here are some tips:
- Use IMX412 camera (M0161 or similar) because it has great image quality and the fastest readout speed of all of our cameras (IMX214 is not recommended for this, it is an old and "slow" camera sensor)
- faster readout = less rolling shutter skew
- use the latest camera drivers, which max out the camera operating speed in all modes : https://storage.googleapis.com/modalai_public/temp/imx412_test_bins/20250919/imx412_fpv_eis_20250919_drivers.zip
- The readout times are documented here for all modes : https://docs.modalai.com/camera-video/low-latency-video-streaming/#imx412-operating-modes
- for example
1996x1520(2x2 binned) mode has about 5.5ms readout time, which is pretty short - QVIO (mvVISLAM.h) has a parameter "readout time", which suggests that it supports rolling shutter. I have not tried it myself, but i heard that it does work.
mvVISLAM_Initialize(...float32_t readoutTime ..) @param readoutTime Frame readout time (seconds). n times row readout time. Set to 0 for global shutter camera. Frame readout time should be (close to) but smaller than the rolling shutter camera frame period.Here is where this param is currently set to 0 in
voxl-qvio-server: https://gitlab.com/voxl-public/voxl-sdk/services/voxl-qvio-server/-/blob/master/server/main.cpp?ref_type=heads#L371- in order to correctly use the readout time, you have to ensure that the camera pipeline indeed selects the correct camera mode (for which there is the corresponding readout time) : https://docs.modalai.com/camera-video/low-latency-video-streaming/#how-to-confirm-which-camera-resolution-was-selected-by-the-pipeline
- also, readout time is printed out by
voxl-camera-serverwhen you run it in-dmode (readout time in nanoseconds here):
- also, readout time is printed out by
VERBOSE: Received metadata for frame 86 from camera imx412 VERBOSE: Timestamp: 69237313613 VERBOSE: Gain: 1575 VERBOSE: Exposure: 22769384 VERBOSE: Readout Time: 16328284- keep the exposure low to avoid motion blur (IMX412 has quite a high analog gain, up to 22x and 16x digital gain). If you want to prioritize gain vs exposure, would need to tweak the auto exposure params in camera server (when you get to that point, i can help you)..
- it would be interesting to compare performance against QVIO with AR0144 - that would probably require collecting images from AR0144 and IMX412 (side by side) + IMU data and running QVIO offline with each camera.
Good luck if you try it! let me know if you have any other questions. Please keep in mind that QVIO is based on a closed-source library from Qualcomm and our support of QVIO is limited.
Alex
- Use IMX412 camera (M0161 or similar) because it has great image quality and the fastest readout speed of all of our cameras (IMX214 is not recommended for this, it is an old and "slow" camera sensor)
-
@Alex-Kushleyev Glad to hear you are optimistic about this! And thank you, as always, for the tips to get us started. I will hopefully dive into experimenting with this in the next week or so. Do you have a suggestion for which of the hires camera pipes we should use?
-
@Rowan-Dempster , you should use a monochrome stream (
_grey), since QVIO needs a RAW8 image.If you are not using MISP on hires cameras, that is fine, you can start off using the output of the ISP.
You should calibrate the camera using whatever resolution you decide to try. This is to avoid any confusion, since if you using ISP pipeline, the camera pipeline may select a higher resolution and downscale + crop. So whenever you are changing resolutions, it is always good to do a quick camera calibration to confirm the camera parameters.
When using MISP, we have more control over which camera mode is selected, because MISP gets the RAW data, not processed by the ISP, so we know the exact dimensions of the image sent from camera.
Alex
-
Hi @Alex-Kushleyev,
Resurrecting this old thread, we now have the IMX412 on a drone and we are now ready to give to VIO on the IMX412 our full attention and lots of testing effort. Where I'm at now is I have a prototype working and QVIO does run on the IMX412 camera and outputs estimates that seem reasonable, but I'm 100% sure it's not configured as good as it could be cause I made so many assumptions that I would like your input on:
Which camera data / pipe to use
Ideally, we would like the IMX412 VIO to perform close to (or of course better than!) aar0144tracking camera, in terms of the quality of the image for feature tracking, low CPU usage, low latency frames, etc etc etc. In this spirit, I've been looking into how to get the MISP normalized pipes coming from the IMX412 and also how to get the camera server producing ION data to get the same CPU usage gains we saw in https://forum.modalai.com/topic/4893/minimizing-voxl-camera-server-cpu-usage-in-sdk1-6.
I saw that in the pipe setup, the normalized code for IMX412 was commented out

After commenting it back in I was able to see in the portal a decent looking normalized stream. I also see the ION pipe pop up for that norm stream but I haven't tried that ION pipe on the QVIO server yet (I'm confident it would work though, just waiting for https://gitlab.com/voxl-public/voxl-sdk/core-libs/libmodal-pipe/-/commit/d18521776e3e88f396d85aa657769c47f29e9c9f to get tagged!).
Do you see any issue with using the MISP norm pipe for IMX412 VIO, or is that actually what you would recommend?Which resolution to use
I know you talked about some resolution advice above, but I'm a little bit confused on the specifics on where to put those numbers. You had suggested1996x1520for a 5.5ms readout time. Do these numbers go into the Preview Width config fields? Here is the entire diff of the config settings I have been using for my testing:

The other values I have a question about in that diff is The MISP width / height fields, I chose 998x760 which is half of the Preview Width resolution you suggested. I did this because I wasn't sure of any compute bottle necks that would pop up if I fed a1996x1520image into QVIO. Do you think 998x760 is good or maybe I should pick a like 0.75 downsample so something like 1497x1140 for the MISP width/height.Camera Driver files to use and how to version control and deploy those
Could you confirm that the binary files in https://storage.googleapis.com/modalai_public/temp/imx412_test_bins/20250919/imx412_fpv_eis_20250919_drivers.zip are still the latest and the recommended binaries to use? Could you also advise on how to version control these files and deploy them to the voxl2 when the camera server .deb is deployed? I want to keep all files related to bringing up the camera in the voxl-camera-server debian if possible. I see some binaries files being stored in this path:

So if I understand the process correctly, those files will end up in
/usr/share/modalai/chi-cdk/imx412-fps-misp. Is that where thevoxl-configure-cameras Clooks for them? Also, do I have to do anything with thecom.qti.sensor.imx412_fpv.sofile in the zip link that you sent, or do I just ignore that file?My end desired behavior is that when I install the voxl camera server .deb, I don't have to worry about also copying binary files over to the voxl, or moving any files around on the voxl, or having to remember to run
voxl-configure-cameras C. So maybe the path forward there is to have all files deployed into the right places by the .deb install and then in the postinst script auto runvoxl-configure-cameras C? What do you think?Aspect ratio concerns and their affect on field of view and camera calibration
Is the aspect ratio you suggested (1996x1520) the actual aspect ratio of the sensor? Or does the sensor support multiple aspect ratios, or is there something more complex going on I don't understand here? I just want to make sure that we're using as much FoV as the sensor supports. That should be the goal for VIO feature detection and for streaming right, maximize field of view?Also, how does changing the aspect ratio / resolution affect the camera calibration? We're using kalibr which asks for a focal length bootstrap, which we've been giving it 470 for the
ar0144camera, do you know of a way that we can figure out an accurate focal length bootstrap for the images that we end up using for the IMX412?How to not lose other IMX412 features like 4k recording and EIS streaming etc
This is kinda my biggest concern about the feasibility of this whole thing: Will we still be able to get the other awesome IMX412 features like high quality streaming to the GCS with EIS, as well as high quality 4K recordings even in difficult low light environments, AND at the same time optimize the IMX412 for VIO which demands stuff like fast readout for less skew?Any advice you can give here on mapping out the tradeoffs? Are there any non-starters like not being able to get 4K recording if we opt to use the MISP norm pipe for VIO? Or are you confident that we can get the best of all 3 worlds

Exposure time concerns
I agree with your initial point that needing low exposure for low motion blur is important. As I mentioned in the intro to this message, I have a prototype working and I am now at a place where I can indeed tune the gain vs exposure / auto exposure params. Could you help me with that? I assume that this tradeoff also applies to thear0144camera, any lessons I can take from there?QVIO readoutTime param
Great find and thanks for pointing that out!! To confirm the specific numbers here, if I use the 1996x1520 preview width/height which has a documented read out time of 5.5ms (I should confirm this using the-dmode), then I should put0.0055for that parameter?As always, thank you for your help in camera related matters, we would be no where close to where we are now with robotic perception without your guidance!