Image Stabilization calibration and pipe size clarification

jameskuesel

Working with EIS a bit more and was hoping for some clarification/advice on a couple things

First, want to clarify calibration for EIS. In the doc seems like it’s fine to calibrate at the full 4040x3040 (via disabling eis and setting misp_width and misp_height temporarily) or can alternatively calibrate at half-res at 2020x1520. Could I also hypothetically calibrate at 1010x760? We have our own calibration routine and the smaller the faster it can run.

I’m assuming as long as the aspect ratio is the same it should be fine?   Furthermore, if I were to use the alternate resolution of 1996x1520 I should obviously do a calibration at that resolution? Or, keeping aspect ratio perhaps at 998x760?

Basically, just want to confirm that as long as a calibrate at some scale of whatever I set the preview frame to should be fine? And once I'm done calibrating I'm ok to then set misp_width and misp_height to my desired output resolution? This leads to my next clarification

I also wanted to clarify that EIS runs on whatever I set the preview frame to correct? As in that frame/resolution is what the algorithm runs on? misp_width and misp_height are just what gets outputted (as in its not like the algorithm is also getting run at that resolution)?

In that case what do you recommend eventually setting the output misp_width and misp_height to? Can they be anything? (Obviously something small enough to stream for example) I noticed in one example you chose 1280x720 but notably that is not the same aspect ratio as 4040x3040 or 1996x1520. Will stretching or warping occur? Should I pick something that is the same aspect ratio? Or does that not matter?

Any guidance much appreciated, thanks!

Alex Kushleyev

Hi @jameskuesel ,

When you enable en_raw_preview, which is what we do when MISP is enabled, the preview_width and preview_height basically force the selection of a specific camera mode with those dimensions. This means that the camera will send a bayer image of that size.

After the image is received by the ISP (which does nothing else in RAW-only mode), then MISP consumes the raw bayer image and performs debayering on the GPU. Then misp_width and misp_height will be used to specify the dimensions of the output image, sampled from the original bayer image.

If the output dimensions have exactly the same ratio as the input dimensions, then MISP will perform (arbitrary) down-scaling only (no crop = same horizontal and vertical FOV). However, if the output dims have different aspect ratio, the output image will be a cropped (and downscaled) version of the original image, such that the width and height ratio is maintained (features are not stretched). The crop is selected to maximize the fit of the output image within the input image.

With that in mind, if you select 4040x3040 as preview resolution and select misp resolution such that both width and height are scaled by the same factor (does not have to be integer), you can calibrate the intrinsics using the small image and after the calibration you can upscale the intrinsics:

multiply the principal point offsets by the same factor
multiply the focal length by the same factor
keep the same fisheye distortion coefficients (since they are a function of angle, not pixels)

See another version of this explanation in a different context : https://forum.modalai.com/topic/4900/running-qvio-on-a-hires-camera/12 (look for "intrinsics")

Input to the EIS algorithm is the full bayer image that is defined by the preview_width and preview_height.
the output image size is defined by misp_width and misp_height, just like when EIS is off. The misp resolution can be arbitrary
an important parameter is misp_zoom, which defines the relationship between the original image and the output image, in terms of zoom level. if you set zoom = 1.0 with EIS enabled, then the misp output will produce the largest undistorted image that fits within the original bayer image
there will be no deformation (stretching). the EIS ROI size will fit within the original bayer image and the zoom level is really controlling how much the FOV is reduced and how much stabilization margin you have.
you can enable the ROI display (PIP or side-by-side) to see how the EIS ROI fits within the original full image
the actual resolution of the misp output has nothing to do with how the EIS ROI fits within the original bayer image : the zoom value and fisheye undistortion dictate that. The output misp resolution just defines how many pixels the output image will contain, not the field of view (FOV).
for output resolution, you can pick whatever you need for your application.

By the way, you can have multiple EIS outputs on the same camera concurrently (MISP supports up to 4 channels). So you can have one stream that is small resolution (for streaming over wifi) and another one higher resolution (and can be different aspect ratio) for high quality recording. https://docs.modalai.com/camera-video/electronic-image-stabilization/#using-eis-with-misp-channels

Please note that currently, when you use multiple misp channels from the same camera, the image is processed separately, meaning if you have a large resolution and small resolution misp stream, the small resolution is NOT generated by downsampling the output of the large stream (which could be a nice optimization, in some cases).

Please let me know if you have any other questions.

Alex

jameskuesel

@Alex-Kushleyev

Thanks for the detailed response Alex.

I think this addresses my most important concern, which was stretching of the misp pipes, I didn't realize that it was just cropping which is good news.

One more thing: Seems enabling Snapshots isn't compatible alongside MISP . Is that a fundamental limitation, or more of a how the stack is built thing? And is there currently any recommended way to still save a still image with MISP enabled? Or if not, a future implementation plan?

What I’m aiming for long term is high-res recording, lower-res streaming, and the option to grab a still in one configuration. With MISP ofc.

Best,
James

Alex Kushleyev

Hi @jameskuesel ,

Yes there is an issue when both MISP and snapshot are enabled. I will look into it again.

By the way, what snapshot are you looking to grab? There are a few options;

a frame from high-res recording (with eis?)
a frame from low-res streaming (with eis?)
full frame, full resolution (no stabilization)
- what format? jpg or raw bayer (for offline processing)

We can implement a snapshot feature in MISP (and not going through ISP).

Alex

jameskuesel

@jameskuesel

Hi Alex,

To be greedy, ideally as many of those as possible. Ones that appeal most to us would be a

A frame from high-res recording (with eis)
A full frame, full res (no eis)

Then for format,
JPG probably makes the most sense but ofc, the more options the better haha.

Alex Kushleyev

@jameskuesel ,

For item 1, (a frame from hires recording with eis), you can do this already by just capturing the YUV from the hires_misp_color stream. you could do this using existing tool voxl-record-raw-image.

For item 2, the full frame can also be captured (using the same tool) but only in raw bayer format. This would be good for offline processing, if you wanted to get the maximum image quality, however this image would not have any processing applied from MISP.

Additionally, since MISP supports multiple outputs, as you already know, (misp channels), you could set up one channel to be a full frame image (which you normally dont stream), with EIS off.

So with correct voxl-camera-server.conf, you should be able to get all the streams you want

full frame
hires recording
low-res streaming

And grab YUVs from any of those streams. The last part would be then just encoding them to JPG (which could be done separately or part of voxl-record-raw-image, which we could add)

If you send me your voxl-camera-server.conf (or the part specific to the hires camera), i can update it to show you how you can get the three streams.

Alex

jameskuesel

@Alex-Kushleyev

Hi Alex,

Sorry has been awhile, keep having to step away and re-visit this.

First, seems like snapshot seems to be fixed in one of your latest dev releases, tried that out and seems to work so that’s awesome. Able to capture a full frame non-MISP image from the 412. That being said, although I tested out voxl-record-raw-image on the hires_misp_color stream, and was able to eventually convert it to a jpg, its not as usable if not in a standard image format by default.

In addition I had a few more clarifying questions.

1.

Seems like there’s no way to have the large hires pipe on at the same time as the MISP pipe? Meaning it’s not possible to get a recording of the full frame, no eis, at the same time the MISP pipe is being used? The reason why this might be useful is because although I could have a high-resolution MISP pipe for recording it would have the same zoom level as the other misp pipes as it seems that is shared parameter. I assume this is a system limitation and the voxl2 might not be able to handle having multiple 4k pipes and encoding them. Just want to verify this.

2.

I noticed there is a imx412 and imx412-misp driver. What is the difference between these two? MISP seems to work fine using the regular imx412 driver (and that’s what I’ve been using since https://docs.modalai.com/camera-video/electronic-image-stabilization/ didn’t make mention of the other one).

Is it just the same thing, different name or are there gains to be made switching to the dedicated MISP driver? What are those?

Let me know, Thanks!

P.S love to see the continued work being done by you all here on the camera server very cool!

Alex Kushleyev

hi @jameskuesel , nice to hear from you

Please note that we recently added new functionality to voxl-record-raw-image to save to jpeg using the software jpeg encoder (turbojpeg, similarly to how it's done in voxl-portal for sending the images to the browser):

voxl-record-raw-image -h


Record a raw image from an MPA pipe to disk
This is typically used for inspecting raw or YUV
image data formatting or for loading into something
like MATLAB or OpenCV for post-processing.

Optional arguments are:
-d, --dir {dir}       directory to save output files in (default: /data/raw-images/)
-h, --help            print this help message
-j, --jpeg            convert to JPEG before saving (supports NV12, NV21, RGB, RAW8 formats)
-q, --jpeg-quality    JPEG quality for compression (1-100, default: 90)
-n, --num-images {n}  number of images to save from the pipe (default 1)
-m, --meta            save metadata in file name (timestamp (us), exposure (us), gain)
-s, --skip {n}        skip n frames before saving a new frame

VOXL2 should be able to encode up to 4x 4K30 streams (video encoder can do 8K30). So there is no limitations to encode two 4K30 streams (if the stream is larger than 4K, such as 4040x3040, still two of them should be fine). Since MISP supports up to 4 output channels, that should be ok, but it seems that youre issue is that you want to have different zoom on different channels. This is a feature we have also been discussing internally and should be easy to add a param that would behave like this: either shared zoom for all misp channels or individual zoom (via config and controlled via the control pipe). Is that something that would work for you? We could add this pretty quick.
The latest IMX412 driver ( which is here : https://storage.googleapis.com/modalai_public/temp/imx412_test_bins/20250919/imx412_fpv_eis_20250919_drivers.zip) has been optimized to get the following:

various custom resolutions, as mentioned here : https://docs.modalai.com/camera-video/low-latency-video-streaming/#imx412-operating-modes
maximize throughput in order to reduce rolling shutter skew (also enable 4040x3040 @60 fps, allowing 4K60 EIS output!)
BTW, there has also been quite a bit of headache over IMX412 camera interference with GPS, we are going to release updated IMX412 drivers very soon. see
https://forum.modalai.com/topic/5116/gnss-emi-mitigation-guidelines
https://docs.modalai.com/emi-mitigation-for-gnss/

Once we release the new IMX412 drivers that do not affect GPS, we will provide better documentation of the existing drivers and difference.. but for now you should just use the "eis" driver for imx412, as i linked above.

Alex

Alex Kushleyev

@jameskuesel , i have been testing a version of camera server that supports independent zoom / drag on each of the 4 misp channels and also independent EIS modes. This means that you can have one full size unstabilized image and one or more stabilized with EIS and arbitrary zoom and look-at positions for each stream.

There is also an updated voxl-portal version that has the multi view that supports showing and interactive with multiple streams at the same time.

If you would like to test it, let me know, i will need to document a bit. Please provide some more information such as what camera you use and what use cases you need, so i can provide an exact camera server config to test (basically specs for each misp channel, - fps, dims, eis/on off and eis mode)

Alex

jameskuesel

@Alex-Kushleyev

Hi Alex, apologies again!

Yes independent zoom is what we would be looking for. Awesome to hear you’ve been working on it. Yeah the three main use cases for us would be recording, streaming, and calibration (Makes it easier for an assembly process to not have to switch camera configs). So here is the minimum channel use case we were looking for

We currently use  J6_UPPER_SENSOR: imx412-fpv

Channel 1 (Potential Recording Pipe 1 (Full Frame)):
Dimensions: 4040x3040
FPS: 30fps
Zoom: OFF/1x
EIS: OFF
EIS MODE: N/A

Channel 2: (Potential Recording Pipe 2 (Variable Zoom):
Dimensions: 4040x3040
FPS: 30fps
Zoom: Variable (1x-whatever, would be sync with streaming pipe. We would implement this logic)
EIS: ON
EIS MODE: Horizon (or full-follow)

Notes: Would likely leave it to our users whether to record full frame or be synced with streaming pipe. So would only ever need one of the recording pipes likely never both at once.

Channel 3: (Streaming Pipe):
Dimensions: 1280x800 (16:10 matches our tablet aspect ratio, not sure if there is something better to use here.)
FPS: 30fps
Zoom: Variable (1x-whatever)
EIS: On
EIS MODE: Horizon (or full-follow)

Channel 3 (Calibration Pipe):
Dimensions: 1010x760 (we’ve been calibrating at quarter resolution with our own internal calibration routine)
FPS: 30fps
Zoom: Off/1x (zero zoom full picture)
EIS: OFF
EIS MODE: N/A
Notes: Really only need the grayscale/normalized from this.

Anyways, happy to give it a try if you have available. Zero rush, currently juggling a few projects.

Best, James