Global Shutter Color Camera? (Not for VIO)

Peter Lingås 0

Hello ModalAI Team

I am looking for a global shutter color camera, preferably 1280x720 resolution or similar. Reading other posts it seems as if the AR0144 has been tested to work, but still needs some development to integrate and function well with the VOXL 2.
Is this only in the situation where it will be used for VIO? I will be using it as an object detection camera, and therefore it should be global shutter preferably for clear images. I can't seem to find any global shutter color cameras in your store. Is it possible to purchase this somehow?

I have tried emailing a few times about this for the past months, but can't seem to have received a response or confirmation that my email was sent.

Best Regards
Peter

Zachary Lowell 0

@Peter-Lingås-0 I do not have an answer for any global shutter cameras but what I can tell you is that the AR0144 camera can be used for anything as well not just VIO - the frames being piped out go into the software dedicated for OVINS but it can also be piped anywhere else due to the pub/sub architecutre, so you should have no problem piping that into your computer vision software either.

Alex Kushleyev

@Peter-Lingås-0 ,

We did have a sample batch of color AR0144 global shutter cameras, but i think we probably run out of those. I can double check. Due to low demand, we are not actively supporting them.

For general object detection applications, i would recommend IMX412 camera (M0161 part number, for example). This camera is 4056x3040 resolution, but it also supports 2x2 binned mode. The advantage of using binned mode (for this context), is reducing the rolling shutter effect.

The amount of rolling shutter distortion / skew is a function of the readout time (time between the start of exposure of first line and last line). This time depends on the camera and specific operating mode.

However, the IMX412 camera in combination with VOXL2 supports a 2x2 binned mode where you get a 1920x1080 image with only about 4.2ms of rolling shutter skew, meaning there is only 4.2ms between the first and last line. You can see more details on the readout time for different modes here : https://docs.modalai.com/camera-video/low-latency-video-streaming/#camera-pipeline-latency-in-different-operating-modes

Additionally, the IMX412 camera module has M12 lens mount, which allows for more flexibility in lens choice as well as supporting generally large lenses, which will improve image quality as well.

Would this be an option for you or is having a global shutter camera a requirement?

Alex

Peter Lingås 0

@Alex-Kushleyev

Hello! Sorry for the late reply, we have been moving offices for a week.

We are experimenting at the time and have not landed on a specific requirement yet, so rolling shutter is not a strict requirement.
Our plan was to use an AR0144 for object detection and then an IMX412 (4k resolution) to analyze the object we detected.

We would prefer to test a global shutter setup so that we can compare it to our IMX412, but if that is not readily available then maybe we should try the 2x2 binned mode you mentioned, and possibly we could use two IMX412.
We estimate to be flying at a speed of 4 m/s at a height of 5m, and our image height with the current lens at that height is 1m.
So with a 4.2ms readout time that will be a max skew of ~18 pixels in 1920x1080 mode?
It seems like the 1920x1080 can be reduced to 1280x720 with binning, but that is a factor of 1.5 and not 2x2, how does that binning work?

Also, would electronic image stabilization also help to make the image clearer or is that simply to fight vibrations or unpredictable motion?

Lastly, if I understand this correctly, we could try to:

Lower exposure time
Choose 1920x1080 resolution for faster readout time
Use 2x2 binned more for less rolling shutter distortion

Peter

Alex Kushleyev

Hello @Peter-Lingås-0 ,

I just checked and we have 0 stock of the color AR0144 cameras. At this moment, we do not have a plan to build more (due to lack of demand, as I mentioned). Unfortunately this means that we do not have any color global shutter cameras available.

Regarding rolling shutter, please keep in mind that in 2x2 binned mode, the 4.2ms skew is top to bottom across a pretty large vertical FOV (94 degrees for the lens used in IMX412 camera). Depending on the size of the object that you are looking for, you can calculate the distortion due to motion + rolling shutter skew.

EIS can correct for camera rotation, however motion compensation would require depth information and is a lot more complicated. We do support EIS on voxl2:

https://docs.modalai.com/camera-video/electronic-image-stabilization/
here is the performance you might expect (from tests some time ago):
- dual camera EIS on Starling 2 Max (hand carried) : https://www.youtube.com/watch?v=-BA_nU4kjQs
- down-facing camera EIS on Starling 2 Max (flight) : https://www.youtube.com/watch?v=-B5xKmBBCAc

You can see the rolling shutter effects present in the hand carried video. The down-facing video demo is a flight at about 5m, actually, might be close to what you are looking for. Both videos were actually shot in 4K (4056x3040 full frame) input mode (longer readout time) and then downscaled to 1080p during EIS (but EIS can also produce 4K output).

Additional notes:

lower exposure time will not reduce rolling shutter skew, it will reduce motion blur
2x2 binned mode automatically gives you 1920x1080 instead of 3840x2160, however you can also get 3840x2160 (or even full frame) and re-scale the image down. You will get better image quality (because the source image is higher resolution), but will have higher rolling shutter skew (12ms vs 4ms -- note it's not exactly 4 times difference due to how the camera operates internally).

Since we don't have an option for global shutter color camera, and the EIS performance looks acceptable, you could try going with IMX412 + EIS (single or dual IMX412 cameras). BTW, VOXL2 can run dual EIS as well, as shown in one of the videos. EIS can work with either full resolution or 2x2 binned as input and output resolution can be arbitrary.

Alex

Peter Lingås 0

@Alex-Kushleyev

Alright, the AR0144 is definitely out of question then.

The quality of the video with EIS is very good and should definitely be good enough for us.
It seems like the rolling shutter skew at 12ms is not a big issue for object detection, but I'm curious how
it will perform when trying to analyze very small AprilTags from distance. I estimate that the tags will cover about 2.5% of the image height.

Do you have any data about the computation power spent for dual 4K streams + 1080p EIS? I'm worried that will leave little room for a YOLOv8 tflite model
in addition to other services and code that has to run.

Also if there are any documentation or sources where I can read more about the 2x2 binning that would be good. It's not entirely clear for me yet
where and when the binning happens, and how it affects readout and if the VOXL does the binning requiring compute power.

Peter

Alex Kushleyev

@Peter-Lingås-0

some approximate perf specs for EIS are mentioned here : https://docs.modalai.com/camera-video/electronic-image-stabilization/#performance
- in short, 4K60 (60fps) will use about 30% of the GPU, so two 4K30 would be similar GPU usage. CPU usage for EIS is minimal, as all the heavy lifting is done on the GPU.

Does your YOLOv8 run on CPU or GPU?

There is no documentation on binning, but it is actually very simple : when we talk about 2x2 binning, we typically mean (unless otherwise noted) that the binning happens on the camera itself. Here is a diagram that shows how the pixels are typically binned in the RGGB bayer pattern : https://www.1stvision.com/cameras/IDS/IDS-manuals/uEye_Manual/hw_binning.html

Because the camera does binning internally, this particular camera is able to do processing faster and there is less data to send out, so the readout time is reduced. Please note that not all cameras that output 2x2 binned images actually reduce the read time - some cameras are limited by the analog ADC part, which still has to sample all pixels and then bin them, so for those cameras you would get 2x2 binned image with the same readout time. The readout time is mainly driven by how fast the camera can read the pixels (ADC conversion), process them and send them out via MIPI. The MIPI output is usually not the bottleneck, but the analog/digital processing is.

There is no overhead for VOXL2 to received 2x2 binned image, in fact the power and cpu usage will be slightly reduced (less data to handle). However, you will lose detail if you use 2x2 binned image.

For analyzing small april tags in the distance, well, the smaller they are, the lower the rolling shutter skew will be (locally) so motion blur will probably dominate the distortion, so you would need to set the exposure low. The IMX412 has a good low-light sensitivity, so this allows you to reduce exposure and still get good image.

One last note, when using EIS, we typically use a full frame image (4056x3040, not 3840x2160), which is a 4:3 aspect ratio and provides a lot of margin for stabilization for 3840x2160 output. The readout time for the full frame image is proportionally larger (16ms instead of 12ms), however, locally the read out skew does not change (larger image = larger total rolling shutter skew). Also, EIS supports arbitrary output dimensions, so you can choose how much stabilization margin you want to have vs the stabilized FOV.

Out of curiosity, are you building a custom drone to support this effort or using one of our drones?

Alex

Peter Lingås 0

@Alex-Kushleyev

Thank you for the thorough reply.
We are a student organization (~40 people) working on a custom drone swarm based on your ecosystem.
We would be happy to meet online with you and your team to talk more about our efforts, if that works for you.

We are currently working on integrating our code and making custom services on the VOXL.
The plan is to run the YOLOv8 on the GPU.
Considering the EIS runs on the GPU, I don't think compute power will be much of an issue.

It didn't hit me that the local distortion for the AprilTags would be so little; that's a great point.
I think the optimal solution for us is to have a dual IMX412 setup with 4K and then a 2x2 binned 1920x1080 -> 1280x720.
We will be operating in a well-lit area, so I think we can lower the exposure time significantly to reduce the motion blur.

Peter

Peter Lingås 0

@Alex-Kushleyev

Also, I came across this comment in your source code: https://gitlab.com/voxl-public/voxl-sdk/services/voxl-tflite-server/-/blame/master/src/model_helper/yolov8_model_helper.cpp#L167

Is this an old comment, or does YOLOv8 with GPU delegate still not work in SDK 1.6?
Potentially, we would want to use YOLOv11 also for the improved mAP.

Peter

Alex Kushleyev

Hi @Peter-Lingås-0 ,

That is great to hear that you are working on a swarm project! Please feel free to ask us about any technical details and we will try to help. We generally prefer to discuss these topics here in the forum because the discussion can also be beneficial to others. If there are details about your work that you may not want to share publicly, please try to generalize them and we would be happy to discuss! It is always interesting to have discussions with users who are exploring new features like EIS because it can also help guide our development efforts based on the demand.

I will try to find out the answers regarding YOLOv8 / YOLOv11.

Alex

Alex Prochazka

Hi @Peter-Lingås-0,

I have been working on voxl-tflite-server for the last few weeks and was actually testing this. I find that both YOLOV8 and YOLOV11 work well with GPU delegate. The comment you are referring to is removed as of version 0.51 of voxl-tflite-server.