Posts made by blossomrd

blossomrd

@Alex-Kushleyev Hi Alex, awesome! - and that's a good question. We don't know necessarily where the text will be in the image beforehand, so do we have to look for it. On the other hand, looking for text and reading it are two different stages - so we are planning to try a pipeline where we first find text at a lower resolution/lower quality and then process selected crops with more resolution and/or processing. So, in short, debayering only selected parts of the image is something that we are interested in, but we only know what regions to process at runtime, not before we look at the image. If debayering selected regions is possible that's something we are definitely interested in. Thanks!

blossomrd

@Alex-Kushleyev
Hi Alex, thanks for the response and sorry for the delay. We've been running some preliminary experiments and already have some better results doing simple (and in retrospect maybe obvious) improvements like using the NV12 images (optionally encoded at high JPG quality) rather than the default JPG images provided by the snapshot tool.

However, we would definitely be interested in trying out the raw bayer images - going forward we might be looking at trying out some more experimental ways of processing the image data, probably not in real time but after data collection. I made a quick attempt at getting these images with the preview stream but couldn't get it working, so some explicit steps would be appreciated :).

The resolution we are interested in is the (native?) resolution, 4056 x 3040, with the IMX412. More system details:

system-image: 1.7.4-M0054-14.1a-perf
kernel:       #1 SMP PREEMPT Fri Feb 9 22:38:25 UTC 2024 4.19.125
--------------------------------------------------------------------------------
hw platform:  M0054
mach.var:     1.2
--------------------------------------------------------------------------------
voxl-suite:   1.1.3-1

As for background information, our application is reading small text (via OCR) at a distance, such that we might only get a few pixels per character. Clearly our ability to do so depends also on the lenses, gain, etc., so we are also optimizing those in parallel. There is also some ML-based processing that will happen after data capture, but the outcome will likely be better the more information we can get from the image, hence working with raw data is something we are interested in exploring. Luckily, we don't have to do this at frame rate - we will do this offline, after data capture, so we can afford trying some more sophisticated but slower methods. Moreover, we only need to do this for images captured every few seconds, rather than continuously, in a video. (That said, we are still exploring, and might also consider a burst capture of 3-4 frames as an alternative method to get more information out of the images).

blossomrd

We are also seeing the repeating image artifact, even after using the ISPs linked in https://forum.modalai.com/topic/3111/hi-res-image-quality-difference-voxl-and-voxl2/7.

blossomrd

@Alex-Kushleyev Hello! We're just starting a new project where we need to as much detail from high-resolution images as we can get, so this is pretty relevant for us. I can confirm that the new ISP is much better for our purposes; in the older firmware, the processing tended to oversmooth images and introduce artifacts, see for example the included image. , with old and new ISP. Note the conditions are not exactly the same, but hopefully it shows what I'm talking about; in the old ISP the letters are kind of smushed and there's some ringing artifacts. So thanks for the new ISP.

One thing that would be useful are flipped versions of these, currently we're getting all our images upside down ;).

Even more useful, as you touched on, would be to have some more control on the ISP side, as well as access to the raw images - I understand there's a tradeoff in efficiency and compressibility for video - in our case, we are willing to sacrifice latency for quality. Thanks!