Problem with bitrate and omx encoder

Amither

Hi,

I have recently started exploring the video interface of the VOXL2, utilising the Hires camera setup. Unfortunately, some issues arose when working with the provided OMX encoder.
When using the preinstalled voxl-streamer application with bitrate of 1MB and resolution of 640x480, format NV12, the output image is grainy.
I tried writing my own Gstreamer plugin (voxlsrc) which encodes frames using omxh264enc and sends through RTP.

gst-launch-1.0 voxlsrc device=hires ! video/x-raw,width=640,height=480,format=NV12,framerate=30/1 ! videoconvert ! omxh264enc control-rate=2 target-bitrate=1000000  ! video/x-h264,width=640,height=480,profile=main ! h264parse ! rtph264pay

When using my own computer, I am able to produce a clear image by simply replacing the encoder to x264enc (CPU encoder) and 1MB bitrate.

gst-launch-1.0 v4l2src ! videoscale ! videoconvert !  video/x-raw,width=640,height=480,framerate=30/1,format=NV12 ! x264enc tune=zerolatency bitrate=1000 ! h264parse ! avdec_h264 ! xvimagesink sync=false

When using voxl-portal or voxl-streamer with bitrate
10Mb or higher we see clearly picture.

Bitrate is a key element in our system, while we are unable to support the current bit rate (10Mb or higher). Is there a method to reduce the current bitrate and still maintain a clear picture?

Links for output pictures : https://postimg.cc/gallery/3RR7j5V

Thanks,
Amit Hershkovich.

Chad Sweet

You need to configure the resolution in voxl-camera-server to get a resolution higher than 640 x 480

Eric Katzfey

@Amither There are multiple parts of the processing chain that can affect video quality. What type of network link are you using (e.g. WiFi, Ethernet, etc.)? I often start with Ethernet to make sure that I can get everything working well without having to deal with the uncertainties of a wireless link. Then, when everything is as good as I can get it over Ethernet I will switch to wireless and see how that affects the quality. Just out of curiosity, what does your custom GStreamer plugin voxlsrc do?

Amither

Hi @Chad-Sweet,
I tried to configuring higher resolution by editing /etc/modalai/voxl-camera-server.conf to higher resolution (720P) as follows, however, the output stays the same.

voxl2:/$ cat /etc/modalai/voxl-camera-server.conf 
{
	"version":	0.1,
	"cameras":	[{
			"name":	"tracking",
			"enabled":	true,
			"frame_rate":	30,
			"type":	"ov7251",
			"camera_id":	0,
			"ae_desired_msv":	60,
			"ae_filter_alpha":	0.600000023841858,
			"ae_ignore_fraction":	0.20000000298023224,
			"ae_slope":	0.05000000074505806,
			"ae_exposure_period":	1,
			"ae_gain_period":	1
		}, {
			"name":	"hires",
			"enabled":	true,
			"frame_rate":	30,
			"type":	"imx214",
			"camera_id":	1,
			"preview_width":	1280,
			"preview_height":	720,
			"snapshot_width":	3840,
			"snapshot_height":	2160
		}]
}

Hi @Eric-Katzfey,
We are working with Ethernet which should provide the best output. I tried implementing a simple plugin for Gstreamer which will allow me to pull a video stream from the camera server. I am sharing the repo if you find this useful -
https://github.com/amit-hers/Voxl-Plugin.

Thanks,
Amit

Amither

Hi,
In addition, when trying to work with the OMX h264enc plugin provided by Qualcomm, I am facing the following issues -

The picture is completely distorted as can be seen here -
https://www.youtube.com/watch?v=niex_WXB6iI
The CLI command is

sudo gst-rtsp-launch "( qtiqmmfsrc camera=1 device=hires latency=0 ! video/x-raw,width=640,height=480,format=NV12,framerate=30/1 ! omxh264enc qos=1 ! video/x-h264, profile=(string)high, width=(int)640, height=(int)480, framerate=(fraction)30/1, format=(string)NV12, interlace-mode=(string)progressive, chroma-format=(string)4:2:0,profile-level-id=428014 !  h264parse ! rtph264pay name=pay0 pt=96 )"

I did manage to receive a clear stream by using Qualcomm's SDK and Qmmf server - https://www.youtube.com/watch?v=e0B045-gBUA
with the following CLI command

sudo gst-rtsp-launch "( qtiqmmfsrc camera=1 device=hires video_0::bitrate=1200000 ! video/x-h264,width=640,height=480,format=NV12,framerate=30/1 ! h264parse ! rtph264pay name=pay0 pt=96  )"

In case that i tried to use voxlsrc with omxh264enc that rides over voxl-camera-server.
I see grainy picture, omxh264enc cant handle low bitrate in that situation.

https://www.youtube.com/watch?v=UXO9PGNwfE8

voxlsrc device=hires ! video/x-raw,width=640,height=480,format=NV12,framerate=30/1 ! videoconvert ! omxh264enc control-rate=2 target-bitrate=1200000  ! video/x-h264,width=640,height=480,profile=main ! h264parse ! rtph264pay name=pay0 pt=96

Our scenario requires having two different streams, one is encoded and the other is a raw output.
Our current setups works with Gstreamer and v4l2src, v4l2h264enc. Since VOXL uses a Qualcomm based board I tried converting the current implementation to use Gstreamer with Qmmfsrc, voxlsrc (my own plugin), omxh264enc.

My question is how can i make the omxh264enc work with qmmfsrc or voxlsrc ?

I hope this clarifies the situation.
Thanks,
Amit

Connor Fuhrman

I'm experiencing similar issues with the omxh264enc showing grainy video. Using identical gstreamer pipelines except for swapping x264enc and omxh264 elements I see clear video with SW-only encoding but grainy with OpenMAX element. Any updates on the omxh264enc gstreamer element?

Alex Kushleyev

@Connor-Fuhrman and @Amither ,

Our best suggestion is to use voxl-camera-server which allows you to concurrently output encoded frames along as raw YUV frames (at different resolutions!).

Perhaps more details for the voxl-camera-config.xml will help. Here is a snippet:

{
	"version":	0.1,
	"cameras":	[{
			"type":	"imx214",
			"name":	"hires",
			"enabled":	true,
			"camera_id":	1,
			"fps":	30,
			"en_preview":	true,
			"preview_width":	1280,
			"preview_height":	720,
			"en_small_video":	false,
			"small_video_width":	1280,
			"small_video_height":	720,
			"small_venc_mode":	"h265",
			"small_venc_br_ctrl":	"cqp",
			"small_venc_Qfixed":	30,
			"small_venc_Qmin":	15,
			"small_venc_Qmax":	40,
			"small_venc_nPframes":	9,
			"small_venc_mbps":	5,
			"en_large_video":	true,
			"large_video_width":	3840,
			"large_video_height":	2160,
			"large_venc_mode":	"h265",
			"large_venc_br_ctrl":	"cbr",
			"large_venc_Qfixed":	38,
			"large_venc_Qmin":	15,
			"large_venc_Qmax":	50,
			"large_venc_nPframes":	29,
			"large_venc_mbps":	1,
			"en_snapshot":	false,
			"en_snapshot_width":	1920,
			"en_snapshot_height":	1080,
			"ae_mode":	"off",
			"en_rotate":	false,
			"en_raw_preview":	false
		}]
}

In this configuration, the large encoded video stream will be 4K30, while there is also a smaller encoded stream available at 720p resolution and also a "preview" frame, which is YUV is avaialble at 720p (1280x720).

So with this configuration, you can use the voxl-streamer application to stream the large or small encoded video and use the standard mpa interface to subscribe to raw YUV frames and process them. Please note that if you change the YUV frame size to 4K, there will be a lot of data moving around between processes, see related post: https://forum.modalai.com/topic/2980/how-to-use-gstreamer-with-mipi-camera-imx412-hi-res-cam-in-voxl2

Also, please note that in this example i set the small encoded video bit rate control to cqp (variable) but large encoded to cbr (constant bit rate). You can change those as you need and experiment. Also please note that it is a known issue that the venc_mbps is not correctly reflected in actual bitrate of the video, but it does have an effect and you can adjust the setting until you reach the desired quality / video size that you want.

If you have any more questions about this approach, please let us know

Connor Fuhrman

@Alex-Kushleyev thank you for the quick reply! My application requires the ability to encode an arbitrary image frame not necessarily directly from a camera stream. Can you please comment on the feasibility of using the modal-pipes library to provide frame data to the video-server (or maybe the proper module is the voxl-streamer?)? I've used this solution in the past to provide data to the Tensorflow Lite server so I'm hoping the same approach can be taken here.

From this file on ModalAI's gitlab it looks like I can configure the voxl-streamer to accept a raw image frame in YUV colorspace? I can then make the pipe contain arbitrary image data, no?

Alex Kushleyev

@Connor-Fuhrman , yes you can send raw frames (YUV) to voxl-streamer and it will use a hardware encoder to encode the frames into a video. You just need to publish the frames via MPA to a new "handle" and tell voxl-streamer which handle to use.

So your processing application can subscribe to YUV frames, do some work on the frames, annotate them, and send them out under a different handle via MPA. Just keep in mind that sending very large frames over MPA (such as 4K) will incur noticeable CPU usage.

Do you need an example how to publish a YUV frame via MPA? It sounded like you already know how to do it.

Connor Fuhrman

@Alex-Kushleyev I've got this working using a custom ModalAI pipe sending an RGB image (I tried YUV because I thought that was the expected input but looking through the source code I saw the vixl-streamer could accept RGB as well).

I am unfortunately only successful with a 480p image and nothing at a higher resolution. I do not get any error messages writing to the pipe with a larger image size and I don't see the voxl-streamer receive an initial frame. Do you have thoughts as to why I'm not seeing any output from voxl-streamer and not getting an error when writing to the pipe?

I've attached the logs from my application and from running /usr/bin/voxl-streamer -v 0.

My configuration file is

{
    "input-pipe": "ros_to_voxl_streamer",
    "bitrate":	10000000,
    "decimator": 1,
    "port": 8900,
    "rotation":	0
}

The logs from the voxl-streamer can be found here for 480p images and here for 720p images. The logs from my application can be found here for the 480p image output and here for the 720p image output.

When I write to the pipe my function is

  bool image_pipe::write_frame(cv::Mat frame_bgr)
  {
    namespace time = std::chrono;

    log_verbose(fmt::format("Input frame has size [width={}, height={}]",
			    frame_bgr.size().width,
			    frame_bgr.size().height));
    
    static auto first_frame = true;
    static std::size_t num_frame_write_errors = 0;
    static constexpr std::size_t num_acceptable_frame_write_errors = 10;

    // Resize the image to the output size specified
    cv::resize(frame_bgr, frame_bgr, img_size);

    log_verbose(fmt::format("Resized input frame to size [width={}, height={}]",
			    frame_bgr.size().width,
			    frame_bgr.size().height));

    cv::Mat frame_rgb;
    cv::cvtColor(frame_bgr, frame_rgb, cv::COLOR_BGR2RGB);

    // If this is the first frame we've sent then configure the
    // camera metadata
    if (first_frame)
    {
      camera_mdata.height = static_cast<std::int16_t>(img_size.height);
      camera_mdata.width = static_cast<std::int16_t>(img_size.width);
      camera_mdata.size_bytes = static_cast<std::int32_t>(frame_rgb.total()) *
	static_cast<std::int32_t>(frame_rgb.elemSize());
      camera_mdata.stride = static_cast<std::int32_t>(frame_rgb.step[0]);

      log_info(fmt::format("First frame written to image_pipe object. Metadata:\n{}",
			   describe_camera_pipe_metadata()));

      first_frame = false;
    }

    camera_mdata.timestamp_ns = time::duration_cast<time::nanoseconds>(
      time::steady_clock::now().time_since_epoch()).count();

    log_verbose(fmt::format("Sending image through pipe with metadata\n{}",
			    describe_camera_pipe_metadata()));

    const auto* frame_data = reinterpret_cast<const void*>(frame_rgb.data);
    const auto ret = pipe_server_write_camera_frame(0, camera_mdata, frame_data);

    if (ret != 0)
    {
      if (num_frame_write_errors++ == num_acceptable_frame_write_errors)
      {
	throw std::runtime_error("Maximum number of consequtive frame write errors reached");
      }
      log_warning(fmt::format("Was not able to write a frame to the pipe. There are {} more error(s) left before failure",
			      num_acceptable_frame_write_errors - num_frame_write_errors));
      return false;
    }

    num_frame_write_errors = 0;

    if (++camera_mdata.frame_id % 100 == 0)
    {
      log_verbose(fmt::format("Camera metadata:\n{}",
			      describe_camera_pipe_metadata()));
    }

    return true;
  }

Note that the calls to log_ functions are what generates the above-linked log files.

and the construction of the pipe occurs here:

  static inline pipe_info_t make_pipe_info(const std::string_view pipe_name)
  {
    pipe_info_t info; // = PIPE_INFO_INITIALIZER;
    std::strcpy(info.name, pipe_name.data());
    std::strcpy(info.type, "camera_image_metadata_t");
    std::strcpy(info.server_name, "ros-voxl-streamer");

    info.size_bytes = 2 * MODAL_PIPE_DEFAULT_PIPE_SIZE;

    return info;
  }
  
  image_pipe::image_pipe(std::string pipe_name, cv::Size _img_size)
    : voxl_pipe_info(make_pipe_info(pipe_name)),
      img_size(_img_size)
  { 
    // Configure the camera_mdata member
    camera_mdata.magic_number = CAMERA_MAGIC_NUMBER;
    camera_mdata.exposure_ns = -1;
    camera_mdata.gain = -1;
    camera_mdata.format = IMAGE_FORMAT_RGB; // IMAGE_FORMAT_YUV420;
    camera_mdata.frame_id = 0;
    camera_mdata.framerate = 5;

    const auto ret = pipe_server_create(0, voxl_pipe_info, 0);
    if (ret != 0)
    {
      throw std::runtime_error("Cannot create pipe");
    }
  }

Alex Kushleyev

@Connor-Fuhrman , I am not sure.

Please try to enable debug-level logging, so you may see some errors (both in your application) and voxl-streamer.

In your application you can add the following :

M_JournalSetLevel((M_JournalLevel) debugLevel);

example: https://gitlab.com/voxl-public/voxl-sdk/services/voxl-camera-server/-/blob/master/src/main.cpp?ref_type=heads#L301

voxl-streamer already has this option, just need a command line option : https://gitlab.com/voxl-public/voxl-sdk/services/voxl-streamer/-/blob/master/src/main.c?ref_type=heads#L383