VOXL2 RTSP decoding fails
-
@Alex-Kushleyev Thanks for quick response. I tried only viewing rtsp to mpa over voxl-portal and it failed once, on next reboot it didnt failed there and failed when I opened tflite mpa. So my suspicion is on the mpa publishing. Failure even happens if I close rtsp mpa by clicking on "VOXL-PORTAL" icon and again opening the mpa. Also on tflite I am using YoloV5 and passing IMAGE_FORMAT_RAW8 1920x1080p stream after RTSP decoding.
-
@Aaky, are you able to try different resolution and / or different rtsp source just to see if this behavior is consistent?
-
@Alex-Kushleyev I tried 640x480 resolution and its working consistently across multiple reboots. I am still stress testing this but not sure when I might fail. Now I shifted back to hardware decoding with 640x480 resolution.
-
@Alex-Kushleyev Update: I was trying with 640x480 with hardware encoded stream and this time I opened RTSP stream over QGC and parallely opened tflite mp before rtsp over voxl-portal and voxl had some hard fault where it had reboot. There seems to be some memory leak/corruption happening in mpa publish pipeline (just an assumption). Please advise ahead.
-
@Aaky, OK, i will double check if the python mpa publisher has a memory leak, this should be easy to verify.
Alex
-
@Aaky ,
I ran the example from here for 4 minutes and did not observe any memory leak here. I tried using the SW decoder and HW decoder with resize (as provided in the example)
This is for SW decoder (hence high cpu usage)
You can use
top
to check memory usage while you are running the test.In my test, i had the following setup:
voxl-camera-server
creating h264 stream 1920x1080 from imx214 camera- using
voxl-streamer -i hires_large_encoded
to create rtsp stream - using
python3 rtsp_rx_mpa_pub.py
to receive rtsp stream and publish a new image - using
voxl-portal
to live stream thertsp-debug
mpa image stream to the browser
You could try a similar setup to see if the issue occurs with a local stream.
-
@Alex-Kushleyev Alex I confirm when I tried the local stream with tracking camera (since I dont have hires on my voxl), I am still getting hardfault even if I try to do "voxl-inspect-cam -a". Can you please urgently send me the deb files which you have installed over SDK 1.1.3 on your setup and also guide me further? Can this be hardware issue?
-
@Aaky , this is strange. Can you please disable the tflite server and try again? Just running camera server and inspect cam should not reboot voxl2. Also, how are you powering your voxl2?
-
@Alex-Kushleyev I tried disabling tflite. I ran rtsp_rx_mpa_pub.py with tracking camera rtsp url, went to voxl-portal and viewed rtsp-debug mpa pipe, then did some back and forth with home page of voxl-portal, voxl rebooted again. We are powering voxl with standard power supply with one end to 4S battery and other end to ESC and regulated power supply to VOXL2 in standard configuration. Culprit is multiple times opening of mpa pipe/rtsp url creates problem. Sometimes this fault comes at first time or sometimes randomly at n'th time.
-
@Alex-Kushleyev Can you try to play the same python file which I have attached above?
-
@Aaky, sure i will try it later today
-
@Alex-Kushleyev Thanks. One more observation, I tried setting FPS to 30 in the rtsp_rx_mpa_pub python script and those faults have stopped for moment. Is the FPS or any other parameter important in the RTSP MPA -> Tflite -> streamer pipeline?
-
@Alex-Kushleyev Update: FPS changing dosent help. Still I am facing failure. On my above google drive I have uploaded my latest failing rtsp decoding python script, voxl-opencv debian, my startup service for kickstarting the RTSP decoding script and also voxl-mpa-tools debian. Please install them and see if there is any problem over SDK 1.1.3. voxl-mpa-tools I have cloned from here with branch pympa-experimental and voxl-opencv I have downloaded from this thread. There is some incompatibility causing this failure. Please let me know your analysis.
I even tried with tracking camera skipping RTSP decoding entirely and providing tracking camera feed to tflite model and then RTSP streaming, that is also having hard time. I am clueless about these failures. Also VOXL keeps rebooting and never comes out of reboot cycle randomly when tflite is active. Please help.
-
@Alex-Kushleyev This issue is extremely urgent for me for some demonstration. My request is if you can provide me solution for this problem as soon as possible it would be really helpful.
-
@Alex-Kushleyev One more update in this respect, I am using libmodal-pipe version 2.10.0 on my SDK 1.1.3, this version of libmodal-pipe is I guess supported on next SDK 1.2.0. Can this be a problem? This came as dependency while installing voxl-mpa-tools I guess.
Say I update to SSDK 1.2.0, what should be exact voxl-opencv and voxl-mpa-tools version to be installed? I think they are conflicting somewhere leading to hardfaults.
-
@Alex-Kushleyev Apologies for trailing messages. Any update over this problem?
-
@Aaky The source of truth for SDK 1.2 packages can be found here: http://voxl-packages.modalai.com/dists/qrb5165/sdk-1.2/binary-arm64/
-
@Aaky , sorry for the delay.
You can see what packages are shipped with each SDK here : https://docs.modalai.com/sdk-1.1-release-notes/#sdk-113-package-list (link is pointing to SDK 1.1.3). Tom also provided the address above where all the packages are available for download for each major release. Also, you can see the tags in the actual git repo for each package, for example here:
You can see that SDK-1.1.0 released version v2.9.2 and the only other release was for SDK 1.2.0.When I originally made the post with pympa tools (including the rtsp example), i was using SDK 1.1.3 and everything was working fine. I used the libmodal-pipe that shipped with SDK.
I just installed SDK 1.2.0 and then on top of that i installed the opencv with python and voxl-mpa-tools i posted before, i am re-posting the links for clarity:
https://storage.googleapis.com/modalai_public/temp/voxl2-misc-packages/voxl-opencv_4.5.5-3_arm64.deb
https://storage.googleapis.com/modalai_public/temp/voxl2-misc-packages/voxl-mpa-tools_1.1.5_arm64.debThen i connected OV7251 tracking camera to my VOXL2 with the basic
voxl-camera-server.conf
which just publishes raw8 640x480 image.Next, i ran
voxl-streamer -i tracking
to encode and create an rtsp stream for from the raw8 images.Finally, i just updated the rtsp address (just
stream_url = 'rtsp://127.0.0.1:8900/live'
) in the modifiedrtsp_rx_mpa_pub.py
script that you shared.And the last part is i started voxl-portal to view the rtsp stream that is re-published back to mpa as
rtsp-debug
mpa channel.So.. everything is working fine, no issues, no reboots.
Doing further investigation, i looked at
dmesg -w
output while the script is running and i saw messages like the following:[ 2306.392927] msm_vidc: err : 00000002: h264d: qbuf cache ops failed: CAPTURE: idx 15 fd 74 off 0 daddr dc900000 size 786432 filled 0 flags 0x0 ts 0 refcnt 2 mflags 0x1, extradata: fd 80 off 245760 daddr de7bc000 size 16384 filled 0 refcnt 2 [ 2306.424439] msm_vidc: err : 00000002: h264d: dqbuf cache ops failed: CAPTURE: idx 16 fd 76 off 0 daddr dc800000 size 786432 filled 786432 flags 0x10 ts 11920339000 refcnt 2 mflags 0x0, extradata: fd 80 off 262144 daddr de7c0000 size 16384 filled 16384 refcnt 2 [ 2306.426040] msm_vidc: err : 00000002: h264d: qbuf cache ops failed: CAPTURE: idx 16 fd 76 off 0 daddr dc800000 size 786432 filled 0 flags 0x0 ts 0 refcnt 2 mflags 0x1, extradata: fd 80 off 262144 daddr de7c0000 size 16384 filled 0 refcnt 2 [ 2306.457763] msm_vidc: err : 00000002: h264d: dqbuf cache ops failed: CAPTURE: idx 17 fd 78 off 0 daddr dc700000 size 786432 filled 786432 flags 0x10 ts 11953488000 refcnt 2 mflags 0x0, extradata: fd 80 off 278528 daddr de7c4000 size 16384 filled 16384 refcnt 2 [ 2306.461468] msm_vidc: err : 00000002: h264d: qbuf cache ops failed: CAPTURE: idx 17 fd 78 off 0 daddr dc700000 size 786432 filled 0 flags 0x0 ts 0 refcnt 2 mflags 0x1, extradata: fd 80 off 278528 daddr de7c4000 size 16384 filled 0 refcnt 2 [ 2306.491300] msm_vidc: err : 00000002: h264d: dqbuf cache ops failed: CAPTURE: idx 0 fd 44 off 0 daddr dd800000 size 786432 filled 786432 flags 0x10 ts 11987146000 refcnt 2 mflags 0x0, extradata: fd 80 off 0 daddr de780000 size 16384 filled 16384 refcnt 2 [ 2306.492874] ion_sgl_sync_range: 291 callbacks suppressed [ 2306.492878] Partial cmo only supported with 1 segment is dma_set_max_seg_size being set on dev:kgsl-3d0 [ 2306.492892] msm_vidc: err : 00000002: h264d: qbuf cache ops failed: CAPTURE: idx 0 fd 44 off 0 daddr dd800000 size 786432 filled 0 flags 0x0 ts 0 refcnt 2 mflags 0x1, extradata: fd 80 off 0 daddr de780000 size 16384 filled 0 refcnt 2 [ 2306.524454] Partial cmo only supported with 1 segment is dma_set_max_seg_size being set on dev:kgsl-3d0 [ 2306.524479] msm_vidc: err : 00000002: h264d: dqbuf cache ops failed: CAPTURE: idx 1 fd 46 off 0 daddr dd700000 size 786432 filled 786432 flags 0x10 ts 12020491000 refcnt 2 mflags 0x0, extradata: fd 80 off 16384 daddr de784000 size 16384 filled 16384 refcnt 2 [ 2306.526015] kgsl_iommu_fault_handler: 141 callbacks suppressed [ 2306.526025] kgsl kgsl-3d0: GPU PAGE FAULT: addr = 500211000 pid= 10785 name=python3 [ 2306.526052] kgsl kgsl-3d0: context=gfx3d_user TTBR0=0x30001c873f000 CIDR=0x2a21 (read translation fault) [ 2306.526096] kgsl kgsl-3d0: FAULTING BLOCK: UCHE: TP [ 2306.526111] kgsl kgsl-3d0: ---- nearby memory ---- [ 2306.526134] kgsl kgsl-3d0: [0000000500130000 - 0000000500211000] (pid = 10785) (2d) ..
so there are actually two issues going on here (it seems)
- error messages from the decoder (h264d)
- some sort of GPU page fault (GPU is used for doing image format conversion / resize). Note that a page fault is not necessarily an issue, but I am not sure if this is a normal page fault or something that should not be occurring (read translation fault).
With these errors, my VOXL2 is not crashing, but still these are probably not good messages to see..
I changed the stream string in the test script to use software decoder and the errors are no longer printed in
dmesg
:stream = 'gst-launch-1.0 rtspsrc location=' + stream_url + ' latency=0 ! queue ! rtph264depay ! h264parse config-interval=-1 ! avdec_h264 ! autovideoconvert ! appsink'
Additinally, using HW decoder but sw-based videoconvert, also works without any errors in
dmesg
(note usingvideoconvert
instead ofautovideoconvert
. I believeautovideoconvert
uses GPU to do the format conversion)stream = 'gst-launch-1.0 rtspsrc location=' + stream_url + ' latency=0 ! queue ! rtph264depay ! h264parse config-interval=-1 ! qtivdec turbo=true ! videoconvert ! appsink'
You can try these basic tests to see if you also see the errors in
dmesg
and if the errors and crash goes away after changingn to SW decoder or SW-basedvideoconvert
.Regarding the error printed in
dmesg
, i am not sure what is actually causing it. It is not coming from ModalAI software, so we should try to work around the issue.With all that being said.. none of the tests that i ran results in a crash or reboot of VOXL2.. I suggest that you run
dmesg -w
in a separate window before running your test and see what is printed right before the system reboots. This can help. If you cannot see anything ondmegs -w
output via adb, you can also check/var/log/kern.log
to see the messages from previous boot (at the end of the log.. note that the log can be large as it saves previous kernel logs).Alex
-
@Alex-Kushleyev Thanks for all the information Alex. Actually I got it solved, the problem was with my VOXL Power cable coming from VOXLPM. After changing this wire the reboot stopped. I am wondering if there was some loose connection which was causing the problem when GPU acceleration started over VOXL. I was monitoring the current consumption of my system with voxl-inspect-battery and I noted, that before running ML model it was 0.9 Amps and after ML acceleration it was rising to 1.2 Amps.
Few observations from voxl-inspect-cpu while ML acceleration started was temperature rising from 70 (No ML acceleration) to 80 (with ML acceleration). I am using YoloV5 over GPU and 50% GPU consumption I was able to see post acceleration. Maybe my video decoding is also hardware based with some GPU usage over there causing more GPU usage. -
@Aaky , I am glad you figured it out. Most of the time the unexpected reboots are due to some sort of power issue, that is why I originally asked about the source of power - sometimes if a Power supply is used to power VOXL2, if the power supply cannot provide sufficient current, the system will reset. But sometimes, cabling is the issue.
Alex