Voxl2 custom segmentation model lead to unwanted reboot

Riccardo Franceschini · 26 Jan 2024, 13:33

Hello everyone, I'm currently dealing with an issue while running a custom segmentation model on the tflite server. The board is maintaining an approximate inference time of 5Hz, which is good. However, I'm facing problems when trying to pass segmentation masks over pipes to be retrieved by the voxl-mpa-to-ros pipeline.

I've implemented a solution using a new data structure for the masks, which are larger than normal bounding boxes. I publish these masks to the pipe, and on the receiving end, the mpa_to_ros node reads and forwards the mask as a custom message. This process worked well with SDK 0.9.5, but after upgrading to SDK 1.1.2, the board crashes and reboots when attempting to subscribe to the tflite data (I can get few samples but then it crash). The code remains the same, and I've also tested changing the buffer, but the issue persists.

Interestingly, subscribing to the camera portal works fine, but subscribing to the pipe causes the crash. I'm wondering if it could be a problem with the SDK, perhaps an issue with memory management causing the board to crash and reboot. Or could it be a power supply problem?

Has anyone else experienced a similar issue after upgrading to SDK 1.1.2, or does anyone have insights into what might be causing the problem? Any suggestions for troubleshooting or potential solutions would be greatly appreciated. Thank you!

26 Jan 2024, 18:49

@Riccardo-Franceschini

Hey Riccardo, happy to try and help you out with this.

Per the repository graph for tflite-server it doesn't seem like we've changed anything in terms of what's being released in the SDKs. Of course, as you point out, the issue could be related to something entirely different.

When you're passing the segmentation masks over libmodal-pipe, can I ask how you're doing it? If you're trying to put that data into an ai_detection_t it likely isn't well-supported and you may be using to much memory (as you point out). If this were the case, you'd be better of making your own struct for the masks and then setting up your own pipe that can appropriately handle the data.

Let me know, hopefully we can get to the bottom of this!

Thomas Patton

Riccardo Franceschini · 29 Jan 2024, 10:09

Hi @thomas,

I createad a custom ai_segmentation_t struct as

#define MASK_HEIGHT 480
#define MASK_WIDTH 640
typedef struct ai_segmentation_t
{
    uint32_t magic_number;
    int64_t timestamp_ns;
    uint32_t class_id;
    int32_t frame_id;
    char class_name[BUF_LEN];
    char cam[BUF_LEN];
    bool first_segmentation;
    bool last_segmentation;
    float class_confidence;
    float detection_confidence;
    float x_min;
    float y_min;
    float x_max;
    float y_max;
    int8_t mask[MASK_HEIGHT * MASK_WIDTH]; // 2D array for segmentation mask
} __attribute__((packed)) ai_segmentation_t;

which are then written down to the pipe as (main.cpp file of the voxl-tflite-server)


            std::vector<ai_segmentation_t> detections;
            if (!inf_helper->postprocess_seg(output_image, detections, last_inference_time))
                continue;
            if (!detections.empty())
            {

                    for (unsigned int i = 0; i < detections.size(); i++)
                    {
                        pipe_server_write(DETECTION_CH, (char *)&detections[i], sizeof(ai_segmentation_t));
                    }

            }
            new_frame->metadata.timestamp_ns = rc_nanos_monotonic_time();
            pipe_server_write_camera_frame(IMAGE_CH, new_frame->metadata, (char *)output_image.data);
        }

And retrieved by the ai_detection_interface.h in the callback function of voxl-mpa-to-ros

static void _helper_cb(__attribute__((unused)) int ch, char *data, int bytes, void *context)
{
    AiDetectionInterface *interface = (AiDetectionInterface *)context;
    if (interface->GetState() != ST_RUNNING)
        return;
    ros::Publisher &publisher = interface->GetPublisher();

    voxl_mpa_to_ros::AiSegmentation &obj = interface->GetSegMsg();
    std::vector<voxl_mpa_to_ros::AiSegmentation> obj_array;
    voxl_mpa_to_ros::AiSegmentationArray obj_arrayMsg;

    int n_packets = bytes / sizeof(ai_segmentation_t);
    ai_segmentation_t *detections = (ai_segmentation_t *)data;
    
    bool first_segmentation = false;
    bool last_segmentation = false;
    for (int i = 0; i < n_packets; i++)
    {
        if (detections[i].magic_number != AI_DETECTION_MAGIC_NUMBER)
            return;

        // publish the sample
        obj.timestamp_ns = detections[i].timestamp_ns;
        obj.class_id = detections[i].class_id;
        obj.frame_id = detections[i].frame_id;
        obj.class_name = detections[i].class_name;
        obj.cam = detections[i].cam;
        obj.class_confidence = detections[i].class_confidence;
        obj.detection_confidence = detections[i].detection_confidence;
        obj.x_min = detections[i].x_min;
        obj.y_min = detections[i].y_min;
        obj.x_max = detections[i].x_max;
        obj.y_max = detections[i].y_max;
        obj.first_segmentation = detections[i].first_segmentation;
        obj.last_segmentation = detections[i].last_segmentation;

        if (detections[i].first_segmentation)
        {
            first_segmentation = true;
        }

        if (detections[i].last_segmentation)
        {
            last_segmentation = true;
        }
        last_segmentation = true;
        first_segmentation = true;

        obj.mask = std::vector<int8_t>(detections[i].mask, detections[i].mask + sizeof(detections[i].mask) / sizeof(detections[i].mask[0]));

        // add the
        obj_array.push_back(obj);

    }

    obj_arrayMsg.segmentation_array = obj_array;

    if (n_packets > 0 && publisher.getNumSubscribers() > 0 && first_segmentation && last_segmentation)
    {
        publisher.publish(obj_arrayMsg);
    }

    return;
}

and inside the ai_detection_interface has the ai_segmentation and ai_segmentation_array defined as (same as the tflite-server):

typedef struct ai_segmentation_t
{
    uint32_t magic_number;
    int64_t timestamp_ns;
    uint32_t class_id;
    int32_t frame_id;
    char class_name[BUF_LEN];
    char cam[BUF_LEN];
    bool first_segmentation;
    bool last_segmentation;
    float class_confidence;
    float detection_confidence;
    float x_min;
    float y_min;
    float x_max;
    float y_max;
    int8_t mask[MASK_HEIGHT * MASK_WIDTH]; // 2D array for segmentation mask
} __attribute__((packed)) ai_segmentation_t;

typedef struct ai_array_segmentation_t {
    std::vector<ai_segmentation_t> segmentations;
} __attribute__((packed)) ai_array_segmentation_t;

The problem is that the model runs and the output of the detection makes sense :

The usage of the GPU is:


[Name   Freq (MHz) Temp (C) Util (%)
-----------------------------------
cpu0       1075.2     67.1    33.30
cpu1       1075.2     65.5    26.08
cpu2       1075.2     65.9    26.77
cpu3       1075.2     65.9    26.34
cpu4        710.4     67.1    16.08
cpu5        710.4     65.5    25.00
cpu6        710.4     65.5    23.67
cpu7        844.8     67.1     6.61
Total                 67.1    22.98
10s avg                       22.78
-----------------------------------
GPU         587.0     67.1    97.92
GPU 10s avg                   49.08
-----------------------------------
memory temp:       65.0 C
memory used:  1125/7671 MB
-----------------------------------
Flags
CPU freq scaling mode: auto
Standby Not Active
-----------------------------------]

However, when I attempt to retrieve the segmentation from the voxl-mpa-to-ros the voxl2 reboot, even though some messages are received. While I might expect a crash at the code level or a segmentation fault, a total reboot occurs, which can create a few problems, especially when flying.

Also even tought it occurs more sporadically, the voxl2 also reboot by simply running the model and displaying on the portal the segmentations. However the voxl-inspect-cpu does not complain about overheating neither I notice any extra memory usage.

Any help is welcome!

29 Jan 2024, 16:52

@Riccardo-Franceschini

Yeah, this is super strange behavior. One thing I want to check on: we allocate a certain number of bytes for the tflite_data pipe as you can see here. I have no idea why this allocation is not dynamic with respect to the size of ai_detection_t but I wonder if you might be pushing out too many bytes and the program is quietly segfaulting somewhere. You should also check out any spot in the code that uses ai_detection_t (e.g. here) as things might not be getting published correctly for your struct.

Hope this helps!

Thomas Patton

Alex Kushleyev · 29 Jan 2024, 19:53

@Riccardo-Franceschini , can you please tell us how you are powering VOXL2? Specifically, what power regulator are you using to provide 5V and how that regulator is powered?

Also, how quickly after you start your segmentation model does the board reboot? Maybe the GPU does draw a lot of power, but does not have time to heat up sufficiently until a reboot happens (hinting to a possible issue with 5V power quality)

Riccardo Franceschini · 14 Feb 2024, 14:02

Hi @Alex-Kushleyev @thomas sorry for the late reply, but the problem seems to be solved when we changed the power module which probably was causing the failure.

14 Feb 2024, 15:33

@Riccardo-Franceschini

All good, glad you got it fixed!