Poor VIO performance when tracking distant features

afdrus

Hi everyone,

I am using a Sentinel drone upgraded to the SDK 1.1.0 to perform indoor autonomous missions.

The vibrations are all green with voxl-inspect-vibration;
The number of features >10;
Illumination conditions are good: the indoor environment is artificially illuminated with artificial light that don't cause any flickering;
I tried also to use another tracking cam pitched by 15 degrees instead of 45, but the following results are the same

However, I experienced drifts of 3m with respect to the ground truth over 10m of flight. In this case the features were roughly 10-15m away from the drone. Whereas as soon as the features were 30ish m away from the drone, it started to oscillate around the point sent by the mission with an amplitude ~0.5m. Consider the indoor environment as a room where most of the features are concentrated in only one side of the walls.

1)Is this behavior expected by the current VIO? If not, what solutions do you propose?
2) What drift should I expect over 10m in good environment conditions?
3) Slightly unrelated question: Is it possible to use the tracking cam and stereo_rear cam at the same time as input to the qvio-server?

Thank you in advance for your response.

Alex Kushleyev

@afdrus ,

If the features are 10-15 meters away, and especially 30 meters away, that would definitely create additional difficulty for QVIO.

Take a look at VIO initialization params, documented here: https://developer.qualcomm.com/sites/default/files/docs/machine-vision-sdk/api/v1.2.13/group__mvvislam.html#ga6f8af5b410006a44dbcf59f7bc6a6b38

Specifically,

logDepthBootstrap	Initial point depth [log(meters)], where log is the natural log. By default, initial depth is set to 1m. However, if e.g. a downward facing camera on a drone is used and it can be assumed that feature depth at initialization is always e.g. 4cm, then we can set this parameter to 4cm (or -3.2). This will improve tracking during takeoff, accelerate state space convergence, and lead to more accurate and robust pose estimates.

If you know that majority of features will be far away, you can increase this number, i believe the default is 1m but you can check your qvio server params.

Also, pay attention to:
useLogCameraHeight and logCameraHeightBootstrap

In order to rule out vibrations being an issue, you can also test VIO by holding the vehicle and moving it by hand, taking all necessary precautions to make sure the propellers would not spin up.

Whenever you are tuning complicated parameters, it is important to be able to use the same data set on with different parameters, otherwise testing can be extremely complicated and inconclusive. You should look into using logging and playback https://docs.modalai.com/voxl-logger/ . You can log several data sets and play them back on VOXL2 with different parameters and look at the results.

afdrus

@Alex-Kushleyev Thank you very much for the precise response!!!! I noticed that logDepthBootstrap is harcoded in voxl-qvio-server/server/main.cpp to be log(1) (i.e. 0) here: https://gitlab.com/voxl-public/voxl-sdk/services/voxl-qvio-server/-/blob/master/server/main.cpp?ref_type=heads#L360
but I will try to add it to the qvio server params so that I can tune it more easily.

useLogCameraHeight set to false should enable logDepthBootstrap instead of logCameraHeightBootstrap right? Regarding these two "bootstraps":

Could you please explain what is the difference between the two and when should one be preferred with respect to the other?
From the comments in https://developer.qualcomm.com/sites/default/files/docs/machine-vision-sdk/api/v1.2.13/group__mvvislam.html#ga6f8af5b410006a44dbcf59f7bc6a6b38 it is not clear to me what the meaning of these two params is: are they expected to work only during the initialization phase? What influence would they have during the rest of the flight?

Thanks again in adavance for your reply.

Alex Kushleyev

@afdrus

I believe that both logDepthBootstrap and logCameraHeightBootstrap are approaches to initialize the depth of new features, not only during initialization. However, the effect of these values is the greatest during initialization because the VIO state is not fully converged and it is difficult to triangulate the feature depths without a good VIO solution.

If VIO is fully initialized and tracking well, the effect of these values will be minimal because feature depth can be very quickly resolved even if there is a little motion (so in that case the initial depth of feature does not matter much).

However, if VIO is not working very well in a case of distant features, then initializing the feature depth to a larger value could help not only during initialization of VIO but also during normal tracking. The farther the features are, the more difficult it is to determine their depth without significant motion, so a good prior on the feature depth could help.

The logic behind using useLogCameraHeight is the following: if your vehicle has a camera pointing significantly down, lets say 45 degrees or even more, then a large part of majority of features that it sees will likely be on the floor during initialization. So if the camera is looking at a flat floor, the features on the floor will be arranged according to the flat plane geometry and if you know the camera height above the ground, which is easy to measure, that parameter is specified in logCameraHeightBootstrap (well log of that actual height). So, using this approach to initialize the feature depth, the depths will be initialized assuming the features in the image are on the ground plane, so the depth is calculated based on where the pixel appears in the image (it can be projected onto a ground plane).

If the camera is not looking sufficiently downward, an assumption that features are on the ground plane will not be very helpful, so it is better to just use an expected average feature depth during initialization.

In the past, we have pretty much always set useLogCameraHeight to true and logCameraHeightBootstrap according to the camera height above ground for best initialization results because majority of the vehicles have the tracking cameras pointing down at least 30 or 45 degrees. this helps during initialization because when the vehicle takes off, the feature motion is very significant during initial moments and knowing that the features are on the ground plane significantly helps with tracking during take-off. After take off, as I mentioned before, a good prior for feature depths is no longer needed.

Regarding using dual cameras, QVIO does not support using two cameras in one VIO instance. We are working on another solution that will support multiple cameras, but we are not ready to share the details yet, hopefully in about 2 months.