Alright, solved it. I adjusted the MCS pose rate to be at 30 Hz (should be fine up to 50 Hz according to PX4 documentation), although this does not seem to have any significant impact.
The problem was that the EKF2_AID_MASK must be set to 24 (only position and yaw fusion) for motion capture, instead of 280 (default with VIO, which sends odometry messages including velocities). Hence the following fields must be set as follows:
...vision position fusion
vision yaw fusion
...
vision velocity fusion <-- Disable for Motion Capture (enabled by default for VIO)
I did not have to modify any other parameters.
@James-Strawson, thanks again for the help!