Local position diverges from visual odometry data

PX4 version: px4fmu-v2 v1.8.0 stable version

We are flying our quadcopter using PixHawk autopilot. We are also using Intel Depth Camera using Intel Realsense SDK 2.0 (build 2.17.0) for providing vision data using ORB_SLAM library. We have set the EKF_AID_MASK parameter such that vision fusion and vision yaw fusion are both enabled.

The problem is that when our drone takes off, it wobbles quite a bit, and somehow, due to the jerky motion, the local position (i.e. /mavros/local_position/pose position data) starts to diverge from the mavros data, and after shooting up a lot, it suddenly resets to zero. Sometimes the vision data also stops coming (as seen on MAVROS). But when the quad is made more stable, it starts following it again normally.

Why does it happen? Is there any way we can prevent this from happening?