Advice on Vision based Position control

Goal of the testing is to fuse visual odometry from ZED stereo camera and GPS data with the PX4 flight stack .
I have written a small snippet that subscribes to /mavros/global_position/global and /zed/pose_cov and does the following

  1. convert the ZED stereo camera odometer to PX4 NED frame by using static transform
  2. If /mavros/global_position/global cov is less than a threshold increase covariance value of camera pose to a very high value or else leave the camera pose as it is.

My flight stack details are as follow

  1. Version : 1.9
  2. Estimator : EKF2
  3. EKf2_aid_mask : GPS+VISION_position+Optical_flow+Multirotor_drag_fusion

Type of Platform: Hexcopter

My question is

  1. What is the difference between the topics /vision/pose and /vision/pose_cov. is it not essential to have a covariance while fusion with ekf . Can someone share some insites onto how things happens inside flight stack ?
  2. Any comment / modification in my approach ?


GPS+VISION_position is not recommended. Does your VIO odometry also includes velocities? That would be the better approach at the moment. Look here.

@kamilritz Thanks a lot for your replay .
I don’t have VIO only VO .

  1. If I understand your solution we shouldn’t fuse position odometer from multiple sources ? but velocity can be fused.
  2. How is an external vision sensor different from a onboard VO sensor ?
    I will update to VIO sensor soon .

Unfortunately, the naming is not very consistent. external_vision, vision and VIO are all different names used for some source of odometry.
The issue is that if you provide 2 sources of position, such as GPS and vision, you would either need to estimate offset of the origins or do some kind of incremental change fusion. The later is currently implemented, but I did not achieved good results with it. You may try if you want.
If you provide velocities you don’t need to estimate the offset of the origins.

@kamilritz Thanks and appreciation for your support .

  1. I would continue with my experimentation. Meanwhile if possible can you share relevant paper/ strategy/codes/materiel of the incremental change fusion of position estimate . I will start going through them and find out the issues/ limitation of the approach .

  2. Understanding wagely the incremental change fusion . So we estimate position using GPS +pixhawk Imus everything work fine useless we enter a place without GPS (when co-variance falls below a threshold) . Now i switch on my VIO/VO/External position estimation. The change between the consecutive time stamp is given to estimator ( ECL-EKF) which is then updates the position estimate. Am I correct ? If i am correct can you tell GPS threshold so that i take care of them while experimenting.
    Thanks a lot .
    Appreciate you help and time .

You can find the implementation here: . I did not find any literature about this.

At the moment there is no logic in place that we switch from GPS position to vision position if we loose GPS position.