Weird behaviour using vision and offboard mode

Hi guys I have made a HITL in the loop simulation on the pixhawk 4 mini using pixhawk 1.10.2 firmware.

Set up details

  • Airframe: HIL quadrotor x
  • Simulation: Gazebo, IRIS quad model with added lidar and sonar altitude sensor
  • ROS: I have a node that publishes the SLAM pose and sonar altitude onto /mavros/vision_pose/pose. This is shown on /mavros/local_position/pose. I also have another ROS node that follows this process : 1. arm quad 2. when quad is armed stream set points on /mavros/setpoint_raw/local 3. set to off-board mode and continue to stream set-points currently i am just streaming xyz=[0 0 1.0] (meters)

Parameters: EKF2_AID_MASK i have this currently set to 24 vision and yaw fusion. I have also tried 88 to convert the coordinates to NED as ROS uses ENU. But from what i understand mavros should take care of this
EKF2_HGT_MODE: Vision
i have tried increasing the EKF2_EV_DELAY but this had little effect
I think these are the only relative parameters to the problem

This setup can successfully arm the quad and take off but the flight is very unstable and crashes fairly quickly.
Flight log:https://review.px4.io/plot_app?log=5e5c927c-4a5e-4e28-bd37-abdf07181714
The roll and pitch seem to be the issue here but i am not very experienced at interpreting these kinds of graphs.

I am not sure if this stems from the modified IRIS model. The controller seems to work well if i just try a normal HITL takeoff as the documentation shows. Potentially it may also come from some time delays between the ROS and PX4 system

If i can help your answer with any more evidence or information please ask