Uncommanded Altitude change in position hold - EKF divergence

#1

We have a large quad that we’re testing, which is flying a Pixhawk 2.1 cube running stock PX4 1.8.2. During a flight, the aircraft was in a long hover. At some point during the hover, the aircraft suddenly (but slowly) started to climb. It climbed almost 25 meters, then stopped.

We’ve dug through the logs, and it appears that the Vertical position and velocity covariances go to zero, and it starts to ignore the altitude sensors. Both GPS and Baro register the altitude change, but the fused altitude does not.

The covariance can be seen in below plot. Covariance 6 and 9 (VZ, Z) both go to zero.

Checking the observation variances, they go flat but at a “normal” level.

Because the observation variances are normal and the covariance goes to zero, the EKF shifts almost exclusively to its “model” of physics and stops using the data from the sensors, which results in a vertical drift.

This can be seen in the innovation plot:

Eventually the innovation level check trips and resets the EKF with the current sensor values at the correct altitude. The drift happens one more time in the log.

The altitude behavior can be seen here:

Vibrations are usual suspect, but our vibrations aren’t high enough to clip, and they didn’t change during the flight at all.

logs: https://logs.px4.io/plot_app?log=a6efe798-71c9-493e-905f-a5dce17aa484

Any ideas as to what may have caused this would be helpful!
@Paul_Riseborough