EKF filter fault warning and problems during flights

Hi! A coworker and I have been testing two drones with PX4 and we have experienced some problems with EKF warnings, vibrations and height divergence and after testing multiple options it’s still not clear to us why it’s failing.

Problem description

When the drone is flying and a “big movement” is commanded, for example, when moving from one mission waypoint to another, during RTL mode or controling the drone with the RC the EKF gives the next warning:

  • [EKF2] primary EKF changed 3 (filter fault) → 2

The EKF starts switching from one instance to another, sometimes it only does it once, but it can switch multiple times.

On some flights, at the exact moment the EKF starts to give warnings the altitude started to increase or decrease, in some situations the pilot had to take manual control. During log analysis we noticed that fused altitude estimation started to diverge on the same timestamp in some of the logs.

At first we thought the problem might be caused due to vibrations, as they’re a bit higher than the recommended values and most of the times the EKF warning seems to happen when the vibrations are higher, but after more test we’re not clear about this issue and we would appreciate some advice on how to verify it.

Setup:

  • Drone chasis: T-Motor M690A
  • Propellers: Original ones that come with the T-Motor, carbon fiber, low weight.
  • FMU: Pixhawk 4 (Holybro)
  • GPS: Pixhawk 4 GPS Module (Holybro M8N GPS)
  • Firmware version: PX4 v1.12.3
  • Onboard computer: Nvidia Jetson Nano
  • Telemetry: Microhard module
  • Battery: 16000mAh 4S with and without BMS

More setup information

  • The drones sensors have been calibrated and the PIDs were adjusted
  • Internal magnetometer disabled (using the GPS one) due to interferences and inconsistency between internal and external mag → EKF running with 2 instances (1 mag and 2 IMUs)
  • FMU mounted to the chasis using 3M dual sided foam
  • Primary height source → Barometer

Both drones have the same setup and have experienced the same problems, so we don’t think it’s a FMU hardware related problem.

Flight testing

First test block. Errors start to happen

The first time errors happend were during simple missions. The internal mag was disabled as stated above by setting CAL_MAG0_PRIO to 0 (Disabled).

Log examples:

In both cases the fused altitude estimate starts to change when the error occurs and the drone experienced an altitude change.

During log analysis we found out the EKF numbers didn’t make much sense as there should only be two instances and in fact the process_logdata_ekf.py script fails to analyse them. This was because setting CAL_MAG0_PRIO to Disabled doesn’t automatically adjust the SENS_MAG_MODE and EKF2_MULTI_MAG.

Vibrations graphics seem a bit high and EKF warnings tend to happen when the vibrations are at it’s peak, we though that might be causing the EKF problem and decided to change the FMU mounting system.

Second test block. Using an FMU mounting bed

Parameters modified:

The FMU was mounted using a platform similar to this one: FMU mounting platform

Sensors were recalibrated and the PIDs had to be slightly readjusted. A few flights were permormed and the error didn’t happen with this setup.

Log examples:

The problem seems to be fixed, there were no EKF warnings during all the tests with this setup and altitude was always stable. The online analysis tools indicates that vibrations seem similar and are still high. When running the process_logdata_ekf.py it indicates that IMU vibrations are high and the graphics seem comparable, sometimes the peak values are even higher with this mounting platform.

Third test block. FMU mounted on gel and on the same pads as in the first test.

As some EKF parameters were changed before the second test block we decided to test again with the same original mounting system (same as first test block) and also with a green mounting gel. PID values were restored as they seem better during log analysis and drone flew more stable this way.

Log examples:

The EKF warning starts to happen again and with both mounting systems. Vibrations remain similar to those obtained with the mouting platform and are still a bit high. Somehow the platform seems to solve the problem but vibration metrics don’t show why this would be.

Fourth test block. Testing internal magnetometer divergence

The internal magnetomer had to be disabled in this drones as there were some interferences and the indicated heading started to diverge (between 60 and 95 degrees) after the drone was armed and went back to the correct value once it was disarmed.

To test if this interference problem or the fact that only one magnetomer was used had any relation with the EKF problems, maybe it affected other sensors readings, more tests were performed.

As we suspect that the battery BMS might be causing the noise problem it’s taken off. The onboard computer and microhard telemetry were also disconnected and a basic SiK radio telemetry system was used instead. The internal mag is enabled again, sensors get recalibrated and the EKF is left as default (4 instances running).

With this basic setup the divergence problem seems to be fixed (only 0-3 degress of error during arming and it stabilizes). The EKF error still happens.

Conclusion

The error seems to be somehow related to the vibrations as the setup with the mounting platform seems to fix the EKF problem, but the log analysis shows that vibrations are still higher than recommended and they seem comparable to the other mounting methods tested, sometimes even higher.

Why would appreciate any ideas or information that could help us check what is really causing the EKF to switch instances and what solutions can be tested to reduce the vibrations, as all the mounting systems tested (including the original Pixhawk 4 pads, that had been tested in previous tests) seem to give somehow bad results.

Please contact if you need more information or if you need that I upload the logs directly.

Thanks.