Hexacopter testing - 1 great flight, 1 failed flight - [sensors] - Mag#0 fail - TIMEOUT! - large 1000 class Hexacopter failure RTL on Geofence violation

Hi Folks,

Did a round of testing today - Ubuntu 14/QGC 3.72 and PX4 1.73 stable; initial flying was great, QGC worked excellent, did manual, stabilized and hold mode. Need to adjust hover throttle but all is good.

Flight file here:
https://logs.px4.io/plot_app?log=bef8e733-48cd-402d-93d3-9c0f9330acb6
Video here:

So we decided to go up and fly longer. Anyhow, second flight, we had a loss of control/failure during RTL on a geofence violation. Flight data is here:
Flight file here:
https://logs.px4.io/plot_app?log=a4ed74ad-4729-4995-a257-0f004c4f7624
Video here:

Seems the magnetometer goes wonky at 19:30 concident with the Mag failure message.
You can also see in the video after this message and subsequent partial loss of control,
there is about 5 more seconds of flight before it crashes and 2-3 second before shutdown
after that.

Has anyone seen this type of behavior before? Do most people fly 2 mags with a splitter? In this case, it seems perhaps a driver or real HW failure. I only found unrelated issues so-far but the GPS and mag are brand-new, any way to debug further? Perhaps the set-down in the first flight was too hard and the internal mag was damaged?

Any input people have on this would be great - it flies just awesome until this happened and thank you everyone for the great work getting this far. I’ll be able to load up on the test stand again next week before free flight (luckily we’ve only broke a skid and cracked a CF prop, waiting for parts). If anyone has ideas on what else to check or debug, that would be greatly appreciated.

Thank you!
-James

in src/modules/sensors/voted_sensors_update.cpp:941, where we get the failover index the next time we call check_failover(), I would expect the sensor.voter.failover_state() to return No error so that the failover_index() can be set. In other words, it seems check_failover() returned true to indicate a switch occured, but the next time we didn’t see the switch.

On Pixhawk-1 shouldn’t wesee the line for the second Mag switch-over (e.g. external compass on GPS)? e.g. PX4_INFO("%s sensor switch from #%i", sensor_name, failover_index); Where does failover_state() get updated?