Loss of ICM sensors after sequential *SOFT* restarts

Hello all,

Frequently working remotely with our drones we’ve had the need to perform multiple soft restarts within a single power cycle (usually via the “Reboot Vehicle” option in the QGC vehicle parameters menu). Recently we’ve noticed on our Cube Orange based vehicles, running PX4 v1.12.3 stable, that after performing multiple soft resets, either the ICM20948 IMU, the ICM20649 IMU, or both will show an error from their respective drivers at boot, and no longer be functional or publish data.

Sometimes this happens in as little as 2 or 3 resets, sometime it takes 5-7, but once a sensor stops running at boot, it never recovers until we do a complete power cycle. Attempting to start the driver manually at the console results in the same error as seen on boot: “no device on bus”. So far a full power cycle seems to always cause these devices to recover, so our initial impression is that the devices are entering some state that the driver is not capable of handling or resetting, and therefore will no longer communicate with the device until power is removed and the device is returned to a known state. This causes the most headache when these sensors are not running and a vehicle calibration is performed, and the non-running devices do not receive calibration values.

As an example, here’s part of the boot log of the 1st boot after a power cycle:

Board sensors: /etc/init.d/rc.board_sensors
ms5611 #0 on SPI bus 4
icm20602 #0 on SPI bus 4 rotation 12
icm20948 #0 on SPI bus 4 rotation 10
ms5611 #1 on SPI bus 1
icm20649 #0 on SPI bus 1

Everything is booting normally, and we see all 3 accels, all 3 gyros, and the internal mag reported with the sensors status command.

Here’s the boot log from the 5th soft reset after the initial power on:

Board sensors: /etc/init.d/rc.board_sensors
ms5611 #0 on SPI bus 4
icm20602 #0 on SPI bus 4 rotation 12
icm20948 #0 on SPI bus 4 rotation 10
ms5611 #1 on SPI bus 1
WARN  [SPI_I2C] icm20649: no instance started (no device on bus?)

Once this (no device on bus?) warning appears we no longer see data from that sensor and it never recovers on subsequent soft reboots.

This is annoying, but manageable at boot time- our primary concern at this stage is determining if these devices can enter this bad state and stop communicating during flight/run time. Any insight into this issue would be greatly appreciated, please let me know if there’s any other details I can provide, or debugging steps I should take. Thank you for your time.