fault_2025_09_12_16_59_45.pdf (62.3 KB)
fault_1970_01_01_00_01_25.pdf (66.1 KB)
fault_2025_09_12_15_40_57.pdf (63.1 KB)
See the attached crash dumps. The mavlink_if2 / mavlink_rcv_if2 hard fault is repeatable if you first connect to the autopilot (Cube Orange+) via USB in QGC, then unplug the USB and connect to the autopilot via radio. Running PX4 v1.15
1 Like
Thanks for the note. It would be nice to fix this, of course. However, from the git hash displayed in your pdf, I can’t exactly see what the PX4 version is that you’re using (c9a99f215c026b4bb0e1dee4929207d876428036). I have to assume you have your own fork which is fine but means I can’t directly use your stack traces.
Can you reproduce it with the stock PX4 v1.15.4? And what radio are you using, and connected how? Best case, you share all your parameters. The more information I have, the more likely I can reproduce this on my end and fix it.
Also, it might be worth checking if it still happens with v1.16.0. If it is fixed, we just have to figure out which commit fixed it, if not, it’s worth fixing in general.
1 Like
Following the hard fault debugging documentation, I get this output:
(gdb) info line *0x0802e5c8
Line 1283 of “serial/serial.c” starts at address 0x802e5c6 <uart_ioctl+10> and ends at 0x802e5cc <uart_ioctl+16>.
(gdb) info line *0x0803015f
Line 74 of “vfs/fs_ioctl.c” starts at address 0x803015e <file_vioctl+80> and ends at 0x8030168 <file_vioctl+90>.
(gdb) info line *0x0802eba1
Line 678 of “usbdev/cdcacm.c” starts at address 0x802eba0 <cdcacm_release_rxpending+40>
and ends at 0x802ebb4 <cdcacm_release_rxpending+60>.
(gdb) info line *0x0802ebc8
Line 486 of “usbdev/cdcacm.c” starts at address 0x802ebbe <cdcacm_release_rxpending+70>
and ends at 0x802ebd4 <cdcacm_release_rxpending+92>.
Unfortunately we don’t have time right now to try reproducing with stock PX4. We’re just going to avoid connecting that way since we haven’t found any issues if you stick to a comms link.
This is the SiK radio we are using: Amazon.com: FPVDrone 500MW Radio Telemetry Kit 915Mhz Air and Ground Data Transmit Module for APM 2.6 APM 2.8 Pixhawk Flight Controller : Toys & Games
The vehicle-based radio is connected to the Cube Orange+ on TELEM1. The GCS-based radio is connected to a laptop via USB.
I’m not able to reproduce this on main
To debug a hardfault correctly, the following is required:
- The exact ELF file that was created together with the .px4 file that was flashed onto the drone when the error occurred. The git hash alone is not enough as other compilers (or compiler versions) may have been used, which leads to the addresses in the fault log not matching
- Access to your fork, so we can checkout the relevant commit to match the addresses to actual code lines
Hey,
we recently lost a vehicle mid-air due to a hardfault and our investigation led us to this thread. We are still looking into the issue, but I wanted to share our findings so far:
- The hardfaults are related with the MAVlink over USB connection.
- I was able to extract the stack traces from three hardfaults, one while airborne, two on the bench. See attachments below.
- There are a lot of things in common between the hardfault logs shared above and ours.
- There are at least three different code paths that lead to the faults.
Hope this might give any additional insights for fixing this bug together!
Cheers!
Details about our system:
Crash logs and analysis including reconstructed stacktrace:
stacktrace_fault_2025_11_27_14_27_15.pdf (36.9 KB)
stacktrace_fault_2025_12_02_19_23_18.pdf (35.7 KB)
stacktrace_px4-crashlog.pdf (35.7 KB)
Ok, any hints how to reproduce it exactly? Plug in USB, and plug it back out? Any other conditions?
Yeah, they do look familiar… I have seen them every now and then over the last few years. Usually, I just recommend not using USB in flight, but that’s not always a satisfying answer.
We changed some parameters over USB while another serial MAVlink connection was active, when we observed the crash the first time on the bench. Plugging and unplugging USB multiple times also once triggered the issue on the bench. When the aircraft crashed mid-flight, we previously connected the vehicle over USB because because we were experiencing telemetry radio issues. In particular, parameters were adjusted and a geofence was set.
Oh, that’s tricky!
Ok, I’m taking note and will try to reproduce this.
1 Like
I have it all on my desk with MAVlink over:
- a SiK radio
- an onboard high bandwidth serial link
- and USB
I’m trying to unplug and replug the USB link but so far haven’t been able to reproduce the hardfault yet. I’ve tried with main and 1.15.4.
(not saying it doesn’t happen, I just unfortunately can’t reproduce it when I want to…)
I was able to narrow it down further:
So in general, the MAVlink instance for USB CDC should be stopped when USB is unplugged. That does not happen in our setup. Feel free to reproduce it using the following steps:
- Connect radio to PC1 via telemetry radio. Start QGC here and switch to MAVlink shell.
- Connect to USB on PC2.
- On PC1, check
cdcacm_autostart status and mavlink status.
- Unplug USB on PC2.
- On PC1, check
cdcacm_autostart status and mavlink status again. The ttyACM0 should be gone, but the mavlink instance for USB CDC is still alive. This should not happen and causes use-after-free, as the ttyACM0 file descriptor is already invalidated. In particular, mavlink stop -d /dev/ttyACM0 was already called by src/drivers/cdcacm_autostart/cdcacm_autostart.cpp:160-171, but did not have the desired effect.
- Optionally, provoke the crash:
- Stop mavlink manually using
mavlink stop -d /dev/ttyACM0.
- Verify that is is stopped using
mavlink status
- Connect again to USB on PC2.
- Observe crash of PX4.
So, two things are coming together here: First, the mavlink code does not check the USB CDC device for its status or existence and thus can use an invalidated file descriptor. Second, the mavlink code does not stop reliably, but the cdcacm_autostart ignores this. This causes use-after-free.
As a workaround, the cdcacm_autostart should at least prevent the system from flying when mavlink stop … does not return 0. Even crashing the whole flightstack would still be better than continuing with corrupted memory.
I agree with most of what you say, but I don’t see the problem where mavlink does not stop. For me, it does stop reliably, and that’s probably why I can’t reproduce the crash. What’s different for you that prevents it from stopping? And does it get force stopped after 3s?
I have made these fixes so far but I couldn’t actually test them as I can’t reproduce the issue just yet: [Sponsored by CubePilot] Try to fix potential mavlink segfaults on USB disconnect by julianoes · Pull Request #26083 · PX4/PX4-Autopilot · GitHub
Have you tried reproducing the steps I posted above? A difference between your and my setup might be that there is XRCE and Mavlink running under load on TELEM1/2, which are both connected to a companion computer.
I was not able to observe a force stop. Do you have any ideas on how I could introspect the stopping process? Unfortunately, the Cube Orange Plus with Mini Carrier board does not have a debug serial or JTAG exposed as far as I know.
Config that differs from fixed wing default
- MAV_0_MODE=Onboard
- MAV_0_RATE=0
- SER_TEL1_BAUD=460800
- UXRCE_DDS_CFG=TELEM2
It does have SWD on the bottom of the connector board, however, I think the connector are not populated by default. (I’m using some dev boards that have this port populated.)
I was not aware that you’re also using XRCE? Is this on the same port as MAVLink or other ports?
Oh, did not know about this. Unfortunately, I could not get hold of schematics of the the Mini Carrier board anywhere.
It is a different telemetry port than MAVlink. So MAVlink to onbaord computer TEL1, XRCE TEL2. From my understanding XRCE should not interfere with (USB-)MAVlink at all, but who knows.
It should not interfere, indeed.
I use the standard carrier board. I would be surprised if the Mini exposes it.
If I was you, I’d try to find a standard carrier board or use some Pixhawk 6X or 6C with the debug adapter, if you happen to have any of that.