I am experiencing severe MAVLink communication issues on a companion computer setup (MAVROS connected to a Pixhawk via serial). The core problem is that the MAVLink sending thread on the PX4 flight controller completely freezes during operation, causing communication with the companion computer to fail. During the failure, all transmission statistics freeze. The connection eventually recovers by itself, but results in severe command sequence desynchronization.
Detailed Symptoms & Timeline
Failure Occurs (approx. 11 minutes into flight):
PX4 Logs: Consecutive [mavlink] Dropped XX events warnings appear, followed by vehicle_command lost and COMMAND_LONG vehicle_command lost errors with large generation number jumps (e.g., 69 → 2304), culminating in a CRITICAL Connection to mission computer lost message.
MAVROS Side: Reports [WARN] CON: Lost connection, HEARTBEAT timed out. and the connected flag in /mavros/state becomes false.
Key System Metric: The total CPU load (cpuload) on PX4 drops from a normal ~0.42 to around 0.35.
During Failure (approx. 1 minute):
Evidence of Sending Thread Freeze:
The mavlink_status.tx_buffer_overruns counter stops at 195 (or another high number) and ceases to increase.
mavlink_status.tx_message_count and txavgratecompletely freeze and stop updating.
This indicates the MAVLink transmit path is fully blocked, and the thread is no longer consuming CPU cycles, explaining the overall load drop.
Data Stream Behavior: Outgoing data streams from PX4 (e.g., local_position, attitude, imu) stop updating in MAVROS. Commands (setpoints) sent from the companion computer to the flight controller might still be partially processed, leading to state confusion.
Connection Recovery (approx. 12 minutes 30 seconds into flight):
Logs show Onboard controller regained, and the communication link is re-established.
However, due to the massive backlog of unprocessed commands accumulated during the thread freeze, severe command loss (generation jumps) and state desynchronization occur after recovery.
What rates are you trying to send out? What is the configuration of streams? And what baudrate are you using? And are you connecting and using flow control?
Baudrate: 921600 (serial connection on /dev/ttyS1, TELEM port)
Flow control: OFF
Mode: Onboard
Current stream rates/configuration: Default for Onboard mode, but I previously modified the source code in mavlink_main.cpp to set HIGHRES_IMU to 200Hz (original was 50Hz). This significantly increased TX bandwidth usage. Current mavlink status for the companion instance shows:
TX rate: around 36 KB/s (avg), with tx rate max set to 80000 B/s
No manual stream configuration overrides yet, but planning to lower some rates.
Latest discovery (from last night):
I was able to perfectly reproduce the exact same behavior as during yesterday’s flight (sudden HEARTBEAT timeout on MAVROS side, outgoing topics freeze, but setpoints still partially work, and it recovers by itself after a while) by manually stopping and restarting the MAVLink instance in the nsh console
mavlink stop -d /dev/ttyS1:
Immediate HEARTBEAT timeout on MAVROS, all PX4 topics stop updating (exactly like the flight freeze).
mavlink start -d /dev/ttyS1 -b 921600 -m onboard -r 80000:
Connection recovers in 1-2 seconds, all topics resume.
This strongly suggests that during flight, the MAVLink send thread on the PX4 side gets “stuck” or deadlocked under high load then tx_message_count and txavgrate freeze/stop updating,causing HEARTBEAT to stop being sent temporarily. Restarting the instance clears the queue and resets everything, mimicking the spontaneous recovery in flight.
Do you think this could be related to TX buffer overruns causing a temporary deadlock in the send path when rates are too high (especially with HIGHRES_IMU at 200Hz)? Any recommendations for safer rates or ways to prevent the freeze without restarting the module?
I’m just trying to reproduce this now. The first thing I found so far is that you are using Telem 3 (because that’s ttyS1). I often use Telem 2 for the connection to the companion computer and haven’t seen that problem there. However, Telem 2 is using DMA while Telem 3 does not. You could try to use Telem 2 instead and check whether you can reproduce it there as well, or not.
I have this working stable and haven’t seen drops yet. However, I’m not sending setpoints but only receiving at this stage. I might have to add in some traffic.
Also, do you have other mavlink instances running, and do you have forwarding enabled? It might be worth sending the output of mavlink status to see what is going on.
I also changed HIGHRES_IMU to 200 Hz which strangely sends it down at 250 Hz…
Actually, after reverting some of the aggressive rate increases (HIGHRES_IMU at 200 Hz plus moderate increases on a few other streams), I have not been able to reproduce the freeze again in extended ground tests and several additional flights.
So far, the MAVLink send thread freeze (with stuck tx stats and HEARTBEAT timeout) has only occurred:
Once during the original flight (the one that started this thread)
And consistently when I manually stop and restart the MAVLink instance (mavlink stop -d /dev/ttyS1 + mavlink start ...), which perfectly reproduces the same symptoms and recovery
In normal operation since then (multiple hours of armed Offboard with vision_pose ~50 Hz and setpoints 50 Hz), everything stays stable:
TX rate around 46 KB/s
No accumulating tx_buffer_overruns
tx_message_count and txavgrate continue increasing normally
No spontaneous freeze or HEARTBEAT timeout
For reference, the configuration I tested in the stable runs included:
HIGHRES_IMU → 200 Hz
ATTITUDE → 200 Hz
ATTITUDE_TARGET → 200 Hz
LOCAL_POSITION_NED → 100 Hz
GLOBAL_POSITION_INT → 100 Hz
My question remains: Are there any known conditions (e.g., prolonged backlog in the TX queue, specific stream patterns, mutex contention, or scheduling issues) under which the MAVLink send thread can temporarily freeze/stop sending messages (including HEARTBEAT) and then spontaneously recover ?
Thanks again for your time — really appreciate the insights from the dev team!