Mavsdk_server UDP connection requires PX4 restart

Hi,

I have a PX4 simulator running in a k8s cluster with open UDP endpoints and can connect, disconnect, and reconnect with it via QGC. However, if I run the mavsdk_server against the simulator, it will only complete initialization if it is the first connection made to the simulator since it was restarted. So after restarting the simulator, mavsdk_server can connect. If I connect first with QGC, then disconnect QGC, and attempt to connect with mavsdk_server, it fails to finish initialization. If I successfully connect with mavsdk_server, disconnect, and reconnect with the mavsdk_server, then it again fails to finish initialization.

When I say it fails to initialize I mean that it hangs with the following output:

[11:57:08|Info ] MAVSDK version: v1.4.13 (mavsdk_impl.cpp:20)
Waiting to discover system on udp://<non-zero IP>:18570...
[11:57:08|Debug] Initializing connection to remote system... (mavsdk_impl.cpp:494)

On a successful connection, it says the system was discovered and I can see other messages. While it is hanging on this message, if I restart the simulator then it will connect once the simulator comes back online.

Is there a special way that I need to disconnect? Is this behavior an issue on the PX4 side, or the MAVSDK side?

Thanks

This doesn’t look right. Which IP is that and what port?

Read this comment carefully:

<non-zero IP>:18570 is the <IP>:<Port> for the PX4 simulator. In this case, the mavsdk_server is acting as a client to the simulator, so I think this is correct. Unless I am misunderstanding your meaning?

Same for me: run PX4 sim in docker and trying to connect to udp://172.17.0.2:18570 with mavsdk_server - connects successfully on the 1st time, and fails all the followings. While QGC can connect-disconnect many times, exit, reconnect, etc.

Seems some mavsdk_server bug or problem. What could we do to solve? Report issue on Github?

Oh, so you’re using mavsdk_server as the simulator side, similar to jmavsim or Gazebo would talk to PX4?

Or mavsdk_server is connecting to PX4 SITL and you just connect as usual like QGC?

Let’s first figure out what is wrong here. It would also be me responding on GitHub.

I think it has to do with the port where traffic is returned to. The port in the other direction is chosen by the operating system, at least for MAVSDK.

Also see:

So that would mean PX4 sends traffic to a random UDP port. If you then disconnect and connect again, PX4 might still send traffic to the previous port.

For QGC, it could be that it selects a specific port and that way can re-connect.

I think it would be worthwhile to look at the port numbers using wireshark to confirm my theory.

This one. mavsdk_server is connecting to the PX4 SITL in the same manner as QGC.

Yes.

Could mavsdk_server select specific port somehow as well?

I wouldn’t if it can be avoided because it leads to bind errors if someone else is already occupying a port.

So what could we or You do?

This is what you can do.

And you can give me the minimal steps on how I can reproduce it.

Ok, will try now.

  1. Run PX4 SITL in Docker:
docker run -it --rm \
                --privileged \
                --name="${CONTAINER_NAME}" \
                -e LOCAL_USER_ID="$(id -u)" \
                -v /tmp/.X11-unix:/tmp/.X11-unix:ro \
                -e DISPLAY="${DISPLAY}" \
                run_script.sh

run_script.sh contains the following:

#!/usr/bin/env bash

# Pass drones coordinates
export PX4_HOME_LAT=${LATITUDE}
export PX4_HOME_LON=${LONGITUDE}
export PX4_HOME_ALT=${ELEVATION}
# Disable pxh> command prompt to prevent excessive logging
export NO_PXH=1
git clone https://github.com/PX4/PX4-Autopilot.git
cd PX4-Autopilot
git checkout d6b523b574875a9f640620d1e90c8277fa13781c
git submodule update --init --recursive
./Tools/setup/ubuntu.sh --no-nuttx
make px4_sitl gazebo-classic_typhoon_h480
  1. It will print You: [Msg] Publicized address: 172.17.0.2 with IP to connect to
  2. Then run mavsdk_server:
$ ./mavsdk_server udp://172.17.0.2:18570
[04:51:34|Info ] MAVSDK version: v1.4.0-442-gefa68dea (mavsdk_impl.cpp:24)
[04:51:34|Info ] Waiting to discover system on udp://172.17.0.2:18570... (connection_initiator.h:20)
[04:51:34|Debug] Initializing connection to remote system... (mavsdk_impl.cpp:668)
[04:51:34|Debug] New: System ID: 1 Comp ID: 1 (mavsdk_impl.cpp:354)
[04:51:34|Debug] Component Autopilot (1) added. (system_impl.cpp:385)
[04:51:34|Info ] System discovered (connection_initiator.h:63)
[04:51:34|Info ] Server started (grpc_server.cpp:161)
[04:51:34|Info ] Server set to listen on 0.0.0.0:50051 (grpc_server.cpp:162)
[04:51:34|Debug] MAVLink: info: Preflight Fail: No manual control input  (system_impl.cpp:259)
[04:51:34|Debug] Component Gimbal (154) added. (system_impl.cpp:385)
[04:51:35|Warn ] Vehicle type changed (new type: 13, old type: 0) (system_impl.cpp:233)
^C
  1. Press Ctrl-C, then run again:
$ ./mavsdk_server udp://172.17.0.2:18570
[04:52:32|Info ] MAVSDK version: v1.4.0-442-gefa68dea (mavsdk_impl.cpp:24)
[04:52:32|Info ] Waiting to discover system on udp://172.17.0.2:18570... (connection_initiator.h:20)
[04:52:32|Debug] Initializing connection to remote system... (mavsdk_impl.cpp:668)

1st time it connects to the sim, 2nd - no.

1 Like

Thanks for the steps, that’s helpful. What container do you use for CONTAINER_NAME? The default PX4 one I assume?

So what I can see is that you’re connecting to the local port of PX4.

Are you aware how it’s done in GitHub - JonasVautherin/px4-gazebo-headless: An unofficial Ubuntu-based container building and running PX4 SITL (Software In The Loop) through gazebo. ?
That could be an alternative to unblock you.

Ok, I can reproduce the issue, even without docker, just against SITL, that’s interesting.

The thing is that the port 18570 is not really a port to rely on. Usually the way it’s described is that the drone broadcasts on 14550 and 14540 but the choice of 18570 is more of an implementation detail.

So what is happening is that PX4 sends traffic to the port where it first happened to receive traffic from but then doesn’t reset the port after the connection is lost, and so the remote port is not reset.

I’m not quite sure yet how to change the PX4 side.

The whole logic on the PX4 side is a bit convoluted if you ask me.

Yes, we used it, then switched to pure Ubuntu to reduce the image size by few GBs.

Hmm, very cool image! However it is not clear how can it unblock us? Stream simulation to a specific IP and 14540 port specially for MAVSDK? I"ll ask our team if we could know/specify MAVSDK IP beforehand for the simulator. Some scripts in the repo was not updated for years - not sure how they’ll work: our old scripts did not work with newest PX4 changes so we forced to update them as well. Also current PX4 has problems with HEADLESS mode, etc. Need to try it however: latest Release was in Jan.

Our scripts also allow us to stream to a specific IP - need to check what port it streams to.

I tried to connect mavsdk_server to 14550 or 14540 - but it does not connect at all even in the 1st time - only to 18570.

So You think problem is in PX4? Not sure about correct logic there: if one reconnects - should it reroute traffic to the new port/client or always send only to the 1st client/port?

BTW, PX4 reports connection regained on mavsdk_server rerun:

[Wrn] [Event.cc:61] Warning: Deleting a connection right after creation. Make sure to save the ConnectionPtr from a Connect call
INFO  [commander] GCS connection regained
INFO  [health_and_arming_checks] Preflight Fail: No manual control input
INFO  [commander] Connection to ground station lost
INFO  [health_and_arming_checks] Preflight Fail: No manual control input
INFO  [commander] GCS connection regained
INFO  [health_and_arming_checks] Preflight Fail: No manual control input

However mavsdk_server does not reflect on this.

Yes because you’re connecting from the other direction.

It generally should work, as it handles the connection stuff with some changes on the PX4 startup to route the connections outside of the container.

Here is a fix for you to try: mavlink: allow UDP client to reconnect by julianoes · Pull Request #21608 · PX4/PX4-Autopilot · GitHub
It probably will need more fixing and testing, so it might not be merged in a hurry, but it should unblock you.

So You propose to patch PX4, not MAVSDK? We had problems with latest PX4 versions - we used d6b523b574875a9f640620d1e90c8277fa13781c commit as You may saw above because of the bug in HEADLESS mode. Will try Your branch now as I found the workaround for it - hope it will work.

BTW, could such situation appear with real drone? May be need to patch mavsdk_server as well?!