Reinforcment learning - PX4/ROS/Gazebo

Hey guys,

i have a big question. I want to implement, that the camera of the drone for example detects a person and the drone follows this person based on the camera image. But instead of just doing some basic operations to keep the person in the middle of the camera image I would like to train it with reinforcment learning. Requierments would be e.g. that the person is in the middle of the camera and the distance between drone and person should be X meters.

I am using ros and gazebo and my code is written in cpp. Is it possible to do reinforcment learning with px4 in gazebo? Does anybody has experiences with it and could give me some hints. The above example is not a specific one that I really want to solve but it should be a good starting point for me to test reinforcment learning with PX4 and gazebo. Maybe any video tutorials or stuff that would make it easier for me to implement it?

Really appreciate your help!