Get the world coordinates of an object from depth and GPS data

Hello, I’m new to the PX4 and drone space. For a project I have a drone with a depth camera and I want to get from global positions which I get from PX4 to world position of an object detected via computer vision.

I know the depth (distance to) of the object and bounding box info in px. I’m thinking of using tf2 with ROS2 after defining the TF tree from camera to base but this seems not as trivial with PX4 SITL. Is this a feasible thing to do or are there any better methods to follow?

The stack is ROS2, Gazebo Garden connected with MicroXRCEAgent and ros_gz.