3D object detection

Hi, are you aware of any 3D object detection datasets from drone perspective? I mean something similar to KITTI 3D object detection for cars (The KITTI Vision Benchmark Suite)? Of course could be only LIDAR or only stereo.

Regards,

1 Like

when we started with our project and used dji drones I found this in their documentary:
https://developer.dji.com/onboard-sdk/documentation/sample-doc/advanced-sensing-object-detection.html

It looks like it is a object detection algorithm name yolo which should work with ros…
But I have never tried to get it running… I would be nice if you can give some kind of response if it works or not

Hi, @hsu-ret.

YOLO is in general 2D algorithm. In your DIJ link they use YOLO2, but latest more accurate version is YOLO3. YOLO has its advantages, mostly it is quite fast and has simple one-stage construction, but as for 2019 YOLO is “outdated”. There are alternatives with better speed/accuracy tradeoff, but mostly YOLO (to my knowlege) is no longer an active project and it’s development framework (darknet) never become popular.
If you want to use 2D object detection the simplest way is to use OpenCV dnn module wrapped in ROS node. That way you can use YOLO, but also more powerful Tensorflow graphs. Unfortunatelly currently OpenCV don’t support NVIDIA CUDA. If you wan’t to use Jetsons it’s better to run Tensorflow/PyTorch models directly in python or C++ wrapped in ROS nodes. NVIDIA also has a ready to use fast models in TensorRT: https://github.com/dusty-nv/ros_deep_learning

Ok, but back to the topic, if anybody knows about labeled point clouds from drones I’m interested :slight_smile:

By the way, if I don’t find any dataset I’ll probably use KITTI.
I’m also looking for people interested in this topic that would like to help with development of ROS node similar to avoidance based on Jetson TX2 with Realsense D435 and LibTorch/CUDA (PyTorch) framework

@hsu-ret, I just noticed that in your link there’s a second part with stereo camera depth estimation. I’ll look at it

Ok, my understanding is that they just assign 3D points to 2D bounding boxes and then calculate some basic statistics like nearest point or average point (~ center). This is far from obtaining real 3D bounding boxes. If somebody needs this kind of simple approach I can direct to ROS repo based on OpenCV dnn and ZED Stereo Camera.
Shame that DJI (from what I see) stopped development in this area long time ago

We formally define the 3d object detection task and present the common regression and classification loss wont to measure the performance of models tackling 3d object detection task.

Next, we divide lidar 3d object detection networks into two categories of networks with input-wise permutation invariance which demonstrates symmetry property to directly process raw point clouds, and networks with point cloud grid representations that believe ordered structured representations of point clouds. The pros and cons of those two categories of networks are discussed in detail. If you want to learn deep learning and make a career in it then its good you can check here: https://www.cetpainfotech.com/technology/deep-learning-training