Real-time 3D Object Detection on Point Clouds

Kuupäev

2020

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Tartu Ülikool

Abstrakt

Abstract: The demand for precise and fast object detection frameworks has increased since the autonomous vehicle industry started to attract more attention. While the progress made so far in 2D object detection task with state-of-the-art approaches such as convolutional neural networks seems promising, we still struggle to obtain the same level of performance in 3D modalities such as lidar point clouds. The main reasons are that point cloud is sparse and in 3D while state-of-the-art 2D object detection models work on camera images. Some of the early works have tried to ease the aforementioned challenges using either 3D convolutional neural networks or bird’s eye view approaches, nevertheless, they were not able to achieve the desired level of performance in 3D perception. PointPillars is one of the recent models running fast with a good accuracy on point clouds. Its main advantage arises from the way it encodes the points in pillars into spatial features using PointNet. It basically divides the whole point cloud into grids of vertical pillars and applies state-of-the-art 2D detection network on this top-down view in which spatial features are encoded. Even though this operation enables the network to keep the positional information of the points within each pillar, yet, it does not take into account the point densities in different parts of the point cloud. This thesis aims to improve PointPillars network by utilizing the positional encoding and extending the detection area. Positional encoding helps the network utilize positional features by introducing two additional input channels before each convolutional and deconvolutional layer. Additionally, different positional encoding schemes are compared to have more insight about the effectiveness of the positional channels introduced. Moreover, this thesis also presents a simple scheme to train 360-degrees model with ground truths provided for only camera Field-of-View (FOV). Positional encoding scheme provides better accuracy at a similar speed as the original network. On the other hand, even though 360-degrees model is supposedly the type of a model that should be used with lidar, in experiments, it is observed that it outputs many False-Positives (FPs).

Kirjeldus

Märksõnad

Object detection, 3D human detection, Positional encoding, Data augmentation

Viide