Real-time 3D Object Detection on Point Clouds

Ozipek, Enes

Real-time 3D Object Detection on Point Clouds

Failid

ozipek_computerscience_2020.pdf (13.67 MB)

Kuupäev

2020

Autorid

Ozipek, Enes

Kirjastaja

Tartu Ülikool

Abstrakt

Abstract: The demand for precise and fast object detection frameworks has increased since the autonomous vehicle industry started to attract more attention. While the progress made so far in 2D object detection task with state-of-the-art approaches such as convolutional neural networks seems promising, we still struggle to obtain the same level of performance in 3D modalities such as lidar point clouds. The main reasons are that point cloud is sparse and in 3D while state-of-the-art 2D object detection models work on camera images. Some of the early works have tried to ease the aforementioned challenges using either 3D convolutional neural networks or bird’s eye view approaches, nevertheless, they were not able to achieve the desired level of performance in 3D perception. PointPillars is one of the recent models running fast with a good accuracy on point clouds. Its main advantage arises from the way it encodes the points in pillars into spatial features using PointNet. It basically divides the whole point cloud into grids of vertical pillars and applies state-of-the-art 2D detection network on this top-down view in which spatial features are encoded. Even though this operation enables the network to keep the positional information of the points within each pillar, yet, it does not take into account the point densities in different parts of the point cloud. This thesis aims to improve PointPillars network by utilizing the positional encoding and extending the detection area. Positional encoding helps the network utilize positional features by introducing two additional input channels before each convolutional and deconvolutional layer. Additionally, different positional encoding schemes are compared to have more insight about the effectiveness of the positional channels introduced. Moreover, this thesis also presents a simple scheme to train 360-degrees model with ground truths provided for only camera Field-of-View (FOV). Positional encoding scheme provides better accuracy at a similar speed as the original network. On the other hand, even though 360-degrees model is supposedly the type of a model that should be used with lidar, in experiments, it is observed that it outputs many False-Positives (FPs).

Märksõnad

Object detection, 3D human detection, Positional encoding, Data augmentation

URI

https://hdl.handle.net/10062/94000

Kollektsioonid

MTAT magistritööd – Master's theses

Kirje täielik lehekülg

Real-time 3D Object Detection on Point Clouds

Failid

Kuupäev

Autorid

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

Abstrakt

Kirjeldus

Märksõnad

Viide

URI

Kollektsioonid