Real-time 3D Object Detection on Point Clouds

Ozipek, Enes

Real-time 3D Object Detection on Point Clouds

dc.contributor.advisor	Matiisen, Tambet, juhendaja
dc.contributor.author	Ozipek, Enes
dc.contributor.other	Tartu Ülikool. Loodus- ja täppisteaduste valdkond	et
dc.contributor.other	Tartu Ülikool. Arvutiteaduse instituut	et
dc.date.accessioned	2023-11-02T14:17:36Z
dc.date.available	2023-11-02T14:17:36Z
dc.date.issued	2020
dc.description.abstract	Abstract: The demand for precise and fast object detection frameworks has increased since the autonomous vehicle industry started to attract more attention. While the progress made so far in 2D object detection task with state-of-the-art approaches such as convolutional neural networks seems promising, we still struggle to obtain the same level of performance in 3D modalities such as lidar point clouds. The main reasons are that point cloud is sparse and in 3D while state-of-the-art 2D object detection models work on camera images. Some of the early works have tried to ease the aforementioned challenges using either 3D convolutional neural networks or bird’s eye view approaches, nevertheless, they were not able to achieve the desired level of performance in 3D perception. PointPillars is one of the recent models running fast with a good accuracy on point clouds. Its main advantage arises from the way it encodes the points in pillars into spatial features using PointNet. It basically divides the whole point cloud into grids of vertical pillars and applies state-of-the-art 2D detection network on this top-down view in which spatial features are encoded. Even though this operation enables the network to keep the positional information of the points within each pillar, yet, it does not take into account the point densities in different parts of the point cloud. This thesis aims to improve PointPillars network by utilizing the positional encoding and extending the detection area. Positional encoding helps the network utilize positional features by introducing two additional input channels before each convolutional and deconvolutional layer. Additionally, different positional encoding schemes are compared to have more insight about the effectiveness of the positional channels introduced. Moreover, this thesis also presents a simple scheme to train 360-degrees model with ground truths provided for only camera Field-of-View (FOV). Positional encoding scheme provides better accuracy at a similar speed as the original network. On the other hand, even though 360-degrees model is supposedly the type of a model that should be used with lidar, in experiments, it is observed that it outputs many False-Positives (FPs).	et
dc.identifier.uri	https://hdl.handle.net/10062/94000
dc.language.iso	eng	et
dc.publisher	Tartu Ülikool	et
dc.rights	openAccess	et
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Object detection	et
dc.subject	3D human detection	et
dc.subject	Positional encoding	et
dc.subject	Data augmentation	et
dc.subject.other	magistritööd	et
dc.subject.other	informaatika	et
dc.subject.other	infotehnoloogia	et
dc.subject.other	informatics	et
dc.subject.other	infotechnology	et
dc.title	Real-time 3D Object Detection on Point Clouds	et
dc.type	Thesis	et

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: ozipek_computerscience_2020.pdf
Suurus:: 13.67 MB
Formaat:: Adobe Portable Document Format
Kirjeldus:

Lae alla

Litsentsi pakett

Nüüd näidatakse 1 - 1 1

Nimi:: license.txt
Suurus:: 1.71 KB
Formaat:: Item-specific license agreed upon to submission
Kirjeldus:

Lae alla

Kollektsioonid

MTAT magistritööd – Master's theses