Human detection and distance estimation with monocular camera using YOLOv3 neural network

Sattar, Asif

Human detection and distance estimation with monocular camera using YOLOv3 neural network

dc.contributor.advisor	Kruusamäe, Karl, supervisor
dc.contributor.advisor	Tammeveski, Lauri, supervisor
dc.contributor.author	Sattar, Asif
dc.contributor.other	Tartu Ülikool. Loodus- ja täppisteaduste valdkond	et
dc.contributor.other	Tartu Ülikool. Tehnoloogiainstituut	et
dc.date.accessioned	2019-06-14T10:00:48Z
dc.date.available	2019-06-14T10:00:48Z
dc.date.issued	2019
dc.description	Inimestega vähemalt samal tasemel keskkonnast aru saamine masinate poolt oleks kasulik paljudes domeenides. Mitmed erinevad sensored aitavad selle ülesande juures, enim on kasutatud kaameraid. Objektide tuvastamine on tähtis osa keskkonnast aru saamisel. Selle täpsus on viimasel ajal palju paranenud tänu arenenud masinõppe meetoditele nimega konvolutsioonilised närvivõrgud (CNN), mida treenitakse kasutades märgendatud kaamerapilte. Monokulaarkaamerapilt sisaldab 2D infot, kuid ei sisalda sügavusinfot. Teisalt, sügavusinfo on tähtis näiteks isesõitvate autode domeenis. Inimeste ohutus tuleb tagada näiteks töötades autonoomsete masinate läheduses või kui jalakäija ületab teed autonoomse sõiduki eest. Antud töös uuritakse võimalust, kuidas tuvastada inimesi ning hinnata nende kaugusi samaaegselt, kasutades RGB kaamerat, eesmärgiga kasutada seda autonoomseks sõitmiseks maastikul. Selleks täiustatakse hetkel parimat objektide tuvastamise konvolutsioonilist närvivõrku YOLOv3 (ingl k. You Only Look Once). Selle töö väliselt on simulatsioonitarkvaradega AirSim ning Unreal Engine loodud lumine metsamaastik koos inimestega erinevates kehapoosides. YOLOv3 närvivõrgu treenimiseks võeti simulatsioonist välja vajalikud andmed, kasutades skripte. Lisaks muudeti närvivõrku, et lisaks inimese asukohta tuvastavale piirikastile väljastataks ka inimese kauguse ennustus. Antud töö tulemuseks on mudel, mille ruutkesmine viga RMSE (ingl k. Root Mean Square Error) on 2.99m objektidele kuni 50m kaugusel, säilitades samaaegselt originaalse närvivõrgu inimeste tuvastamise täpsuse. Võrreldavate meetodite RMSE veaks leiti 4.26m (teist andmestikku kasutades) ja 4.79m (selles töös kasutatud andmestikul), mis vastavalt kasutavad kahte eraldiseisvat närvivõrku ning LASSO meetodit. See näitab suurt parenemist võrreldes teiste meetoditega. Edasisteks eesmärkideks on meetodi treenimine ning testimine päris maailmast kogutud andmetega, et näha, kas see üldistub ka sellistele keskkondadele.	et
dc.description.abstract	Making machines perceive environment better or at least as well as humans would be beneficial in lots of domains. Different sensors aid in this, most widely used of which is monocular camera. Object detection is a major part of environment perception and its accuracy has greatly improved in the last few years thanks to advanced machine learning methods called convolutional neural networks (CNN) that are trained on many labelled images. Monocular camera image contains two dimensional information, but contains no depth information of the scene. On the other hand, depth information of objects is important in a lot of areas related to autonomous driving, e.g. working next to an automated machine, pedestrian crossing a road in front of an autonomous vehicle, etc. This thesis presents an approach to detect humans and to predict their distance from RGB camera for off-road autonomous driving. This is done by improving YOLO (You Only Look Once) v3[1], a state-of-the-art object detection CNN. Outside of this thesis, an off-road scene depicting a snowy forest with humans in different body poses was simulated using AirSim and Unreal Engine. Data for training YOLOv3 neural network was extracted from there using custom scripts. Also, network was modified to not only predict humans and their bounding boxes, but also their distance from camera. RMSE of 2.99m for objects with distances up to 50m was achieved, while maintaining similar detection accuracy to the original network. Comparable methods using two neural networks and a LASSO model gave 4.26m (in an alternative dataset) and 4.79m (with dataset used is this work) RMSE respectively, showing a huge improvement over the baselines. Future work includes experiments with real-world data to see if the proposed approach generalizes to other environments.	en
dc.identifier.uri	http://hdl.handle.net/10062/64352
dc.language.iso	eng	et
dc.publisher	Tartu Ülikool	et
dc.rights	openAccess	et
dc.rights	Autorile viitamine + Mitteäriline eesmärk + Tuletatud teoste keeld 3.0 Eesti	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/ee/	*
dc.subject	monocular camera	en
dc.subject	object detection	en
dc.subject	depth estimation	en
dc.subject	CNN	en
dc.subject	off-road	en
dc.subject	YOLOv3	en
dc.subject	simulation	en
dc.subject	kaamera	et
dc.subject	objektide tuvastamine	et
dc.subject	sügavusinfo ennustamine	et
dc.subject	kovolutsioonilised tehisnärvivõrgud	et
dc.subject	simulatsioon	et
dc.subject.other	magistritööd	et
dc.title	Human detection and distance estimation with monocular camera using YOLOv3 neural network	en
dc.title.alternative	Inimeste tuvastamine ning kauguse hindamine kasutades kaamerat ning YOLOv3 tehisnärvivõrku	et
dc.type	Thesis	et

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sattar_MSc2019.pdf
Size:: 2.2 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Robotics and Computer Engineering - Master's theses