Geospatial data harmonization and machine learning for large-scale water quality modelling

dc.contributor.advisorUuemaa, Evelyn, juhendaja
dc.contributor.advisorKmoch, Alexander, juhendaja
dc.contributor.authorVirro, Holger
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.date.accessioned2022-10-11T12:19:26Z
dc.date.available2022-10-11T12:19:26Z
dc.date.issued2022-10-11
dc.descriptionVäitekirja elektrooniline versioon ei sisalda publikatsiooneet
dc.description.abstractPõllumajanduslik reostus põhjustab jätkuvalt magevee kvaliteedi üleilmset halvenemist. Tõhusate veemajandamise meetmete väljatöötamisel on oluline osa veekvaliteedi modelleerimisel. Veekvaliteedi laialdaseks modelleerimiseks on aga vajalik hea ruumilise katvusega lähteandmete olemasolu. Töö eesmärk oli parandada ja harmoniseerida veekvaliteedi modelleerimiseks vajalikke andmestikke ning arendada välja masinõppe raamistik, mida saaks kasutada riigiüleseks veekvaliteedi modelleerimiseks. Töö üheks väljundiks on Eesti mullastikuandmebaas EstSoil-EH. EstSoil-EH atribuudid olid sisendiks masinõppe mudelile, mida kasutasin mulla orgaanilise süsiniku sisalduse prognoosimiseks. Selgus, et proovivõtukohtade keskkonnatingimused mõjutasid mudeli prognoosi täpsust. Globaalse veekvaliteedi andmete parandamiseks loodi viie andmestiku põhjal andmebaas Global River Water Quality Archive (GRQA). Mullasüsiniku mudeli loomise käigus õpitu põhjal arendati välja raamistik üle-eestiliseks veekvaliteedi modelleerimiseks. Mudel prognoosis toitainete kontsentratsioone 242 Eesti jõe valglas. Saadud mudelite täpsus on võrreldav Baltimaades varem rakendatud mudelitega. Mudelite täpsust mõjutas valglate suurus, kuna prognoosid olid üldjuhul ebatäpsemad väiksemates valglates. Seejuures piisas rahuldava täpsuse saavutamiseks vähem kui pooltest tunnustest, mis näitab, et tunnuste arvust olulisem on nende kirjeldusvõime. Seega on loodud masinõppe mudelid rakendatavad piirkondades, kus tunnuste tuletamiseks vajalike lähteandmete katvus on piiratud.et
dc.description.abstractThe state of freshwater quality continues to deteriorate worldwide due to agricultural pollution. In order to combat these issues effectively, water quality modeling could be used to better manage water resources. However, large-scale water quality models depend on input datasets with good spatial coverage. The aim of the thesis was to improve and harmonize datasets for water quality modeling purposes and create a machine learning framework for national-scale modeling. We created EstSoil-EH as a new numerical soil database for Estonia by converting the text-based soil properties in the Estonian Soil Map to machine-readable values. We used it to predict soil organic carbon content using the random forest machine learning method and found that the conditions of sampling locations affected prediction accuracy. We improved the global coverage of water quality data by producing the Global River Water Quality Archive (GRQA), which was compiled from five existing large-scale datasets. The compilation involved harmonizing the corresponding metadata, flagging outliers, calculating time series characteristics and detecting duplicate observations. We developed a framework suitable for national-scale water quality modeling based on lessons learnt from predicting soil carbon content. We used 82 environmental variables, including soil properties from EstSoil-EH as features to predict nutrient concentrations in 242 river catchments. The resulting models achieved accuracy comparable to the ones used previously in the Baltic region. We found that the size of the catchment influenced accuracy, since predictions were less accurate in smaller catchments. The models maintained reasonable accuracy even when the number of features was reduced by half, which shows that the relevance of features is more important than the amount. This flexibility makes our models applicable in areas that are otherwise lacking in the input data needed for extracting features.en
dc.description.urihttps://www.ester.ee/record=b5520677
dc.identifier.isbn978-9916-27-033-2
dc.identifier.isbn978-9916-27-034-9 (pdf)
dc.identifier.issn1406-1295
dc.identifier.issn2806-2302 (pdf)
dc.identifier.urihttp://hdl.handle.net/10062/86022
dc.language.isoenget
dc.relation.ispartofseriesDissertationes geographicae Universitatis Tartuensis;86
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectspatial dataen
dc.subjectautomatic learningen
dc.subjectwater qualityen
dc.subjectenvironment simulationen
dc.subjectgeographic information systemsen
dc.subject.otherdissertatsioonidet
dc.subject.otherETDet
dc.subject.otherdissertationset
dc.subject.otherväitekirjadet
dc.subject.otherruumiandmedet
dc.subject.othertehisõpeet
dc.subject.otherveekvaliteetet
dc.subject.otherkeskkonna modelleerimineet
dc.subject.othergeoinfosüsteemidet
dc.titleGeospatial data harmonization and machine learning for large-scale water quality modellinget
dc.title.alternativeRuumiandmete harmoniseerimine ja masinõpe veekvaliteedi modelleerimisekset
dc.typeThesiset

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
virro_holger.pdf
Size:
21.54 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1 B
Format:
Item-specific license agreed upon to submission
Description: