Masinõppe mudelite hindamine väheste märgenditega andmetel

dc.contributor.advisorLaur, Sven, juhendaja
dc.contributor.authorAun, Mart-Mihkel
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2024-10-07T11:13:33Z
dc.date.available2024-10-07T11:13:33Z
dc.date.issued2023
dc.description.abstractMachine learning models used to solve classification tasks are evaluated using quality measures such as accuracy, precision, and recall. These measures or their estimates are calculated through the class labels of data points and the classifications of the method on those data points. To find the actual class labels, they must be manually reviewed. Often, quality measures are evaluated using a finite sample, and the obtained estimates obtained errors. In this thesis, the necessary sample size was derived, to not exceed the limit of estimation error with a certain confidence level. In addition, for a sample, the definition-based way of finding the accuracy, precision, or recall of all the sample data points’ labels must be determined. If another method exists in addition to the method being evaluated, it can be used for a new evaluation. In this case, it is possible to reduce the amount of manual work required for labeling by examining how much better the new method is than the old one instead of calculating the quality measures of the new method. This thesis explored techniques that help to reduce the number of data points that require labeling for the evaluation of the quality measures of the two classification methods.
dc.identifier.urihttps://hdl.handle.net/10062/105225
dc.language.isoet
dc.publisherTartu Ülikoolet
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Estoniaen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/ee/
dc.subjectmachine learning
dc.subjectclassification
dc.subjectprobability theory
dc.subjectstatistics
dc.subjectaccruacy
dc.subject2 precision
dc.subjectrecall
dc.subject.otherbakalaureusetöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticsen
dc.subject.otherinfotechnologyen
dc.titleMasinõppe mudelite hindamine väheste märgenditega andmetel
dc.typeThesis

Failid

Originaal pakett

Nüüd näidatakse 1 - 2 2
Laen...
Pisipilt
Nimi:
Aun_informaatika_2023.pdf
Suurus:
539.86 KB
Formaat:
Adobe Portable Document Format
Pisipilt ei ole saadaval
Nimi:
lisad.zip
Suurus:
297.07 KB
Formaat:
Compressed ZIP