Training the Best Neural Machine Translation Model for the Estonian-English Language Pair

dc.contributor.advisorTättar, Andre, juhendaja
dc.contributor.authorKuningas, Kristiina
dc.contributor.otherTartu Ülikool. Loodus- ja täppisteaduste valdkondet
dc.contributor.otherTartu Ülikool. Arvutiteaduse instituutet
dc.date.accessioned2023-09-28T11:17:00Z
dc.date.available2023-09-28T11:17:00Z
dc.date.issued2021
dc.description.abstractTo this day, a lot of neural machine translation models have been developed to produce high-quality translations on many language directions. The same goes for Estonian- English. However, these models that have been trained on that language pair are mostly multilingual or already outdated and need enhancing. This bachelor’s thesis represents a bilingual approach using recent effective technologies with the most current data available to improve the previous best result for this Estonian-English language pair. This paper introduces a state-of-the-art bilingual neural machine translation system, which outperforms the previous best result achieved for Estonian-English. The system uses different methods to achieve the goal - trains baseline models on parallel data, generates additional data with available monolingual data and backtranslation, combines the synthetic data with the initial parallel corpus, trains a new model on the augmented corpus, and in the final step, uses ensembles of those already trained models.et
dc.identifier.urihttps://hdl.handle.net/10062/93213
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectneural networkset
dc.subjectmachine translationet
dc.subjectBLEUet
dc.subjectlanguage technologyet
dc.subject.otherbakalaureusetöödet
dc.subject.otherinformaatikaet
dc.subject.otherinfotehnoloogiaet
dc.subject.otherinformaticset
dc.subject.otherinfotechnologyet
dc.titleTraining the Best Neural Machine Translation Model for the Estonian-English Language Pairet
dc.typeThesiset

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kuningas_Informaatika_2021.pdf
Size:
288.68 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: