Training the Best Neural Machine Translation Model for the Estonian-English Language Pair
dc.contributor.advisor | Tättar, Andre, juhendaja | |
dc.contributor.author | Kuningas, Kristiina | |
dc.contributor.other | Tartu Ülikool. Loodus- ja täppisteaduste valdkond | et |
dc.contributor.other | Tartu Ülikool. Arvutiteaduse instituut | et |
dc.date.accessioned | 2023-09-28T11:17:00Z | |
dc.date.available | 2023-09-28T11:17:00Z | |
dc.date.issued | 2021 | |
dc.description.abstract | To this day, a lot of neural machine translation models have been developed to produce high-quality translations on many language directions. The same goes for Estonian- English. However, these models that have been trained on that language pair are mostly multilingual or already outdated and need enhancing. This bachelor’s thesis represents a bilingual approach using recent effective technologies with the most current data available to improve the previous best result for this Estonian-English language pair. This paper introduces a state-of-the-art bilingual neural machine translation system, which outperforms the previous best result achieved for Estonian-English. The system uses different methods to achieve the goal - trains baseline models on parallel data, generates additional data with available monolingual data and backtranslation, combines the synthetic data with the initial parallel corpus, trains a new model on the augmented corpus, and in the final step, uses ensembles of those already trained models. | et |
dc.identifier.uri | https://hdl.handle.net/10062/93213 | |
dc.language.iso | eng | et |
dc.publisher | Tartu Ülikool | et |
dc.rights | openAccess | et |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | neural networks | et |
dc.subject | machine translation | et |
dc.subject | BLEU | et |
dc.subject | language technology | et |
dc.subject.other | bakalaureusetööd | et |
dc.subject.other | informaatika | et |
dc.subject.other | infotehnoloogia | et |
dc.subject.other | informatics | et |
dc.subject.other | infotechnology | et |
dc.title | Training the Best Neural Machine Translation Model for the Estonian-English Language Pair | et |
dc.type | Thesis | et |