Training the Best Neural Machine Translation Model for the Estonian-English Language Pair
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
To this day, a lot of neural machine translation models have been developed to produce
high-quality translations on many language directions. The same goes for Estonian-
English. However, these models that have been trained on that language pair are mostly
multilingual or already outdated and need enhancing. This bachelor’s thesis represents
a bilingual approach using recent effective technologies with the most current data
available to improve the previous best result for this Estonian-English language pair.
This paper introduces a state-of-the-art bilingual neural machine translation system,
which outperforms the previous best result achieved for Estonian-English. The system
uses different methods to achieve the goal - trains baseline models on parallel data,
generates additional data with available monolingual data and backtranslation, combines
the synthetic data with the initial parallel corpus, trains a new model on the augmented
corpus, and in the final step, uses ensembles of those already trained models.
Description
Keywords
neural networks, machine translation, BLEU, language technology