Efficient Use of Pre-trained NMT Models Through Mixing and Matching
Kuupäev
2023
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikool
Abstrakt
With an increasing amount of pre-trained language models and neural machine
translation (NMT) models becoming available, it is important to investigate how to use
them when training new models to avoid expensive training from scratch. This thesis
investigates how to effectively use pre-trained models, focusing on combining encoders
and decoders of different independent pre-trained NMT models as modules. This is not
directly possible since the intermediate representations of any two independent NMT
models are different and cannot be combined without modification. To get around
this, firstly, a dimension adapter is added if the encoder and decoder have different
embedding dimensionalities, and secondly, extra encoder layers are added after the
pre-trained encoder to align the intermediate representations. As a proof of concept,
this thesis looks at many-to-Estonian translation and combines a massively multilingual
encoder and a high-quality language-specific decoder. The results show significant
improvements in both translation quality and speed for many-to-one translation over the
baseline multilingual model. Furthermore, the ability to rapidly train a high-quality NMT
system is successfully demonstrated with Estonain-Ukrainian and Ukrainian-Estonian
translation, achieving competitive results compared to previous works. More broadly,
the thesis demonstrates that sentence representations of two independent NMT models
can be made compatible without changing the pre-trained components while keeping
translation quality from deteriorating.
Kirjeldus
Märksõnad
natural language processing, neural machine translation, machine translation, multilingual machine translation, artificial neural networks