Metaphor Identification for Estonian
Kuupäev
2021
Autorid
Ajakirja pealkiri
Ajakirja ISSN
Köite pealkiri
Kirjastaja
Tartu Ülikool
Abstrakt
Metaphors are a common facet of written and spoken language. For humans, it is
pretty easy to identify and interpret metaphors, but machines struggle to match
this capability. Much research about metaphors has been done in the last decades,
but mainly for English using different approaches - ranging from rule-based to deep
learning-based systems. As of the date of this thesis, there has been no research
done for computational metaphor processing for the Estonian language. In this
thesis, the research in the field of computational metaphors is explicitly applied to
the Estonian language. All the methods implemented are unsupervised or semisupervised
because the resources for Estonian regarding metaphors do not exist.
This thesis also attempts to incorporate contextualized embeddings from the BERT
language model into metaphor identification systems to enhance performance.
For testing the performance of the methods, a new evaluation dataset for the
Estonian language was created1. This dataset contains 500 sentences, from which
232 sentences contain VERB-NOUN phrase where VERB is used metaphorically
and 268 which the VERB was used literally. The best results were obtained using
BERT embeddings alongside with information from Estonian WordNet.
Kirjeldus
Märksõnad
Metaphors, clustering, natural language processing, unsupervised learning, semisupervised learning, metaphor identification, BERT