The Danish Idiom Dataset: A collection of 1000 Danish idioms and fixed expressions

Kuupäev

2025-03

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

University of Tartu Library

Abstrakt

Interpreting idiomatic expressions is a challenging task for learners and LLMs alike, as their meanings cannot be deduced directly from their individual components and often reflect nuances that are specific to the language in question. This makes idiom interpretation an ideal task for assessing the linguistic proficiency of large language models (LLMs). In order to test how LLMs handle this task, we introduce a new dataset comprising 1000 Danish idiomatic expressions sourced from the Danish Dictionary DDO (ordnet.dk/ddo). The dataset has been made publicly available at sprogteknologi.dk. For each expression, the dataset includes a correct dictionary definition, a literal false definition, a figurative false definition, and a random false definition. In the paper, we also present three experiments that demonstrate diverse applications of the dataset and aim to evaluate how well LLMs are able to identify the correct meanings of idiomatic expressions.

Kirjeldus

Märksõnad

Viide