Danoliteracy of Generative Large Language Models

Vejlgaard Holm, Søren; Hansen, Lars Kai; Nielsen, Martin Carsten

Danoliteracy of Generative Large Language Models

dc.contributor.author	Vejlgaard Holm, Søren
dc.contributor.author	Hansen, Lars Kai
dc.contributor.author	Nielsen, Martin Carsten
dc.contributor.editor	Johansson, Richard
dc.contributor.editor	Stymne, Sara
dc.coverage.spatial	Tallinn, Estonia
dc.date.accessioned	2025-02-19T09:09:22Z
dc.date.available	2025-02-19T09:09:22Z
dc.date.issued	2025-03
dc.description.abstract	The language technology moonshot moment of Generative Large Language Models (GLLMs) was not limited to English: These models brought a surge of technological applications, investments, and hype to low-resource languages as well. However, the capabilities of these models in languages such as Danish were, until recently, difficult to verify beyond qualitative demonstrations due to a lack of applicable evaluation corpora. We present a GLLM benchmark to evaluate Danoliteracy, a measure of Danish language and cultural competency across eight diverse scenarios such as Danish citizenship tests and abstractive social media question answering. This limited-size benchmark was found to produce a robust ranking that correlates to human feedback at $\rho \sim 0.8$ with GPT-4 and Claude Opus models achieving the highest rankings. Analyzing these model results across scenarios, we find one strong underlying factor explaining $95\%$ of scenario performance variance for GLLMs in Danish, suggesting a $g$ factor of model consistency in language adaptation.
dc.identifier.uri	https://hdl.handle.net/10062/107271
dc.language.iso	en
dc.publisher	University of Tartu Library
dc.relation.ispartofseries	NEALT Proceedings Series, No. 57
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Danoliteracy of Generative Large Language Models
dc.type	Article

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: 2025_nodalida_1_78.pdf
Suurus:: 621.51 KB
Formaat:: Adobe Portable Document Format

Lae alla

Kollektsioonid

Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)