Danoliteracy of Generative Large Language Models

dc.contributor.authorVejlgaard Holm, Søren
dc.contributor.authorHansen, Lars Kai
dc.contributor.authorNielsen, Martin Carsten
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-19T09:09:22Z
dc.date.available2025-02-19T09:09:22Z
dc.date.issued2025-03
dc.description.abstractThe language technology moonshot moment of Generative Large Language Models (GLLMs) was not limited to English: These models brought a surge of technological applications, investments, and hype to low-resource languages as well. However, the capabilities of these models in languages such as Danish were, until recently, difficult to verify beyond qualitative demonstrations due to a lack of applicable evaluation corpora. We present a GLLM benchmark to evaluate Danoliteracy, a measure of Danish language and cultural competency across eight diverse scenarios such as Danish citizenship tests and abstractive social media question answering. This limited-size benchmark was found to produce a robust ranking that correlates to human feedback at $\rho \sim 0.8$ with GPT-4 and Claude Opus models achieving the highest rankings. Analyzing these model results across scenarios, we find one strong underlying factor explaining $95\%$ of scenario performance variance for GLLMs in Danish, suggesting a $g$ factor of model consistency in language adaptation.
dc.identifier.urihttps://hdl.handle.net/10062/107271
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleDanoliteracy of Generative Large Language Models
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_78.pdf
Suurus:
621.51 KB
Formaat:
Adobe Portable Document Format