Danoliteracy of Generative Large Language Models
dc.contributor.author | Vejlgaard Holm, Søren | |
dc.contributor.author | Hansen, Lars Kai | |
dc.contributor.author | Nielsen, Martin Carsten | |
dc.contributor.editor | Johansson, Richard | |
dc.contributor.editor | Stymne, Sara | |
dc.coverage.spatial | Tallinn, Estonia | |
dc.date.accessioned | 2025-02-19T09:09:22Z | |
dc.date.available | 2025-02-19T09:09:22Z | |
dc.date.issued | 2025-03 | |
dc.description.abstract | The language technology moonshot moment of Generative Large Language Models (GLLMs) was not limited to English: These models brought a surge of technological applications, investments, and hype to low-resource languages as well. However, the capabilities of these models in languages such as Danish were, until recently, difficult to verify beyond qualitative demonstrations due to a lack of applicable evaluation corpora. We present a GLLM benchmark to evaluate Danoliteracy, a measure of Danish language and cultural competency across eight diverse scenarios such as Danish citizenship tests and abstractive social media question answering. This limited-size benchmark was found to produce a robust ranking that correlates to human feedback at $\rho \sim 0.8$ with GPT-4 and Claude Opus models achieving the highest rankings. Analyzing these model results across scenarios, we find one strong underlying factor explaining $95\%$ of scenario performance variance for GLLMs in Danish, suggesting a $g$ factor of model consistency in language adaptation. | |
dc.identifier.uri | https://hdl.handle.net/10062/107271 | |
dc.language.iso | en | |
dc.publisher | University of Tartu Library | |
dc.relation.ispartofseries | NEALT Proceedings Series, No. 57 | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.title | Danoliteracy of Generative Large Language Models | |
dc.type | Article |
Failid
Originaal pakett
1 - 1 1