A Collection of Question Answering Datasets for Norwegian

dc.contributor.authorMikhailov, Vladislav
dc.contributor.authorMæhlum, Petter
dc.contributor.authorLangø, Victoria Ovedie Chruickshank
dc.contributor.authorVelldal, Erik
dc.contributor.authorØvrelid, Lilja
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-18T13:48:23Z
dc.date.available2025-02-18T13:48:23Z
dc.date.issued2025-03
dc.description.abstractThis paper introduces a new suite of question answering datasets for Norwegian; NorOpenBookQA, NorCommonSenseQA, NorTruthfulQA, and NRK-Quiz-QA. The data covers a wide range of skills and knowledge domains, including world knowledge, commonsense reasoning, truthfulness, and knowledge about Norway. Covering both of the written standards of Norwegian – Bokmål and Nynorsk – our datasets comprise over 10k question-answer pairs, created by native speakers. We detail our dataset creation approach and present the results of evaluating 11 language models (LMs) in zero- and few-shot regimes. Most LMs perform better in Bokmål than Nynorsk, struggle most with commonsense reasoning, and are often untruthful in generating answers to questions. All our datasets and annotation materials are publicly available.
dc.identifier.urihttps://hdl.handle.net/10062/107235
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleA Collection of Question Answering Datasets for Norwegian
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_43.pdf
Suurus:
191.48 KB
Formaat:
Adobe Portable Document Format