WikiQA-IS: Assisted Benchmark Generation and Automated Evaluation of Icelandic Cultural Knowledge in LLMs
dc.contributor.author | Arnardóttir, Þórunn | |
dc.contributor.author | Einarsson, Elías Bjartur | |
dc.contributor.author | Ingvarsson Juto, Garðar | |
dc.contributor.author | Helgason, Þorvaldur Páll | |
dc.contributor.author | Einarsson, Hafsteinn | |
dc.contributor.editor | Tudor, Crina Madalina | |
dc.contributor.editor | Debess, Iben Nyholm | |
dc.contributor.editor | Bruton, Micaella | |
dc.contributor.editor | Scalvini, Barbara | |
dc.contributor.editor | Ilinykh, Nikolai | |
dc.contributor.editor | Holdt, Špela Arhar | |
dc.coverage.spatial | Tallinn, Estonia | |
dc.date.accessioned | 2025-02-14T10:11:47Z | |
dc.date.available | 2025-02-14T10:11:47Z | |
dc.date.issued | 2025-03 | |
dc.description.abstract | This paper presents WikiQA-IS, a novel question-answering dataset focusing on Icelandic culture and history, along with an automated pipeline for dataset generation and evaluation. Leveraging GPT-4 to create questions and answers based on Icelandic Wikipedia articles and news sources, we produced a high-quality corpus of 2,000 question-answer pairs. We introduce an automatic evaluation method using GPT-4o as a judge, which shows strong agreement with human evaluations. Our benchmark reveals varying performances across different language models, with closed-source models generally outperforming open-weights alternatives. This work contributes a resource for evaluating language models' knowledge of Icelandic culture and offers a replicable framework for creating similar datasets in other cultural contexts. | |
dc.identifier.uri | https://aclanthology.org/2025.resourceful-1.0/ | |
dc.identifier.uri | https://hdl.handle.net/10062/107117 | |
dc.language.iso | en | |
dc.publisher | University of Tartu Library | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.title | WikiQA-IS: Assisted Benchmark Generation and Automated Evaluation of Icelandic Cultural Knowledge in LLMs | |
dc.type | Article |
Failid
Originaal pakett
1 - 1 1
Laen...
- Nimi:
- 2025_resourceful_1_13.pdf
- Suurus:
- 725.64 KB
- Formaat:
- Adobe Portable Document Format