SweClinEval: A Benchmark for Swedish Clinical Natural Language Processing

Vakili, Thomas; Hansson, Martin; Henriksson, Aron

SweClinEval: A Benchmark for Swedish Clinical Natural Language Processing

dc.contributor.author	Vakili, Thomas
dc.contributor.author	Hansson, Martin
dc.contributor.author	Henriksson, Aron
dc.contributor.editor	Johansson, Richard
dc.contributor.editor	Stymne, Sara
dc.coverage.spatial	Tallinn, Estonia
dc.date.accessioned	2025-02-19T09:06:17Z
dc.date.available	2025-02-19T09:06:17Z
dc.date.issued	2025-03
dc.description.abstract	The lack of benchmarks in certain domains and for certain languages makes it difficult to track progress regarding the state-of-the-art of NLP in those areas, potentially impeding progress in important, specialized domains. Here, we introduce the first Swedish benchmark for clinical NLP: _SweClinEval_. The first iteration of the benchmark consists of six clinical NLP tasks, encompassing both document-level classification and named entity recognition tasks, with real clinical data. We evaluate nine different encoder models, both Swedish and multilingual. The results show that domain-adapted models outperform generic models on sequence-level classification tasks, while certain larger generic models outperform the clinical models on named entity recognition tasks. We describe how the benchmark can be managed despite limited possibilities to share sensitive clinical data, and discuss plans for extending the benchmark in future iterations.
dc.identifier.uri	https://hdl.handle.net/10062/107269
dc.language.iso	en
dc.publisher	University of Tartu Library
dc.relation.ispartofseries	NEALT Proceedings Series, No. 57
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	SweClinEval: A Benchmark for Swedish Clinical Natural Language Processing
dc.type	Article

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: 2025_nodalida_1_76.pdf
Suurus:: 118.38 KB
Formaat:: Adobe Portable Document Format

Lae alla

Kollektsioonid

Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)