Multi-label Scandinavian Language Identification (SLIDE)

dc.contributor.authorFedorova, Mariia
dc.contributor.authorFrydenberg, Jonas Sebulon
dc.contributor.authorHandford, Victoria
dc.contributor.authorLangø, Victoria Ovedie Chruickshank
dc.contributor.authorWilloch, Solveig Helene
dc.contributor.authorMidtgaard, Marthe Løken
dc.contributor.authorScherrer, Yves
dc.contributor.authorMæhlum, Petter
dc.contributor.authorSamuel, David
dc.contributor.editorTudor, Crina Madalina
dc.contributor.editorDebess, Iben Nyholm
dc.contributor.editorBruton, Micaella
dc.contributor.editorScalvini, Barbara
dc.contributor.editorIlinykh, Nikolai
dc.contributor.editorHoldt, Špela Arhar
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-14T10:48:13Z
dc.date.available2025-02-14T10:48:13Z
dc.date.issued2025-03
dc.description.abstractIdentifying closely related languages at sentence level is difficult, in particular because it is often impossible to assign a sentence to a single language. In this paper, we focus on multi-label sentence-level Scandinavian language identification (LID) for Danish, Norwegian Bokmål, Norwegian Nynorsk, and Swedish. We present the Scandinavian Language Identification and Evaluation, SLIDE, a manually curated multi-label evaluation dataset and a suite of LID models with varying speed–accuracy tradeoffs. We demonstrate that the ability to identify multiple languages simultaneously is necessary for any accurate LID method, and present a novel approach to training such multi-label LID models.
dc.description.urihttps://aclanthology.org/2025.resourceful-1.0/
dc.identifier.urihttps://hdl.handle.net/10062/107130
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleMulti-label Scandinavian Language Identification (SLIDE)
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_resourceful_1_33.pdf
Suurus:
245.49 KB
Formaat:
Adobe Portable Document Format