From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM

dc.contributor.authorHoldt, Špela Arhar
dc.contributor.authorAntloga, Špela
dc.contributor.authorMunda, Tina
dc.contributor.authorPori, Eva
dc.contributor.authorKrek, Simon
dc.contributor.editorTudor, Crina Madalina
dc.contributor.editorDebess, Iben Nyholm
dc.contributor.editorBruton, Micaella
dc.contributor.editorScalvini, Barbara
dc.contributor.editorIlinykh, Nikolai
dc.contributor.editorHoldt, Špela Arhar
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-14T10:36:30Z
dc.date.available2025-02-14T10:36:30Z
dc.date.issued2025-03
dc.description.abstractLarge Language Models (LLMs) have demonstrated significant potential in natural language processing, but they depend on vast, diverse datasets, creating challenges for languages with limited resources. The paper presents a national initiative that addresses these challenges for Slovene. We outline strategies for large-scale text collection, including the creation of an online platform to engage the broader public in contributing texts and a communication campaign promoting openly accessible and transparently developed LLMs.
dc.description.urihttps://aclanthology.org/2025.resourceful-1.0/
dc.identifier.urihttps://hdl.handle.net/10062/107125
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleFrom Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_resourceful_1_27.pdf
Suurus:
176.27 KB
Formaat:
Adobe Portable Document Format