From Words to Action: A National Initiative to Overcome Data Scarcity for the Slovene LLM

Kuupäev

2025-03

Ajakirja pealkiri

Ajakirja ISSN

Köite pealkiri

Kirjastaja

University of Tartu Library

Abstrakt

Large Language Models (LLMs) have demonstrated significant potential in natural language processing, but they depend on vast, diverse datasets, creating challenges for languages with limited resources. The paper presents a national initiative that addresses these challenges for Slovene. We outline strategies for large-scale text collection, including the creation of an online platform to engage the broader public in contributing texts and a communication campaign promoting openly accessible and transparently developed LLMs.

Kirjeldus

Märksõnad

Viide