The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective

dc.contributor.authorRosa, Javier de la
dc.contributor.authorMikhailov, Vladislav
dc.contributor.authorZhang, Lemei
dc.contributor.authorWetjen, Freddy
dc.contributor.authorSamuel, David
dc.contributor.authorLiu, Peng
dc.contributor.authorBraaten, Rolv-Arild
dc.contributor.authorMæhlum, Petter
dc.contributor.authorBirkenes, Magnus Breder
dc.contributor.authorKutuzov, Andrey
dc.contributor.authorEnstad, Tita
dc.contributor.authorFarsethås, Hans Christian
dc.contributor.authorBrygfjeld, Svein Arne
dc.contributor.authorGulla, Jon Atle
dc.contributor.authorOepen, Stephan
dc.contributor.authorVelldal, Erik
dc.contributor.authorØstgulen, Wilfred
dc.contributor.authorØvrelid, Lilja
dc.contributor.authorMyhre, Aslak Sira
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-18T14:41:50Z
dc.date.available2025-02-18T14:41:50Z
dc.date.issued2025-03
dc.description.abstractThe use of copyrighted materials in training language models raises critical legal and ethical questions. This paper presents a framework for and the results of empirically assessing the impact of publisher-controlled copyrighted corpora on the performance of generative large language models (LLMs) for Norwegian. When evaluated on a diverse set of tasks, we found that adding both books and newspapers to the data mixture of LLMs tend to improve their performance, while the addition of fiction works seems to be detrimental. Our experiments could inform the creation of a compensation scheme for authors whose works contribute to AI development.
dc.identifier.urihttps://hdl.handle.net/10062/107251
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleThe Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_59.pdf
Suurus:
433.76 KB
Formaat:
Adobe Portable Document Format