Mapping Faroese in the Multilingual Representation Space: Insights for ASR Model Optimization

dc.contributor.authorLág, Dávid í
dc.contributor.authorScalvini, Barbara
dc.contributor.authorGudnason, Jon
dc.contributor.editorJohansson, Richard
dc.contributor.editorStymne, Sara
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-18T09:37:59Z
dc.date.available2025-02-18T09:37:59Z
dc.date.issued2025-03
dc.description.abstractASR development for low-resource languages like Faroese faces significant challenges due to the scarcity of large, diverse datasets. While fine-tuning multilingual models using related languages is a common practice, there is no standardized method for selecting these auxiliary languages, leading to a computationally expensive trial-and-error process. By analyzing Faroese’s positioning among other languages in wav2vec2’s multilingual representation space, we find that Faroese's closest neighbors are influenced not only by linguistic similarity but also by historical, phonetic, and cultural factors. These findings open new avenues for auxiliary language selection to improve Faroese ASR and underscore the potential value of data-driven factors in ASR fine-tuning.
dc.identifier.urihttps://hdl.handle.net/10062/107229
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.relation.ispartofseriesNEALT Proceedings Series, No. 57
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleMapping Faroese in the Multilingual Representation Space: Insights for ASR Model Optimization
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nodalida_1_38.pdf
Suurus:
237.5 KB
Formaat:
Adobe Portable Document Format