Fine-Tuning Cross-Lingual LLMs for POS Tagging in Code-Switched Contexts

Absar, Shayaan

Fine-Tuning Cross-Lingual LLMs for POS Tagging in Code-Switched Contexts

dc.contributor.author	Absar, Shayaan
dc.contributor.editor	Tudor, Crina Madalina
dc.contributor.editor	Debess, Iben Nyholm
dc.contributor.editor	Bruton, Micaella
dc.contributor.editor	Scalvini, Barbara
dc.contributor.editor	Ilinykh, Nikolai
dc.contributor.editor	Holdt, Špela Arhar
dc.coverage.spatial	Tallinn, Estonia
dc.date.accessioned	2025-02-14T09:41:38Z
dc.date.available	2025-02-14T09:41:38Z
dc.date.issued	2025-03
dc.description.abstract	Code-switching (CS) involves speakers switching between two (or potentially more) languages during conversation and is a common phenomenon in bilingual communities. The majority of NLP research has been devoted to mono-lingual language modelling. Consequentially, most models perform poorly on code-switched data. This paper investigates the effectiveness of Cross-Lingual Large Language Models on the task of POS (Part-of-Speech) tagging in code-switched contexts, once they have undergone a fine-tuning process. The models are trained on code-switched combinations of Indian languages and English. This paper also seeks to investigate whether fine-tuned models are able to generalise and POS tag code-switched combinations that were not a part of the fine-tuning dataset. Additionally, this paper presents a new metric, the S-index (Switching-Index), for measuring the level of code-switching within an utterance.
dc.identifier.uri	https://aclanthology.org/2025.resourceful-1.0/
dc.identifier.uri	https://hdl.handle.net/10062/107110
dc.language.iso	en
dc.publisher	University of Tartu Library
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title	Fine-Tuning Cross-Lingual LLMs for POS Tagging in Code-Switched Contexts
dc.type	Article

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: 2025_resourceful_1_2.pdf
Suurus:: 775.28 KB
Formaat:: Adobe Portable Document Format

Lae alla

Kollektsioonid

Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)