Interpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language

dc.contributor.authorAkef, Soroosh
dc.contributor.authorMeurers, Detmar
dc.contributor.authorMendes, Amália
dc.contributor.authorRebuschat, Patrick
dc.contributor.editorMuñoz Sánchez, Ricardo
dc.contributor.editorAlfter, David
dc.contributor.editorVolodina, Elena
dc.contributor.editorKallas, Jelena
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-17T10:40:02Z
dc.date.available2025-02-17T10:40:02Z
dc.date.issued2025-03
dc.description.abstractThis study leverages interpretable machine learning to investigate how different societal languages (SLs) influence the written production of Portuguese heritage language (HL) learners. Using a corpus of learner texts from adolescents in Germany and the UK, we systematically control for topic and proficiency level to isolate the cross-linguistic effects that each SL may exert on the HL. We automatically extract a wide range of linguistic complexity measures, including lexical, morphological, syntactic, discursive, and grammatical measures, and apply clustering-based undersampling to ensure balanced and representative data. Utilizing an explainable boosting machine, a class of inherently interpretable machine learning models, our approach identifies predictive patterns that discriminate between English- and German-influenced HL texts. The findings highlight distinct lexical and morphosyntactic patterns associated with each SL, with some patterns in the HL mirroring the structures of the SL. These results support the role of the SL in characterizing HL output. Beyond offering empirical evidence of cross-linguistic influence, this work demonstrates how interpretable machine learning can serve as an empirical test bed for language acquisition research.
dc.identifier.urihttps://hdl.handle.net/10062/107169
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleInterpretable Machine Learning for Societal Language Identification: Modeling English and German Influences on Portuguese Heritage Language
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_nlp4call_1_4.pdf
Suurus:
608.06 KB
Formaat:
Adobe Portable Document Format