Second language Korean Universal Dependency treebank v1.2: Focus on Data Augmentation and Annotation Scheme Refinement

dc.contributor.authorSung, Hakyung
dc.contributor.authorShin, Gyu-Ho
dc.contributor.editorTudor, Crina Madalina
dc.contributor.editorDebess, Iben Nyholm
dc.contributor.editorBruton, Micaella
dc.contributor.editorScalvini, Barbara
dc.contributor.editorIlinykh, Nikolai
dc.contributor.editorHoldt, Špela Arhar
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-14T09:45:33Z
dc.date.available2025-02-14T09:45:33Z
dc.date.issued2025-03
dc.description.abstractWe expand the second language (L2) Korean Universal Dependencies (UD) treebank with 5,454 manually annotated sentences. The annotation guidelines are also revised to better align with the UD framework. Using this enhanced treebank, we fine-tune three Korean language models—Stanza, spaCy, and Trankit—and evaluate their performance on in-domain and out-of-domain L2-Korean datasets. The results show that fine-tuning significantly improves their performance across various metrics, thus highlighting the importance of using well-tailored L2 datasets for fine-tuning first-language-based, general-purpose language models for the morphosyntactic analysis of L2 data.
dc.identifier.urihttps://aclanthology.org/2025.resourceful-1.0/
dc.identifier.urihttps://hdl.handle.net/10062/107111
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleSecond language Korean Universal Dependency treebank v1.2: Focus on Data Augmentation and Annotation Scheme Refinement
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_resourceful_1_4.pdf
Suurus:
1.61 MB
Formaat:
Adobe Portable Document Format