Beyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages

dc.contributor.authorMuradoglu, Saliha
dc.contributor.authorGray, James
dc.contributor.authorSimpson, Jane Helen
dc.contributor.authorProctor, Michael
dc.contributor.authorHarvey, Mark
dc.contributor.editorTudor, Crina Madalina
dc.contributor.editorDebess, Iben Nyholm
dc.contributor.editorBruton, Micaella
dc.contributor.editorScalvini, Barbara
dc.contributor.editorIlinykh, Nikolai
dc.contributor.editorHoldt, Špela Arhar
dc.coverage.spatialTallinn, Estonia
dc.date.accessioned2025-02-14T09:49:17Z
dc.date.available2025-02-14T09:49:17Z
dc.date.issued2025-03
dc.description.abstractLinguistic datasets are essential across fields: computational linguists use them for NLP development, theoretical linguists for statistical arguments supporting hypotheses about language, and documentary linguists for preserving examples and aiding grammatical descriptions. Transforming raw data (e.g., recordings or dictionaries) into structured forms (e.g., tables) requires non-trivial decisions within processing pipelines. This paper highlights the importance of these processes in understanding linguistic systems. Our contributions include: (1) an interactive dashboard for four central Australian languages with custom filters, and (2) demonstrating how data processing decisions influence measured outcomes.
dc.identifier.urihttps://aclanthology.org/2025.resourceful-1.0/
dc.identifier.urihttps://hdl.handle.net/10062/107113
dc.language.isoen
dc.publisherUniversity of Tartu Library
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleBeyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages
dc.typeArticle

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1
Laen...
Pisipilt
Nimi:
2025_resourceful_1_7.pdf
Suurus:
1.37 MB
Formaat:
Adobe Portable Document Format