Error rate of automated part-of-speech tagging of Estonian academic learner English

dc.contributor.advisorKlavan, Jane, juhendaja
dc.contributor.authorKaljuste, Karl August
dc.contributor.otherTartu Ülikool. Humanitaarteaduste ja kunstide valdkondet
dc.contributor.otherTartu Ülikool. Anglistika osakondet
dc.contributor.otherTartu Ülikool. Maailma keelte ja kultuuride kolledž
dc.date.accessioned2021-09-22T07:23:39Z
dc.date.available2021-09-22T07:23:39Z
dc.date.issued2021
dc.description.abstractCorpora are a great tool for linguistic research and improving learner language. At the moment, there exists the Tartu Corpus of Estonian Learner English (TCELE). However, it is small and lacking academic learner English. Building a corpus of Estonian academic learner English (EALE) could fill the gap in TCELE and provide worthwhile information for students, teachers and researchers alike. Modern corpora include various types of annotation and tagging words for their part of speech (POS) is the most common of them, but manual tagging is an overwhelmingly long and difficult task. Automated taggers can make this process relatively fast and easy. However, while automated tagger performance has been evaluated with both native writing and learner writing, there is a lack of research of automated tagger performance on academic learner writing. This paper aims to study the accuracy of automated POS tagging of EALE. To achieve this, a corpus of EALE was built and tagged using the Natural Language Toolkit (NLTK) POS tagger with the results compared against a sample of manually added tags.et
dc.description.urihttps://www.ester.ee/record=b5460798*estet
dc.identifier.urihttp://hdl.handle.net/10062/74209
dc.language.isoenget
dc.publisherTartu Ülikoolet
dc.rightsopenAccesset
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectakadeemiline õppijakeelet
dc.subjectmärgendamineet
dc.subject.otherbakalaureusetöödet
dc.subject.otheringlise keelet
dc.subject.otherkorpused (keelet.)et
dc.subject.otherkeeleteaduset
dc.subject.othergrammatikaet
dc.subject.othersõnaliigidet
dc.subject.otherkorpuslingvistikaet
dc.titleError rate of automated part-of-speech tagging of Estonian academic learner Englishet
dc.typeThesiset

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kaljuste_BA_2021.pdf
Size:
412.91 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.67 KB
Format:
Item-specific license agreed upon to submission
Description: