Automated cognitive distortion de-tection and classification of Reddit posts using machine learning
dc.contributor.advisor | Sirts, Kairit, juhendaja | |
dc.contributor.author | Sochynskyi, Stanislav | |
dc.contributor.other | Tartu Ülikool. Loodus- ja täppisteaduste valdkond | et |
dc.contributor.other | Tartu Ülikool. Arvutiteaduse instituut | et |
dc.date.accessioned | 2023-09-26T09:33:17Z | |
dc.date.available | 2023-09-26T09:33:17Z | |
dc.date.issued | 2021 | |
dc.description.abstract | A vicious circle of exaggerated thinking patterns, also known as cognitive distortions, can lead a person to anxiety and major depression. Automatic detection and classification of cognitive distortions can be beneficial for the initial mental health screening, the better use of counselling time, and improve accessibility of mental healthcare services. In this work, we apply logistic regression, Support Vector Machines (SVM), and fasttext classifiers to identify cognitive distortions in the real-world data from Reddit. For binary classification, the best F-score of 0.71 with the fasttext classifier. For multiclass classification task, the best F-score of 0.23 was achieved with Support Vector Machine (SVM) using tf-idf vectorisation. However, the metrics of some classes do not exceed the random chance baseline. A possible explanation is that the created dataset is sufficient to build a binary classifier, but more accurate models require more data to distinguish a larger number of classes. Addition-ally, we experimented with unsupervised clustering and topic modelling algorithms and did not find evidence that unsupervised methods could extract the patterns of cognitive distortions from a text. We developed an annotation guideline for manual annotation of cognitive distortions and applied it to annotate 2021 Reddit posts. We achieved kappa's score of 0.569 for binary case and 0.424 for multiclass case annotation, meaning moderate agreement be-tween annotators. A higher number of classes leads to poorer consistency in annotation agreement, mainly due to overlapping definitions of cognitive distortions. Consequently, any automated methods cannot be expected to show high results in cognitive distortion classification. | et |
dc.identifier.uri | https://hdl.handle.net/10062/93136 | |
dc.language.iso | eng | et |
dc.publisher | Tartu Ülikool | et |
dc.rights | openAccess | et |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Machine learning | et |
dc.subject | mental health | et |
dc.subject | natural language processing | et |
dc.subject | cognitive distortions | et |
dc.subject | data annotation | et |
dc.subject.other | magistritööd | et |
dc.subject.other | informaatika | et |
dc.subject.other | infotehnoloogia | et |
dc.subject.other | informatics | et |
dc.subject.other | infotechnology | et |
dc.title | Automated cognitive distortion de-tection and classification of Reddit posts using machine learning | et |
dc.type | Thesis | et |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Sochynskyi_InnovationAndTechnologyManagement_2021.pdf
- Size:
- 3.09 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: