Predicting Cognitive Distortions from Reddit Posts by Using Supervised Machine Learning Methods
dc.contributor.advisor | Sirts, Kairit, juhendaja | |
dc.contributor.author | Grents, Linda Katariina | |
dc.contributor.other | Tartu Ülikool. Loodus- ja täppisteaduste valdkond | et |
dc.contributor.other | Tartu Ülikool. Arvutiteaduse instituut | et |
dc.date.accessioned | 2023-08-25T06:14:18Z | |
dc.date.available | 2023-08-25T06:14:18Z | |
dc.date.issued | 2022 | |
dc.description.abstract | Importance of mental health has gained great attention in modern societies. People have become more open about discussing their thoughts with the public, especially online. One platform that people are using it for is Reddit. The aim of this thesis is to predict cognitive distortions from the texts retrieved from the Anxiety sub-reddit. Cognitive distortions are important to detect as they can potentially have a negative impact on people’s lives. Predic-tions in this work are made by using supervised machine learning methods, such as logistic regression, support vector machine and fasttext (also with pre-trained word vectors). In ad-dition, inter-annotator agreement between annotators is being assessed with Cohen’s Kappa and Krippendorff’s Alpha. The results show that predicting cognitive distortions from the text is a challenge on its own, since the classifiers were not able to produce satisfactory results. This corresponds to related works where predicting different types of distortions have not given very good results. It is assumed that it would be more reasonable to predict the existence of cognitive distortions from the text rather than predicting different types of distortions, as this prediction shows better results. Predicting the existence of some distor-tion might be of more help to people suffering from anxiety or depression. It might also be useful to predict only the most prevalent distortions from the text, as some distortions are probably more prevalent than others. It is important to note that major constraint in this work is related to the dataset, as it is relatively small in size and noisy. If there is a need to predict different types of cognitive distortions, it is suggested to use a larger dataset of better quality. However, this remains a challenge on its own in natural language processing and clinical psychology research area. | et |
dc.identifier.uri | https://hdl.handle.net/10062/91751 | |
dc.language.iso | eng | et |
dc.publisher | Tartu Ülikool | et |
dc.rights | openAccess | et |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Cognitive distortions | et |
dc.subject | mental health | et |
dc.subject | AI | et |
dc.subject | NLP | et |
dc.subject | et | |
dc.subject.other | magistritööd | et |
dc.subject.other | informaatika | et |
dc.subject.other | infotehnoloogia | et |
dc.subject.other | informatics | et |
dc.subject.other | infotechnology | et |
dc.title | Predicting Cognitive Distortions from Reddit Posts by Using Supervised Machine Learning Methods | et |
dc.type | Thesis | et |