Predicting Depression Symptoms Based on Reddit Posts
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Tartu Ülikool
Abstract
Using social media posts to predict mental health problems has become a popular topic
in Natural Language Processing (NLP). Machine learning has been used for detecting
a diagnosis or single symptoms associated with depression. As the clinical picture of
depression can differ for people, it is better to detect symptoms instead of diagnosis
from the social media posts. In this work, depression symptoms are predicted based on
posts from Reddit page r/depression using NLP methods and multi-label classification.
This work focuses on evaluating the quality of the annotations and analysing if such
data can be used to train a predictive model. Each post is annotated by three annotators
and the labels are aggregated in three ways to create three datasets that are used to train
Transformers models. The results of this work reveal that on a small dataset with a lower
annotation agreement, a majority vote over annotations gives the most reliable dataset
and results. RoBERTa model shows the best learning and generalization ability in this
work.
Description
Keywords
Multi-label classification, Transformers, symptom prediction, depression, social media