Enhanced Speech Emotion Recognition Using Averaged Valence Arousal Dominance Mapping and Deep Neural Networks

Rizhinashvili, Davit

Enhanced Speech Emotion Recognition Using Averaged Valence Arousal Dominance Mapping and Deep Neural Networks

dc.contributor.advisor	Anbarjafari, Gholamreza, juhendaja
dc.contributor.advisor	Sham, Abdallah Hussein, juhendaja
dc.contributor.author	Rizhinashvili, Davit
dc.contributor.other	Tartu Ülikool. Loodus- ja täppisteaduste valdkond	et
dc.contributor.other	Tartu Ülikool. Tehnoloogiainstituut	et
dc.date.accessioned	2024-06-18T07:52:45Z
dc.date.available	2024-06-18T07:52:45Z
dc.date.issued	2024
dc.description.abstract	This thesis delves into advancements in speech emotion recognition (SER) by establish ing a novel approach for emotion mapping and prediction using the Valence-Arousal Dominance (VAD) model. Central to this research is the creation of reliable emotion to-VAD mappings, achieved by averaging outcomes from multiple pre-trained networks applied to the RAVDESS dataset. This approach adeptly resolves prior inconsistencies in emotion-to-VAD mappings and establishes a dependable framework for SER. The study also introduces a refined SER model, integrating the pre-trained Wav2Vec 2.0 with Long Short-Term Memory (LSTM) networks and linear layers, culminating in an output layer representing valence, arousal, and dominance. Notably, this model exhibits commendable accuracy across various datasets, such as RAVDESS, EMO-DB, CREMA-D, and TESS, thereby showcasing its robustness and adaptability, an improvement over earlier models susceptible to dataset-specific overfitting. The research further unveils a comprehensive speech analysis application, adept at denoising, segmenting, and profiling emotions in speech segments. This application features interactive emotion tracking and sentiment reports, illustrating its practicality in diverse applications. The study recognizes ongoing challenges in SER, especially in managing the subjective nature of emotion perception and integrating multimodal data. Although the research marks a progression in SER technology, it underscores the need for continuous research and careful consideration of ethical aspects in deploying such technologies. This thesis contributes to the SER domain by introducing a dependable method for emotion to VAD mapping, a robust model for emotion recognition, and a user-friendly application for practical implementations.
dc.identifier.uri	https://hdl.handle.net/10062/99920
dc.language.iso	en
dc.publisher	Tartu Ülikool	et
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Estonia	en
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/ee/
dc.subject	Speech Emotion Recognition
dc.subject	Deep Neural Networks
dc.subject	LSTM
dc.subject	Speech Analysis
dc.subject	Valence
dc.subject	Arousal
dc.subject	Dominance
dc.subject.other	magistritööd	et
dc.title	Enhanced Speech Emotion Recognition Using Averaged Valence Arousal Dominance Mapping and Deep Neural Networks
dc.title.alternative	Täiustatud kõne emotsioonide tuvastamine kasutades keskmistatud valentsuse ergutuse dominantsuse kaardistamist ja süvavõrgustikke
dc.type	Thesis	en

Failid

Originaal pakett

Nüüd näidatakse 1 - 1 1

Nimi:: DavitRizhinashvili_Bioeng.pdf
Suurus:: 1.74 MB
Formaat:: Adobe Portable Document Format

Lae alla

Kollektsioonid

Biotehnoloogia magistritööd - Master's theses