Joint Embeddings for Voices and Their Textual Descriptions

Lastovko, Ivan

Joint Embeddings for Voices and Their Textual Descriptions

dc.contributor.advisor	Fishel, Mark, juhendaja
dc.contributor.author	Lastovko, Ivan
dc.contributor.other	Tartu Ülikool. Loodus- ja täppisteaduste valdkond	et
dc.contributor.other	Tartu Ülikool. Arvutiteaduse instituut	et
dc.date.accessioned	2023-10-30T12:29:50Z
dc.date.available	2023-10-30T12:29:50Z
dc.date.issued	2023
dc.description.abstract	Embeddings are vector representations which is a highly effective method employed in machine learning to represent data in a more meaningful and efficient manner. In this study, we aim to implement vector representation for speakers’ voices and corresponding textual descriptions, maximizing their cosine similarity. In other words, we want to build a system capable of representing both voices and descriptions of those voices as closely as possible in the multidimensional space. In our work, the data collection process involves using public datasets as well as manually annotated data. In order to conduct our research, we have utilized different training modes, such as standalone, where encoders are trained individually, and joint training techniques, where encoders are trained together to learn to adapt their outputs accordingly. We then evaluated the models on our control sample extracted from the manually collected dataset and assessed the quality of our annotations. We have also investigated the changes in cosine similarity between the speakers’ and voice descriptions’ vector representation with the decline in annotation quality.	et
dc.identifier.uri	https://hdl.handle.net/10062/93841
dc.language.iso	eng	et
dc.publisher	Tartu Ülikool	et
dc.rights	openAccess	et
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	text-to-speech	et
dc.subject	embeddings	et
dc.subject	cosine similarity	et
dc.subject	Wav2Vec2	et
dc.subject	sentence encoders	et
dc.subject.other	magistritööd	et
dc.subject.other	informaatika	et
dc.subject.other	infotehnoloogia	et
dc.subject.other	informatics	et
dc.subject.other	infotechnology	et
dc.title	Joint Embeddings for Voices and Their Textual Descriptions	et
dc.type	Thesis	et

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lastovko_MSc_computer_science_2023.pdf
Size:: 7.69 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

MTAT magistritööd – Master's theses