Joint Embeddings for Voices and Their Textual Descriptions
dc.contributor.advisor | Fishel, Mark, juhendaja | |
dc.contributor.author | Lastovko, Ivan | |
dc.contributor.other | Tartu Ülikool. Loodus- ja täppisteaduste valdkond | et |
dc.contributor.other | Tartu Ülikool. Arvutiteaduse instituut | et |
dc.date.accessioned | 2023-10-30T12:29:50Z | |
dc.date.available | 2023-10-30T12:29:50Z | |
dc.date.issued | 2023 | |
dc.description.abstract | Embeddings are vector representations which is a highly effective method employed in machine learning to represent data in a more meaningful and efficient manner. In this study, we aim to implement vector representation for speakers’ voices and corresponding textual descriptions, maximizing their cosine similarity. In other words, we want to build a system capable of representing both voices and descriptions of those voices as closely as possible in the multidimensional space. In our work, the data collection process involves using public datasets as well as manually annotated data. In order to conduct our research, we have utilized different training modes, such as standalone, where encoders are trained individually, and joint training techniques, where encoders are trained together to learn to adapt their outputs accordingly. We then evaluated the models on our control sample extracted from the manually collected dataset and assessed the quality of our annotations. We have also investigated the changes in cosine similarity between the speakers’ and voice descriptions’ vector representation with the decline in annotation quality. | et |
dc.identifier.uri | https://hdl.handle.net/10062/93841 | |
dc.language.iso | eng | et |
dc.publisher | Tartu Ülikool | et |
dc.rights | openAccess | et |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | text-to-speech | et |
dc.subject | embeddings | et |
dc.subject | cosine similarity | et |
dc.subject | Wav2Vec2 | et |
dc.subject | sentence encoders | et |
dc.subject.other | magistritööd | et |
dc.subject.other | informaatika | et |
dc.subject.other | infotehnoloogia | et |
dc.subject.other | informatics | et |
dc.subject.other | infotechnology | et |
dc.title | Joint Embeddings for Voices and Their Textual Descriptions | et |
dc.type | Thesis | et |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Lastovko_MSc_computer_science_2023.pdf
- Size:
- 7.69 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: