dc.contributor.author | Akhundova, Natavan | |
dc.date.accessioned | 2023-10-16T11:25:39Z | |
dc.date.available | 2023-10-16T11:25:39Z | |
dc.date.issued | 2022-04 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12181/727 | |
dc.description.abstract | Speaker identification is a process of identifying a person who is speaking and is very useful in applications such as customer service or even in investigations and reporting forensic evidence. This study focuses on finding the relationship between the latest state of-art technology in speaker recognition which is x-vectors, and the uttered text within audio signals, as well as, the duration of them. In order to accomplish that, three different datasets are used: two relatively small digits datasets in English and Azerbaijani, and one larger dataset of digits and commands in Azerbaijani. The hypotheses tested in this research are as following: 1) x-vectors hold the information about the text in audio recordings, and the accuracy of the model changes as the text is changed; 2) x-vectors show better accuracy with longer audio recordings than shorter ones. All three datasets were trained to test the first hypothesis and the findings show that when the models are given audio samples in which a new unseen text is uttered, the accuracy decreases drastically. The last dataset was used to test the second hypothesis. Indeed, x-vectors are data-hungry and more speech samples together with longer duration of recordings gave the best results. Although, most of the experiments are conducted in the Azerbaijani language, it is believed that the results are not related to the specific language. Moreover, testing these hypotheses with a dataset of another language will yield the same results, as proved with the English dataset in this study. | en_US |
dc.language.iso | en | en_US |
dc.publisher | ADA University | en_US |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Speaker recognition. | en_US |
dc.subject | Speaker identification. | en_US |
dc.subject | TDNN. | en_US |
dc.title | Text-Dependent Speaker Identification | en_US |
dc.type | Thesis | en_US |
The following license files are associated with this item: