Text to speech system for Azerbaijani language

Aghalarli, Yusif

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

dc.contributor.author	Aghalarli, Yusif
dc.date.accessioned	2025-02-10T06:29:58Z
dc.date.available	2025-02-10T06:29:58Z
dc.date.issued	2023-04
dc.identifier.uri	http://hdl.handle.net/20.500.12181/941
dc.description.abstract	This Master thesis focuses on the development of a Text-to-Speech (TTS) system for the Azerbaijani language. TTS technology has been gaining popularity due to its ability to generate human-like speech from written text, making it beneficial for people with disabilities, language learners, and those who prefer auditory learning. The thesis starts with an introduction to TTS, its significance, and its history. The literature review section provides an overview of related studies, including the recent advancements in TTS systems. The review covers several topics, such as the different techniques and models used in TTS systems, the evaluation metrics used to assess their performance, and the challenges and limitations of developing TTS systems for low-resource languages. The main focus of the study is the Tacotron-2 architecture, which is known for its high-quality and natural-sounding speech. This architecture consists of two parts: a mel spectrogram generator and a neural vocoder. The mel spectrogram is a representation of the speech signal that captures its spectral information, while the neural vocoder generates the actual speech waveform. The study also explains the data collection process, which is a crucial component of developing a TTS system. The first data collection attempt produced poor-quality data, which prompted the researchers to refine the process by using an audio book with speech alignment. This process resulted in approximately 19 hours of high-quality data, which was used to train the Tacotron-2 architecture. To evaluate the performance of the TTS system, a survey was conducted, and participants were asked to evaluate the system using the Mean Opinion Score .The results showed that the system received a MOS score of 3.3, indicating that it produced acceptable speech quality. In conclusion, this Master thesis provides a comprehensive overview of developing a TTS system for the Azerbaijani language using the Tacotron-2 architecture. The study presents the different components of the TTS system, the data collection process, and the evaluation metrics used to assess the system's performance. It also highlights the challenges and limitations of developing TTS systems for low-resource languages and suggests future directions for improving the system's performance.	en_US
dc.language.iso	en	en_US
dc.publisher	ADA University	en_US
dc.relation	School of IT and Engineering	en_US
dc.relation	Graduate program	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Speech synthesis -- Neural networks	en_US
dc.subject	Text-to-speech systems -- Azerbaijani language.	en_US
dc.subject	Tacotron-2 architecture -- Applications in speech generation.	en_US
dc.subject	Data collection -- Speech data.	en_US
dc.subject	Speech quality evaluation -- Mean Opinion Score (MOS)	en_US
dc.subject	Low-resource languages -- Speech synthesis	en_US
dc.subject	Language technology -- Applications for disabilities	en_US
dc.subject	IT and Engineering	en_US
dc.title	Text to speech system for Azerbaijani language	en_US
dc.type	Thesis	en_US