ADA Library Digital Repository

Automatic speech recognition for numeric data in Azerbaijani language

Show simple item record

dc.contributor.author Aslanli, Ulvi
dc.date.accessioned 2023-10-01T15:34:41Z
dc.date.available 2023-10-01T15:34:41Z
dc.date.issued 2023-04
dc.identifier.uri http://hdl.handle.net/20.500.12181/655
dc.description.abstract Automatic Speech Recognition (ASR) technology is essential in a variety of applications, such as voice search, virtual assistants, transcription services, and subtitling for people with hearing impairments. Despite its numerous applications, developing ASR systems for low-resource languages like Azerbaijani presents significant challenges due to the scarcity of available data, linguistic variations, and the unique phonetic properties of the language. This thesis specifically addresses the development of an ASR system for recognizing numeric data in Azerbaijani, a Turkic language spoken by approximately 50 million people worldwide. Numeric data recognition has critical practical applications in industries such as finance and transportation, where accurate and reliable recognition of numbers is essential. One of the primary challenges in developing an ASR system for numeric data is the inherent lack of context available to help disambiguate similar-sounding numbers. Unlike general speech recognition, numeric data often appears in isolation or with limited accompanying information, making it more difficult to accurately recognize spoken numbers. This challenge is further exacerbated in low-resource languages like Azerbaijani. The objective of this master’s thesis is to develop an ASR system for numeric data in Azerbaijani by exploring various techniques and methodologies. We investigate the phonetic and linguistic properties of Azerbaijani relevant to numeric data recognition and analyze the existing resources for developing an ASR system. The study proposes a framework for ASR system development, experimenting with different feature extraction and modeling techniques, and evaluating the performance of the system using appropriate metrics. In this research, we developed an ASR system for the Azerbaijani language using the Kaldi toolkit. The ASR model was trained using the classic Hidden Markov Model - Gaussian Mixture Model (HMM-GMM) architecture, employing both monophone and triphone models along with various feature extraction techniques such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC), and Cepstral Mean and Variance Normalization (CMVN). The experimental results showed that the triphone models generally outperformed monophone models, and the combination of MFCC, LPC, and CMVN features provided the best performance among the tested feature extraction techniques. While performance varied across different datasets, our ASR system demonstrated promising potential for further improvements and adaptation to specific challenges presented by each dataset. This thesis contributes to the development of ASR technology for low-resource languages, specifically Azerbaijani, in the domain of numeric data recognition. The results of this research have practical implications for industries that rely on accurate and reliable recognition of numeric data, such as financial services and transportation. As the dataset and ASR system improve, we anticipate that the impact on various applications, including voice assistants, transcription services, and speech analytics in Azerbaijani, will be significant. This study lays the foundation for further research and development of ASR systems for the Azerbaijani language, paving the way for improved and more robust ASR solutions. en_US
dc.language.iso en en_US
dc.publisher ADA University en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject.lcsh Automatic Speech Recognition. en
dc.subject.lcsh Computer algorithms. en
dc.subject.lcsh Numeric data. en
dc.title Automatic speech recognition for numeric data in Azerbaijani language en_US
dc.type Thesis en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search ADA LDR


Advanced Search

Browse

My Account