Automatic speech recognition for numeric data in Azerbaijani language

Aslanli, Ulvi

Library MyADA ADA University

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

Automatic speech recognition for numeric data in Azerbaijani language

Aslanli, Ulvi

URI: http://hdl.handle.net/20.500.12181/655

Date: 2023-04

Abstract:

Automatic Speech Recognition (ASR) technology is essential in a variety of applications, such as voice search, virtual assistants, transcription services, and subtitling for people with hearing impairments. Despite its numerous applications, developing ASR systems for low-resource languages like Azerbaijani presents significant challenges due to the scarcity of available data, linguistic variations, and the unique phonetic properties of the language. This thesis specifically addresses the development of an ASR system for recognizing numeric data in Azerbaijani, a Turkic language spoken by approximately 50 million people worldwide. Numeric data recognition has critical practical applications in industries such as finance and transportation, where accurate and reliable recognition of numbers is essential. One of the primary challenges in developing an ASR system for numeric data is the inherent lack of context available to help disambiguate similar-sounding numbers. Unlike general speech recognition, numeric data often appears in isolation or with limited accompanying information, making it more difficult to accurately recognize spoken numbers. This challenge is further exacerbated in low-resource languages like Azerbaijani. The objective of this master’s thesis is to develop an ASR system for numeric data in Azerbaijani by exploring various techniques and methodologies. We investigate the phonetic and linguistic properties of Azerbaijani relevant to numeric data recognition and analyze the existing resources for developing an ASR system. The study proposes a framework for ASR system development, experimenting with different feature extraction and modeling techniques, and evaluating the performance of the system using appropriate metrics. In this research, we developed an ASR system for the Azerbaijani language using the Kaldi toolkit. The ASR model was trained using the classic Hidden Markov Model - Gaussian Mixture Model (HMM-GMM) architecture, employing both monophone and triphone models along with various feature extraction techniques such as Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC), and Cepstral Mean and Variance Normalization (CMVN). The experimental results showed that the triphone models generally outperformed monophone models, and the combination of MFCC, LPC, and CMVN features provided the best performance among the tested feature extraction techniques. While performance varied across different datasets, our ASR system demonstrated promising potential for further improvements and adaptation to specific challenges presented by each dataset. This thesis contributes to the development of ASR technology for low-resource languages, specifically Azerbaijani, in the domain of numeric data recognition. The results of this research have practical implications for industries that rely on accurate and reliable recognition of numeric data, such as financial services and transportation. As the dataset and ASR system improve, we anticipate that the impact on various applications, including voice assistants, transcription services, and speech analytics in Azerbaijani, will be significant. This study lays the foundation for further research and development of ASR systems for the Azerbaijani language, paving the way for improved and more robust ASR solutions.

Show full item record