Abstract:
In natural language processing (NLP), spelling correction is an essential task which seeks to automatically
correct misspelled words in text documents,. This thesis focuses on Azerbaijani language spelling
correction, which presents unique challenges due to its rich morphology and complex orthographic norms.
Beginning with a comprehensive literature review covering the extant approaches and techniques
for spelling correction in various languages, the thesis then proceeds to its methodology. We identify
the limitations of existing methods and propose a novel approach for Azerbaijani orthography
correction based on a sequence-to-sequence (seq2seq) deep neural network.
Our proposed method makes use of seq2seq models, which have demonstrated great success in a
variety of NLP tasks, to discover the mapping between misspelled words and their right counterparts.
In addition, we introduce techniques for generating artificial noise to augment the training data and
enhance the model’s ability to manage various types of misspellings.
We conduct extensive experiments on a large corpus of Azerbaijani text data in order to evaluate
the performance of our approach. We evaluate the results in terms of character error rate, word error
rate and sequence error rate by comparing our method to several other methods. Our experiments
demonstrate that our seq2seq-based approach reaches adequate results, 5.3% character error rate and
25.77% word error rate in text from news which shows potential to enhance the accuracy of
Azerbaijani text spelling correction.
In addition, we analyze the effect of artificial noise generation techniques on the performance of
our model and provide insights into how effective they are in managing various misspelling types. In
addition, we discuss the limitations of our methodology and possible future directions for further
development.
This thesis concludes with a novel approach to Azerbaijani spelling correction using deep neural
networks, specifically the seq2seq model, along with artificial noise generation techniques. The
experimental results demonstrate the viability of our method for enhancing the precision of
Azerbaijani text documents in real-world settings.