ADA Library Digital Repository

Text Summarization for Azerbaijani Documents Using Hybrid Neural Networks

Show simple item record

dc.contributor.author Pashayev, Mir Amir
dc.date.accessioned 2025-08-05T05:53:56Z
dc.date.available 2025-08-05T05:53:56Z
dc.date.issued 2025-04
dc.identifier.uri http://hdl.handle.net/20.500.12181/1431
dc.description.abstract This thesis investigates efficient text summarization techniques for Azerbaijani documents through hybrid neural approaches, combining extractive and abstractive methods. Due to the low-resource nature of the Azerbaijani language, significant challenges arise in developing reliable summarization systems. To address this, an Azerbaijani-specific dataset was prepared, consisting of documents paired with human-written summaries, and both extractive and abstractive summarization models were developed and evaluated. In the extractive summarization part, sentence embeddings were used to construct similarity matrices, followed by a TextRank-based algorithm to rank and select key sentences. Evaluation using ROUGE metrics demonstrated strong results, achieving ROUGE-1 recall, precision, and F1 scores of approximately 0.47, 0.52, and 0.49 respectively, ROUGE-2 scores around 0.44, 0.47, and 0.45, and ROUGE-L scores comparable to ROUGE-1. These results indicated a strong alignment between the extracted summaries and human references. For the abstractive summarization task, the multilingual pre-trained mT5-base model was fine-tuned on the Azerbaijani dataset. Fine-tuning significantly improved performance over the baseline. The baseline (zero-shot) mT5 model achieved ROUGE-1, ROUGE-2, and ROUGE-L scores of approximately 45%, 25%, and 40%, respectively, with BLEU and METEOR scores around 35% and 42%. After fine-tuning, the model achieved ROUGE-1, ROUGE-2, and ROUGE-L F1 scores of approximately 64%, 47%, and 57%, with BLEU and METEOR scores improving to about 32% and 50%, respectively. Visualizations of evaluation metrics, dataset length distributions, and comparative analysis were provided to better interpret model performance. Both extractive and abstractive systems showed significant promise for Azerbaijani text summarization, overcoming challenges related to data scarcity and linguistic complexity. This work demonstrates that adapting multilingual pre-trained models and combining them with classical graph-based extractive methods can yield highly effective summarization systems for low-resource languages. Future research directions include expanding the dataset, exploring reinforcement learning techniques, and further optimizing model architectures for improved generalization across diverse Azerbaijani text domains. en_US
dc.language.iso en en_US
dc.publisher ADA University en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject Natural language processing (Computer science) en_US
dc.subject Text summarization -- Data processing en_US
dc.subject Computational linguistics en_US
dc.subject Low-resource languages -- Natural language processing en_US
dc.subject Language processing -- Evaluation en_US
dc.subject Azerbaijan -- Languages -- Computational analysis en_US
dc.subject Azerbaijan -- Data processing en_US
dc.title Text Summarization for Azerbaijani Documents Using Hybrid Neural Networks en_US
dc.type Thesis en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search ADA LDR


Advanced Search

Browse

My Account