Text Summarization for Azerbaijani Documents Using Hybrid Neural Networks

Pashayev, Mir Amir

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

dc.contributor.author	Pashayev, Mir Amir
dc.date.accessioned	2025-08-05T05:53:56Z
dc.date.available	2025-08-05T05:53:56Z
dc.date.issued	2025-04
dc.identifier.uri	http://hdl.handle.net/20.500.12181/1431
dc.description.abstract	This thesis investigates efficient text summarization techniques for Azerbaijani documents through hybrid neural approaches, combining extractive and abstractive methods. Due to the low-resource nature of the Azerbaijani language, significant challenges arise in developing reliable summarization systems. To address this, an Azerbaijani-specific dataset was prepared, consisting of documents paired with human-written summaries, and both extractive and abstractive summarization models were developed and evaluated. In the extractive summarization part, sentence embeddings were used to construct similarity matrices, followed by a TextRank-based algorithm to rank and select key sentences. Evaluation using ROUGE metrics demonstrated strong results, achieving ROUGE-1 recall, precision, and F1 scores of approximately 0.47, 0.52, and 0.49 respectively, ROUGE-2 scores around 0.44, 0.47, and 0.45, and ROUGE-L scores comparable to ROUGE-1. These results indicated a strong alignment between the extracted summaries and human references. For the abstractive summarization task, the multilingual pre-trained mT5-base model was fine-tuned on the Azerbaijani dataset. Fine-tuning significantly improved performance over the baseline. The baseline (zero-shot) mT5 model achieved ROUGE-1, ROUGE-2, and ROUGE-L scores of approximately 45%, 25%, and 40%, respectively, with BLEU and METEOR scores around 35% and 42%. After fine-tuning, the model achieved ROUGE-1, ROUGE-2, and ROUGE-L F1 scores of approximately 64%, 47%, and 57%, with BLEU and METEOR scores improving to about 32% and 50%, respectively. Visualizations of evaluation metrics, dataset length distributions, and comparative analysis were provided to better interpret model performance. Both extractive and abstractive systems showed significant promise for Azerbaijani text summarization, overcoming challenges related to data scarcity and linguistic complexity. This work demonstrates that adapting multilingual pre-trained models and combining them with classical graph-based extractive methods can yield highly effective summarization systems for low-resource languages. Future research directions include expanding the dataset, exploring reinforcement learning techniques, and further optimizing model architectures for improved generalization across diverse Azerbaijani text domains.	en_US
dc.language.iso	en	en_US
dc.publisher	ADA University	en_US
dc.relation	School of IT and Engineering	en_US
dc.relation	Graduate program	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Natural language processing (Computer science)	en_US
dc.subject	Text summarization -- Data processing	en_US
dc.subject	Computational linguistics	en_US
dc.subject	Low-resource languages -- Natural language processing	en_US
dc.subject	Language processing -- Evaluation	en_US
dc.subject	Azerbaijan -- Languages -- Computational analysis	en_US
dc.subject	Azerbaijan -- Data processing	en_US
dc.subject	IT and Engineering	en_US
dc.title	Text Summarization for Azerbaijani Documents Using Hybrid Neural Networks	en_US
dc.type	Thesis	en_US