Abstract:
The research aims to answer the question of how to achieve text summarization in the
Azerbaijani language. There are two types of text summarization, being extractive and
abstractive. Abstractive text summarization is under the spotlight of this paper since it requires
much more complex approaches to achieve the goal. Throughout the research, the problems of
the models being trained primarily for English are highlighted and the difficulties of adapting
them to the Azerbaijani language are discussed. Azerbaijani language contains 32 letters in its
alphabet, meaning that there are extra language-specific characters compared to English. It was
shown that tokenization plays a vital role in the model having a successful outcome. It was
shown that changing the default normalization layer parameters of the tokenizer tuned out to
be extremely helpful towards the results. Three various tokenizers were considered for this task,
being WordPiece, SentencePiece Byte-Per-Encoding and Byte Level Byte-Per-Encoding tokenizers.
The results of all these three tokenizers were analyzed and determined that WordPiece tokenizer
gives better results and is much more space efficient due to its nature.
Apart from that, different network architectures were considered for this
summarization task. Advantages and disadvantages of RNNs, LSTMs, and CNNs with the
addition of attention mechanism were listed and importance of transformers compared to the
three architectures was highlighted.
Azerbaijani news dataset was utilized to build the vocabulary and train the model on.
BERT model was chosen to create a model for Azerbaijani text summarization. It was observed
that BERT model can achieve feasible results even with moderately small amounts of data. Pre
trained multilingual models such as mBERT did not prove to be worthy since it is trained on
very small amount of data.
Additionally, T5 and RoBERTa models were also tested for this task, and they did not
achieve acceptable results compared to BERT itself. T5 was computationally demanding and
required more training to have a result could be considered as successful. RoBERTa, however,
was not suitable for summarization task at all.