Evaluating Language Understanding and World Knowledge of Large Language Models in Turkic Languages

Isbarov, Jafar

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

dc.contributor.author	Isbarov, Jafar
dc.date.accessioned	2025-09-24T07:17:01Z
dc.date.available	2025-09-24T07:17:01Z
dc.date.issued	2024-04
dc.identifier.uri	http://hdl.handle.net/20.500.12181/1481
dc.description.abstract	MMLU is one of the most widely used and referred-to benchmarks for LLM performance [19]. It consists of multiple-choice questions on elementary, high school, college, and professional topics. MMLU benchmark was designed with English in mind, but we have seen a rise of multilingual benchmarks alongside multilingual models in recent years. One example is the MMMLU, which has multiple alternatives now. Such benchmarks allow researchers to compare the performance of LLMs across languages, speeding up the development in this area. While multilingual benchmarks exist, such works tend to have a global scope. As a result, low-resource languages are left out of the spotlight yet again. This applies to Turkic languages as well. With the exception of Turkish, Turkic languages have been left of most such benchmarks. In this work, we have concentrated on evaluation of Turkic languages, including extremely lowresource languages such as Tatar and Crimean Tatar. We have prepared a benchmark consisting of high-school-level multiple-choice questions. This benchmark has allowed us to evaluate the NLU capabilities of SOTA LLMs in Turkic languages. Turkish is the only Turkic language with a native MMLU benchmark. TurkishMMLU was released a few months ago [65], and we have collaborated with their team on this project. There have been attempts to create MMLU benchmarks for Turkic (and other) languages by synthetic generation or machine translation. However, these approaches are known to produce potentially erroneous text and do not consider the linguistic and cultural nuances of the target language. Since MMLU was designed for the English language, its contents are biased towards an Anglocentric worldview. Therefore, any successful benchmark should be created from native sources instead of direct or indirect translations. Since TurkishMMLU already exists, our project expanded this to seven more Turkic languages – Azerbaijani, Kazakh, Uzbek, Tatar, Crimean Tatar, Uyghur, and Karakalpak. In this project, we have created a unified and native language understanding benchmark for the aforementioned languages. We have later used this benchmark to evaluate state-of-the-art LLMs with multilingual capabilities. We have also evaluated the effect of the writing system on the performance of the model. Finally, we have analysed the output of LLMs in low-resource languages and identified that LLMs can respond to questions in low-resource languages using a high-resource language without being prompted to do so.	en_US
dc.language.iso	az	en_US
dc.publisher	ADA University	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Natural language processing.	en_US
dc.subject	Language resources.	en_US
dc.subject	Benchmarking (Computing).	en_US
dc.subject	Large language models -- Artificial intelligence.	en_US
dc.subject	Low-resource languages.	en_US
dc.title	Evaluating Language Understanding and World Knowledge of Large Language Models in Turkic Languages	en_US
dc.type	Thesis	en_US