dc.contributor.author | Isbarov, Jafar | |
dc.date.accessioned | 2025-09-24T07:17:01Z | |
dc.date.available | 2025-09-24T07:17:01Z | |
dc.date.issued | 2024-04 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12181/1481 | |
dc.description.abstract | MMLU is one of the most widely used and referred-to benchmarks for LLM performance [19]. It consists of multiple-choice questions on elementary, high school, college, and professional topics. MMLU benchmark was designed with English in mind, but we have seen a rise of multilingual benchmarks alongside multilingual models in recent years. One example is the MMMLU, which has multiple alternatives now. Such benchmarks allow researchers to compare the performance of LLMs across languages, speeding up the development in this area. While multilingual benchmarks exist, such works tend to have a global scope. As a result, low-resource languages are left out of the spotlight yet again. This applies to Turkic languages as well. With the exception of Turkish, Turkic languages have been left of most such benchmarks. In this work, we have concentrated on evaluation of Turkic languages, including extremely lowresource languages such as Tatar and Crimean Tatar. We have prepared a benchmark consisting of high-school-level multiple-choice questions. This benchmark has allowed us to evaluate the NLU capabilities of SOTA LLMs in Turkic languages. Turkish is the only Turkic language with a native MMLU benchmark. TurkishMMLU was released a few months ago [65], and we have collaborated with their team on this project. There have been attempts to create MMLU benchmarks for Turkic (and other) languages by synthetic generation or machine translation. However, these approaches are known to produce potentially erroneous text and do not consider the linguistic and cultural nuances of the target language. Since MMLU was designed for the English language, its contents are biased towards an Anglocentric worldview. Therefore, any successful benchmark should be created from native sources instead of direct or indirect translations. Since TurkishMMLU already exists, our project expanded this to seven more Turkic languages – Azerbaijani, Kazakh, Uzbek, Tatar, Crimean Tatar, Uyghur, and Karakalpak. In this project, we have created a unified and native language understanding benchmark for the aforementioned languages. We have later used this benchmark to evaluate state-of-the-art LLMs with multilingual capabilities. We have also evaluated the effect of the writing system on the performance of the model. Finally, we have analysed the output of LLMs in low-resource languages and identified that LLMs can respond to questions in low-resource languages using a high-resource language without being prompted to do so. | en_US |
dc.language.iso | az | en_US |
dc.publisher | ADA University | en_US |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Natural language processing. | en_US |
dc.subject | Language resources. | en_US |
dc.subject | Benchmarking (Computing). | en_US |
dc.subject | Large language models -- Artificial intelligence. | en_US |
dc.subject | Low-resource languages. | en_US |
dc.title | Evaluating Language Understanding and World Knowledge of Large Language Models in Turkic Languages | en_US |
dc.type | Thesis | en_US |
The following license files are associated with this item: