Development of Large-scale National Azerbaijan Corpus with UI and Functionality

Alizada, Emin

Library MyADA ADA University

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

Development of Large-scale National Azerbaijan Corpus with UI and Functionality

Alizada, Emin

URI: http://hdl.handle.net/20.500.12181/932

Date: 2023-04

Abstract:

This paper presents the creation of a large-scale Azerbaijani language corpus with more than 50 million tokens, and the development of several functionalities for language analysis and corpus linguistics, including Word Frequency, Ngrams, Concordance, Thesaurus, and Word Sketch. The corpus was collected from various sources, including Azerbaijani books, articles, and websites, and was stored in a relational database. The paper provides a detailed description of the corpus creation process and the database schema used to store the corpus, as well as dives into the creation of each of the functionality of the corpus, and what kind of insights it is possible to get from the given functionality set. Afterwards, the paper analyzes different corpus applications and analyzes their interfaces and user experience provided by the application, before introducing the online application for the Azerbaijani language corpus to make the corpus and its functionalities available to the linguists, researchers and language learners. The functionalities were implemented using Python, and the user interface was created using Next.js. The final product is a web application that allows users to access all the functionalities of the corpus easily.

Show full item record