Probabilistic Image Processing and Recognition Model for the Azerbaijani Sign Language

Nazimzada, Aykhan

Library MyADA ADA University

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

Probabilistic Image Processing and Recognition Model for the Azerbaijani Sign Language

Nazimzada, Aykhan

URI: http://hdl.handle.net/20.500.12181/729

Date: 2022-04

Abstract:

As technology is being improved day by day, accessibility to the different resources is being cleared up. Today, several technologies are aimed to solve problems for disabled people. One of the obstacles is that people from deaf/mute people communities have difficulties creating healthy communication with others, especially those who are not from that community. The technology that solves such problems is known as Sign Language Recognition (SLR) systems. There are approximately 50,000 deaf/mute people in Azerbaijan. They have their very own sign language called Azerbaijani Sign Language (AzSL). It is inevitable fact that people not from deaf/mute people communities do not know the AzSL except sign language translators. There are 32 letters in Azerbaijani Sign Language. 24 of them are static letters which mean it is interpreted as forming hand parts to a specific orientation. These letters can be illustrated in just one frame. Rest 8 letters are dynamic which are just like static letters, but hands needed to move such as up and down, rotation, or anything else. Those letters cannot be illustrated in a single frame. They are a bunch of frames similar to videos. Other than letters, all words are also dynamic. Our goal in this paper is to come up with an SLR system that reads the video from the live camera and converts it into text in real-time. AzSL is different than other well-known sign languages such as American, German, French, Russian, and others. Hereby, there is no such dataset that contains letters and words of AzSL. Therefore, the first task was to collect both qualitative and quantitative datasets. For that goal, we created a Telegram bot where volunteer users can capture pictures (for static letters) and videos (for dynamic letters) according to samples provided and can upload them to servers. Users were mostly students of ADA University. In total, approximately 14,000 pictures and 3,000 videos are collected. For further research and applications, data for words that are dynamic is being collected. In this paper, the scope of the aim is to develop a recognition system for static letters only. For this research, a sufficient number of papers have been read. For our dataset, we detect that using “MediaPipe” for feature extraction is the best option. MediaPipe is an open-source framework that helps users to extract important landmarks from human body parts. In our project, we only use hand landmarks as there is not any effect of human pose or facial emotions in AzSL. It extracts 21 hand joints for a single hand, and each of them has 3 parameters. Hereby, the size of the input becomes 63 (21x3). If both hands are present that number becomes 126 (2x21x3). Another approach was to train raw images in Convolutional Neural Network (CNN) with different parameters. However, because of the few samples and computational power, all experiments with CNN could not reach the desired level of performance. Coming back to MediaPipe features, they are trained in different classifiers including Logistic Regression, Multilayer Perceptrons, Deep Neural Network, and others. There are similar letters that models cannot generalize well. For this reason, 2 level DNNs architecture was designed to train similar letters separately. It can be considered as also clusterization. This architecture gave the best result with 94% of test accuracy.

Show full item record