dc.contributor.author | Dadash-zada, Samir | |
dc.date.accessioned | 2023-10-16T09:51:39Z | |
dc.date.available | 2023-10-16T09:51:39Z | |
dc.date.issued | 2022-04 | |
dc.identifier.uri | http://hdl.handle.net/20.500.12181/721 | |
dc.description.abstract | This thesis discusses and brings a wide range of solutions for different aspects of the data collection, augmentation and generation for an automatic sign language to text translation system. Even though the data collection, augmentation and generation parts were implemented for the Azerbaijani Sign Language (AzSL) dataset, all of the mentioned stages can be used for a different set of sign language images and videos with minimal adjustment to the parameters of the corresponding models. In order to reach out to as many subscribers as possible and collect an original set of images and videos from a public, an exclusive Telegram bot was developed and used during the course of 7 months. It was shared on various social networks in order to maximize the variance of features for each single sign. Multiple fixes and updates were applied on the bot in order to lead the bot subscribers to upload an image or a video that was more scarce. Validation of the original dataset was done by running a few clustering algorithms. The results of the clustering revealed many similar signs, some anomalies, and feature patterns from the dataset and led to some important takeaways. As the dataset will be mainly fed into Machine Learning models, the amount of original images and videos were not enough, especially after dropping out invalid media files among them. That was the main driving force to bring the data augmentation and generation parts. The image and video augmentation techniques used in the thesis are essentially selected because of their feature preservation and speed features. In addition to the augmentation techniques, a state-of-the-art technique for generating synthetic images is also used called Generative Adversarial Network (GAN). A very specific version of GAN called StyleGAN is slightly customized to the dataset which is mainly used for generating photo-realistic high-quality images. | en_US |
dc.language.iso | en | en_US |
dc.publisher | ADA University | en_US |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.subject | Data collection. | en_US |
dc.subject | Azerbaijani Sign Language. | en_US |
dc.subject | Generative Adversarial Network. | en_US |
dc.title | Data collection, augmentation, classification, and generation for a sign language | en_US |
dc.type | Thesis | en_US |
The following license files are associated with this item: