ADA Library Digital Repository

Autoencoder-Based Efficient Feature Extraction Method for Position Invariant Image

Show simple item record

dc.contributor.author Naghizade, Elshan
dc.date.accessioned 2025-11-05T07:38:27Z
dc.date.available 2025-11-05T07:38:27Z
dc.date.issued 2024
dc.identifier.uri http://hdl.handle.net/20.500.12181/1529
dc.description.abstract This master’s thesis explores an effective method for extracting features from images that do not change position, using a technique based on autoencoders. The focus is on using Convolutional Autoencoders (CAEs) and Variational Autoencoders (VAEs) to process two specific types of data: the MNIST dataset, which includes images of handwritten digits, and the Bristol-Myers Squibb – Molecular Translation dataset, which contains images of hand drawn chemical structures. The approach involves training both CAE and VAE models on these datasets to create latent vectors—compressed representations that summarize key aspects of the images. For the MNIST dataset, these vectors are used to train deep neural networks (DNNs) for classifying the digits. The results show high accuracy, with CAE-based DNNs achieving 96% and VAE based DNNs achieving 91% accuracy on the training set. The performance is further analyzed for each digit, using bar graphs and confusion matrices to demonstrate that CAEs provide more accurate and consistent classifications. For the chemical structure images in the Bristol-Myers Squibb dataset, the thesis tests three different setups for converting images into textual formulas. These setups include a combination of EfficientNet, Vision Transformer (ViT), and a traditional Transformer model, as well as pipelines using features from VAEs and CAEs paired with ViT and Transformer. The effectiveness of these models is measured using the Levenshtein distance, a metric that quantifies the difference between the predicted text and the actual formula. The CAE-based model outperforms the others, showing it can more accurately translate images into text. The study demonstrates the value of using features specific to the data being analyzed. It shows that CAE and VAE models, when trained on specific datasets like the Bristol-Myers Squibb, can capture essential details better than more general models. The thesis wraps up by suggesting directions for future research, such as implementing more advanced data augmentation techniques, applying transfer learning from specialized fields, exploring different neural network structures, testing the models on various datasets, and finding ways to make training and using these models more efficient. en_US
dc.language.iso en en_US
dc.publisher ADA University en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject Convolutional autoencoders (Artificial intelligence) en_US
dc.subject Bristol-Myers Squibb dataset en_US
dc.subject MNIST dataset en_US
dc.subject Machine learning en_US
dc.subject Image recognition (Computer science) en_US
dc.title Autoencoder-Based Efficient Feature Extraction Method for Position Invariant Image en_US
dc.type Thesis en_US
dcterms.accessRights Absolute Embargo Only Bibliographic Record and Abstract


Files in this item

Files Size Format View

There are no files associated with this item.

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search ADA LDR


Advanced Search

Browse

My Account