Autoencoder-Based Efficient Feature Extraction Method for Position Invariant Image

Naghizade, Elshan

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

dc.contributor.author	Naghizade, Elshan
dc.date.accessioned	2025-11-05T07:38:27Z
dc.date.available	2025-11-05T07:38:27Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/20.500.12181/1529
dc.description.abstract	This master’s thesis explores an effective method for extracting features from images that do not change position, using a technique based on autoencoders. The focus is on using Convolutional Autoencoders (CAEs) and Variational Autoencoders (VAEs) to process two specific types of data: the MNIST dataset, which includes images of handwritten digits, and the Bristol-Myers Squibb – Molecular Translation dataset, which contains images of hand drawn chemical structures. The approach involves training both CAE and VAE models on these datasets to create latent vectors—compressed representations that summarize key aspects of the images. For the MNIST dataset, these vectors are used to train deep neural networks (DNNs) for classifying the digits. The results show high accuracy, with CAE-based DNNs achieving 96% and VAE based DNNs achieving 91% accuracy on the training set. The performance is further analyzed for each digit, using bar graphs and confusion matrices to demonstrate that CAEs provide more accurate and consistent classifications. For the chemical structure images in the Bristol-Myers Squibb dataset, the thesis tests three different setups for converting images into textual formulas. These setups include a combination of EfficientNet, Vision Transformer (ViT), and a traditional Transformer model, as well as pipelines using features from VAEs and CAEs paired with ViT and Transformer. The effectiveness of these models is measured using the Levenshtein distance, a metric that quantifies the difference between the predicted text and the actual formula. The CAE-based model outperforms the others, showing it can more accurately translate images into text. The study demonstrates the value of using features specific to the data being analyzed. It shows that CAE and VAE models, when trained on specific datasets like the Bristol-Myers Squibb, can capture essential details better than more general models. The thesis wraps up by suggesting directions for future research, such as implementing more advanced data augmentation techniques, applying transfer learning from specialized fields, exploring different neural network structures, testing the models on various datasets, and finding ways to make training and using these models more efficient.	en_US
dc.language.iso	en	en_US
dc.publisher	ADA University	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Convolutional autoencoders (Artificial intelligence)	en_US
dc.subject	Bristol-Myers Squibb dataset	en_US
dc.subject	MNIST dataset	en_US
dc.subject	Machine learning	en_US
dc.subject	Image recognition (Computer science)	en_US
dc.title	Autoencoder-Based Efficient Feature Extraction Method for Position Invariant Image	en_US
dc.type	Thesis	en_US
dcterms.accessRights	Absolute Embargo Only Bibliographic Record and Abstract

Files in this item

Files	Size	Format	View
There are no files associated with this item.

The following license files are associated with this item:

Creative Commons

This item appears in the following Collection(s)

School of Information Technologies and Engineering

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Autoencoder-Based Efficient Feature Extraction Method for Position Invariant Image

Files in this item

This item appears in the following Collection(s)

Search ADA LDR

Browse

All of ADA LDR

This Collection

My Account