| dc.contributor.author | Naghizade, Elshan | |
| dc.date.accessioned | 2025-11-05T07:38:27Z | |
| dc.date.available | 2025-11-05T07:38:27Z | |
| dc.date.issued | 2024 | |
| dc.identifier.uri | http://hdl.handle.net/20.500.12181/1529 | |
| dc.description.abstract | This master’s thesis explores an effective method for extracting features from images that do not change position, using a technique based on autoencoders. The focus is on using Convolutional Autoencoders (CAEs) and Variational Autoencoders (VAEs) to process two specific types of data: the MNIST dataset, which includes images of handwritten digits, and the Bristol-Myers Squibb – Molecular Translation dataset, which contains images of hand drawn chemical structures. The approach involves training both CAE and VAE models on these datasets to create latent vectors—compressed representations that summarize key aspects of the images. For the MNIST dataset, these vectors are used to train deep neural networks (DNNs) for classifying the digits. The results show high accuracy, with CAE-based DNNs achieving 96% and VAE based DNNs achieving 91% accuracy on the training set. The performance is further analyzed for each digit, using bar graphs and confusion matrices to demonstrate that CAEs provide more accurate and consistent classifications. For the chemical structure images in the Bristol-Myers Squibb dataset, the thesis tests three different setups for converting images into textual formulas. These setups include a combination of EfficientNet, Vision Transformer (ViT), and a traditional Transformer model, as well as pipelines using features from VAEs and CAEs paired with ViT and Transformer. The effectiveness of these models is measured using the Levenshtein distance, a metric that quantifies the difference between the predicted text and the actual formula. The CAE-based model outperforms the others, showing it can more accurately translate images into text. The study demonstrates the value of using features specific to the data being analyzed. It shows that CAE and VAE models, when trained on specific datasets like the Bristol-Myers Squibb, can capture essential details better than more general models. The thesis wraps up by suggesting directions for future research, such as implementing more advanced data augmentation techniques, applying transfer learning from specialized fields, exploring different neural network structures, testing the models on various datasets, and finding ways to make training and using these models more efficient. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | ADA University | en_US |
| dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
| dc.subject | Convolutional autoencoders (Artificial intelligence) | en_US |
| dc.subject | Bristol-Myers Squibb dataset | en_US |
| dc.subject | MNIST dataset | en_US |
| dc.subject | Machine learning | en_US |
| dc.subject | Image recognition (Computer science) | en_US |
| dc.title | Autoencoder-Based Efficient Feature Extraction Method for Position Invariant Image | en_US |
| dc.type | Thesis | en_US |
| dcterms.accessRights | Absolute Embargo Only Bibliographic Record and Abstract |
| Files | Size | Format | View |
|---|---|---|---|
|
There are no files associated with this item. |
|||
The following license files are associated with this item: