Capsule Networks: Architecture, Visual Recognition and NLP Integration

Hasanov, Javid

Library MyADA ADA University

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

Capsule Networks: Architecture, Visual Recognition and NLP Integration

Hasanov, Javid

URI: http://hdl.handle.net/20.500.12181/1433

Date: 2025

Abstract:

Capsule Networks (CapsNets) are a relatively recent deep learning architecture developed to alleviate distinct disadvantages of standard Convolutional Neural Networks (CNNs): their incapacity to effectively model spatial hierarchies, to account for pose relationships between features. Accumulating part-whole relationships and encoding spatial information, CapsNets substitute scalar-output neurons with vector-based capsules, and use a dynamic routing-by-agreement mechanism. In this thesis we comprehensively analyse the architectural principles and range of CapsNet capabilities, starting with their use of image classification tasks. We present benchmark analysis of the performance of CapsNets on a series of datasets, like MNIST and smallNORB, asthey demonstrate performance comparable and often superior to traditional CNNs using significantly fewer parameters, whilst also demonstrating robustness to affine transformations and occlusion of objects. Aside from computer vision, we also extend CapsNets to Natural Language Processing (NLP), and explore their suitability for text classification tasks end-to-end. Specifically, we look to implement a CapNet based text classifier for a sentiment analysis project in Azerbaijani: a morphologically rich and under-resourced language. Using a dataset of around 160,000 user review (Hajili’s Azerbaijani Review Sentiment Classification dataset), we take the stance of building a CapsNet based text classifier and contrast such a model with baseline CNN and LSTM architectures. Given the experiments will be implemented end-to-end, using available Python based deep learning frameworks, we report that the CapNet model gives unstandardized resultsthat are marginally superior based on accuracy and F1 score, as well asindication that it also has a capacity of modeling some form of semantic hierarchy in language too. In the literature review section we conduct a historical and contemporary survey of work situated on CapsNet research, including advances in the form of Matrix Capsules with EM Routing, and more recent routing algorithms that take attention based mechanisms. We provide our viewpoint on some of the architectural trade-offs, including the computational expenses and instability in converging weights for training of large-scale deployment, whilst we make mention of the desirable aspects of using capsule-inspired representations in both visual and language tasks. To summarize, we position CapsNets as a recent conceptual architectural innovation in bridging the phase change between spatial data and sequential data processing. With that in mind, we also propose future research directions, in particular the coupling of CapsNets with Transformer-based models to create hybrid architectures with the potential for even greater performance outcome predictions on lowresource NLP tasks. Both empirical outcomes, and new concepts of architecture we have made, should serve to encourage the broader take-up of capsule-based models in cross-domain learning.

Show full item record