ADA Library Digital Repository

Capsule Networks: Architecture, Visual Recognition and NLP Integration

Show simple item record

dc.contributor.author Hasanov, Javid
dc.date.accessioned 2025-08-06T12:20:37Z
dc.date.available 2025-08-06T12:20:37Z
dc.date.issued 2025
dc.identifier.uri http://hdl.handle.net/20.500.12181/1433
dc.description.abstract Capsule Networks (CapsNets) are a relatively recent deep learning architecture developed to alleviate distinct disadvantages of standard Convolutional Neural Networks (CNNs): their incapacity to effectively model spatial hierarchies, to account for pose relationships between features. Accumulating part-whole relationships and encoding spatial information, CapsNets substitute scalar-output neurons with vector-based capsules, and use a dynamic routing-by-agreement mechanism. In this thesis we comprehensively analyse the architectural principles and range of CapsNet capabilities, starting with their use of image classification tasks. We present benchmark analysis of the performance of CapsNets on a series of datasets, like MNIST and smallNORB, asthey demonstrate performance comparable and often superior to traditional CNNs using significantly fewer parameters, whilst also demonstrating robustness to affine transformations and occlusion of objects. Aside from computer vision, we also extend CapsNets to Natural Language Processing (NLP), and explore their suitability for text classification tasks end-to-end. Specifically, we look to implement a CapNet based text classifier for a sentiment analysis project in Azerbaijani: a morphologically rich and under-resourced language. Using a dataset of around 160,000 user review (Hajili’s Azerbaijani Review Sentiment Classification dataset), we take the stance of building a CapsNet based text classifier and contrast such a model with baseline CNN and LSTM architectures. Given the experiments will be implemented end-to-end, using available Python based deep learning frameworks, we report that the CapNet model gives unstandardized resultsthat are marginally superior based on accuracy and F1 score, as well asindication that it also has a capacity of modeling some form of semantic hierarchy in language too. In the literature review section we conduct a historical and contemporary survey of work situated on CapsNet research, including advances in the form of Matrix Capsules with EM Routing, and more recent routing algorithms that take attention based mechanisms. We provide our viewpoint on some of the architectural trade-offs, including the computational expenses and instability in converging weights for training of large-scale deployment, whilst we make mention of the desirable aspects of using capsule-inspired representations in both visual and language tasks. To summarize, we position CapsNets as a recent conceptual architectural innovation in bridging the phase change between spatial data and sequential data processing. With that in mind, we also propose future research directions, in particular the coupling of CapsNets with Transformer-based models to create hybrid architectures with the potential for even greater performance outcome predictions on lowresource NLP tasks. Both empirical outcomes, and new concepts of architecture we have made, should serve to encourage the broader take-up of capsule-based models in cross-domain learning. en_US
dc.language.iso en en_US
dc.publisher ADA University en_US
dc.rights Attribution-NonCommercial-NoDerivs 3.0 United States *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/us/ *
dc.subject Capsule networks (Computer architecture) -- Applications -- Image classification en_US
dc.subject Sentiment analysis -- Text classification en_US
dc.subject Machine learning algorithms -- Performance evaluation en_US
dc.subject Natural language processing -- Low-resource languages -- Model development en_US
dc.subject Morphologically rich languages -- Natural language processing en_US
dc.title Capsule Networks: Architecture, Visual Recognition and NLP Integration en_US
dc.type Thesis en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States

Search ADA LDR


Advanced Search

Browse

My Account