Refining Neural Network Interpretability through Activation Modification Techniques: An Exploration of Threshold-Based Approaches

Mammadova, Nigar

Library MyADA ADA University

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

Refining Neural Network Interpretability through Activation Modification Techniques: An Exploration of Threshold-Based Approaches

Mammadova, Nigar

URI: http://hdl.handle.net/20.500.12181/1437

Date: 2025-04

Abstract:

Interpretability in deep learning models has recently emerged as a major and growing concern, especially in high-stakes settings such as medical diagnostics. This thesis focuses on the problem of how to design real post-hoc modifiable Deep Neural Networks (DNNs) that can achieve or exceed state-of-the-art performance while also providing increased transparency that can help in understanding how predictions made by DNNs were reached. Existing techniques for interpretability are mostly concentrated on inspecting neuron activations as is. Here, we study controlled neuron activation adjustments during inference and examine whether these adjustments can help improve the explainability and generalization of Fully Connected Neural Networks (FCNNs) without retraining. The dataset utilized in our research is a publicly available benchmark brain tumor classification dataset, which has been divided into four classes: glioma, meningioma, pituitary tumor, and no tumor. Both a baseline Fully Connected Neural Network (FCNN) was constructed and assessed, and an interpretable framework was created where activation patterns along the layers were visualized. Finally, we further studied the activation dynamics through experiments with partial network connections, underfitting, and overfitting, in order to investigate the relationship among sparsity, generalization, and interpretability of the network. Based on these results, the study introduces three activation method adaptation strategies: a thresholding method based on qualitative analysis of heatmaps, a robust Linear moments ( Lmoments), and a probabilistic Gaussian Mixture Model (GMM) threshold determination method. All of them introduce a systematic adjustment of neuron activations according to individual activation magnitude, which tends to make the latent feature representation more significant in the inference phase. Experimental results show that the improvement of classification accuracies can be significant on misclassified samples as well as on overall model performance, achieving up to 14% improvements without retraining. The proposed approaches provide realistic post-deployment methodologies to enhance model performance without significant computational overhead or regulatory liability. Furthermore, the improvements in interpretability achieved by activation visualization and correction provide useful information regarding how deep neural networks arrive at their decisions, building trust with users. The thesis observes this “post-hoc” activation manipulation as a promising, scalable path to improving the interpretability and the usability of Deep Learning (DL) models, especially in sensitive domains where not only performance but also transparent decision-making is paramount.

Show full item record