Refining Neural Network Interpretability through Activation Modification Techniques: An Exploration of Threshold-Based Approaches

Mammadova, Nigar

Home
→
CB5. ADA Theses, Dissertations and Final Projects
→
School of Information Technologies and Engineering
→
View Item

dc.contributor.author	Mammadova, Nigar
dc.date.accessioned	2025-08-26T08:01:20Z
dc.date.available	2025-08-26T08:01:20Z
dc.date.issued	2025-04
dc.identifier.uri	http://hdl.handle.net/20.500.12181/1437
dc.description.abstract	Interpretability in deep learning models has recently emerged as a major and growing concern, especially in high-stakes settings such as medical diagnostics. This thesis focuses on the problem of how to design real post-hoc modifiable Deep Neural Networks (DNNs) that can achieve or exceed state-of-the-art performance while also providing increased transparency that can help in understanding how predictions made by DNNs were reached. Existing techniques for interpretability are mostly concentrated on inspecting neuron activations as is. Here, we study controlled neuron activation adjustments during inference and examine whether these adjustments can help improve the explainability and generalization of Fully Connected Neural Networks (FCNNs) without retraining. The dataset utilized in our research is a publicly available benchmark brain tumor classification dataset, which has been divided into four classes: glioma, meningioma, pituitary tumor, and no tumor. Both a baseline Fully Connected Neural Network (FCNN) was constructed and assessed, and an interpretable framework was created where activation patterns along the layers were visualized. Finally, we further studied the activation dynamics through experiments with partial network connections, underfitting, and overfitting, in order to investigate the relationship among sparsity, generalization, and interpretability of the network. Based on these results, the study introduces three activation method adaptation strategies: a thresholding method based on qualitative analysis of heatmaps, a robust Linear moments ( Lmoments), and a probabilistic Gaussian Mixture Model (GMM) threshold determination method. All of them introduce a systematic adjustment of neuron activations according to individual activation magnitude, which tends to make the latent feature representation more significant in the inference phase. Experimental results show that the improvement of classification accuracies can be significant on misclassified samples as well as on overall model performance, achieving up to 14% improvements without retraining. The proposed approaches provide realistic post-deployment methodologies to enhance model performance without significant computational overhead or regulatory liability. Furthermore, the improvements in interpretability achieved by activation visualization and correction provide useful information regarding how deep neural networks arrive at their decisions, building trust with users. The thesis observes this “post-hoc” activation manipulation as a promising, scalable path to improving the interpretability and the usability of Deep Learning (DL) models, especially in sensitive domains where not only performance but also transparent decision-making is paramount.	en_US
dc.language.iso	az	en_US
dc.publisher	ADA University	en_US
dc.relation	School of IT and Engineering	en_US
dc.relation	Graduate program	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Deep learning (Machine learning) -- Interpretability	en_US
dc.subject	Neural networks (Computer science) -- Transparency	en_US
dc.subject	Artificial intelligence -- Medical applications	en_US
dc.subject	Medical imaging -- Data processing	en_US
dc.subject	Brain -- Tumors -- Diagnosis -- Data processing	en_US
dc.subject	IT and Engineering	en_US
dc.title	Refining Neural Network Interpretability through Activation Modification Techniques: An Exploration of Threshold-Based Approaches	en_US
dc.type	Thesis	en_US