Abstract:
Interpretability in deep learning models has recently emerged as a major and growing
concern, especially in high-stakes settings such as medical diagnostics. This thesis focuses
on the problem of how to design real post-hoc modifiable Deep Neural Networks (DNNs)
that can achieve or exceed state-of-the-art performance while also providing increased
transparency that can help in understanding how predictions made by DNNs were reached.
Existing techniques for interpretability are mostly concentrated on inspecting neuron
activations as is. Here, we study controlled neuron activation adjustments during inference
and examine whether these adjustments can help improve the explainability and
generalization of Fully Connected Neural Networks (FCNNs) without retraining.
The dataset utilized in our research is a publicly available benchmark brain tumor
classification dataset, which has been divided into four classes: glioma, meningioma,
pituitary tumor, and no tumor. Both a baseline Fully Connected Neural Network (FCNN) was
constructed and assessed, and an interpretable framework was created where activation
patterns along the layers were visualized. Finally, we further studied the activation dynamics
through experiments with partial network connections, underfitting, and overfitting, in order
to investigate the relationship among sparsity, generalization, and interpretability of the
network.
Based on these results, the study introduces three activation method adaptation strategies: a
thresholding method based on qualitative analysis of heatmaps, a robust Linear moments ( Lmoments), and a probabilistic Gaussian Mixture Model (GMM) threshold determination
method. All of them introduce a systematic adjustment of neuron activations according to
individual activation magnitude, which tends to make the latent feature representation more
significant in the inference phase. Experimental results show that the improvement of
classification accuracies can be significant on misclassified samples as well as on overall
model performance, achieving up to 14% improvements without retraining. The proposed approaches provide realistic post-deployment methodologies to enhance model
performance without significant computational overhead or regulatory liability. Furthermore,
the improvements in interpretability achieved by activation visualization and correction
provide useful information regarding how deep neural networks arrive at their decisions,
building trust with users. The thesis observes this “post-hoc” activation manipulation as a
promising, scalable path to improving the interpretability and the usability of Deep Learning
(DL) models, especially in sensitive domains where not only performance but also
transparent decision-making is paramount.