Back close

An analysis of data leakage and generalizability in MRI based classification of Parkinson’s Disease using Explainable 2D Convolutional Neural Networks

Publication Type : Journal Article

Source : Digital Signal Processing (2024): 104407

Url : https://www.sciencedirect.com/science/article/abs/pii/S1051200424000320

Campus : Coimbatore

School : School of Artificial Intelligence

Year : 2024

Abstract : Background and Objective Parkinson's Disease (PD) is a progressive neurological disorder caused by the death of dopamine producing neurons. Neuroimaging techniques such as Magnetic Resonance Imaging (MRI) allows the visualization of the structural changes in the brain due to PD. Advances in computer vision has led to a new area of research that combines the expertise of deep learning (DL) tools such as Convolutional Neural Networks (CNN) to detect PD from MRI. Despite the promising results obtained, the clinical integration of the DL models is held back by questions of bias, generalizability and explainability. Methods In the present work the identification of bias propagation is carried out through an analysis of data leakage and generalizability of T1 weighted MRI data driven CNN models. For the same, 12 diverse pre-trained CNN models were trained on T1 weighted MRI from the PPMI dataset. Of these, the top 3 models were tested on three different datasets under three simulated cases of data leakage - Subject-wise split, slice-wise split and longitudinal split. A Grad-CAM based visualization was implemented to visualize and explain the output from the CNN without data leakage, and identify regions of importance (ROI) in the brain. Results Results from the data leakage simulation revealed that slice level data leakage and longitudinal data leakage can result in over 67% and 30% inflation of accuracy score in hold out test sets. Testing the generalizability of the CNN models to external patient cohorts was able to capture the implicit bias due to data leakage and enable the selection of the most robust CNN architecture. The VGG19 model displayed a consistent performance when tested within the PPMI dataset and the external datasets. The results from the explainable artificial intelligence analysis revealed the identified ROIs were significant with the expected disease progression, validating the proposed method. Conclusions The study presents the possible avenues of bias propagation in the MRI data driven classification using CNN models through a simulation of data leakage and by testing the generalizability of the models. The study highlights the need for generalizability and the importance of the testing with heterogeneous populations in ensuring the robustness of the developed models, and in capturing any data mishandling oversights and associated bias. The results suggest that the pre-trained VGG19 model can be used to create a generalizable and explainable model to aid in the detection of PD from T1 weighted MRI.

Cite this Research Publication : Veetil, Iswarya Kannoth, Divi Eswar Chowdary, Paleti Nikhil Chowdary, V. Sowmya, and E. A. Gopalakrishnan. "An analysis of data leakage and generalizability in MRI based classification of Parkinson's Disease using Explainable 2D Convolutional Neural Networks." Digital Signal Processing (2024): 104407

Admissions Apply Now