Applying Machine Learning to High-Dimensional Proteomics Datasets for Biomarker Discovery in Neurodegenerative Disorders

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Identifying biomarkers for Alzheimer’s Disease (AD), a progressive neurodegenerative disorder characterized by progressive cognitive decline is crucial for early diagnosis and treatment. This thesis explores proteomic abundances along the AD continuum using lumbar and ventricular cerebrospinal fluid (CSF) samples from patients with idiopathic normal pressure hydrocephalus (iNPH) to identify potential new biomarkers. Our study emphasizes the necessity of treating lumbar and ventricular CSF samples as separate datasets due to their distinct proteomic profiles. Challenges such as handling high-dimensional data with missing values, small sample sizes and class imbalances were addressed through imputation, oversampling and k-fold cross-validation techniques. We discuss the presence and consequence of batch effect, a remnant of the mass spectrometry technique tandem mass tag. Comparative analysis through staging on existing biomarkers highlights the uniqueness of the dataset provided by Sahlgrenska University Hospital. Through machine learning and feature selection techniques, we propose eight protein and nine peptide biomarkers for distinguishing iNPH patients on the pathological AD spectra. One such biomarker shows relevance in both lumbar and ventricular CSF. Despite the study’s limited cohort size, our findings contribute insights into the proteomic analysis of neurodegenerative disorders.

Description

Keywords

Alzheimer’s disease, neurodegenerative disorder, proteomics, mass spectrometry, high-dimensional data, biomarkers, machine learning, feature selection, staging

Citation

ISBN

Articles

Department

Defence location

Collections

Endorsement

Review

Supplemented By

Referenced By