Determining linguistic predictor for the classification of subjective cognitive impairment and mild cognitive impairment using machine learning
Introduction Mild Cognitive Impairment (MCI) is a neurological condition characterized by cognitive decline greater than expected for an individual's age and education level. Subjective Cognitive Impairment (SCI) is a selfreported decline in cognitive abilities but not clinically identified as MCI. Individuals with MCI remain functional in their daily activities (Petersen et al., 1999) and are characterized by different deterioration rates depending on the evaluation methods employed. More than 50% of these individuals will develop Alzheimer’s Disease (AD) within the following five years; however several will remain stable and never develop AD (Gauthier et al., 2006; Petersen et al., 1999; Petersen et al., 2017). Although, there is no cure for AD, the early identification of individuals with MCI can enable treatments to delay the progression of the condition (Zucchella et al., 2018). Therefore, it is of paramount importance, to develop reliable objective diagnostic methods of cognitive impairment that can be conducted at primary care centers and memory clinics to determine whether an individual should seek further professional advice. Methodology 90 individuals participated in the study. 23 SCI patients, 31 MCI patients and 36 healthy controls (HC) enrolled in the study. All participants were between 50 to 79 years old; had Swedish as their first and only language before 5 years old; had similar length of education; had no stroke or brain tumor; and had recent neuropsychological test results available for assessment. Connected speech data were elicited from cookietheft picture description task (Goodglass & Kaplan, 1983), a standardized test employed in language therapy and evaluation sessions. Participants were recorded and the recordings were manually transcribed into text. The study refined the transcriptions of the recordings, defined several linguistic features, and employed two different annotation tools (Sparv and Parsey Universal) and two statistical measurements (Accuracy and Area under the Receiver Operating Characteristic (ROC AUC)) to select the superior feature set for the classification tasks. As a side product, an open source Swedish text annotation tool was deployed to benefit the linguistic research community. A novel feature engineering approach called SVCRandomized Recursive Feature Elimination (SVC-RRFE) was introduced to select best features using Support Vector Machine, binary search and group k-fold cross validation. In the end, the 160, 150 and 98 selected features were applied and evaluated in feed-forward neural networks using group 10-fold crossvalidation. Results Through group 10-fold cross validation neural networks (NN), we reached 76% mean accuracy, 73% mean ROC AUC, 0.47 mean Matthew’s correlation coefficient for MCI detection; 71% mean accuracy, 71% mean ROC AUC, 0.4 mean Matthew’s correlation coefficient for SCI detection and 75% mean accuracy, 71% mean ROC AUC, 0.39 mean Matthew’s correlation coefficient to differentiate MCI speakers and SCI speakers. The highest validation accuracy for the three models were 83%, 79% and 84%, respectively. The best features to classify MCI individuals and HC were mean length of word, words begin with [mɐ] and words with [ɪp] at the second and third position; the top 3 most important features to identify SCI individuals and HC were words with [ɑːɡ] at the second and third position, words begin with [mɑː] and words begin with [jøː]; and MLU, words begin with [dɛ], words with [ɑːd] at the second and third position were the most important features to differentiate MCI and SCI individuals. 3 Discussions Phonology was impaired in patients with MCI and SCI subject. Specifically, Individuals with MCI showed more self-interruptions, produced more long vowels than the ones with SCI, more unrounded vowels than rounded ones and more stops follow by back vowels during the picture description task. Individuals with SCI tend to produce longer utterances than HC and MCI ones, and more nasal consonants follow by close front vowels. Sparv annotated data performed better during feature selection and the ones analyzed by Parsey Universal reached better results with neural networks. It proved that feed-forward neural networks can be used to build models to identify people with MCI and people with SCI. By employing phonological features this study provided improved classification of individuals with MCI, provided added objective markers than can be employed to identify these individuals for treatment.