Machine learning for molecular property prediction and drug safety
Abstract
Utilizing machine learning methods for the prediction of acid dissociation (pKa ) values of compounds holds great significance, as pKa is an important parameter, optimized frequently in drug discovery. Accurate prediction of pKa values could potentially provide valuable insights on other molecular properties and thereby support compound design. In an attempt to extend the scope of pKa prediction, we have created several machine learning models utilizing internal AstraZeneca data. We explored both classical ML approaches with different molecular descriptors, and deep learning methods. The results showed that graph neural network based models outperform tree based methods and yielded reasonable predictions for both acidic and basic pKa values. Through the implementation of several data splitting strategies, we have substantiated that the models hold the potential to generalize well to novel compounds and outperform state of the art methods. Besides evaluating the models on different splits of the internal data, their performance was also assessed on public datasets. This yielded comparatively lower accuracies which can be attributed to the collation of data from diverse sources and the high experimental variability of the publicly available data.
Degree
Student essay
Collections
View/ Open
Date
2023-10-23Author
Jenei, Kinga
Keywords
Molecular property prediction
Acid dissociation constant
pKa
Machine learning
Graph Neural Networks
Molecular descriptors
Drug Discovery
Language
eng