LAPSE:2023.4505
Published Article

LAPSE:2023.4505
MRlogP: Transfer Learning Enables Accurate logP Prediction Using Small Experimental Training Datasets
February 23, 2023
Abstract
Small molecule lipophilicity is often included in generalized rules for medicinal chemistry. These rules aim to reduce time, effort, costs, and attrition rates in drug discovery, allowing the rejection or prioritization of compounds without the need for synthesis and testing. The availability of high quality, abundant training data for machine learning methods can be a major limiting factor in building effective property predictors. We utilize transfer learning techniques to get around this problem, first learning on a large amount of low accuracy predicted logP values before finally tuning our model using a small, accurate dataset of 244 druglike compounds to create MRlogP, a neural network-based predictor of logP capable of outperforming state of the art freely available logP prediction methods for druglike small molecules. MRlogP achieves an average root mean squared error of 0.988 and 0.715 against druglike molecules from Reaxys and PHYSPROP. We have made the trained neural network predictor and all associated code for descriptor generation freely available. In addition, MRlogP may be used online via a web interface.
Small molecule lipophilicity is often included in generalized rules for medicinal chemistry. These rules aim to reduce time, effort, costs, and attrition rates in drug discovery, allowing the rejection or prioritization of compounds without the need for synthesis and testing. The availability of high quality, abundant training data for machine learning methods can be a major limiting factor in building effective property predictors. We utilize transfer learning techniques to get around this problem, first learning on a large amount of low accuracy predicted logP values before finally tuning our model using a small, accurate dataset of 244 druglike compounds to create MRlogP, a neural network-based predictor of logP capable of outperforming state of the art freely available logP prediction methods for druglike small molecules. MRlogP achieves an average root mean squared error of 0.988 and 0.715 against druglike molecules from Reaxys and PHYSPROP. We have made the trained neural network predictor and all associated code for descriptor generation freely available. In addition, MRlogP may be used online via a web interface.
Record ID
Keywords
lipophilicity prediction, logP prediction, physicochemical property prediction, transfer learning
Suggested Citation
Chen YK, Shave S, Auer M. MRlogP: Transfer Learning Enables Accurate logP Prediction Using Small Experimental Training Datasets. (2023). LAPSE:2023.4505
Author Affiliations
Chen YK: School of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, Scotland, UK [ORCID]
Shave S: School of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, Scotland, UK [ORCID]
Auer M: School of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, Scotland, UK [ORCID]
Shave S: School of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, Scotland, UK [ORCID]
Auer M: School of Biological Sciences, University of Edinburgh, The King’s Buildings, Edinburgh EH9 3BF, Scotland, UK [ORCID]
Journal Name
Processes
Volume
9
Issue
11
First Page
2029
Year
2021
Publication Date
2021-11-13
ISSN
2227-9717
Version Comments
Original Submission
Other Meta
PII: pr9112029, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2023.4505
This Record
External Link

https://doi.org/10.3390/pr9112029
Publisher Version
Download
Meta
Record Statistics
Record Views
190
Version History
[v1] (Original Submission)
Feb 23, 2023
Verified by curator on
Feb 23, 2023
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2023.4505
Record Owner
Auto Uploader for LAPSE
Links to Related Works
