LAPSE:2024.1045
Published Article

LAPSE:2024.1045
Utilizing Machine Learning Models with Molecular Fingerprints and Chemical Structures to Predict the Sulfate Radical Rate Constants of Water Contaminants
June 7, 2024
Abstract
Sulfate radicals are increasingly recognized for their potent oxidative capabilities, making them highly effective in degrading persistent organic pollutants (POPs) in aqueous environments. These radicals excel in breaking down complex organic molecules that are resistant to traditional treatment methods, addressing the challenges posed by POPs known for their persistence, bioaccumulation, and potential health impacts. The complexity of predicting interactions between sulfate radicals and diverse organic contaminants is a notable challenge in advancing water treatment technologies. This study bridges this gap by employing a range of machine learning (ML) models, including random forest (DF), decision tree (DT), support vector machine (SVM), XGBoost (XGB), gradient boosting (GB), and Bayesian ridge regression (BR) models. Predicting performances were evaluated using R2, RMSE, and MAE, with the residual plots presented. Performances varied in their ability to manage complex relationships and large datasets. The SVM model demonstrated the best predictive performance when utilizing the Morgan fingerprint as descriptors, achieving the highest R2 and the lowest MAE value in the test set. The GB model displayed optimal performance when chemical descriptors were utilized as features. Boosting models generally exhibited superior performances when compared to single models. The most important ten features were presented via SHAP analysis. By analyzing the performance of these models, this research not only enhances our understanding of chemical reactions involving sulfate radicals, but also showcases the potential of machine learning in environmental chemistry, combining the strengths of ML with chemical kinetics in order to address the challenges of water treatment and contaminant analysis.
Sulfate radicals are increasingly recognized for their potent oxidative capabilities, making them highly effective in degrading persistent organic pollutants (POPs) in aqueous environments. These radicals excel in breaking down complex organic molecules that are resistant to traditional treatment methods, addressing the challenges posed by POPs known for their persistence, bioaccumulation, and potential health impacts. The complexity of predicting interactions between sulfate radicals and diverse organic contaminants is a notable challenge in advancing water treatment technologies. This study bridges this gap by employing a range of machine learning (ML) models, including random forest (DF), decision tree (DT), support vector machine (SVM), XGBoost (XGB), gradient boosting (GB), and Bayesian ridge regression (BR) models. Predicting performances were evaluated using R2, RMSE, and MAE, with the residual plots presented. Performances varied in their ability to manage complex relationships and large datasets. The SVM model demonstrated the best predictive performance when utilizing the Morgan fingerprint as descriptors, achieving the highest R2 and the lowest MAE value in the test set. The GB model displayed optimal performance when chemical descriptors were utilized as features. Boosting models generally exhibited superior performances when compared to single models. The most important ten features were presented via SHAP analysis. By analyzing the performance of these models, this research not only enhances our understanding of chemical reactions involving sulfate radicals, but also showcases the potential of machine learning in environmental chemistry, combining the strengths of ML with chemical kinetics in order to address the challenges of water treatment and contaminant analysis.
Record ID
Keywords
advance oxidation, boosting models, emerging contaminants, machine learning models, SHAP analysis, sulfate radicals
Subject
Suggested Citation
Tang T, Song D, Chen J, Chen Z, Du Y, Dang Z, Lu G. Utilizing Machine Learning Models with Molecular Fingerprints and Chemical Structures to Predict the Sulfate Radical Rate Constants of Water Contaminants. (2024). LAPSE:2024.1045
Author Affiliations
Tang T: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006
Song D: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
Chen J: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
Chen Z: SCNU (NAN’AN) Green and Low-Carbon Innovation Center, Guangdong Provincial Engineering Research Center of Intelligent Low-Carbon Pollution Prevention and Digital Technology, South China Normal University, Guangzhou 510006, China
Du Y: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
Dang Z: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006
Lu G: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006 [ORCID]
Song D: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
Chen J: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
Chen Z: SCNU (NAN’AN) Green and Low-Carbon Innovation Center, Guangdong Provincial Engineering Research Center of Intelligent Low-Carbon Pollution Prevention and Digital Technology, South China Normal University, Guangzhou 510006, China
Du Y: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China
Dang Z: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006
Lu G: School of Environment and Energy, South China University of Technology, Guangzhou 510006, China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou 510006 [ORCID]
Journal Name
Processes
Volume
12
Issue
2
First Page
384
Year
2024
Publication Date
2024-02-14
ISSN
2227-9717
Version Comments
Original Submission
Other Meta
PII: pr12020384, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2024.1045
This Record
External Link

https://doi.org/10.3390/pr12020384
Publisher Version
Download
Meta
Record Statistics
Record Views
346
Version History
[v1] (Original Submission)
Jun 7, 2024
Verified by curator on
Jun 7, 2024
This Version Number
v1
Citations
Most Recent
This Version
URL Here
http://psecommunity.org/LAPSE:2024.1045
Record Owner
Auto Uploader for LAPSE
Links to Related Works
