LAPSE:2026.0408
Published Article

LAPSE:2026.0408
An End-to-End Pure Component Property Prediction Framework Based on a Hierarchical Molecular Fragmentation Method
June 12, 2026
Abstract
The accurate prediction of pure component properties has consistently been a critical issue in fields such as chemical engineering, biomedicine, and environmental science. In recent years, end-to-end deep learning methods have shown significant improvement over traditional machine learning approaches. This is due to their ability to automatically learn task-relevant representations from raw molecular data. In addition to accurate property prediction, researchers have increasingly focused on how specific fragment structures influence molecular properties. However, existing fragmentation methods based on predefined rules and group libraries struggle to capture novel molecular structures, which hampers the development of new materials and drugs. To address these challenges, this work proposes a hierarchical molecular fragmentation method. This method can automatically segment molecules into multiple fragments containing key functional groups. Then a three-branch graph attention network was constructed to achieve multi-level representation. Finally, a multi-layer perceptron is employed to establish the mapping relationship between molecular features and physical property values. Twenty datasets were used for validation, which can be grouped into four categories: Thermodynamic Properties, Pharmacokinetics, Toxicological Properties, and Industrial Safety. The results show that the best performance is achieved, with the average error reduced by 6.8% compared to existing research.
The accurate prediction of pure component properties has consistently been a critical issue in fields such as chemical engineering, biomedicine, and environmental science. In recent years, end-to-end deep learning methods have shown significant improvement over traditional machine learning approaches. This is due to their ability to automatically learn task-relevant representations from raw molecular data. In addition to accurate property prediction, researchers have increasingly focused on how specific fragment structures influence molecular properties. However, existing fragmentation methods based on predefined rules and group libraries struggle to capture novel molecular structures, which hampers the development of new materials and drugs. To address these challenges, this work proposes a hierarchical molecular fragmentation method. This method can automatically segment molecules into multiple fragments containing key functional groups. Then a three-branch graph attention network was constructed to achieve multi-level representation. Finally, a multi-layer perceptron is employed to establish the mapping relationship between molecular features and physical property values. Twenty datasets were used for validation, which can be grouped into four categories: Thermodynamic Properties, Pharmacokinetics, Toxicological Properties, and Industrial Safety. The results show that the best performance is achieved, with the average error reduced by 6.8% compared to existing research.
Record ID
Keywords
Subject
Suggested Citation
Jiao J, Li J. An End-to-End Pure Component Property Prediction Framework Based on a Hierarchical Molecular Fragmentation Method. Systems and Control Transactions 5:1634-1642 (2026) https://doi.org/10.69997/sct.100427
Author Affiliations
Journal Name
Systems and Control Transactions
Volume
5
First Page
1634
Last Page
1642
Year
2026
Publication Date
2026-06-12
Version Comments
Original Submission
Other Meta
PII: 1634-1642-227-SCT-5-2026, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2026.0408
This Record
External Link

https://doi.org/10.69997/sct.100427
Publisher Version
Download
Meta
Record Statistics
Record Views
4
Version History
[v1] (Original Submission)
Jun 12, 2026
Verified by curator on
Jun 12, 2026
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2026.0408
Record Owner
PSE Press
Links to Related Works
References Cited
- Rogers D, Hahn M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50:742-754 (2010) https://doi.org/10.1021/ci100050t
- Gani R. Group contribution-based property estimation methods: advances and perspectives. Current Opinion in Chemical Engineering 23:184-196 (2019) https://doi.org/10.1016/j.coche.2019.04.007
- Alshehri AS, Tula AK, You F, Gani R. Next generation pure component property estimation models: with and without machine learning techniques. AIChE Journal 68: (2021) https://doi.org/10.1002/aic.17469
- Aouichaoui ARN, Fan F, Abildskov J, Sin G. Application of interpretable group-embedded graph neural networks for pure compound properties. Computers & Chemical Engineering 176:108291 (2023) https://doi.org/10.1016/j.compchemeng.2023.108291
- Gilmer J, Schoenholz S S, Riley P F, Vinyals O, Dahl G E. Neural message passing for quantum chemistry. in International conference on machine learning 1263-1272 (2017) https://doi.org/proceedings.mlr.press/v70/gilmer17a
- Xiong Z, Wang D, Liu X, Zhong F, Wan X, Li X, Li Z, Luo X, Chen K, Jiang H, Zheng M. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J. Med. Chem. 63:8749-8760 (2019) https://doi.org/10.1021/acs.jmedchem.9b00959
- Li X, Fourches D. SMILES pair encoding: a data-driven substructure tokenization algorithm for deep learning. J. Chem. Inf. Model. 61:1560-1569 (2021) https://doi.org/10.1021/acs.jcim.0c01127
- Hukkerikar AS, Sarup B, Ten Kate A, Abildskov J, Sin G, Gani R. Group-contribution+ (GC+) based estimation of properties of pure components: improved property estimation and uncertainty analysis. Fluid Phase Equilibria 321:25-43 (2012) https://doi.org/10.1016/j.fluid.2012.02.010
- Wang J, Wang Y. Brics-based generation and ai-assisted screening of ionic liquids with mechanistic insights into lithium transport in electrolytes. J. Chem. Inf. Model. 65:10961-10976 (2025) https://doi.org/10.1021/acs.jcim.5c01824
- Brody S, Alon U, Yahav E. How attentive are graph attention networks? arXiv Prepr. arXiv2105.14491 (2021) https://doi.org/10.48550/arXiv.2105.14491
- Jiao J, Gao X, Li J. Pure component property estimation framework using explainable machine learning methods. Chinese Journal of Chemical Engineering 84:158-178 (2025) https://doi.org/10.1016/j.cjche.2025.05.011
- Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining :2623-2631 (2019) https://doi.org/10.1145/3292500.3330701
- Cao X, Gong M, Tula A, Chen X, Gani R, Venkatasubramanian V. An improved machine learning model for pure component property estimation. Engineering 39:61-73 (2024) https://doi.org/10.1016/j.eng.2023.08.024
- Zhu W, Zhang Y, Zhao D, Xu J, Wang L. Hignn: a hierarchical informative graph neural network for molecular property prediction equipped with feature-wise attention. J. Chem. Inf. Model. 63:43-55 (2022) https://doi.org/10.1021/acs.jcim.2c01099
(0.08 seconds)
[0.09 s]

