LAPSE:2026.0286
Published Article

LAPSE:2026.0286
Molecular Similarity Coefficient in Chemical Design and Analysis
June 12, 2026
Abstract
Computer-aided molecular design (CAMD) is an efficient product design method that is gradually attracting attention at present. It mainly uses data mining technology to extract information from the existing chemical molecular data and use this information to generate potential excellent molecules. However, the key that CAMD can truly provide accurate and reliable results lies in the efficient utilization of chemical data. In this paper, a series of chemical data analysis methods based on molecular similarity are proposed to enhance the data utilization efficiency of CAMD, which mainly includes 3 applications: adaptive modeling, reliability assessment and advanced data preprocessing including molecular recommendation, data consistency test and data augmentation. We propose specific methodology for each application, and use multiple cases to verify the effect. The results show that molecular similarity can help to improve the accuracy of property prediction at the data level, provide quantification for property prediction reliability, recommend potential excellent molecules with similar structures, locate data mistakes and perform reliable data augmentation, finally enhancing the data utilization efficiency of CAMD.
Computer-aided molecular design (CAMD) is an efficient product design method that is gradually attracting attention at present. It mainly uses data mining technology to extract information from the existing chemical molecular data and use this information to generate potential excellent molecules. However, the key that CAMD can truly provide accurate and reliable results lies in the efficient utilization of chemical data. In this paper, a series of chemical data analysis methods based on molecular similarity are proposed to enhance the data utilization efficiency of CAMD, which mainly includes 3 applications: adaptive modeling, reliability assessment and advanced data preprocessing including molecular recommendation, data consistency test and data augmentation. We propose specific methodology for each application, and use multiple cases to verify the effect. The results show that molecular similarity can help to improve the accuracy of property prediction at the data level, provide quantification for property prediction reliability, recommend potential excellent molecules with similar structures, locate data mistakes and perform reliable data augmentation, finally enhancing the data utilization efficiency of CAMD.
Record ID
Keywords
Data preprocessing, Molecular design, Property prediction, Reliability quantification, Similarity
Subject
Suggested Citation
Xu Y, Shao Z, Alshehri AS, Alhoshan MS, Tula AK. Molecular Similarity Coefficient in Chemical Design and Analysis. Systems and Control Transactions 5:681-691 (2026) https://doi.org/10.69997/sct.146496
Author Affiliations
Xu Y: Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
Shao Z: Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
Alshehri AS: Chemical Engineering Department, College of Engineering, King Saud University, Riyadh 11421, KSA
Alhoshan MS: Chemical Engineering Department, College of Engineering, King Saud University, Riyadh 11421, KSA
Tula AK: Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
[Login] to see author email addresses.
Shao Z: Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
Alshehri AS: Chemical Engineering Department, College of Engineering, King Saud University, Riyadh 11421, KSA
Alhoshan MS: Chemical Engineering Department, College of Engineering, King Saud University, Riyadh 11421, KSA
Tula AK: Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
[Login] to see author email addresses.
Journal Name
Systems and Control Transactions
Volume
5
First Page
681
Last Page
691
Year
2026
Publication Date
2026-06-12
Version Comments
Original Submission
Other Meta
PII: 0681-0691-360-SCT-5-2026, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2026.0286
This Record
External Link

https://doi.org/10.69997/sct.146496
Publisher Version
Download
Meta
Record Statistics
Record Views
27
Version History
[v1] (Original Submission)
Jun 12, 2026
Verified by curator on
Jun 12, 2026
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2026.0286
Record Owner
PSE Press
Links to Related Works
References Cited
- Hukkerikar AS, Sin G;, Abildskov J;, Sarup B, Gani R. General rights Development of pure component property models for chemical product-process design and analysis). Development of pure component property models for chemical product-process design and analysis. APA; 2017.
- Alshehri AS, Tula AK, You F, Gani R. Next generation pure component property estimation models: with and without machine learning techniques. AIChE Journal 68: (2021) https://doi.org/10.1002/aic.17469
- Zhang J, Wang Q, Lei Y, Shen W. An interpretable 3D multi-hierarchical representation-based deep neural network for environmental, health and safety properties prediction of organic solvents. Green Chem. 26:4181-4191 (2024) https://doi.org/10.1039/d3gc04801b
- Zhang L, Mao H, Liu L, Du J, Gani R. A machine learning based computer-aided molecular design/screening methodology for fragrance molecules. Computers & Chemical Engineering 115:295-308 (2018) https://doi.org/10.1016/j.compchemeng.2018.04.018
- Jonuzaj S, Akula PT, Kleniati P, Adjiman CS. The formulation of optimal mixtures with generalized disjunctive programming: a solvent design case study. AIChE Journal 62:1616-1633 (2016) https://doi.org/10.1002/aic.15122
- Sahinidis NV, Tawarmalani M, Yu M. Design of alternative refrigerants via global optimization. AIChE Journal 49:1761-1775 (2004) https://doi.org/10.1002/aic.690490714
- Al-Shar'i NA. The design of TOPK inhibitors using similarity search, molecular docking, and MD simulations. Journal of Biomolecular Structure and Dynamics 43:8456-8467 (2024) https://doi.org/10.1080/07391102.2024.2319107
- Gurung AB, Ali MA, Lee J, Farah MA, Al-Anazi KM. An updated review of computer?aided drug design and its application to COVID?19. BioMed Research International 2021: (2021) https://doi.org/10.1155/2021/8853056
- Fu T, Xiao C, Li X, Glass LM, Sun J. MIMOSA: multi-constraint molecule sampling for molecule optimization. AAAI 35:125-133 (2021) https://doi.org/10.1609/aaai.v35i1.16085
- JOBACK KG, REID RC. ESTIMATION OF PURE-COMPONENT PROPERTIES FROM GROUP-CONTRIBUTIONS. Chemical Engineering Communications 57:233-243 (2007) https://doi.org/10.1080/00986448708960487
- Xu Y, Shao Z, Tula AK. Accurate property predictions and reliability quantification in molecular design based on molecular similarity. Computers & Chemical Engineering 201:109241 (2025) https://doi.org/10.1016/j.compchemeng.2025.109241
- Gani R, Hytoft G, Jaksland C, Jensen AK. An integrated computer aided system for integrated design of chemical processes. Computers & Chemical Engineering 21:1135-1146 (1997) https://doi.org/10.1016/s0098-1354(96)00324-9
(0.09 seconds)
[0.09 s]

