Proceedings of ESCAPE 36ISSN: 2818-4734
Volume: 5 (2026)
Table of Contents
LAPSE:2026.0286
Published Article
LAPSE:2026.0286
Molecular Similarity Coefficient in Chemical Design and Analysis
Youquan Xu, Zhijiang Shao, Abdulelah S. Alshehri, Mansour S. Alhoshan, Anjan K. Tula
June 12, 2026
Abstract
Computer-aided molecular design (CAMD) is an efficient product design method that is gradually attracting attention at present. It mainly uses data mining technology to extract information from the existing chemical molecular data and use this information to generate potential excellent molecules. However, the key that CAMD can truly provide accurate and reliable results lies in the efficient utilization of chemical data. In this paper, a series of chemical data analysis methods based on molecular similarity are proposed to enhance the data utilization efficiency of CAMD, which mainly includes 3 applications: adaptive modeling, reliability assessment and advanced data preprocessing including molecular recommendation, data consistency test and data augmentation. We propose specific methodology for each application, and use multiple cases to verify the effect. The results show that molecular similarity can help to improve the accuracy of property prediction at the data level, provide quantification for property prediction reliability, recommend potential excellent molecules with similar structures, locate data mistakes and perform reliable data augmentation, finally enhancing the data utilization efficiency of CAMD.
Keywords
Data preprocessing, Molecular design, Property prediction, Reliability quantification, Similarity
Suggested Citation
Xu Y, Shao Z, Alshehri AS, Alhoshan MS, Tula AK. Molecular Similarity Coefficient in Chemical Design and Analysis. Systems and Control Transactions 5:681-691 (2026) https://doi.org/10.69997/sct.146496
Author Affiliations
Xu Y: Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
Shao Z: Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
Alshehri AS: Chemical Engineering Department, College of Engineering, King Saud University, Riyadh 11421, KSA
Alhoshan MS: Chemical Engineering Department, College of Engineering, King Saud University, Riyadh 11421, KSA
Tula AK: Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
[Login] to see author email addresses.
Journal Name
Systems and Control Transactions
Volume
5
First Page
681
Last Page
691
Year
2026
Publication Date
2026-06-12
Version Comments
Original Submission
Other Meta
PII: 0681-0691-360-SCT-5-2026, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2026.0286
This Record
External Link

https://doi.org/10.69997/sct.146496
Publisher Version
Download
Files
Jun 12, 2026
Main Article
License
CC BY-SA 4.0
Meta
Record Statistics
Record Views
27
Version History
[v1] (Original Submission)
Jun 12, 2026
 
Verified by curator on
Jun 12, 2026
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2026.0286
 
Record Owner
PSE Press
Links to Related Works
Directly Related to This Work
Publisher Version
References Cited
  1. Hukkerikar AS, Sin G;, Abildskov J;, Sarup B, Gani R. General rights Development of pure component property models for chemical product-process design and analysis). Development of pure component property models for chemical product-process design and analysis. APA; 2017.
  2. Alshehri AS, Tula AK, You F, Gani R. Next generation pure component property estimation models: with and without machine learning techniques. AIChE Journal 68: (2021) https://doi.org/10.1002/aic.17469
  3. Zhang J, Wang Q, Lei Y, Shen W. An interpretable 3D multi-hierarchical representation-based deep neural network for environmental, health and safety properties prediction of organic solvents. Green Chem. 26:4181-4191 (2024) https://doi.org/10.1039/d3gc04801b
  4. Zhang L, Mao H, Liu L, Du J, Gani R. A machine learning based computer-aided molecular design/screening methodology for fragrance molecules. Computers & Chemical Engineering 115:295-308 (2018) https://doi.org/10.1016/j.compchemeng.2018.04.018
  5. Jonuzaj S, Akula PT, Kleniati P, Adjiman CS. The formulation of optimal mixtures with generalized disjunctive programming: a solvent design case study. AIChE Journal 62:1616-1633 (2016) https://doi.org/10.1002/aic.15122
  6. Sahinidis NV, Tawarmalani M, Yu M. Design of alternative refrigerants via global optimization. AIChE Journal 49:1761-1775 (2004) https://doi.org/10.1002/aic.690490714
  7. Al-Shar'i NA. The design of TOPK inhibitors using similarity search, molecular docking, and MD simulations. Journal of Biomolecular Structure and Dynamics 43:8456-8467 (2024) https://doi.org/10.1080/07391102.2024.2319107
  8. Gurung AB, Ali MA, Lee J, Farah MA, Al-Anazi KM. An updated review of computer?aided drug design and its application to COVID?19. BioMed Research International 2021: (2021) https://doi.org/10.1155/2021/8853056
  9. Fu T, Xiao C, Li X, Glass LM, Sun J. MIMOSA: multi-constraint molecule sampling for molecule optimization. AAAI 35:125-133 (2021) https://doi.org/10.1609/aaai.v35i1.16085
  10. JOBACK KG, REID RC. ESTIMATION OF PURE-COMPONENT PROPERTIES FROM GROUP-CONTRIBUTIONS. Chemical Engineering Communications 57:233-243 (2007) https://doi.org/10.1080/00986448708960487
  11. Xu Y, Shao Z, Tula AK. Accurate property predictions and reliability quantification in molecular design based on molecular similarity. Computers & Chemical Engineering 201:109241 (2025) https://doi.org/10.1016/j.compchemeng.2025.109241
  12. Gani R, Hytoft G, Jaksland C, Jensen AK. An integrated computer aided system for integrated design of chemical processes. Computers & Chemical Engineering 21:1135-1146 (1997) https://doi.org/10.1016/s0098-1354(96)00324-9
(0.09 seconds)

[0.09 s]