LAPSE:2026.0284
Published Article

LAPSE:2026.0284
Development of Symbolic Regression-Based ATR-FTIR Calibration Models
June 12, 2026
Abstract
Accurate calibration of spectroscopic measurements is essential for reliable real-time monitoring and control of crystallization processes. In this work, calibration strategies for Attenuated Total Reflectance Fourier Transform Infrared (ATR-FTIR) spectroscopy were systematically evaluated for concentration monitoring in batch cooling crystallization of paracetamol in ethanol. Linear regression (LR), Partial Least Squares Regression (PLSR), Principal Component Regression (PCR), and symbolic regression (SR) were compared using both peak-based features and full spectral representations. Peak-based models provided a transparent baseline, with peak-area-based models consistently outperforming peak-height-based models. For LR, incorporating multiple absorption bands reduced the mean squared error (MSE) by nearly one order of magnitude compared to single-peak models. Using the same peak-based inputs, SR further improved performance, reducing prediction bias at high concentrations and yielding higher coefficients of determination (R² > 0.99) compared to LR. A substantial improvement was achieved when full spectral information was used. Among all evaluated approaches, SR with unprocessed spectra yielded the best overall performance, achieving an R² of 0.996 and an MSE of 1.4 × 10?6 on the validation dataset. This model also demonstrated strong generalization on an independent solubility test dataset, closely reproducing the reference solubility curve over the full temperature range with minimal deviation. In contrast, PCR and PLSR models showed increased sensitivity to preprocessing choices and exhibited larger errors on the test dataset. SR provided an accurate, robust, and interpretable calibration framework for ATR-FTIR, with reduced reliance on spectral preprocessing and potential for real-time process analytical technology and control applications.
Accurate calibration of spectroscopic measurements is essential for reliable real-time monitoring and control of crystallization processes. In this work, calibration strategies for Attenuated Total Reflectance Fourier Transform Infrared (ATR-FTIR) spectroscopy were systematically evaluated for concentration monitoring in batch cooling crystallization of paracetamol in ethanol. Linear regression (LR), Partial Least Squares Regression (PLSR), Principal Component Regression (PCR), and symbolic regression (SR) were compared using both peak-based features and full spectral representations. Peak-based models provided a transparent baseline, with peak-area-based models consistently outperforming peak-height-based models. For LR, incorporating multiple absorption bands reduced the mean squared error (MSE) by nearly one order of magnitude compared to single-peak models. Using the same peak-based inputs, SR further improved performance, reducing prediction bias at high concentrations and yielding higher coefficients of determination (R² > 0.99) compared to LR. A substantial improvement was achieved when full spectral information was used. Among all evaluated approaches, SR with unprocessed spectra yielded the best overall performance, achieving an R² of 0.996 and an MSE of 1.4 × 10?6 on the validation dataset. This model also demonstrated strong generalization on an independent solubility test dataset, closely reproducing the reference solubility curve over the full temperature range with minimal deviation. In contrast, PCR and PLSR models showed increased sensitivity to preprocessing choices and exhibited larger errors on the test dataset. SR provided an accurate, robust, and interpretable calibration framework for ATR-FTIR, with reduced reliance on spectral preprocessing and potential for real-time process analytical technology and control applications.
Record ID
Keywords
Crystallization, PAT, PLSR, Preprocessing, Process monitoring
Subject
Suggested Citation
Lima FARD, Nordhus IS, Moraes MGFD, Leblebici ME, Secchi AR, Souza MBD Jr, Nogueira I. Development of Symbolic Regression-Based ATR-FTIR Calibration Models. Systems and Control Transactions 5:655-663 (2026) https://doi.org/10.69997/sct.169416
Author Affiliations
Lima FARD: Chemical Engineering Department, Norwegian University of Science and Technology, Trondheim, 793101, Norway. EPQB, School of Chemistry, Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, CT, Bloco E, 21941-914, Rio de Janeiro, RJ - Brazil. [ORCID]
Nordhus IS: Chemical Engineering Department, Norwegian University of Science and Technology, Trondheim, 793101, Norway
Moraes MGFD: Chemical Engineering Program, PEQ/COPPE - Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, CT, Bloco G, G115, 21941-914, Rio de Janeiro, RJ - Brazil [ORCID]
Leblebici ME: KU Leuven, Center for Industrial Process Technology, Diepenbeek, Belgium [ORCID]
Secchi AR: EPQB, School of Chemistry, Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, CT, Bloco E, 21941-914, Rio de Janeiro, RJ - Brazil. Chemical Engineering Program, PEQ/COPPE - Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, [ORCID]
Souza MBD Jr: EPQB, School of Chemistry, Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, CT, Bloco E, 21941-914, Rio de Janeiro, RJ - Brazil. Chemical Engineering Program, PEQ/COPPE - Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, [ORCID]
Nogueira I: Chemical Engineering Department, Norwegian University of Science and Technology, Trondheim, 793101, Norway [ORCID]
[Login] to see author email addresses.
Nordhus IS: Chemical Engineering Department, Norwegian University of Science and Technology, Trondheim, 793101, Norway
Moraes MGFD: Chemical Engineering Program, PEQ/COPPE - Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, CT, Bloco G, G115, 21941-914, Rio de Janeiro, RJ - Brazil [ORCID]
Leblebici ME: KU Leuven, Center for Industrial Process Technology, Diepenbeek, Belgium [ORCID]
Secchi AR: EPQB, School of Chemistry, Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, CT, Bloco E, 21941-914, Rio de Janeiro, RJ - Brazil. Chemical Engineering Program, PEQ/COPPE - Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, [ORCID]
Souza MBD Jr: EPQB, School of Chemistry, Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, CT, Bloco E, 21941-914, Rio de Janeiro, RJ - Brazil. Chemical Engineering Program, PEQ/COPPE - Universidade Federal do Rio de Janeiro, Av. Horácio Macedo, 2030, [ORCID]
Nogueira I: Chemical Engineering Department, Norwegian University of Science and Technology, Trondheim, 793101, Norway [ORCID]
[Login] to see author email addresses.
Journal Name
Systems and Control Transactions
Volume
5
First Page
655
Last Page
663
Year
2026
Publication Date
2026-06-12
Version Comments
Original Submission
Other Meta
PII: 0655-0663-188-SCT-5-2026, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2026.0284
This Record
External Link

https://doi.org/10.69997/sct.169416
Publisher Version
Download
Meta
Record Statistics
Record Views
3
Version History
[v1] (Original Submission)
Jun 12, 2026
Verified by curator on
Jun 12, 2026
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2026.0284
Record Owner
PSE Press
Links to Related Works
References Cited
- Gao Z, Rohani S, Gong J, Wang J. Recent developments in the crystallization process: toward the pharmaceutical industry. Engineering 3:343-353 (2017) https://doi.org/10.1016/j.eng.2017.03.022
- Braatz RD. Advanced control of crystallization processes. Annual Reviews in Control 26:87-99 (2002) https://doi.org/10.1016/s1367-5788(02)80016-5
- Nagy ZK, Braatz RD. Advances and new directions in crystallization control. Annu. Rev. Chem. Biomol. Eng. 3:55-75 (2012) https://doi.org/10.1146/annurev-chembioeng-062011-081043
- Lima FARD, de Moraes MGF, Barreto AG Jr, Secchi AR, Grover MA, de Souza MB Jr. Applications of machine learning for modeling and advanced control of crystallization processes: developments and perspectives. Digital Chemical Engineering 14:100208 (2025) https://doi.org/10.1016/j.dche.2024.100208
- Simon LL, et al. Assessment of recent process analytical technology (PAT) trends: a multiauthor review. Org Process Res Dev 19:3-62 (2015) https://doi.org/10.1021/op500261y
- Xiouras C, Cameli F, Quilló GL, Kavousanakis ME, Vlachos DG, Stefanidis GD. Applications of artificial intelligence and machine learning algorithms to crystallization. Chem. Rev. 122:13006-13042 (2022) https://doi.org/10.1021/acs.chemrev.2c00141
- Dias Lima FAR, Fernandes de Moraes MG, Resende Secchi A, de Souza MB Jr, Grover MA. Experimental nonlinear model predictive control of crystal size and yield in batch cooling crystallization enabled by soft sensor and symbolic-based calibration model. Ind. Eng. Chem. Res. 64:23582-23600 (2025) https://doi.org/10.1021/acs.iecr.5c03894
- Zhang F, Du K, Guo L, Huo Y, He K, Shan B. Progress, problems, and potential of technology for measuring solution concentration in crystallization processes. Measurement 187:110328 (2022) https://doi.org/10.1016/j.measurement.2021.110328
- Swinehart DF. The beer-lambert law. J. Chem. Educ. 39:333 (1962) https://doi.org/10.1021/ed039p333
- Lindenberg C, Krättli M, Cornel J, Mazzotti M, Brozio J. Design and optimization of a combined cooling/antisolvent crystallization process. Crystal Growth & Design 9:1124-1136 (2008) https://doi.org/10.1021/cg800934h
- Trampuž M, Tesli? D, Likozar B. Process analytical technology-based (PAT) model simulations of a combined cooling, seeded and antisolvent crystallization of an active pharmaceutical ingredient (API). Powder Technology 366:873-890 (2020) https://doi.org/10.1016/j.powtec.2020.03.027
- Zhang F, Liu T, Wang XZ, Liu J, Jiang X. Comparative study on ATR-FTIR calibration models for monitoring solution concentration in cooling crystallization. Journal of Crystal Growth 459:50-55 (2017) https://doi.org/10.1016/j.jcrysgro.2016.11.064
- Lu M, Rao S, Yue H, Han J, Wang J. Recent advances in the application of machine learning to crystal behavior and crystallization process control. Crystal Growth & Design 24:5374-5396 (2024) https://doi.org/10.1021/acs.cgd.3c01251
- Lima FARD, de Moraes MGF, Rebello CM, Barreto AG Jr, Secchi AR, de Souza MB Jr, Nogueira IBR. Interpretable and uncertainty-aware machine learning for trustworthy prediction in batch crystallization. Chemical Engineering and Processing - Process Intensification 215:110350 (2025) https://doi.org/10.1016/j.cep.2025.110350
- Rebello CM, Costa EA, Fontana M, Schnitman L, Nogueira IBR. Interpretable scientific machine learning approach for correcting phenomenological models: methodology validation on an ESP prototype. Ind. Eng. Chem. Res. 63:19030-19050 (2024) https://doi.org/10.1021/acs.iecr.4c02104
- Santana VV, Costa E, Rebello CM, Ribeiro AM, Rackauckas C, Nogueira IBR. Efficient hybrid modeling and sorption model discovery for non-linear advection-diffusion-sorption systems: a systematic scientific machine learning approach. Chemical Engineering Science 282:119223 (2023) https://doi.org/10.1016/j.ces.2023.119223
- Griffin DJ, Kawajiri Y, Rousseau RW, Grover MA. Using MC plots for control of paracetamol crystallization. Chemical Engineering Science 164:344-360 (2017) https://doi.org/10.1016/j.ces.2017.01.065
- Virtanen P, Gommers R, Oliphant TE, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17:261-272 (2020) https://doi.org/10.1038/s41592-019-0686-2
- Mehmood T, Liland KH, Snipen L, Sæbø S. A review of variable selection methods in partial least squares regression. Chemometrics and Intelligent Laboratory Systems 118:62-69 (2012) https://doi.org/10.1016/j.chemolab.2012.07.010
- Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825-2830 (2011)
- Cranmer K. Interpretable machine learning for science with PySR and symbolic regression. Mach Learn Sci Technol 4:015018 (2023) https://doi.org/10.1088/2632-2153/ac9f7c
(0.09 seconds)
[0.09 s]

