LAPSE:2026.0309
Published Article

LAPSE:2026.0309
A Data-Efficient Symbolic Regression Framework for Automated Interpretable Bioprocess Modelling
June 12, 2026
Abstract
Bioprocess modelling, optimisation and scale-up are central components for improving sustainable manufacturing within pharmaceutical and chemical industries. However, developing accurate bioprocess digital twins remains a challenging process. Conventional mechanistic models are difficult to construct because of limited mechanistic understanding and large complexity of cellular metabolisms. While data-driven models have gained popularity, they often require large amounts of experimental data that is often time consuming to obtain and lack any quantitative description of the process. Hybrid modelling methods have emerged as promising alternatives however fail to provide physical insight to the root cause of model error. This work therefore presents a promising solution by developing a data-efficient symbolic regression (SR) based framework to enable the automated discovery of interpretable bioprocess models. A universal kinetic model backbone was used to capture overall process behaviour, while SR was applied to strategically uncover the structures of critical kinetic terms within the backbone. Two frameworks, embedding SR directly into the kinetic model backbone or identifying time-varying parameter profiles prior to SR, were benchmarked using an in-silico yeast fermentation case study. The results demonstrated that independently identifying individual kinetic terms was crucial for recovering the ground-truth model, while refining SR-generated candidates through a novel local iterative structural correction strategy significantly improved convergence to the true kinetic expressions, surpassing model-based design of experiments in data efficiency. This study therefore enables automated yet interpretable model construction for small-data bioprocess applications, paving the way towards augmented intelligence driven bioprocess modelling and accelerating digital twin development for process optimisation and control.
Bioprocess modelling, optimisation and scale-up are central components for improving sustainable manufacturing within pharmaceutical and chemical industries. However, developing accurate bioprocess digital twins remains a challenging process. Conventional mechanistic models are difficult to construct because of limited mechanistic understanding and large complexity of cellular metabolisms. While data-driven models have gained popularity, they often require large amounts of experimental data that is often time consuming to obtain and lack any quantitative description of the process. Hybrid modelling methods have emerged as promising alternatives however fail to provide physical insight to the root cause of model error. This work therefore presents a promising solution by developing a data-efficient symbolic regression (SR) based framework to enable the automated discovery of interpretable bioprocess models. A universal kinetic model backbone was used to capture overall process behaviour, while SR was applied to strategically uncover the structures of critical kinetic terms within the backbone. Two frameworks, embedding SR directly into the kinetic model backbone or identifying time-varying parameter profiles prior to SR, were benchmarked using an in-silico yeast fermentation case study. The results demonstrated that independently identifying individual kinetic terms was crucial for recovering the ground-truth model, while refining SR-generated candidates through a novel local iterative structural correction strategy significantly improved convergence to the true kinetic expressions, surpassing model-based design of experiments in data efficiency. This study therefore enables automated yet interpretable model construction for small-data bioprocess applications, paving the way towards augmented intelligence driven bioprocess modelling and accelerating digital twin development for process optimisation and control.
Record ID
Keywords
Augmented intelligence, Biochemical reaction kinetics, Data intelligence, Machine learning, Symbolic Regression
Subject
Suggested Citation
Riezzo L, Rogers A, Kay H, Zhang D. A Data-Efficient Symbolic Regression Framework for Automated Interpretable Bioprocess Modelling. Systems and Control Transactions 5:855-860 (2026) https://doi.org/10.69997/sct.191227
Author Affiliations
Riezzo L: Department of Chemical Engineering, The University of Manchester, Manchester, M13 9PL, UK [ORCID]
Rogers A: Department of Chemical Engineering, The University of Manchester, Manchester, M13 9PL, UK [ORCID]
Kay H: Department of Chemical Engineering, The University of Manchester, Manchester, M13 9PL, UK [ORCID]
Zhang D: [ORCID]
[Login] to see author email addresses.
Rogers A: Department of Chemical Engineering, The University of Manchester, Manchester, M13 9PL, UK [ORCID]
Kay H: Department of Chemical Engineering, The University of Manchester, Manchester, M13 9PL, UK [ORCID]
Zhang D: [ORCID]
[Login] to see author email addresses.
Journal Name
Systems and Control Transactions
Volume
5
First Page
855
Last Page
860
Year
2026
Publication Date
2026-06-12
Version Comments
Original Submission
Other Meta
PII: 0855-0860-25-SCT-5-2026, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2026.0309
This Record
External Link

https://doi.org/10.69997/sct.191227
Publisher Version
Download
Meta
Record Statistics
Record Views
1
Version History
[v1] (Original Submission)
Jun 12, 2026
Verified by curator on
Jun 12, 2026
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2026.0309
Record Owner
PSE Press
Links to Related Works
References Cited
- Park S, Kim S, Park C, Kim J, Lee D. Data?driven prediction models for forecasting multistep ahead profiles of mammalian cell culture toward bioprocess digital twins. Biotech & Bioengineering 120:2494-2508 (2023) https://doi.org/10.1002/bit.28405
- Pennington O, Ríos SE, Sebastian MT, Dickson A, Zhang D. Dynamic multiscale hybrid modelling of a CHO cell system for recombinant protein production. IFAC-PapersOnLine 58:133-138 (2024) https://doi.org/10.1016/j.ifacol.2024.08.326
- Riezzo L, Kay H, Feng Y, Jing K, Zhang D. Accelerating bioprocess digital twin development by integrating hybrid modelling with transfer learning. Chemical Engineering Journal 511:162018 (2025) https://doi.org/10.1016/j.cej.2025.162018
- Brunton, S.L., Proctor, J.L. and Kutz, J.N., "Discovering governing equations from data by sparse identification of nonlinear dynamical systems", Proceedings of the National Academy of Sciences of the United States of America, 113(15), pp. 3932-3937, (2016) https://doi.org/10.48550/arXiv.1509.03580
- Jul-Rasmussen P, Chakraborty A, Venkatasubramanian V, Liang X, Huusom JK. Hybrid AI modeling techniques for pilot scale bubble column aeration: a comparative study. Computers & Chemical Engineering 185:108655 (2024) https://doi.org/10.1016/j.compchemeng.2024.108655
- Forster T, Vázquez D, Müller C, Guillén-Gosálbez G. Machine learning uncovers analytical kinetic models of bioprocesses. Chemical Engineering Science 300:120606 (2024) https://doi.org/10.1016/j.ces.2024.120606
- Vega?Ramon F, Zhu X, Savage TR, Petsagkourakis P, Jing K, Zhang D. Kinetic and hybrid modeling for yeast astaxanthin production under uncertainty. Biotech & Bioengineering 118:4854-4866 (2021) https://doi.org/10.1002/bit.27950
(0.08 seconds)
[0.09 s]

