Proceedings of ESCAPE 36ISSN: 2818-4734
Volume: 5 (2026)
Table of Contents
LAPSE:2026.0323
Published Article
LAPSE:2026.0323
Data Transformation Techniques and its Influence in Hybrid Model Performance
June 12, 2026
Abstract
The global transition toward sustainable energy has intensified research into biofuels, with bioprocess optimization playing a central role in achieving decarbonization goals. Biobutanol, in particular, is a high-value molecule for sustainable fuel applications due to its superior energy density and compatibility with existing infrastructure. However, model-based optimization of its production is hindered by traditional semi-structured kinetic models that often suffer from limited predictive robustness. To address this challenge, within this study we developed a hybrid modeling framework for Clostridium saccharoperbutylacetonicum that integrates mechanistic mass-balance equations with Gaussian Processes (GPs) aiming to describe the biobutanol formation rate. Here, we investigate the effect of data normalization techniques on hybrid model's prediction capabilities comparing min-max normalization, z-score normalization, and no transformation. For each data treatment strategy, 8, 000 hybrid models were trained using experimental fermentation datasets, and ensemble predictors were constructed to mitigate variability and enhance generalization. Results demonstrate that hybrid models significantly outperform parametric baselines, with the ensemble approach leveling the performance differences across all pretreatment techniques. Ensembles achieved validation RMSE reductions of approximately 10.1% to 10.4%, which are two times higher than the best individual models regardless of the scaling method used. Despite similar RMSE reductions, prediction envelopes varied along data transformations revealing tradeoffs between uncertainty and coverage. Notably, targeted hybridization of the butanol reaction rate propagated performance improvements to other states, such as glucose and biomass. These findings suggest that for hybrid modeling, ensemble design provides greater benefits for robustness than any data normalization technique.
Keywords
Biofuels, Butanol, Ensemble Learning, Hybrid Modeling, Industry 4.0
Suggested Citation
Herrera-Ruiz JF, Robles-Rodriguez CE, Aceves-Lara CA, Fontalvo J, Prado-Rubio OA. Data Transformation Techniques and its Influence in Hybrid Model Performance. Systems and Control Transactions 5:964-971 (2026) https://doi.org/10.69997/sct.184765
Author Affiliations
Herrera-Ruiz JF: Grupo de Investigación en Aplicación de Nuevas Tecnologías (GIANT) Departamento de Ingeniería Química, Universidad Nacional de Colombia sede Manizales, Campus La Nubia, Manizales, 170003, Colombia [ORCID]
Robles-Rodriguez CE: Toulouse Biotechnology Institute, Toulouse, France [ORCID]
Aceves-Lara CA: Toulouse Biotechnology Institute, Toulouse, France [ORCID]
Fontalvo J: Grupo de Investigación en Aplicación de Nuevas Tecnologías (GIANT) Departamento de Ingeniería Química, Universidad Nacional de Colombia sede Manizales, Campus La Nubia, Manizales, 170003, Colombia [ORCID]
Prado-Rubio OA: Grupo de Investigación en Aplicación de Nuevas Tecnologías (GIANT) Departamento de Ingeniería Química, Universidad Nacional de Colombia sede Manizales, Campus La Nubia, Manizales, 170003, Colombia [ORCID]
[Login] to see author email addresses.
Journal Name
Systems and Control Transactions
Volume
5
First Page
964
Last Page
971
Year
2026
Publication Date
2026-06-12
Version Comments
Original Submission
Other Meta
PII: 0964-0971-95-SCT-5-2026, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2026.0323
This Record
External Link

https://doi.org/10.69997/sct.184765
Publisher Version
Download
Files
Jun 12, 2026
Main Article
License
CC BY-SA 4.0
Meta
Record Statistics
Record Views
32
Version History
[v1] (Original Submission)
Jun 12, 2026
 
Verified by curator on
Jun 12, 2026
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2026.0323
 
Record Owner
PSE Press
Links to Related Works
Directly Related to This Work
Publisher Version
References Cited
  1. Schweidtmann AM, Zhang D, von Stosch M. A review and perspective on hybrid modeling methodologies. Digital Chemical Engineering 10:100136 (2024) https://doi.org/10.1016/j.dche.2023.100136
  2. Gargalo CL, Malanca AA, Aouichaoui ARN, Huusom JK, Gernaey KV. Navigating industry 4.0 and 5.0: the role of hybrid modelling in (bio)chemical engineering's digital transition. Front. Chem. Eng. 6: (2024) https://doi.org/10.3389/fceng.2024.1494244
  3. de Azevedo CR, Díaz VG, Prado?Rubio OA, Willis MJ, Préat V, Oliveira R, von Stosch M. Hybrid semiparametric modeling: a modular process systems engineering approach for the integration of available knowledge sources. Systems Engineering in the Fourth Industrial Revolution :345-373 (2020) https://doi.org/10.1002/9781119513957.ch14
  4. Maharana K, Mondal S, Nemade B. A review: data pre-processing and data augmentation techniques. Global Transitions Proceedings 3:91-99 (2022) https://doi.org/10.1016/j.gltp.2022.04.020
  5. Singh D, Singh B. Investigating the impact of data normalization on classification performance. Applied Soft Computing 97:105524 (2020) https://doi.org/10.1016/j.asoc.2019.105524
  6. Li J, Brenner M, Pierides I, Wessner B, Franzke B, Strasser EM, Waldherr S, Wagner KH, Weckwerth W. Machine learning and data-driven inverse modeling of metabolomics unveil key processes of active aging. npj Syst Biol Appl 11: (2025) https://doi.org/10.1038/s41540-025-00580-4
  7. van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7: (2006) https://doi.org/10.1186/1471-2164-7-142
  8. Craig A, Cloarec O, Holmes E, Nicholson JK, Lindon JC. Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78:2262-2267 (2006) https://doi.org/10.1021/ac0519312
  9. Herrera?Ruiz JF, Fontalvo J, Prado?Rubio OA. Hybrid modeling for bioprocesses: architectures, applications, and perspectives. Engineering Reports 7: (2025) https://doi.org/10.1002/eng2.70502
  10. Malik MAI, Usman M, Waqas Rafique M, Raza S, Saleem MW, Abbas N, Sajjad U, Hamid K, Rezaul Karim M, Abul Kalam M. Managing energy transition alongside environmental protection by making use of ai-led butanol powered SI engine optimization in compliance with sdgs. Heliyon 10:e29698 (2024) https://doi.org/10.1016/j.heliyon.2024.e29698
  11. Shinto H, Tashiro Y, Yamashita M, Kobayashi G, Sekiguchi T, Hanai T, Kuriya Y, Okamoto M, Sonomoto K. Kinetic modeling and sensitivity analysis of acetone-butanol-ethanol production. Journal of Biotechnology 131:45-56 (2007) https://doi.org/10.1016/j.jbiotec.2007.05.005
  12. Sánchez-Rendón JC, Matallana LG, Morales-Rodriguez R, Prado-Rubio OA. Enhanced kinetic model parameters for xylitol bioproduction from Candida mogii ATCC 18364, 2024, p. 2497-502 https://doi.org/10.1016/B978-0-443-28824-1.50417-8
  13. Herrera-Ruiz JF, Fontalvo J, Prado-Rubio OA. Hybrid model development for succinic acid fermentation: relevance of ensemble learning for enhancing model prediction. Systems and Control Transactions 4:1896-1901 (2025) https://doi.org/10.69997/sct.153338
  14. Pinheiro JMH, Oliveira SVB, Silva THS, Saraiva PAR, Souza EF, Godoy RV, Ambrosio LA, Becker M. The impact of feature scaling in machine learning: effects on regression and classification tasks. IEEE Access 13:199903-199931 (2025) https://doi.org/10.1109/access.2025.3635541
(0.09 seconds)

[0.09 s]