LAPSE:2025.0342
Published Article

LAPSE:2025.0342
A Subset Selection Strategy for Gaussian Process Q-Learning of Process Optimization and Control
June 27, 2025
Abstract
This work addresses a practical challenge in batch process optimization: the need for sample efficient learning methods due to the high cost and time-intensive nature of running physical batch processes. While reinforcement learning (RL) offers a promising framework for optimizing batch processes, traditional approaches require numerous experimental runs to converge to optimal policies. A novel sample efficient RL method that leverages Gaussian Processes (GPs) to accelerate learning from limited batch data is proposed. However, the direct application of GPs becomes computationally intractable as data accumulates batch-to-batch, and their performance degrades when training distributions shift during policy improvement. To address these challenges, an integrated framework that combines Q-learning with GPs was developed and a strategic subset selection mechanism using determinantal point processes is introduced to maintain computational efficiency while preserving diverse, high-performing samples. The method exploits problem structure and backward induction to further maximize sample efficiency and incorporates both aleatoric and epistemic uncertainty for robust policy improvements. The approach is demonstrated on a non-isothermal semi-batch reactor case study, showing significantly improved learning efficiency compared to exact GP strategies while maintaining computational tractability. The results highlight the method's practical applicability to industrial batch process optimization where experimental data is limited and costly to obtain.
This work addresses a practical challenge in batch process optimization: the need for sample efficient learning methods due to the high cost and time-intensive nature of running physical batch processes. While reinforcement learning (RL) offers a promising framework for optimizing batch processes, traditional approaches require numerous experimental runs to converge to optimal policies. A novel sample efficient RL method that leverages Gaussian Processes (GPs) to accelerate learning from limited batch data is proposed. However, the direct application of GPs becomes computationally intractable as data accumulates batch-to-batch, and their performance degrades when training distributions shift during policy improvement. To address these challenges, an integrated framework that combines Q-learning with GPs was developed and a strategic subset selection mechanism using determinantal point processes is introduced to maintain computational efficiency while preserving diverse, high-performing samples. The method exploits problem structure and backward induction to further maximize sample efficiency and incorporates both aleatoric and epistemic uncertainty for robust policy improvements. The approach is demonstrated on a non-isothermal semi-batch reactor case study, showing significantly improved learning efficiency compared to exact GP strategies while maintaining computational tractability. The results highlight the method's practical applicability to industrial batch process optimization where experimental data is limited and costly to obtain.
Record ID
Keywords
Batch Process Control, Gaussian Processes, Reinforcement Learning
Subject
Suggested Citation
Bloor M, Savage T, Tsay C, Chanona EADR, Mowbray M. A Subset Selection Strategy for Gaussian Process Q-Learning of Process Optimization and Control. Systems and Control Transactions 4:1181-1186 (2025) https://doi.org/10.69997/sct.126649
Author Affiliations
Bloor M: Imperial College London, Department of Chemical Engineering, United Kingdom
Savage T: Imperial College London, Department of Chemical Engineering, United Kingdom
Tsay C: Imperial College London, Department of Computing, United Kingdom
Chanona EADR: Imperial College London, Department of Chemical Engineering, United Kingdom
Mowbray M: Imperial College London, Department of Chemical Engineering, United Kingdom
Savage T: Imperial College London, Department of Chemical Engineering, United Kingdom
Tsay C: Imperial College London, Department of Computing, United Kingdom
Chanona EADR: Imperial College London, Department of Chemical Engineering, United Kingdom
Mowbray M: Imperial College London, Department of Chemical Engineering, United Kingdom
Journal Name
Systems and Control Transactions
Volume
4
First Page
1181
Last Page
1186
Year
2025
Publication Date
2025-07-01
Version Comments
Original Submission
Other Meta
PII: 1181-1186-1601-SCT-4-2025, Publication Type: Journal Article
Record Map
Published Article

LAPSE:2025.0342
This Record
External Link

https://doi.org/10.69997/sct.126649
Article DOI
Download
Meta
Record Statistics
Record Views
1121
Version History
[v1] (Original Submission)
Jun 27, 2025
Verified by curator on
Jun 27, 2025
This Version Number
v1
Citations
Most Recent
This Version
URL Here
https://psecommunity.org/LAPSE:2025.0342
Record Owner
PSE Press
Links to Related Works
References Cited
- A. Mesbah, 'Stochastic Model Predictive Control: An Overview and Perspectives for Future Research', IEEE Control Syst., vol. 36, no. 6, pp. 30-44, Dec. 2016, https://doi.org/10.1109/MCS.2016.2602087
- E. Bradford, L. Imsland, D. Zhang, and E. A. Del Rio Chanona, 'Stochastic data-driven model predictive control using gaussian processes', Comput. Chem. Eng., vol. 139, p. 106844, Aug. 2020, https://doi.org/10.1016/j.compchemeng.2020.106844
- P. Petsagkourakis, I. O. Sandoval, E. Bradford, F. Galvanin, D. Zhang, and E. A. D. Rio-Chanona, 'Chance constrained policy optimization for process control and optimization', J. Process Control, vol. 111, pp. 35-45, Mar. 2022, https://doi.org/10.1016/j.jprocont.2022.01.003
- M. Mowbray, P. Petsagkourakis, E. A. Del Rio-Chanona, and D. Zhang, 'Safe chance constrained reinforcement learning for batch process control', Comput. Chem. Eng., vol. 157, p. 107630, Jan. 2022, https://doi.org/10.1016/j.compchemeng.2021.107630
- H.-E. Byun, B. Kim, and J. H. Lee, 'Multi-step lookahead Bayesian optimization with active learning using reinforcement learning and its application to data-driven batch-to-batch optimization', Comput. Chem. Eng., vol. 167, p. 107987, Nov. 2022, https://doi.org/10.1016/j.compchemeng.2022.107987
- J. Kocijan, R. Murray-Smith, C. E. Rasmussen, and A. Girard, 'Gaussian process model based predictive control', in Proceedings of the 2004 American Control Conference, Boston, MA, USA: IEEE, 2004. https://doi.org/10.23919/ACC.2004.1383790
- Y. Engel, S. Mannor, and R. Meir, 'Reinforcement learning with Gaussian processes', in Proceedings of the 22nd international conference on Machine learning - ICML '05, Bonn, Germany: ACM Press, 2005, pp. 201-208. https://doi.org/10.1145/1102351.1102377
- K. Azizzadenesheli and A. Anandkumar, 'Efficient Exploration through Bayesian Deep Q-Networks', Sep. 06, 2019, arXiv: arXiv:1802.04412.
- J. T. Wilson, V. Borovitskiy, A. Terenin, P. Mostowsky, and M. P. Deisenroth, 'Efficiently Sampling Functions from Gaussian Process Posteriors', Aug. 16, 2020, arXiv: arXiv:2002.09309.
- A. Kulesza and B. Taskar, 'Determinantal point processes for machine learning', Found. Trends® Mach. Learn., vol. 5, no. 2-3, pp. 123-286, 2012, https://doi.org/10.1561/2200000044
- E. Bradford and L. Imsland, 'Economic Stochastic Model Predictive Control Using the Unscented Kalman Filter', IFAC-Pap., vol. 51, no. 18, pp. 417-422, 2018, https://doi.org/10.1016/j.ifacol.2018.09.336
- M. Bloor et al., 'PC-Gym: Benchmark Environments For Process Control Problems', Dec. 05, 2024, arXiv: arXiv:2410.22093.
- T. Savage, D. Zhang, M. Mowbray, and E. A. D. Río Chanona, 'Model-free safe reinforcement learning for chemical processes using Gaussian processes', IFAC-Pap., vol. 54, no. 3, pp. 504-509, 2021, https://doi.org/10.1016/j.ifacol.2021.08.292
(0.08 seconds)
[0.08 s]

