Search | arXiv e-print repository

Data Augmentation Scheme for Raman Spectra with Highly Correlated Annotations

Authors: Christoph Lange, Isabel Thiele, Lara Santolin, Sebastian L. Riedel, Maxim Borisyak, Peter Neubauer, M. Nicolas Cruz Bournazou

Abstract: In biotechnology Raman Spectroscopy is rapidly gaining popularity as a process analytical technology (PAT) that measures cell densities, substrate- and product concentrations. As it records vibrational modes of molecules it provides that information non-invasively in a single spectrum. Typically, partial least squares (PLS) is the model of choice to infer information about variables of interest fr… ▽ More In biotechnology Raman Spectroscopy is rapidly gaining popularity as a process analytical technology (PAT) that measures cell densities, substrate- and product concentrations. As it records vibrational modes of molecules it provides that information non-invasively in a single spectrum. Typically, partial least squares (PLS) is the model of choice to infer information about variables of interest from the spectra. However, biological processes are known for their complexity where convolutional neural networks (CNN) present a powerful alternative. They can handle non-Gaussian noise and account for beam misalignment, pixel malfunctions or the presence of additional substances. However, they require a lot of data during model training, and they pick up non-linear dependencies in the process variables. In this work, we exploit the additive nature of spectra in order to generate additional data points from a given dataset that have statistically independent labels so that a network trained on such data exhibits low correlations between the model predictions. We show that training a CNN on these generated data points improves the performance on datasets where the annotations do not bear the same correlation as the dataset that was used for model training. This data augmentation technique enables us to reuse spectra as training data for new contexts that exhibit different correlations. The additional data allows for building a better and more robust model. This is of interest in scenarios where large amounts of historical data are available but are currently not used for model training. We demonstrate the capabilities of the proposed method using synthetic spectra of Ralstonia eutropha batch cultivations to monitor substrate, biomass and polyhydroxyalkanoate (PHA) biopolymer concentrations during of the experiments. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2312.03427 [pdf, other]

Latent State Space Extension for interpretable hybrid mechanistic models

Authors: Judit Aizpuru, Maxim Borisyak, Peter Neubauer, M. Nicolas Cruz Bournazou

Abstract: Mechanistic growth models play a major role in bioprocess engineering, design, and control. Their reasonable predictive power and their high level of interpretability make them an essential tool for computer aided engineering methods. Additionally, since they contain knowledge about cell physiology, the parameter estimates provide meaningful insights into the metabolism of the microorganism under… ▽ More Mechanistic growth models play a major role in bioprocess engineering, design, and control. Their reasonable predictive power and their high level of interpretability make them an essential tool for computer aided engineering methods. Additionally, since they contain knowledge about cell physiology, the parameter estimates provide meaningful insights into the metabolism of the microorganism under study. However, the assumption of time invariance of the model parameters is often violated in real experiments, limiting their capacity to fully explain the observed dynamics. In this work, we propose a framework for identifying such violations and producing insights into misspecified mechanisms. The framework achieves this by allowing kinetic and process parameters to vary in time. We demonstrate the framework's capabilities by fitting a hybrid model based on a simple mechanistic growth model for E. coli with data generated in-silico by a much more complex one and identifying missing kinetics. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.00847 [pdf, other]

Handling nonlinearities and uncertainties of fed-batch cultivations with difference of convex functions tube MPC

Authors: Niels Krausch, Martin Doff-Sotta, Mark Canon, Peter Neubauer, Mariano Nicolas Cruz Bournazou

Abstract: Bioprocesses are often characterized by nonlinear and uncertain dynamics. This poses particular challenges in the context of model predictive control (MPC). Several approaches have been proposed to solve this problem, such as robust or stochastic MPC, but they can be computationally expensive when the system is nonlinear. Recent advances in optimal control theory have shown that concepts from conv… ▽ More Bioprocesses are often characterized by nonlinear and uncertain dynamics. This poses particular challenges in the context of model predictive control (MPC). Several approaches have been proposed to solve this problem, such as robust or stochastic MPC, but they can be computationally expensive when the system is nonlinear. Recent advances in optimal control theory have shown that concepts from convex optimization, tube-based MPC, and difference of convex functions (DC) enable stable and robust online process control. The approach is based on systematic DC decompositions of the dynamics and successive linearizations around feasible trajectories. By convexity, the linearization errors can be bounded tightly and treated as bounded disturbances in a robust tube-based MPC framework. However, finding the DC composition can be a difficult task. To overcome this problem, we used a neural network with special convex structure to learn the dynamics in DC form and express the uncertainty sets using simplices to maximize the product formation rate of a cultivation with uncertain substrate concentration in the feed. The results show that this is a promising approach for computationally tractable data-driven robust MPC of bioprocesses. △ Less

Submitted 7 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: Corrected typos in equation

arXiv:2203.07211 [pdf, other]

Model predictive control and moving horizon estimation for adaptive optimal bolus feeding in high-throughput cultivation of \textit{E. coli}

Authors: Jong Woo Kim, Niels Krausch, Judit Aizpuru, Tilman Barz, Sergio Lucia, Peter Neubauer, Mariano Nicolas Cruz Bournazou

Abstract: We discuss the application of a nonlinear model predictive control (MPC) and a moving horizon estimation (MHE) to achieve an optimal operation of \textit{E. coli} fed-batch cultivations with intermittent bolus feeding. 24 parallel experiments were considered in a high-throughput microbioreactor platform at a 10 mL scale. The robotic island in question can run up to 48 fed-batch processes in parall… ▽ More We discuss the application of a nonlinear model predictive control (MPC) and a moving horizon estimation (MHE) to achieve an optimal operation of \textit{E. coli} fed-batch cultivations with intermittent bolus feeding. 24 parallel experiments were considered in a high-throughput microbioreactor platform at a 10 mL scale. The robotic island in question can run up to 48 fed-batch processes in parallel with automated liquid handling and online and at-line analytics. The implementation of the model-based monitoring and control framework reveals that there are mainly three challenges that need to be addressed; First, the inputs are given in an instantaneous pulsed form by bolus injections, second, online and at-line measurement frequencies are severely imbalanced, and third, optimization for the distinctive multiple reactors can be either parallelized or integrated. We address these challenges by incorporating the concept of impulsive control systems, formulating multi-rate MHE with identifiability analysis, and suggesting criteria for deciding the reactor configuration. In this study, we present the key elements and background theory of the implementation with \textit{in silico} simulations for bacterial fed-batch cultivation. △ Less

Submitted 6 February, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

arXiv:2112.13283 [pdf, other]

Fitting nonlinear models to continuous oxygen data with oscillatory signal variations via a loss based on DynamicTime Warping

Authors: Judit Aizpuru, Annina Karolin Kemmer, Jong Woo Kim, Stefan Born, Peter Neubauer, Mariano N. Cruz Bournazou, Tilman Barz

Abstract: High throughput experimental systems play an important role in bioprocess development, as they provide an efficient way of analysing different experimental conditions and perform strain discrimination in previous phases to the industrial scale production. In the millilitre scale, these systems are combinations of parallel mini-bioreactors, liquid handling robots and automated workflows for data ha… ▽ More High throughput experimental systems play an important role in bioprocess development, as they provide an efficient way of analysing different experimental conditions and perform strain discrimination in previous phases to the industrial scale production. In the millilitre scale, these systems are combinations of parallel mini-bioreactors, liquid handling robots and automated workflows for data handling and model based operation. For successfully monitoring cultivation conditions and improving the overall process quality by model-based approaches, a proper model identification is crucial. However, the quality and amount of measurements makes this task challenging considering the complexity of the bio-processes. TheDissolved Oxygen Tension is often the only measurement which is available online, and therefore, a good understanding of the errors in this signal is important for performing a robust estimation.Some of the expected errors will provoke uncertainties in the time-domain of the measurement, and in those cases, the common Weighted Least Squares estimation procedure can fail providing good results. Moreover, these errors will have even a larger effect in the fed-batch phase where bolus feeding is applied, as this generates fast dynamic responses in the signal. In the present work, an insilico study of the performance of Weighted Least Squares estimator is analysed when the expected time-uncertainties are present in the oxygen signal. As an alternative, a loss based on the Dynamic Time Warping measure is proposed. The results show how this latter procedure outperforms the former reconstructing the oxygen signal, and in addition, returns less biased parameter estimates. △ Less

Submitted 25 December, 2021; originally announced December 2021.

arXiv:2112.10548 [pdf, other]

Model predictive control guided with optimal experimental design for pulse-based parallel cultivation

Authors: Jong Woo Kim, Niels Krausch, Judit Aizpuru, Tilman Barz, Sergio Lucia, Ernesto C. Martínez, Peter Neubauer, Mariano Nicolas Cruz Bournazou

Abstract: Optimal experimental design for parameter precision attempts to maximize the information content in experimental data for a most effective identification of parametric model. With the recent developments in miniaturization and parallelization of cultivation platforms for high-throughput screening of optimal growth conditions massive amounts of informative data can be generated with few experiments… ▽ More Optimal experimental design for parameter precision attempts to maximize the information content in experimental data for a most effective identification of parametric model. With the recent developments in miniaturization and parallelization of cultivation platforms for high-throughput screening of optimal growth conditions massive amounts of informative data can be generated with few experiments. Increasing the quantity of the data means to increase the number of parameters and experimental design variables which might deteriorate the identifiability and hamper the online computation of optimal inputs. To reduce the problem complexity, in this work, we introduce an auxiliary controller at a lower level that tracks the optimal feeding strategy computed by a high-level optimizer in an online fashion. The hierarchical framework is especially interesting for the operation under constraints. The key aspect of this method are discussed together with an in silico study considering parallel glucose limited bacterial fed batch cultivations. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: 6 pages, 4 figures, submitted to IFAC Conference

arXiv:2011.13863 [pdf, other]

doi 10.1002/bit.27907

Knowledge transfer across cell lines using Hybrid Gaussian Process models with entity embedding vectors

Authors: Clemens Hutter, Moritz von Stosch, Mariano Nicolas Cruz Bournazou, Alessandro Butté

Abstract: To date, a large number of experiments are performed to develop a biochemical process. The generated data is used only once, to take decisions for development. Could we exploit data of already developed processes to make predictions for a novel process, we could significantly reduce the number of experiments needed. Processes for different products exhibit differences in behaviour, typically only… ▽ More To date, a large number of experiments are performed to develop a biochemical process. The generated data is used only once, to take decisions for development. Could we exploit data of already developed processes to make predictions for a novel process, we could significantly reduce the number of experiments needed. Processes for different products exhibit differences in behaviour, typically only a subset behave similar. Therefore, effective learning on multiple product spanning process data requires a sensible representation of the product identity. We propose to represent the product identity (a categorical feature) by embedding vectors that serve as input to a Gaussian Process regression model. We demonstrate how the embedding vectors can be learned from process data and show that they capture an interpretable notion of product similarity. The improvement in performance is compared to traditional one-hot encoding on a simulated cross product learning task. All in all, the proposed method could render possible significant reductions in wet-lab experiments. △ Less

Submitted 27 November, 2020; originally announced November 2020.

Comments: 14 pages, 5 figures

Showing 1–7 of 7 results for author: Bournazou, M N C