Model | WageningenUR

Model Overview

Models are essential tools for hypothesis testing, data interpretation, and gaining insights into various biological processes. In order to demonstrate the complex behavior of our engineered strain of Pseudomonas fluorescens in the soil and its interactions with plants, we built three models. The agent-based models and the genetic circuit model using ordinary differential equations (ODE) were made to deeply explore the growth and metabolism of our engineered P. fluorescens at both population and cell levels. A machine learning model was developed to predict plant responses to weather changes to assure a more precise application of our product.

Why Agent-Based Models?

Predicting how genetically engineered P. fluorescens behaves in soil is complicated due to intricate microbial interactions and fluctuating environments. This is not an easy problem to solve for deterministic models, as the actions of a single bacteria/colony might lead to unknown collective behaviors. With several simple rules for bacteria to follow, agent-based models can explore complex biological systems on multiple levels [1], including entity behaviors, microbial interactions and environmental effects (Figure 1).

Figure 1. General scheme of agent-based modelling. Inputs from internal or external factors and stochastic choices made by agents eventually form the outputs of the model.

Aim of the Model

To predict more precisely the behaviors of our genetically engineered P. fluorescens in the soil, we have built two sub-models: a Microbial Community model to simulate how our bacteria move and colonise on the root; and a Cell Level model to explore how the protein secretion system would work in cells with genetic circuits. Addtionally, we intertwined two sub-models to explore cell-level effects on antiflorigen protein secretion across populations, in pursuit of the ideal product application. Eventually, we are trying to give suggestions for both experimental design and product development. Here, we are using NetLogo [2] as an agent-based modeling platform to build our model and conduct simulations; this also allows us to even track the spatial movements of the entities during the simulations.

What We Learned?

From our agent-based models, we got new insights into the behaviours of our bacteria in the soil. This helps us in both experiments and product development. Moreover, we also integrated the results of our agent-based models to the Applied Design, giving suggestions for the product design and precisely application methods. Furthermore, through the analysis of large amounts of data from the agent-based models, we successfully captured a series of real-life biological patterns. This proves that agent-based modelling frameworks can be potentially powerful tools for the complex biological systems.

Microbial Community Model

In the Microbial Community model, we explore the bacteria behaviours at population level in the soil. The agents in this sub-model are natural P. fluorescens and our modified P. fluorescens. Those agents will follow several simple rules (see below) to act in a 20 x 20 cm closed field in the soil. The output of this sub-model is colonisation efficiency of our modified P. fluorescens on roots (i.e., the percentage of population of our bacteria on the root).

Rules and kinetic models for Microbial Community model

Rules:

1. The soil contains a carbon source [3,4], which can be used by bacteria in the soil by the Monod equation.

2. The bacteria slowly move with water and are attracted by root exudates [5], where the root exudates have a horizontal concentration gradient from the root region to the proximate soil field [6].

3. The changing of water content of the soil will change the death rate of bacteria in the soil [7,8].

4. An attachment region (gray color) will be set for bacteria to determine whether they can colonise on the root or not, based on whether the root surface is fully occupied or not.

5. When the colonies of natural P. fluorescens and modified P. fluorescens meet close enough, they will start to compete through type VI secretion systems [9].

6. Plant roots constitutively secrete root exudates, which can be used by bacteria. Thus, we assume the concentration of carbon source is constant and big enough. The behavior of bacteria will not be affected by carbon sources when they colonise on the root.

Kinetic models:

Carbon source flux in the soil:

$katex$\dfrac{dS}{dt} = k- \dfrac{k_{s}S}{1+S} $katex$

where k is the carbon source accumulation rate in the simulated field, calculated from annual carbon source accumulation in the soil (Midwood et al., 2021). ks is carbon source uptake rate of P. fluorescens. S represents the concentration of carbon source.

Bacteria population growth:

$katex$\dfrac{dN}{dt} = \mu (1-\dfrac{N}{K})N - k_{d}N $katex$

where N is the population of bacteria. μ is the cell growth rate. K is carrying capacity of cell populations, which means the largest number of populations that a colony can reach. kd is death rate of bacteria in the soil.

Bacteria growth rate:

$katex$\mu = \mu_{max} (\dfrac{\phantom{bb}S}{1+S})$katex$

where μmax is the maximum cell growth rate of the P. fluorescens.

Microbial competition:

$katex$\dfrac{dNP}{\phantom{b}dt}=-updating\_competition\_deathrate \cdot NP$katex$

$katex$\dfrac{dMP}{\phantom{b}dt}=-normal\_competition\_deathrate \cdot MP$katex$

where NP is the population of natural P. fluorescens, while MP is the population of our genetically modified P. fluorescens.

You can also enjoy the non-embedded Microbial Community model here in a separate page.

Run the Microbial Community model below!

For smaller screens this content is not embedded. Instead go and enjoy the microbial community model here.

Model Validation

graph modified P. fluorescens colonisation distribution on roots

Figure 2. Spatial distribution of colonisation of modified P. fluorescens on roots. (a) Colonisation distribution of our bacteria on roots with smaller root size. (b) Colonisation distribution of our bacteria on roots with bigger root size. (c) Part of experimental results from previous research on root colonisation of P. fluorescens [11]. (d) Our experimental measurement of spatial distribution of colonisation of modified P. fluorescens on the root of Arabidopsis thaliana (Discover more in our WetLab page).

We modelled the spatial distribution of the modified P. fluorescens on the root by adding the population of our bacteria for each root region from 30 simulations; the darker colours are where more colonies are likely to be found (Figure 2a and 2b). Given the comparable root lengths between wheat and cherry trees, we utilized wheat data for a comprehensive comparison (Figure 2c). We can observe that our bacteria will colonise mostly in the first 6 cm underground on root, and the colonisations are showing a decreasing gradient from top layer to deeper layer of the roots in the soil (Figure 2a and 2b), as shown in the wheat root (Figure 2c). Moreover, we experimentally measured the colonisation patterns of our bacteria on the root of Arabidopsis thaliana. Our model's findings corresponded with the observed data from the wet lab (Figure 2c). This means the model can capture the essential aspects of the real-world system.

Colonisation Dynamics

Figure 3. Example plots of colonisation dynamics of P. fluorescens on the root. 4 graphs show 4 different parameter conditions to the system. (a) Water-content-of-soil: 14%; root-cc: 200000; competition-death-rate: 0.014 /h; max-cell-growth-rate: 0.4 /h. (b) Water-content-of-soil: 30%; root-cc: 200000; competition-death-rate: 0.014 /h; max-cell-growth-rate: 0.5 /h. (c) Water-content-of-soil: 22%; root-cc: 400000; competition-death-rate: 0.25 /h; max-cell-growth-rate: 0.5 /h. (d) Water-content-of-soil: 22%; root-cc: 500000; competition-death-rate: 0.014 /h; max-cell-growth-rate: 0.5 /h. The blue line shows the population dynamics of the natural P. fluorescens with error bars. The orange line shows the population dynamics of the modified P. fluorescens with error bars. The red line is the standard line of 50% of colonisation efficiency for comparing the dynamic patterns.

We conducted a parameter sweep, exploring combinations of parameter values through an extensive series of simulations. Here we show four example plots from different parameter conditions with 30 repetitions for each condition, which leads to different colonisation dynamic patterns (Figure 3). Within the simulation time, the population of natural P. fluorescens and modified P. fluorescens will reach steady states, or constant levels, after around four to nine days. This means if we want to apply our product to the orchard, it would be better to apply it at least ten days earlier to make sure our product works for the plants (see Model Suggestions below).

Predictability of the Microbial Community Model

graph Random Forest regression for Microbial community model

Figure 4. Random Forest regression for Microbial Community model. The blue dots are the output values (i.e., colonisation efficiency) from both trained data and predicted data. The red line is the regression line of the data points.

Subsequently, we used random forest regression to test the predictability of the Microbial Community model. Despite being a stochastic model, the mean squared error is low and the R-squared score is over 0.99 (Figure 4), which means the behavior of our bacteria would be relatively controlable at the population level in the soil.

Cell Level Model

graph inducer systems of modified P. fluorescens

Figure 5. General scheme of the genetic circuits.

The biosensor system comprises two genetic circuits: one activated by a specific root exudate – salicylic acid (click here to find out the details of screening of root exudates in weblab page) and the other by N-Acyl-l-homoserine lactone (AHL) (Figure 5). AHL is continually produced but functional to the cells only above a certain concentration threshold. Consequently, the target protein is synthesized only in the presence of salicylic acid and a sufficiently bacterial population.

In the Cell Level model, we simplify several genetic circuits into only one, and the production of trigger RNAs and AHL is expressed in simpler ways. In this sub-model, we assume that AHL concentration is always high enough to activate the protein secretion. The agents of this sub-model are the molecules in the cell (e.g., RNA polymerase, proteins, etc.). The setting of the strength of promoter, ribosome binding site, and protein degradation rate are mostly set to default following an example model in NetLogo [12]. This sub-model will follow several rules (see below) to perform transcription, translation, etc., exploring the protein secretion of cells with artificial genetic circuits.

Rules for Cell Level model:

1. The promoter strength will affect the ability of attracting RNA polymerase to the DNA.

2. The ribosome binding site strength will affect the number of proteins that each transcription and translation can produce.

3. Only when trigger RNA binds to the RNA switch, the DNA will start transcription and translation.

4. When target gene is not activated, there will be a leaky expression of the desired protein.

You can also enjoy the non-embedded Cell Level model here in a separate page.

Run the Cell Level model below!.

For smaller screens this content is not embedded. Instead go and enjoy the cell level model here.

Model Validation

Figure 6. Comparison of Cell Level model’s results and lab data. “R + AHL” represents the condition that both root exudate salicylic acid and AHL occur in the system. “Leaky Expression” represents the condition that the target gene is not activated in the system. The blue and red lines are experimental measured data. The orange and black lines are model simulations with error bars.

We compared our model with our own generated lab data; the model’s results were collected after 30 times repetitions of simulation (orange and black regions) and shows that the results from the Cell Level model fits the data generated from lab (blue and red lines) (Figure 6). This means the Cell Level model can explain the protein secretion system in the cells.

Predictability of the Cell Level Model

graph Random Forest regression for Cell Level model

Figure 7. Random Forest regression for Cell Level model. The blue dots are the output values from both trained data and predicted data. The red line is the regression line of the data points.

Unlike the Microbial Communities model, the Cell Level model shows more variability due to random movements of molecules in the cells, making the protein secretion more unpredictable based on the parameters. This is reflected by the larger mean squared error and lower R-squared value (Figure 7). We hypothesise that this is because the dynamics of protein production could be different even though the basic trend of protein secretion is like the real data.

Model Interaction

We used a built-in tool LevelSpace [13] in NetLogo to conduct model interactions. The output of the interactive model is the total protein secretion level in the system.

Rule and kinetic model for the interactive model:

Rule:

The cell level model will be activated only when the AHL concentration for colonies are above the activation threshold [14].

Kinetic model:

AHL production model follows the equation from previous research [15]:

$katex$\dfrac{dA}{\phantom{.}dt}= - \gamma A + \dfrac{\alpha + \beta A^{n}}{A_{threshold}+A^{n}}\cdot{} N(t)$katex$

where α is constitutive production rate of AHL. β is induced production rate of AHL. n is the degree of polymerization. Athreshold represents the threshold of AHL between low and increased activity. γ is the degradation rate of AHL. N(t) is the population of the bacteria, changing with time.

Watch the video to see how the interactive model works:

Video 1: An example simulation of the interactive model. The built-in tool - LevelSpace does not support to run in the browers. So, here we show a video of the model interaction.

If the video is not loading, watch it here.

Predictability of the Interactive Model

graph Random Forest regression for the interactive model

Figure 8. Random Forest regression of the interactive model. The blue dots are the output values from both trained data and predicted data. The red line is the regression line of the data points.

We integrated the protein secretion level of individual cells with the population of colonised genetically modified P. fluorescens, thereby encompassing both colonisation efficiency and protein secretion magnitude. Perturbations from the Cell Level model affects the predictability of the interactive model, which has R-squared score of 0.46 (Figure 8) compared to 0.99 (Figure 4) before. Most of the errors likely come from stochasticity of the sub-models.

Sensitivity Analysis

Sensitivity analysis is crucial as it reveals the impact of parameter variations on mathematical models. It identifies key parameters, guiding experimental focus and resource allocation. By pinpointing influential factors, it enhances model accuracy, aiding in decision-making and advancing scientific understanding.

SHapley Additive exPlanations (SHAP)

SHAP is a data analysis method that explains how each feature impacts the model's output. It compares the model’s output for all possible variable combinations both with and without one of the variables [16]. Here, we use SHAP values to conduct sensitivity analysis at both population and cell levels.

graph Sensitivity analysis of ABM with SHAP

Figure 9. Sensitivity analysis for the agent-based model. (a) and (c) demonstrate the SHAP analysis of variables in population level and cell level, respectively. (b) and (d) demonstrate the feature importance of each variable by calculating the mean absolute SHAP values.

Root size (“root-cc”) is the most sensitive parameter in the interactive model. This is likely because bigger roots have larger carrying capacity for bacteria. The second most sensitive parameter is competition death rate, which significantly influences total protein secretion. Therefore, enhancing bacterial competition ability isn't necessary, allowing us to control costs effectively. We also discovered that properly increasing maximum cell growth rate and water content of soil would also increase the total protein secretion level (Figure 9a and 9b). Those findings provide deeper insights into product development and application strategies as will be discussed in model suggestions.

The protein degradation rate is the most sensitive parameter in the cell level model, which means that the stability of our target protein is vital for the protein secretion (Figure 9c and 9d). Our results suggest that it is advantageous to perform laboratory evolution for the enzyme of our bacteria to increase the stability of the desired protein in the future.

Model Suggestions

Using a reported standard curve of relationship between sGFP fluorescence and sGFP protein concentration [17], we replot a calibration curve to estimate the range of total protein concentration of the final system (Figure 10).

Figure 10. sGFP standard curve. The blue dots are the values of sGFP fluorescence related to the sGFP concentrations [17]. The orange line is linear regression curve of these data points.

Based on the data generated from different combinations of parameter values (Figure 11), the result shows that the range of total protein concentration after 2 weeks’ simulation can be at least 33.3 mg/ml, and up to 6510 mg/ml in optimized condition.

graph all data points of interactive model's output

Figure 11. Visualization of all data points of the interactive model’s output. The green dot is the model output with the highest protein secretion level. The red dot is the model output with the lowest protein secretion level. The orange and purple dots are the model outputs with fixed condition, except 14% water content of the soil and 30% water content of the soil respectively. The blue dots are other data points of the model outputs.

The introduction of florigen in Arabidopsis necessitates concentrations ranging from tens to hundreds of micromoles [18]. In contrast, the maintenance of flowering balance typically requires relatively small amounts of antiflorigen factors [19]. So, we provide far more than enough antiflorigen protein to restore the flowering of plants. Therefore, we can have a lower proportion of our bacteria in the product to control the cost.

The outcome demonstrates our product's effectiveness in controlling plant flowering. If necessary, enhancing total protein secretion can be achieved by improving bacterial growth or protein stability. Additionally, maintaining well-hydrated soil is advised for better product performance in agricultural applications. Although our model says we need to know temperature four days in advance, given we are overproducing antiflorigen protein, we believe that earlier application would also be beneficial.

Parameters	Value (range)	Unit	Description	Reference
root-cc	100000~500000	CFU/(g soil)	The largest population of bacteria that a root can contain (root carrying capacity).	(Soil Biology, n.d.)
max-cell-growth-rate (μ_max)	0.35 ~ 0.7	h-1	The maximum growth rate of bacteria.	(Caldwell & Lawrence, 1986)
updating-competition-death-rate	0.014 or 0.25	h-1	Death rate of natural P. fluorescens introduced by competition from modified P. fluorescens	(Decoin et al., 2015)
normal-competition-death-rate	0.014	h-1	Death rate of modified P. fluorescens introduced by competition from natural P. fluorescens	(Decoin et al., 2015)
water-content-of-soil	14 ~ 30	%	The percentage of water content in a field of soil	(Liao et al., 2019)
max-initial-population-per-colony	500 ~ 5000	CFU/(g soil)	The maximum population that one colony can have in the beginning. This will affect the initial total population of our bacteria in the soil	Assumption
ahl-gene-expression-level	1 ~ 5	-	The gene expression level of ahl gene	Assumption
promoter-strength	“weak”, “normal”, “strong”	-	The strength of promoter of the DNA	(Dabholkar et al., n.d.)
rbs-strength	“weak”, “normal”, “strong”	-	The strength of ribosome binding site of the DNA	(Dabholkar et al., n.d.)
trigger-rna-expression-level	“low”, “middle”, “high”	-	The expression level of trigger RNA	Assumption
trigger-rna-bond-leakage	0 ~ 1	-	The rate that the trigger RNA dissociates from the binding site (e.g., RNA switch)	(Dabholkar et al., n.d.)
target-protein-degradation-chance	0 ~ 1	-	The degradation chance of the secreted proteins	(Dabholkar et al., n.d.)
k	0.618	mM/h	Constitutive carbon source accumulation rate	(Midwood et al., 2021)
ks	8.8	h-1	Carbon source uptake rate of P. fluorescens	(Roca & Olsson, 2001)
kd	0.013 ~ 0.025	h-1	Cell death rate	(Liao et al., 2019)
α	2.3 x 10-19	mol per cell h-1	Constitutive production rate of AHL	(Fekete et al., 2010)
β	2.3 x 10-18	mol per cell h-1	Induced production rate of AHL	(Fekete et al., 2010)
n	2.5	-	Degree of polymerization	(Fekete et al., 2010)
A_threshold	70	nM	Threshold of AHL between low and increased activity	(Fekete et al., 2010)
γ	0.005545	h-1	Degradation rate of AHL	(Fekete et al., 2010)