Why Agent-Based Models?
Predicting how genetically engineered P. fluorescens behaves in soil is complicated due to intricate microbial interactions and fluctuating environments. This is not an easy problem to solve for deterministic models, as the actions of a single bacteria/colony might lead to unknown collective behaviors. With several simple rules for bacteria to follow, agent-based models can explore complex biological systems on multiple levels [1], including entity behaviors, microbial interactions and environmental effects (Figure 1).

Figure 1. General scheme of agent-based modelling. Inputs from internal or external factors and stochastic choices made by agents eventually form the outputs of the model.
Aim of the Model
To predict more precisely the behaviors of our genetically engineered P. fluorescens in the soil, we have built two sub-models: a Microbial Community model to simulate how our bacteria move and colonise on the root; and a Cell Level model to explore how the protein secretion system would work in cells with genetic circuits. Addtionally, we intertwined two sub-models to explore cell-level effects on antiflorigen protein secretion across populations, in pursuit of the ideal product application. Eventually, we are trying to give suggestions for both experimental design and product development. Here, we are using NetLogo [2] as an agent-based modeling platform to build our model and conduct simulations; this also allows us to even track the spatial movements of the entities during the simulations.
What We Learned?
From our agent-based models, we got new insights into the behaviours of our bacteria in the soil. This helps us in both experiments and product development. Moreover, we also integrated the results of our agent-based models to the Applied Design, giving suggestions for the product design and precisely application methods. Furthermore, through the analysis of large amounts of data from the agent-based models, we successfully captured a series of real-life biological patterns. This proves that agent-based modelling frameworks can be potentially powerful tools for the complex biological systems.
Microbial Community Model
In the Microbial Community model, we explore the bacteria behaviours at population level in the soil. The agents in this sub-model are natural P. fluorescens and our modified P. fluorescens. Those agents will follow several simple rules (see below) to act in a 20 x 20 cm closed field in the soil. The output of this sub-model is colonisation efficiency of our modified P. fluorescens on roots (i.e., the percentage of population of our bacteria on the root).
You can also enjoy the non-embedded Microbial Community model here in a separate page.
For smaller screens this content is not embedded. Instead go and enjoy the microbial community model here.Model Validation

Figure 2. Spatial distribution of colonisation of modified P. fluorescens on roots. (a) Colonisation distribution of our bacteria on roots with smaller root size. (b) Colonisation distribution of our bacteria on roots with bigger root size. (c) Part of experimental results from previous research on root colonisation of P. fluorescens [11]. (d) Our experimental measurement of spatial distribution of colonisation of modified P. fluorescens on the root of Arabidopsis thaliana (Discover more in our WetLab page).
We modelled the spatial distribution of the modified P. fluorescens on the root by adding the population of our bacteria for each root region from 30 simulations; the darker colours are where more colonies are likely to be found (Figure 2a and 2b). Given the comparable root lengths between wheat and cherry trees, we utilized wheat data for a comprehensive comparison (Figure 2c). We can observe that our bacteria will colonise mostly in the first 6 cm underground on root, and the colonisations are showing a decreasing gradient from top layer to deeper layer of the roots in the soil (Figure 2a and 2b), as shown in the wheat root (Figure 2c). Moreover, we experimentally measured the colonisation patterns of our bacteria on the root of Arabidopsis thaliana. Our model's findings corresponded with the observed data from the wet lab (Figure 2c). This means the model can capture the essential aspects of the real-world system.
Colonisation Dynamics

Figure 3. Example plots of colonisation dynamics of P. fluorescens on the root. 4 graphs show 4 different parameter conditions to the system. (a) Water-content-of-soil: 14%; root-cc: 200000; competition-death-rate: 0.014 /h; max-cell-growth-rate: 0.4 /h. (b) Water-content-of-soil: 30%; root-cc: 200000; competition-death-rate: 0.014 /h; max-cell-growth-rate: 0.5 /h. (c) Water-content-of-soil: 22%; root-cc: 400000; competition-death-rate: 0.25 /h; max-cell-growth-rate: 0.5 /h. (d) Water-content-of-soil: 22%; root-cc: 500000; competition-death-rate: 0.014 /h; max-cell-growth-rate: 0.5 /h. The blue line shows the population dynamics of the natural P. fluorescens with error bars. The orange line shows the population dynamics of the modified P. fluorescens with error bars. The red line is the standard line of 50% of colonisation efficiency for comparing the dynamic patterns.
We conducted a parameter sweep, exploring combinations of parameter values through an extensive series of simulations. Here we show four example plots from different parameter conditions with 30 repetitions for each condition, which leads to different colonisation dynamic patterns (Figure 3). Within the simulation time, the population of natural P. fluorescens and modified P. fluorescens will reach steady states, or constant levels, after around four to nine days. This means if we want to apply our product to the orchard, it would be better to apply it at least ten days earlier to make sure our product works for the plants (see Model Suggestions below).
Predictability of the Microbial Community Model

Figure 4. Random Forest regression for Microbial Community model. The blue dots are the output values (i.e., colonisation efficiency) from both trained data and predicted data. The red line is the regression line of the data points.
Subsequently, we used random forest regression to test the predictability of the Microbial Community model. Despite being a stochastic model, the mean squared error is low and the R-squared score is over 0.99 (Figure 4), which means the behavior of our bacteria would be relatively controlable at the population level in the soil.
Cell Level Model

Figure 5. General scheme of the genetic circuits.
The biosensor system comprises two genetic circuits: one activated by a specific root exudate – salicylic acid (click here to find out the details of screening of root exudates in weblab page) and the other by N-Acyl-l-homoserine lactone (AHL) (Figure 5). AHL is continually produced but functional to the cells only above a certain concentration threshold. Consequently, the target protein is synthesized only in the presence of salicylic acid and a sufficiently bacterial population.
In the Cell Level model, we simplify several genetic circuits into only one, and the production of trigger RNAs and AHL is expressed in simpler ways. In this sub-model, we assume that AHL concentration is always high enough to activate the protein secretion. The agents of this sub-model are the molecules in the cell (e.g., RNA polymerase, proteins, etc.). The setting of the strength of promoter, ribosome binding site, and protein degradation rate are mostly set to default following an example model in NetLogo [12]. This sub-model will follow several rules (see below) to perform transcription, translation, etc., exploring the protein secretion of cells with artificial genetic circuits.
You can also enjoy the non-embedded Cell Level model here in a separate page.
For smaller screens this content is not embedded. Instead go and enjoy the cell level model here.Model Validation

Figure 6. Comparison of Cell Level model’s results and lab data. “R + AHL” represents the condition that both root exudate salicylic acid and AHL occur in the system. “Leaky Expression” represents the condition that the target gene is not activated in the system. The blue and red lines are experimental measured data. The orange and black lines are model simulations with error bars.
We compared our model with our own generated lab data; the model’s results were collected after 30 times repetitions of simulation (orange and black regions) and shows that the results from the Cell Level model fits the data generated from lab (blue and red lines) (Figure 6). This means the Cell Level model can explain the protein secretion system in the cells.
Predictability of the Cell Level Model

Figure 7. Random Forest regression for Cell Level model. The blue dots are the output values from both trained data and predicted data. The red line is the regression line of the data points.
Unlike the Microbial Communities model, the Cell Level model shows more variability due to random movements of molecules in the cells, making the protein secretion more unpredictable based on the parameters. This is reflected by the larger mean squared error and lower R-squared value (Figure 7). We hypothesise that this is because the dynamics of protein production could be different even though the basic trend of protein secretion is like the real data.
Model Interaction
We used a built-in tool LevelSpace [13] in NetLogo to conduct model interactions. The output of the interactive model is the total protein secretion level in the system.