Introduction to modeling
CELLECT is a complex system with many interacting parts which presents a opportunity to describe it via a mathematical model.
This would help us in understanding the dynamics of the underlying mechanisms and even possibly predict the influence of varying parameters on the setup.
There was a continuous exchange of information between the experimental setup and the theoretical model. New data as well as observations from experiments were analyzed and processed in silico, gaining valuable information about the underlying processes.
We decided to pursue two distinct modeling approaches. On one hand we decided to track concentrations of the individual components and their dynamics over time, describing our system through ordinary differential equations (ODE). On the other hand we investigated the influence of a fusion between two BluB monomers on stability and production in Molecular Dynamics Simulations (MDS).
ODE-Modeling
Key Findings

Motivation
Many complex biological processes can be described by simple equations.
As computational processing power rises and the availability of high throughput methods such as single cell RNA sequencing increases, modeling is becoming more and more important in biology.
It opens up a whole new range of fields such as Computational Synthetic Biology which applies engineering design principles to create novel biological systems.
Biological systems are characterized by a multitude of interconnected components and complex interactions.
Modeling offers a powerful framework to describe and study the behavior of living organisms By simplifying the intricate complexities of biological systems into mathematical representations, we can gain valuable insights into their underlying mechanisms and make predictions about their behavior.
This helps us to design our system in a better way and in addition to that saves time and resources.
CELLECT describes a novel self-regulatory system with each part interacting with another part of the system. On the one hand we couple the production of AdoCbl to a toxic protein (MazF), and on the other hand, we have the neutralization of the toxin by an Antitoxin (MazE).
To model CELLECT, we utilized conceptual models (CM).
A conceptual model is a graphical representation of the components we want to track.
Complex reaction networks can be well described by coupled Ordinary Differential Equations (ODE) since they can capture the dynamic behavior over time and the integration of quantitative data [1].
ODEs in Biology
Biology relies on ODEs to study population dynamics, enzyme kinetics and the behavior of biochemical reactions within cells [2], helping us understand complex biological processes.
These equations establish a connection between a function and its derivatives, revealing how concentrations change over time.
Negative values signify a decrease, whereas positive values indicate an increase.
Differential equations are broadly categorized into two types: ordinary differential equations (ODEs) and partial differential equations (PDEs).
ODEs describe the behavior as a function of one independent variable while PDEs consider multiple independent variables.
Since in our case, only time is considered as an independent variable, we focus on ODEs.
During our modeling, we are focusing on microscopic processes, meaning we track the concentrations of intracellular molecules and proteins on the basis of one individual bacteria rather than looking on a populational level.
Yet we are not entirely limited to it, since these values could still represent the average of the population.
In its simplest form protein dynamics can be described with ODEs using a constant production rate $k_P$ and a degradation rate $\beta _P$.
The production rate describes how fast the protein is produced based on its influence by translation rate, length of the protein and multiple other factors.
The resulting equation is the protein concentration as a function of time.
The gene gets transcribed and then translated, resulting in a constant stream of new protein, we therefore assume the protein production rate to be a constant.
For this variable we are therefore assuming that the cell has unlimited resources which we provide as given in our model.
Likewise, the degradation rate describes how fast the protein will be degraded, which can be influenced by the exact amino acid composition, overall metabolism of the cell and multible other factors.
In contrast to the production rate, the degradation rate is dependent on the concentration of the protein.
Logically, the more protein present in the cell, the more protein will get degraded until the reaction reaches saturation.
Therefore,the degradation rate has to be multiplied with the current protein concentration $P$ resulting in the following equation:
\begin{equation}
\dot {P} = k_P - \beta _P P
\end{equation}
We would expect to see an initial phase of exponential growth followed by a plateau.
Limited exponential growth is common in biological systems,
and this is indeed backed up by plotts based on this equation in silico (Figure 1).
If we assume the starting concentration of the protein to be 0, there will be no protein degradation, but a constant increase of protein due to the production term of the equation.
This leads to linear growth in the beginning.
As more and more protein gets produced the degradation rate increases and results in a slower accumulation of protein, which eventually reaches equilibrium.
From this point and onwards, the production term and degradation term have the same value resulting in a net change of 0.
Following graph results:

However, more complex biological processes can also be described by implementing more terms in the ODE.
When looking at a typical chemical reaction, we consider the law of mass action generally expressed as follows:
\begin{equation}
aA + bB \rightleftharpoons cC +dD
\end{equation}
Lower case letters describe the stoichiometry factors for the educts $A$ and $B$ and the products $C$ and $D$.
The law of mass action describes how the probability of reactants colliding by chance increases when more of the educts are present.
This results in an exponential increase of reaction speed with increasing concentration.
When the reaction requires more than two reactants, the probability of the reactants colliding decreases exponentially as well.
This is described in the equation with an exponent.
Therefore we derive the equilibrium constant K as follows:
\begin{equation}
K= \frac{\text{[}A\text{]}^a\text{[}B\text{]}^b }{\text{[}C\text{]}^c\text{[}D\text{]}^d }
\end{equation}
However, to assume that $K$ is constant, we have to assume that temperature and pressure remain the same.
The bracketed $A$, $B$, $C$ and $D$ are the concentrations of each educt and product with their exponents describing the stoichiometry coefficients as seen in equation (2).
An equilibrium constant of $K < 1$ describes the situation where the reaction of the equation is favored towards the product, on the contrary an equilibrium constant $K>1$ means an equilibrium on the side of the educts.
This is the basis to model our system to better understand the different parts making up CELLECT.
Bioproduction
To accurately describe the bioproduction aspect of CELLECT we chose to model and track three Variables: BluB, Adenosylcobalamin (AdoCbl) and Cobinamide (Cbi).
BluB is the enzyme we introduce into the organism to increase AdoCbl production in E.coli.
AdoCbl is our desired compound with Cbi as its substrate, which E.coli cannot synthesize itself and therefore must be taken up from the environment [3].
AdoCbl serves as an important cofactor for multiple enzymes.
The corrin center molecule of AdoCbI however demands a complex pathway which is only present in some archaea and prokaryotes [4].
Thats why many organisms have a salvage pathway to produce AdoCbl.
This means that E.coli take up intermediates, which already contain the corrin center, from the environment and use it for further synthesis of Cobalamin.
However, E.coli can also directly take up B12 from the environment through active transport.
Besides Cobinamide, 5,6-dimethylbenzimidazole (DMB) is needed for the synthesis of AdoCbl (Figure 2).
The BluB enzyme has been shown to utilize reduced Flavin mononucleotide ($FMNH_2$) to produce DMB.
DMB is already present in E.coli in low concentrations [5], and upon overproduction by introduction of an additional bluB gene, it can support the efficient production of B12 in E.coli, if supplemented with Cobinamide [6].
As shown in the previous example, we assume a continuous production and degradation of bluB, vitamin B12 and Cobinamide.
The production and degradation of these compounds can be described by a constant production term and a negative proportional degradation term in the individual components of the ODE.

The equation describing the rate of change for the enzyme BluB (B) is given by:
\begin{equation}
\dot {B} = k_B - \beta _B B
\end{equation}
Where:
\begin{align}
k_B &= \text{Production rate of BluB after induction}\nonumber\\
\beta_B &= \text{degradation rate of BluB}\nonumber\\
B &= \text{Concentration of BluB}\nonumber\\
\end{align}
We assume a constant concentration of inducer in our system resulting in a constant production of BluB after induction.
The degradation rate can be obtained from the half life of the protein, as well as other factors influencing the ability to produce DMB from the substrate FMN.
This results in the following graph:

This falls in the range of typical overexpression for non toxic proteins [7] and can be expected for our system.
The equation describing the rate of change of the Cobinamide $C$ is given by:
\begin{equation}
\dot{C} = k_{0C}+k_C\frac{1}{1+\left(\alpha_{CV}V\right)^n} - k_V B C - \beta_C C
\end{equation}
Where:
\begin{align}
k_{0C} &= \text{uptake rate Cbi without AdoCbl inhibiton}\nonumber\\
k_C &= \text{uptake rate Cbi }\nonumber\\
\beta_C &= \text{degradation rate Cbi}\nonumber\\
\alpha_{CV} &= \text{binding affinity of AdoCbl to the riboswitch}\nonumber\\
V &= \text{Concentration of AdoCbl}\nonumber\\
C &= \text{concentration of the Cbi}\nonumber\\
B &= \text{Concentration of BluB}\nonumber\\
n &= \text{Hill coefficient}
\end{align}
We assume the uptake of Cbi $k_C$ to be a constant rate.
However it has been shown that the corrin ring transport protein BtuB which transports corrin ring structures from the environment into the periplasmic space, is additionally responsible for the uptake of not only AdoCbl but also Cbi.[8].
This Transporter is both transcriptionally and translationally regulated by a Riboswitch located upstream of the btuB gene [6][8].Upon binding of AdoCbl and its derivatives to the 5´ UTR RNA, the region undergoes structural changes resulting in preliminary transcription termination as well as inhibition of ribosome binding to the ribosome binding site(RBS) [6, 8].
This leads to a decrease in Cbi uptake in the presence of intracellular AdoCbl.
This dynamic will be described with a modified hill equation $\frac{1}{1+\left(\alpha_{CV}V\right)^n}$ commonly used to describe repressor dynamics.
Since we have no change in binding affinity upon AdoCbl binding to the riboswitch, because only one molecule binds the riboswitch, n in the modified hill equation is 1 and therefore not mentioned further here.
AdoCbl $V$ acts as the repressor here with $\alpha _{CV}$ being the factor describing repression strength.
From literature research we obtained that the maximum repression by the riboswitch sets in at an AdoCbl concentration of around 5 nM [6].
Experimental results showed a saturation of the riboswitch at around 100 nM (Riboswitch Results).
This however refers to extracellular AdoCbl concentration and does not necessarily correspond to intracellular concentrations.
Since experimental data suggests that Cbi is taken up even after AdoCbl is in riboswitch saturating concentrations (>5 nM intracellular AdoCbl), we also include another parameter to describe it more accuratly.
$k_{0C}$ tries to capture the Cbi uptake due to low diffusional pressures.
The third term of equation 5 describes the conversion of Cbi to AdoCbl, where we assume a one to one conversion.
The resulting term is negative since Cbi is metabolized.
Under degradation we consolidate factors which lead to a decrease of Cbi available to AdoCbl production, for example it being metabolized or efflux from the cell.
Modeling the Cbi uptake in the absence of AdoCbl, with experimentally acquired parameters, as well as literature suggestions [9], results in the following graph:

We can observe that most of the Cbi gets taken up in approximately the first 10 minutes.
We see relatively low concentration of Cbi in the cell (Figure 4) as compared to concentration of BluB (Figure 3), which is 20,000 times greater.
The Equation describing the overall change of AdoCbl $V$ in the context of our system is given by:
\begin{equation}
\dot{V} = k_{0}+k_V B C - \beta_V V
\end{equation}
Where:
\begin{align}
k_0 &= \text{basal production rate of AdoCbl}\nonumber\\
k_V &= \text{Production rate of AdoCbl after induction}\nonumber\\
\beta_V &= \text{degradation rate AdoCbl}\nonumber\\
V &= \text{Concentration of AdoCbl}\nonumber\\
C &= \text{concentration of the Cbi}\nonumber\\
B &= \text{Concentration of BluB}\nonumber\\
\end{align}
After induction, the concentration of BluB $B$ rises resulting in increasing concentrations of DMB inside the cell.
This leads to an increase of AdoCbl $V$ which is dependent on the current concentrations of Cbi $C$ and BluB $B$.
Since the third term of equation 5 describes the conversion of Cbi to AdoCbl it is implemented in the equation as well, but as a positive term since AdoCbl is produced.
Under the degradation rate $\beta_V$, we assume processes that remove AdoCbl from the pool of available AdoCbl to the Riboswitch.
For example incorporation in AdoCbl dependent enzymes as a cofactor.
Experiments have shown that E.
coli are able to produce low levels of AdoCbl when supplemented with Cobinamide [6].
Therefore we include a basal production rate $k_0$ of AdoCbI which would also occur in cells not expressing BluB.
From combining all three tracked Variables for bioproduction we obtain following dynamic:

As soon as AdoCbl is produced, the Cbi concentration drops.
This can be attributed to two effects: firstly: AdoCbl represses the prodution of the corrin ring transporter BtuB, [6] and secondly: AdoCbl production requires Cbi as a substrate, therefore metabolizes it.
So most of the available Cbi gets converted to AdoCbl which then represses the uptake, this further leads to stagnation of AdoCbl production, which at this point can only use the Cbi entering the cell via the diffusional pressure.
Furthermore, we see that most of the AdoCbl is produced within the first 3 hours (Figure 5).
The relevant AdoCbl concentration of 5 nM is reached around 9 minutes after induction.
Toxin-Antitoxin Systems
Toxin-antitoxin systems (TAS) are widespread in bacterial genomes, many consisting of a toxin which is produced together with a more labile and actively degraded antitoxin.
Under given conditions the toxin accumulates, thus leading to growth inhibition or cell death.
Under regular circumstances however, antitoxin is produced which neutralizes the toxin by forming a complex (Figure 6), therefore resulting in the survival of the cell [10].
Stress factors like nutrient deficiency or heat, inhibit the production of antitoxin [11].
Since the antitoxin degrades faster compred to the toxin, free toxin accumulates in the cell leading to cell death.

CELLECT exploits this system to ensure the production of a desired compound, in this case Vitamin B12 (AdoCbl).
Since we do not know how the system behaves and wether we need to implement an antitoxin component in CELLECT, we designed the equations with independence in mind, meaning that both toxin and antitoxin can be expressed under different promoters.
To mathematically describe the above explained processes, the following ODEs can be established.
The equations describe changes of antitoxin $A$ and toxin $T$
\begin{equation}
\dot {T} = k_T - \beta _T T
\end{equation}
\begin{equation}
\dot {A} = k_A - \beta _A A
\end{equation}
Where:
\begin{align}
k_A &= \text{production rate Antitoxin}\nonumber\\
k_T &= \text{production rate Toxin}\nonumber\\
\beta_A &= \text{degradation Antitoxin}\nonumber\\
\beta_T &= \text{degradation Toxin}\nonumber\\
T &= \text{concentration of the Toxin}\nonumber\\
A &= \text{concentration of the Antitoxin}\nonumber
\end{align}
As already explained in ODEs in biology, proteins like the toxin and antitoxin have constant production rates $k_x$ and degradation rates $\beta_x$.
We assume that almost all free toxin is immediately bound to antitoxin as long as antitoxin is available.
Therefore lethal amounts of toxin can be determined by comparing absolute toxin with antitoxin levels.
The native MazF-MazE toxin-antitoxin system in E.coli has already been well studied.
The production $k_x$ and degradation rates $\beta_x$ were determined at $k_A$ = 1.12 nM/s and $k_T$ = 0.33 nM/s [12], as well as $\beta_A$ = 3.85E-04 1/s and $\beta_T$ = 5.78E-05 1/s [13], resulting in following dynamic:

We see that the system approaches an equilibrium of around 5671 nM for the toxin and around 2909 nM for the antitoxin.
Resulting in a factor of around 1.95 toxins per antitoxin in the cell which coincides rather well with the theoretical necessity of 4 toxins being bound by 2 antitoxins (Figure 6), further validating the model.
This is in accordance with findings from Li at al.
2014 [12], suggesting that bacteria keep a fine-tuned balance for proteins close to their stoichiometric factors.
Since dividing bacteria would always start with higher initial amounts, the initial values of the ODE will have to be adjusted when modeling different scenarios involving the toxin-antitoxin system.
This model allows us to calculate the threshold amount of toxin exposure over a certain time frame needed to kill the cell.
From literature research we have found that after 10 minutes of stress inducing conditions the survival rate drops to effectively zero, but when antitoxin is introduced back into the cell (via induced plasmid carrying the antitoxin gene) the survival rate rises as compared to WT E.Coli.
This leads to a so-called “point of no return” which is found to be at around 180 minutes after the stressful condition, where even with antitoxin introduction cell survival is also effectively zero [14].
In order to determine the threshold exposure $\Theta$, we take the system in equilibrium and introduce additional stress, thus effectively stopping production of the two proteins.
At the time point $t_1$, when toxin can not be neutralized by the antitoxin anymore eg.
$T$ > $2A$, free toxin starts to affect the cell.
After $\Delta t$ = 180 minutes certain cell-death sets in as mentioned previously [14].
The amount of free toxin during this time period can be calculated by an integral given by:
\begin{equation}
\Theta = \int\limits_{t_1}^{t_1+\Delta t} (T-2A)dt.
\end{equation}
Resulting in the following graph:

The resulting lethal toxin amount according to our model is: 8516 nM*hours.
This value will help us to determine when cells are going to die in our model of CELLECT.
Further, it is important to emphasize that we can only use construct independent parameters thus the degradation rates for toxin and antitoxin for the subsequent modeling of CELLECT.
The production rates of toxin and antitoxin are dependent on the construct their in, since both toxin and antitoxin have different promoters as well as different RBS's in our system when compared to the native system in E.coli, we can not use the production paramters $k_x$.
In the next step we look at the protein dynamics as we would expect them in our system.
Three different setups of our system will be tested.
Starting with a system without any antitoxin.
For this the following is observed:

As shown in Figure 9 cell lethality sets in rather quickly at around 17 minutes after induction, suggesting that the cells would not be able to handle the amount of free toxin and therefore die.
This is in accordance with findings from the toxin results.
Furthermore we do see that cell would be stressed by the free toxin from the point of induction onwards.
And even before induction due to a leaky promoter as seen in the production experiements, further suggesting the need for a toxin buffer e.g an antitoxin.
When introducing the antitoxin to the system under the same promoter as the toxin under the assumption that both production rates are the same, we observe the following:

We observe that cell death would set in after around 151 minutes after induction.
This is in stark contrast to a system without an antitoxin (Figure 9), where the bacterial fate is sealed after 17 minutes.
But this increase of stress free time after induction comes at a cost: a lot of toxin and antitoxin has to be produced before cell stress, to still ensure eventual cell death (Figure 6).
This could potentially be a huge burden for the cell.
When thinking about applicability to CELLECT, we see that the huge amount of antitoxin would directly compete with BluB for resources in its synthesis potentially leading to lower BluB concentrations and therefore lower AdoCbl levels.
Thus defeating the purpose of CELLECT.
To deal with this problem we thought about implementing the antitoxin under a low constitutive promoter, resulting in an antitoxin level already at equilibrium at the point of induction.
This is modeled under the assumption that a constitutive promoter leads to a lower production rate compared to an inducible promoter like the TetR promoter.
This results in following dynamic:

With the constitutively expressed antitoxin we see the cells die after 52 min (Figure 11) in contrast to 17 min without the antitoxin (Figure 9) and 151 minutes for the overexpression of antitoxin (Figure 10).
From figure 9 we concluded that the relevant AdoCbl concentration for inhibition with the riboswitch is reached after around 9 minutes.
Therefore a system without an antitoxin could not adequately protect the cell from cell stress during the first 9 minutes after induction.
In contrast to that we saw from figure 10, that cell stress would set in at 90 minutes; this would be plenty of time for the riboswitch to down regulate the toxin before reaching the critical level.
But the cost for this approach was a higher antitoxin production and thus greater metabolic burden for the cell.
That's why this approach also should not be chosen as well.
Lastly we saw that stress sets in after around 24 minutes with the “point of no return”, 52 minutes after induction.
This is well over the 9 minutes needed for reaching saturating AdoCbl levels and promise a better fit for our system.
That's why we conclude the necessity for a low constitutively expressed antitoxin as a toxin buffer after induction.
CELLECT
In CELLECT the production of AdoCbI is coupled to the survival of the cell.
Therefore it is advantagous to control the toxin with the same riboswitch, which inhibits the transporter protein BtuB in WT E.
coli.
To describe the resulting system in an ODE, we combine the first term of equation (8) with the modified hill term as seen in equation (5) and obtain
\begin{equation}
\dot{T} = k_T\frac{1}{1+\left(\alpha_{CV}V\right)^n} - \beta_T T
\end{equation}
Where
\begin{align}
k_T &= \text{production rate Toxin}\nonumber\\
\alpha_{CV} &= \text{binding affinity of AdoCbl to the Riboswitch}\nonumber\\
\beta_T &= \text{degradation Toxin}\nonumber\\
V &= \text{Concentration of AdoCbl}\nonumber\\
T &= \text{concentration of the Toxin}\nonumber\\
n &= \text{Hill coefficient}.
\end{align}
The Hill-coefficient modulates the production term in a way that an increase in AdoCbl concentration will result in a decrease of the total production rate.
When combining all the aforementioned variables in our system with a low constitutively expressed antitoxin we obtain the following plots:

We can observe a less than lethal amount of toxin produced as a result of the down regulation by AdoCbl binding to the Riboswitch (Figure 12).
Further we can observe that for a final AdoCbl concentration of around 26 nM, the corresponding toxin concentration is quite high at around 65 µM.
This suggests that higher AdoCbl concentrations in equilibrium would lead to less toxin production and therefore less metabolic burden to the cell.
When looking at a scenario where AdoCbl prodution does not reach desired levels we observe the following:

Cell death sets in at around 150 minutes after induction due to rising toxin levels (Figure 13).
However, it only does so after the AdoCbl equilibrium has already been reached.
The toxin production decreases with rising AdoCbl concentration as expected resulting in insufficient toxin concentrations inside the cell to induce cell (Figure 12).
But when AdoCbl production is decreased, we see an increase of toxin production as expected (Figure 13).
This is CELLECT.
In order to determine the minimal AdoCbl concentration necessary for toxin stress free conditions we look at the following plot which is derived from the two preceding figures (Figure 12 & 13):

Here we can observe how AdoCbl concentration in equilibrium affects toxin concentrations in equilibrium.
Cell stress due to free toxin sets in at T>2A which is equivalent to a toxin concentration of 86.232 µM.
And an AdoCbl concentration in equilibrium of around 20 nM suggesting that this is the minimal AdoCbl concentration at equilibrium necessary for toxin stress-free cell life (Figure 14).
The difference in AdoCbl concentration leading to cell stress and leading to cell death is negligible which means that there is only a small range of AdoCbl levels resulting in cell stress rather then cell death.
More importantly is that we see a decrease of toxin concentartion with rising AdoCbl concentrations (Figure 14).
This would lead to less metabolic burden for the cell when producing more AdoCbl, inturn giving high producing cells an evolutionary advantage over cells producing just the minimum concentration necessary required for survival from the toxin.
This could be seen as a from of directed evolution.
DISCLAIMER
All parameter estimations are based on literature values or experimental observations.
Due to very newly obtained experimental results which replace old data the modeling will have to be reevaluated.
While this impact remains to be explored, every conclusion drawn so far is aligned with the expected outcomes of CELLECT.
Parameter | Used for | Determined Value | Source |
---|---|---|---|
$k_T$ | Production rate of the toxin | 66 nM/s | Estimated by looking at overexpression of Luciferase with the TetR promoter [15]. |
$k_A$ | Production rate of the antitoxin | 16.6 1/s | Under the assumption that the low constitutive Amp promotor enables around a quarter of the expression as seen with the TetR promoter |
$\beta _T$ | Degradation rate of the toxin | 5.78E-05 1/s | [13]. |
$\beta _A$ | Degradation rate of the antitoxin | 3.85E-04 1/s | [13]. |
$k_B$ | Production rate of the BluB | 66 nM/s | Estimated by looking at overexpression of Luciferase with the TetR promotor [15]. |
$\beta _B$ | Degradation rate of the Blub | 6.66E-4 1/s | Estimated by solving equation (4) for $\beta _B$ with $k_B$ from [15] and an equilibrium concentration of 100 µM [16] [17] |
$k_C$ | Uptake rate for Cbi | 3.055 nM/s | Estimated by curve fitting from MS data (Riboswitch results) (B12 results) |
$\beta _C$ | Degradation rate of Cbi | 0.597 1/s | Estimated by curve fitting from MS data (Riboswitch results) (B12 results) |
$k_V$ | Production rate of AdoCbl | 1E-6 nM/s | Estimated from MS data (Riboswitch results) (B12 results) |
$\beta _V$ | Degradation rate of AdoCbl | 3E-4 1/s | Estimated from MS data (Riboswitch results) (B12 results) |
$k_0$ | Basal AdoCbl production with 500 nM Cbi | ≈0 nM/s. | Measured by MS with basically no AdoCbl increase after 24 hours incubation in 500 nM Cbi as well as supported by experiment in Ethanolamine (Riboswitch results) (B12 results) |
$\alpha _{CV}$ | Binding affinity of AdoCbl to the riboswitch | 0.01 1/nM | Estimated by fitting MS data (Riboswitch results) (B12 results) |
$k_{0C}$ | Cobinamide uptake due to diffusion at 500 nM Cbi | 0.01 nM/s | Estimated by fitting MS data (Riboswitch results) (B12 results) |
Molecular Dynamics
Pushing the boundaries of B12 production?
Investigating the microscopic world - visual in silico approach
Key Findings

Conceptualizing the atomic world has always posed a difficult challenge to humanity.
Over the decades, however, more and more precise methods of molecular visualization have been discovered.
Nowadays we are able to accurately display complex objects ranging from simple chemicals over amino acids, proteins, viruses to even entire cells.
With the expanded knowledge gained from comprehensive imaging and modeling approaches comes now unimaginable potential.
Potential that pushed drug development a huge step forward towards a more aimed approach [18].
By gathering additional information about surface structure, orientation, general conformation, interaction and forces scientists are able to identify potential drug targets more specifically than ever before rather than testing non-ending possibilities by trial and error.
Investigating structures and dynamics of proteins and biomolecules is a major concern of modern science and indispensable when it comes to finding appropriate solutions to current biological and medical problems.
Current crystallography analysis methods can create high-resolution pictures of molecules using demanding methods such as X-ray crystallography or NMR.
These and similar methods show only a fraction of reality since they capture a static snapshot of possible states.
However, these methods are limited when it comes to showing dynamics and interactions.
As a result, we often lack the spatio-temporal resolution in complex environments that would allow us to better infer and deduce the function of the participants in such environments.
AlphaFold [19] is a powerful tool that, when given the amino acid sequence, is able to precisely predict the structure and configuration of proteins.
But even this is limited when it comes to understanding the complexity of protein folding and interaction with a substrate or other proteins.
Furthermore, it is necessary to gain insight on how the molecule would behave over the course of time when exposed to external and internal forces.
Here molecular dynamics simulations open the door to discover the dynamics of biomolecular processes in silico by using computational analytical capabilities.
With ever-advancing software and data, such as force field parameters it is possible to simulate up to hundreds of millions of atoms and their response to the artificial environment one imposes on them.
A force field is a set of mechanical forces summed up in a code to use in molecular dynamics simulation.
This is useful to replicate real life forces and their effect on the observed molecules by applying Newtonian equations to each atom [20].
Molecular dynamics provides a suitable approach that has evolved into a powerful tool to address complex questions in synthetic biology without the need of touching a pipette.
Some of the better known simulation programs and provided force fields for example are the biomolecular simulation programs suite Amber [21], Chemistry at HARvard Molecular Mechanics Charmm [22] or the GROningen MAchine for Chemical Simulations Gromacs [23].
Molecular Dynamics Simulation of BluB monomer and BluB fusion construct
Molecular Dynamics Simulation in OpenMM using Amber ff14 force fields and TIP3FB water model. BluB monomer and a novel BluB fusion protein consisting of two BluB molecules connected by a flexible linker derived from piG_01b sequence and predicted in AlphaFold. Simulation length over 30 ns. Video shows a fraction of entire simulation. Every frame represents 50 ps.
In this year’s iGEM project the enzyme BluB plays a crucial role in boosting AdoCbl synthesis in E. coli.
It is therefore important to be able to make accurate predictions about its protein structure and configuration and furthermore be able to design and improve our system in a customized manner to maximize the yield in bioproduction.
During our research, we discovered that BluB is expressed as a monomer but is most probably active as a dimer.
This assumption of this dimeric form is based on the fact that to our knowledge, only crystal structures that are in dimeric form and bound to a substrate are published in the protein data bank RCSB.org [24], as for example by Taga et.
al.
(pdb ID: 2ISJ) [25].
Due to the fact that hardly any data exist on whether the activity of BluB differs between the monomeric and dimeric form, we decided to investigate this using biological structural analysis and ultimately molecular dynamics simulations.
To reach our goal of making bioproduction as efficient as possible, not only in terms of gene stability, but also in the broader sense, we modeled and evaluated binding capabilities of the enzyme BluB in dimeric and monomeric configuration to its substrate (FMN).
We noticed that the assumption made by AlphaFold was that the enzyme would fold as a monomer in the same way as a dimer.
This conclusion is most likely based on the crystallographic training data for BluB, consisting only of published PDB BluB dimer structures.
However, we suspect that this monomeric structure prediction is flawed.
If this is the case, it could also have an impact on our self-regulatory system, since a higher concentration of B12 increases the survivability of our cell by binding to a riboswitch.
To test this, we simulated BluB with molecular dynamics simulations to see if it folds differently than expected by us.
In the simulations, we took a closer look at the enzymatic binding pocket of BluB.
BluB binds to flavin mononucleotide (FMN), a molecule that is cleaved to 5,6-dimethylbenzimidazole (DMB) and is a direct precursor of B12.
We showed that it’s likely that the BluB dimers are essential for proper binding of FMN.
To support our findings, we created and simulated a novel fusion protein consisting of two BluB molecules connected by a flexible linker to ensure enzymatic stability.
An optimized fusion BluB dimer could significantly increase the production and yield of commercial B12 production.
These results could be of great importance for science and industry.
Conclusion
Our production MD simulation is in total 38.5 ns for protein BluB monomer (BluB mono) and 32.5 ns for protein BluB dimer linked via 18 amino acid GS linker (BluB fusion).
To visually illustrate the simulation and get a grasp of protein behavior, a section of the production MD is shown in Figure 1 as a video.
Only protein structure is displayed and water molecules and counter ions have been deleted from the solvation box.
Every frame of the video translates to 50 ps of simulation.
As one can notice, the overall protein structure tends to be stable without any major structure shifts, unfoldings and dissociations.
Furthermore, BluB mono (violet) is keeping its core structure with only its C-terminal tail remaining unstructured with high fluctuation.
In the structure of the BluB fusion (BluB chains in violet, magenta and artificial linker in cyan) however, no free C-terminal tail is visible.
This is because the tails are stabilized by the neighboring BluB chain.
We conclude that the dimer configuration improves chain and structure stability and our fusion product is at first glance stable and resembles the structure of a native BluB dimer.
As we show later this C-terminal tail is important for proper enzymatic pocket folding.

In the Root Mean Squared Deviation (RMSD) graphs in Figures 15A and 15B, it can be observed that the deviation reaches its limit indicating a stable protein conformation state.
Dips at frame 400 for BluB fusion and frame 168 for BluB mono are a result of interruption and new solvation, minimization and equilibration of the simulation to decrease and optimize computational resources.
That resulted in a shift of values but not in the trend and thus we continued with it.
To support our assumptions, we calculated the Radius of Gyration (Rg) for both proteins over the trajectory.
We noticed no substantial increase or decrease (BluB mono: Rg first frame 2.158, Rg middle frame 1.970, Rg last frame 2.084 and Rg average 1.983; BluB fusion: Rg first frame 2.204, Rg middle frame 2.143, Rg last frame 2.154 and Rg average 2.152).
With both findings we concluded that a sufficient simulation length had been achieved.
Figure 15C and 15D plotted the Root Mean Squared Fluctuation (RMSF).
While a consistently low RMSF can be observed for both monomeric and fusion BluB, a high fluctuation can be observed at the N/C-terminus of both proteins.
This fluctuation for BluB mono is considerably higher when compared to the BluB fusion.
This is consistent with our observation in Figure 1, in which the BluB mono C-terminal tail is mobile and the BluB fusion tail is stabilized by the neighboring BluB.
Our artificial linker has only a small peak at residue 3000 to 4000 in BluB fusion (Figure 15C) but does not particularly stand out from the overall trend.
Apart from their N/C-terminus, both proteins present a low and comparable fluctuation indicating overall stability.

Regarding BluB fusion simulation we were not only interested in overall protein stability but also in proper enzymatic pocket formation.
This is important because only correct protein folding can ensure proper substrate uptake and enzymatic activity in bioproduction.
We inspected the published crystal structure of a BluB dimer bound to FMN obtained from PDB ID: 2ISJ (Figure 16).
In highlighting polar contacts (yellow lines) and associated amino acids (gray) we can see that the substrate FMN contacts not only one BluB chain (purple), but also the amino acid serine 59 (S59) of the opposite BluB chain (magenta).
Corresponding to that, we assume that S59 is essential for proper docking of FMN to BluB.
Only by dimerization of two BluB proteins, the S59 of one BluB stabilizes FMN in the enzymatic binding pocket of the other BluB.
Vice versa, it can be concluded that FMN binding would be significantly less stabilized if a second BluB were not present in the complex (Figure 16).
The sequence from PDB 2ISJ does not completely match our sequence of BluB from our plasmid piG_01.
The predicted BluB structure by AlphaFold, however, has very high resemblance with it.
PDB 2ISJ and BluB fusion were visually inspected and aligned using PyMol prior and after the simulation and it was confirmed that the alignment was of high similarity between both protein complexes.

Our findings in Figure 16 lead us straight to the analysis in Figure 17.
Here we chose two sets of opposing amino acids that form the enzymatic pocket (gray amino acids in Figure 16) and calculated the distances over time in the trajectory of this amino acids alpha carbon atom (C-alpha).
Amino acids opposite to each other were arginine 21 (R 21, C-alpha with atomic ID 327) and serine 156 (S 156, C-alpha with atomic ID 2443) as well as arginine 25 (R 25, C-alpha with atomic ID 404) and serine 287 (S 287, C-alpha with atomic ID 4334).
Distance between R 21 and S 156 was indicated with a dark orange line and distance between R 25 and S 287 with a light orange line (Figure 17A).
Since BluB mono is only a monomer, there is no serine 287 (corresponds to S 59 in Figure 16) in the neighboring chain and only R 21 with S 156 was plotted.
In Figure 17B, we can clearly see that the distance in BluB mono stays consistent over the course of the simulation but has a variation of around 0.35 nm.
In BluB fusion (Figure 17C and 17D) we can see a shift at frame 400.
The renewed solvation and equilibration must have shifted atomic coordinates and introduced an offset to the calculation.
Nonetheless the trends stay the same.
The amino acids R 25 and S 287 increase their distance over the entire simulation time.
Even when neglecting the shift, an increase for about 0.25 nm is still observable.
In contrast, the amino acids R 21 and S 156 in Figure 17D rapidly increase and then steadily decline their overall distances.
Considering the shift the downwards trend still includes around 0.25 nm of reduced distance but lands on the same level as the initial predicted protein structure data.
These changes in BluB fusion most likely occur because based on the training data for this predicted structure only the presence of FMN in the enzymatic pocket is known.
Without it there is more free space for relaxation of the protein chains.
However, in these relaxation movements we can not observe rapid drifts that would suggest protein unfolding or complex dissociation.
When combined with previous data, the BluB fusion seems to be overall stable and functional as a protein complex.
We conclude that our BluB fusion protein could be a perfect candidate for experimental testing in bioproduction.
It embodies a good balance between having the right length of linker and guaranteed proper complex folding, while in the meantime, not obstructing anything.
Outlook
As shown above we can see that a BluB dimer is most likely necessary for the stabilization of the substrate FMN in the binding pocket and therefore indispensable regarding adequate DMB production.
A fusion protein of two BluB molecules does not only ensure the association, but also the long-term stability of the enzymatic complex.
It could lower metabolic burden by decreasing the amount of unbound BluB monomers.
The implementation of this change would be efficient yet offering increased benefit.
The next step in our research is to implement our improved protein in our self-regulated system by testing a BluB fusion protein in-vitro, to assess the value of it.
We plan on cloning this fusion protein and are thrilled to see if we can achieve experimental approval of this assumption by reaching higher B12 yields, further improving CELLECT.
Methods
BluB input files:
BluB PDB files for simulation were predicted with AlphaFold with BluB monomer sequence taken from plasmid piG_01. BluB fusion has the same sequence as the BluB monomer but is doubled and fused via a 18 amino acid GS linker. The length of the linker was visually estimated in Visualization for Molecular Dynamics software VMD [26]. Predicted BluB structures were inspected and aligned with crystallographic data of the Protein Data Bank (pdb ID: 2ISJ) in PyMol version 2.5.3 [27].
System preparation:
For simulation Amber force fields ff14SB [28] with TIP3P-FB water model [29] were employed for this study. Input files, topologies and parameter files were loaded into the high performance toolkit for molecular dynamics simulation OpenMM [30]. With OpenMM missing atoms and hydrogens to a corresponding pH=7 were added. To further mimic a most cell like environment the system was solvated in an pre-equilibrated cubic water box with a minimum padding of 1 nm including sodium and chloride counterions to achieve suitable ionic strength.
Minimization and Equilibration:
All systems were minimized and equilibrated following standard protocol to relieve steric clashes and normalize geometry and built up internal forces. Equilibration was executed for 1000 steps. Step size was set to 2 fs and the system was heated up to a temperature of 310 Kelvin with a friction coefficient of 1/ps.
Production MD run:
The simulations were run with CUDA platform support on either two RTX 2080 TI server GPUs or one Nvidia RTX 3060 mobile GPU. After initial MD production simulation for a minimal duration of 8.4 ns the simulation was stopped and protein coordinates extracted. To optimize computational resources the systems were then solvated and equilibrated again to achieve smaller solvation box sizes. Settings were kept as described above. Total duration of simulation was 38.5 ns for BluB mono and 32.5 ns for BluB fusion and confirmed by a sufficiently consistent RMSD.
Analysis:
All calculations and graph plotting were done with Python Version 3.8.8. The MD analysis library for Python MDTraj [31] was used to calculate RMSD, RMSF, Radius of Gyration and atomic distances. Additionally it was used to concatenate production MD trajectories. Atomic coordinates for amino acid alpha carbons (C-alpha) were taken directly from PDB files. Inspection, alignment, protein pictures and trajectory videos were created using PyMol.
References
- [1] Motta S, Pappalardo F. Mathematical modeling of biological systems. Briefings in Bioinformatics [Internet]. 2012 Oct 14;14(4):411–22. Available from: https://doi.org/10.1093/bib/bbs061
- [2] Janson NB. Non-linear dynamics of biological systems. Contemporary Physics [Internet]. 2012 Mar 1;53(2):137–68. Available from: https://doi.org/10.1080/00107514.2011.644441
- [3] Równicki M, Dąbrowska Z, Wojciechowska M, Wierzba AJ, Maximova K, Gryko D, et al. Inhibition of Escherichia coli Growth by Vitamin B12–Peptide Nucleic Acid Conjugates. ACS Omega [Internet]. 2019 Jan 10;4(1):819–24. Available from: https://doi.org/10.1021/acsomega.8b03139
- [4] Fang H, Kang J, Zhang D. Microbial production of vitamin B12: a review and future perspectives. Microbial Cell Factories [Internet]. 2017 Jan 30;16(1). Available from: https://doi.org/10.1186/s12934-017-0631-y
- [5] Taga ME, Larsen N, Howard‐Jones AR, Walsh CT, Walker GC. BluB cannibalizes flavin to form the lower ligand of vitamin B12. Nature [Internet]. 2007 Mar 1;446(7134):449–53. Available from: https://doi.org/10.1038/nature05611
- [6] Fowler CC, Brown ED, Li Y. Using a Riboswitch Sensor to Examine Coenzyme B12 Metabolism and Transport in E. coli. Chemistry & Biology [Internet]. 2010 Jul 1;17(7):756–65. Available from: https://doi.org/10.1016/j.chembiol.2010.05.025
- [7] Bolognesi B, Lehner B. Reaching the limit. eLife [Internet]. 2018 Aug 10;7. Available from: https://doi.org/10.7554/elife.39804
- [8] Escalante‐Semerena JC. Conversion of Cobinamide into Adenosylcobamide in Bacteria and Archaea. Journal of Bacteriology [Internet]. 2007 Jul 1;189(13):4555–60. Available from: https://doi.org/10.1128/jb.00503-07
- [9] Pm DG, Bradbeer C. Transport of Vitamin B 12 in Escherichia coli. Journal of Bacteriology [Internet]. 1971 Jun 1;106(3):745–50. Available from: https://doi.org/10.1128/jb.106.3.745-750.1971
- [10]Jurėnas D, Fraikin N, Goormaghtigh F, Van Melderen L. Biology and evolution of bacterial toxin–antitoxin systems. Nature Reviews Microbiology [Internet]. 2022 Jan 2;20(6):335–50. Available from: https://doi.org/10.1038/s41579-021-00661-1
- [11] Engelberg‐Kulka H, Hazan R, Amitai S. mazEF: a chromosomal toxin-antitoxin module that triggers programmed cell death in bacteria. Journal of Cell Science [Internet]. 2005 Oct 1;118(19):4327–32. Available from: https://doi.org/10.1242/jcs.02619
- [12] Li GW, Burkhardt DH, Gross CA, Weissman JS. Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell [Internet]. 2014 Apr 1;157(3):624–35. Available from: https://doi.org/10.1016/j.cell.2014.02.033
- [13] Aizenman E, Engelberg‐Kulka H, Glaser G. An Escherichia coli chromosomal “addiction module” regulated by guanosine [corrected] 3’,5’-bispyrophosphate: a model for programmed bacterial cell death. Proceedings of the National Academy of Sciences of the United States of America [Internet]. 1996 Jun 11;93(12):6059–63. Available from: https://doi.org/10.1073/pnas.93.12.6059
- [14] Kolodkin‐Gal I, Engelberg‐Kulka H. Induction of Escherichia coli Chromosomal mazEF by Stressful Conditions Causes an Irreversible Loss of Viability. Journal of Bacteriology [Internet]. 2006 May 1;188(9):3420–3. Available from: https://doi.org/10.1128/jb.188.9.3420-3423.2006
- [15] Lutz R, Bujard H. Independent and tight regulation of transcriptional units in Escherichia coli via the LacR/O, the TetR/O and AraC/I1-I2 regulatory elements. Nucleic Acids Research [Internet]. 1997 Mar 15;25(6):1203–10. Available from: https://doi.org/10.1093/nar/25.6.1203
- [16] Mori Y, Yoshida Y, Satoh A, Moriya H. Development of an experimental method of systematically estimating protein expression limits in HEK293 cells. Scientific Reports [Internet]. 2020 Mar 16;10(1). Available from: https://doi.org/10.1038/s41598-020-61646-3
- [17] Sundararaj S, Guo A, Habibi‐Nazhad B, Rouani M, Stothard P, Ellison M, Wishart DS. The CyberCell Database (CCDB): a comprehensive, self‐updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli. Nucleic acids research. 2004 Jan 1;32(suppl_1):D293-5.
- [18] Blundell TL, Jhoti H, Abell C. High-throughput crystallography for lead discovery in drug design. Nature Reviews Drug Discovery. 2002 Jan 1;1(1):45-54.
- [19] Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature [Internet]. 2021 Jul 15;596(7873):583–9. Available from: https://doi.org/10.1038/s41586-021-03819-2
- [20] Stevens JA, Grünewald F, van Tilburg PM, König M, Gilbert BR, Brier TA, Thornburg ZR, Luthey-Schulten Z, Marrink SJ. Molecular dynamics simulation of an entire cell. Frontiers in Chemistry. 2023 Jan 18;11:1106495.
- [21] Salomon‐Ferrer R, Case DA, Walker RC. An overview of the Amber biomolecular simulation package. Wiley Interdisciplinary Reviews: Computational Molecular Science. 2013 Mar;3(2):198-210.
- [22] Brooks BR, Brooks III CL, Mackerell Jr AD, Nilsson L, Petrella RJ, Roux B, Won Y, Archontis G, Bartels C, Boresch S, Caflisch A. CHARMM: the biomolecular simulation program. Journal of computational chemistry. 2009 Jul 30;30(10):1545-614.
- [23] Berendsen HJ, van der Spoel D, van Drunen R. GROMACS: A message-passing parallel molecular dynamics implementation. Computer physics communications. 1995 Sep 2;91(1-3):43-56.
- [24] Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic acids research. 2000 Jan 1;28(1):235-42.
- [25] Taga ME, Larsen NA, Howard-Jones AR, Walsh CT, Walker GC. BluB cannibalizes flavin to form the lower ligand of vitamin B12. Nature. 2007 Mar 22;446(7134):449-53.
- [26] Humphrey W, Dalke A, Schulten K. VMD: Visual molecular dynamics. Journal of Molecular Graphics [Internet]. 1996 Feb 1;14(1):33–8. Available from: https://doi.org/10.1016/0263-7855(96)00018-5
- [27] The PyMOL Molecular Graphics System, Version 2.5.3 Schrödinger, LLC.
- [28] Maier JA, Martinez C, Kasavajhala K, Wickstrom L, Hauser KE, Simmerling C. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. Journal of chemical theory and computation. 2015 Aug 11;11(8):3696-713.
- [29] Wang LP, Martinez TJ, Pande VS. Building force fields: An automatic, systematic, and reproducible approach. The journal of physical chemistry letters. 2014 Jun 5;5(11):1885-91.
- [30] Eastman P, Swails J, Chodera JD, McGibbon RT, Zhao Y, Beauchamp KA, et al. OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Computational Biology [Internet]. 2017 Jul 26;13(7):e1005659. Available from: https://doi.org/10.1371/journal.pcbi.1005659
- [31] McGibbon RT, Beauchamp KA, Harrigan MP, Klein C, Swails J, Hernández CX, et al. MDTRAJ: A modern open library for the analysis of molecular dynamics Trajectories. Biophysical Journal [Internet]. 2015 Oct 1;109(8):1528–32. Available from: https://doi.org/10.1016/j.bpj.2015.08.015
- [32] Sarzynska J, Popenda M, Antczak M, Szachniuk M. RNA tertiary structure prediction using RNAComposer in CASP15. Proteins: Structure, Function, and Bioinformatics. 2023 Aug 24. Available from: https://doi.org/10.1002/prot.26578
NOTE
PDB file of spinning riboswitch molecule on the home was created using RNAcomposer [32], animation was done using molstar.org, PDB file of spinning BluB molecule on the same page was taken from rcsb.org (PDB ID: 2ISJ) and also animated using molstar.org.