Model

HPP Model

Our main goal is to filter out a promoter sequence with the best distinction under conditions with or without environmental oxygen, so that the survival of E.coli differs most in the tumor and the normal homeoenvironment. To achieve this goal, we develop Hybrid Promoter Predicting (HPP) model. We divide the whole task into the following 3 steps:

1. Predict the relative intensity of a set of pPept parts in promoters in aerobic environments (Aerobic Intensity Prediction - Linear Model)

2. Predict the promoter intensity in hypoxia environment focusing primarily on the overall gene circuit, without taking into account the details such as mutant pPept components. (AND-Gate Model)

3. Predict the overall relative intensity by multiplying the 2 above (Integrate Model)

Aerobic Intensity Prediction - Linear Model

We calculate the transcription rate of the naked DNA strings, namely the intensity of the promoter under aerobic conditions, simulating the bacteria growth in normal tissues. Then, we verify the result by comparing the outcome of the in-silico experiment with the proportion of fluorescence to OD600.

Introduction

Inspiration from the paper LaFleur, T.L., Hossain, A. & Salis, H.M. Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria. Nat Commun 13, 5159 (2022).. Through machine learning, the author used the dataset based on data from NCBI and the supplementary data the author provided to train a prediction model by determining the parameters needed. These interactions occur at DNA sites known by their canonical positions, including (i) an upstream 6-nucleotide site called the -35 motif; (ii) a 20-nucleotide region that appears upstream of the -35 motif, called the Up element; (iii) a downstream 6-nucleotide site called the -10 motif; (iv) a spacer region that separates the -10 and -35 motifs; (v) a 4-nucleotide site upstream of the -10 motif, called the -10 extended motif (-10 ext); (vi) a typically 6-nucleotide region in between the -10 motif and TSS, called the discriminator (Disc); and (vii) the first 20 transcribed nucleotides, called the initial transcribed region (ITR). The dataset includes the parts mentioned above. Parameters are assigned according to model aims to find the transcription starting site and its corresponding free energy change as well as transcription rate, showing good predicting ability in testing datasets. Therefore, our team decided to adapt this model to help the prediction of our known promoter sequences, again certifying our in-vitro experiment outcome, further amplifying our confidence in its in-vivo effect as a cancer-targeting drug.

Method

statistical thermodynamic model

In the article (Travis L. et al), the author formulated a statistical thermodynamic model, which indicates that we can decompose how a promoter's sequence controls the interaction energies into a sum of free energies that can be related to the transcription initiation rate (TX), according to:

ΔG_{\text{total}} = ΔG_{\text{UP}} + ΔG_{35} + ΔG_{\text{spacer}} + ΔG_{10\text{ext}} + ΔG_{10} + ΔG_{\text{disc}} + ΔG_{\text{ITR}}

\log\frac{TX}{TX_{\text{ref}^{\prime}}} = β(\Delta G_{\text{total}} - \Delta G_{\text{total,ref}})

Overall, a rigid regression model with 346 fitted coefficients was found. To perform our analysis with cofficients in free_energy_coeffs.npy, we processed our promoter sequence in the following way:

Sequence_Name	UP	h35	spacer
BFT	tctagagGGATAAAATTGATCTGA	ATCAAT	ATTTGTCTTTTCTTGCTT
BFW	tctagagGGATAAAATTGATCTGA	ATCAAT	ATTTGTCTTTTCTTGCTT
BWT	tctagagGGATAAAAGTGACCTGA	CGCAAT	ATTTGTCTTTTCTTGCTT
BWW	tctagagGGATAAAAGTGACCTGA	CGCAAT	ATTTGTCTTTTCTTGCTT
JFT	tctagagGGATAAAATTGATCTGA	ATCAAT	ATTTGTCTTTTCTTGCTT
JFW	tctagagGGATAAAATTGATCTGA	ATCAAT	ATTTGTCTTTTCTTGCTT
JWT	tctagagGGATAAAAGTGACCTGA	CGCAAT	ATTTGTCTTTTCTTGCTT
JWW	tctagagGGATAAAAGTGACCTGA	CGCAAT	ATTTGTCTTTTCTTGCTT

Cont.

Sequence_Name	h10	Disc	Itr
BFT	TATAAT	GTTGTCAtcta	GAGaaagaggagaaaTACTA
BFW	AATAAT	GTTGTCATCTA	GAGaaagaggagaaaTACTA
BWT	tATAAT	GTTGTCATCTA	GAGaaagaggagaaaTACTA
BWW	AATAAT	GTTGTCATCTA	GAGaaagaggagaaaTACTA
JFT	TATAAT	GTTGTCAtcta	gaGAAAGACAGGACCCacta
JFW	AATAAT	GTTGTCATCTA	gaGAAAGACAGGACCCacta
JWT	tATAAT	GTTGTCATCTA	gaGAAAGACAGGACCCacta
JWW	AATAAT	GTTGTCATCTA	gaGAAAGACAGGACCCacta

*BWW is the wild type promoter

Table 1 Promoter Sequence Parts

For sequence name, B/J represents the ribosome binding site; F/W represent the FNR-box; T/W represent the -10 site.

Note that we determine the parts by choosing the -10 site as standard. Then, a function analyze_excel is defined, taking the Excel as input, calculating the respective free energy change and corresponding TX rate and visualizing them in the histogram.

Result

Promoter Type	Sequence_Name	dG_total	TX_rate	relative TX
WA(wild type)	BWW	-1.35921	388.241	1
WA	JWW	-1.3445	379.012	0.97623
WT	BWT	-1.9255	980.648	2.52588
WT	JWT	-1.9108	957.338	2.46584
FA	BFW	-1.74074	724.798	1.86688
FA	JFW	-1.72603	707.569	1.8225
FT	BFT	-2.30704	1830.75	4.7155
FT	JFT	-2.29233	1787.23	4.60341

Table 2 Prediction result

With the linear prediction model, we acquire the relative free energy change, transcription rate and their relative transcription rate using the wild type promoter(BWW) as base intensity.

Figure 1: In-silico prediction of transcription rate

Figure 2, 3: In vitro GFP standard expression

Figure 4: Linear fit with error bars

Since the culture environment varies with batch, which results in huge differences between the multitude of standard fluorescence, we choose the 090123 batch since the bacteria solution had been renewed, leading to higher bacteria activity. From the graph, it’s easy to see that the results of in-silico and in-vitro outcomes match well. Consequently. It offers a reference for the next step of hypoxia prediction.

Discussion

· Free energy calculation may vary based on how we process our known sequences into parts by choosing different variables. We choose 11 for the length of Disc, and 20 for the length of Itr. For different lengths chosen, slightly different results may be reached.

· FNR binds to the -41.5 site, hindering the contribution of the -35 site by covering on the deoxyriboses. Therefore, the role -35 site plays may assume less importance in the actual scenario.

And-Gate Model

1.Model Construction

We constructed the model based on the molecular mechanism of gene circuit of AND-gate, and gave the predicted transcription rate at arbitrarily given lactate and oxygen concentration. Thus, we theoretically demonstrated the selectivity and specificity of our engineered E.coli for tumor microenvironment. Also, results of modeling gave clues about methods for gene circuit optimization.

1.1 Landscape of the Circuit

Gene circuit of our project consists of three main parts:

(1) Element targeting lack of oxygen:

FNR which is an activator, binds specific sequence at the upstream site of pPepT, and helps RNA polymerase bindng onto the promoter. FNR protein is an activator when it exist as a dimer, and oxygen molecular hinders the polymerization of two FNR monomers.

(2) Element targeting high concentration of lactate:

LldR which is a repressor, binds onto O1 and O2 sequence and subsequently polymerize into a dimer. The dimer forms a DNA loop and hinder the expression of genes between O1 and O2.

(3) Element expressing LldR:

With LldR overexpressing, pPepT is locked by DNA loop and will not be turned on.

Element (1) and (2) make up the and-gate structure. Promoter pPepT is turned on only in condition with high concentration of lactate and lack of oxygen, when DNA loop is open and FNR takes effect. Element (3) is designed to reduce constitutive expression.

Figure 5: Landscape of Engineered E.coli's Gene Circuit

1.2 Basic assumptions

(1) The thermodynamic system consists of a single DNA strand containing fragment of relative genes, large amount of proteins including RNA polymerases, LldR and FNR (monomers and dimers), and small moleculars including lactate and oxygen.

(2) There are enumerous binding sites for RNA polymerases on the DNA strand but only one site is specific for RNA polymerase. This assumption also holds for other activators (FNR protein) and repressors (LldR protein).

(3) The probability of systems' states are given by energy consuming during DNA-protein interaction according to Boltzmann distribution. For a given state of a thermal system, the Boltzmann factor is the exponential of minus its energy, measured in units of $k_b \times T$ . (4) The strength of promoter expression are directedly decided by probabilities of states where RNA polymerase are bound onto specific site on pPepT. For each state $i$ , $r_i$ labels the transcription rate of state $i$ . When RNA polymerazes are bounded onto non-specific sites, $r_i$ is assumed to be zero. Thus, the whole transcription rate is given by:

Transcription \ rate = \sum{r_i}{p_i}

1.3 System States and Corresponding Probability

According to the gene circuits of our AND-gate, we abstracted 13 discrete states based on DNA-protein interaction, and gave the probability of each based on Boltzmann distribution. When LldR are not bounded onto O1 and O2, there are four possble states: neither FNR nor RNA polymerase bounded on their specific sites (state 1); RNA polymerase bounded on its specific site but FNR does not (state 2);FNR bounded on its specific site but RNA polymerase does not (state 3); both FNR and RNA polymerase bounded on their specific sites (state 4). These four situations also occur in other two major categories: LldR binding to O1 (states 5-8), LldR binding to O1, and O2 (states 9-12). In state 13, two LldR proteins polymerize and form the DNA loop.

Table 3. Pictures of all States of the system and their corresponding probabilities.

State Number	Statistical Weight	Variables
1	1	$p_{none}$
2	${{RNAP} \over {N_{NS}}}{e^{-{{\Delta e_{Rd}} \over {{k_b}T}}}}$	$p_{RNAP}$
3	${{FNR_{dim}} \over {N_{NS}}}{e^{-{{\Delta e_{Fd}} \over {{k_b}T}}}}$	$p_{FNR}$
4	${{RNAP} \over {N_{NS}}}{{FNR_{dim}} \over {N_{NS}}}{e^{-{{{\Delta e_{Rd}}+{\Delta e_{Fd}}+{\Delta e_{FR}}} \over {{k_b}T}}}}$	$p_{FNR,RNAP}$
5	${{LldR} \over {N_{NS}}} {e^{-{{{\Delta e_{Ld}}\over {k_b T}}}}}$	$p_{none,O1}$
6	${{LldR} \over {N_{NS}}} {e^{-{{{\Delta e_{Ld}}\over {k_b T}}}}} \times {{RNAP} \over {N_{NS}}}{e^{-{{\Delta e_{Rd}} \over {{k_b}T}}}}$	$p_{RNAP,O1}$
7	${{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{FNR_{dim}} \over {N_{NS}}}{e^{-{{\Delta e_{Fd}} \over {{k_b}T}}}}$	$p_{FNR,O1}$
8	${{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{RNAP} \over {N_{NS}}}{{FNR_{dim}} \over {N_{NS}}}{e^{-{{{\Delta e_{Rd}}+{\Delta e_{Fd}}+{\Delta e_{FR}}} \over {{k_b}T}}}}$	$p_{FNR,RNAP,O1}$
9	${{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}}$	$p_{none,O12}$
10	${{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{RNAP} \over {N_{NS}}}{e^{-{{\Delta e_{Rd}} \over {{k_b}T}}}}$	$p_{RNAP,O12}$
11	${{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{FNR_{dim}} \over {N_{NS}}}{e^{-{{\Delta e_{Fd}} \over {{k_b}T}}}}$	$p_{FNR,O12}$
12	${{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{RNAP} \over {N_{NS}}}{{FNR_{dim}} \over {N_{NS}}}{e^{-{{{\Delta e_{Rd}}+{\Delta e_{Fd}}+{\Delta e_{FR}}} \over {{k_b}T}}}}$	$p_{FNR,RNAP,O12}$
13	${({{LldR} \over {N_{NS}}})^2}{e^{-{{2 \Delta e_{Rd}+{F_{loop}}+{\Delta e_{LL}}} \over {{k_b}T}}}}$	$p_{ring}$

Given statistical weights for all states, the probability of each state can be calculated by:

p_i = { {statistic \ weight_i} \over {\sum\limits_{j = 1}^{13} {statistic \ weight_j}} }

1.4 Parameter Determination

Table 4. All parameters and their values, units and references.

Parameter	Value	Unit	Significance	Reference
$N_{NS}$	$5e6$	-	Number of non-spoecific binding sites on the DNA strand.	10.1016/j.gde.2005.02.006.
$RNAP$	$1000$	-	Number of RNA polymerases	10.1016/j.gde.2005.02.006.
$LldR$	$500$	-	Number of LldR molecules	10.1016/j.gde.2005.02.006.
$FNR$	$1000$	-	Total number of FNR molecules, calculated by its monomers.	Manually setting
$\Delta e_{Rd}$	$-8$	$k_b T$	Energy difference between situation of RNA polymerase bounded onto its specific binding site and non-specific binding site.	Manually setting refering to 10.1016/j.gde.2005.02.006.
$\Delta e_{Fd}$	$-4$	$k_b T$	Energy difference between situation of FNR dimer bounded onto its specific binding site and non-specific binding site.	Manually setting refering to 10.1016/j.gde.2005.02.006.
$\Delta e_{FR}$	$-8$	$k_b T$	Energy difference between situation of RNA polymerase bounded with FNR and unbounded with FNR.	Manually setting refering to 10.1016/j.gde.2005.02.006.
$\Delta e_{Ld}$	$-10.3$	$k_b T$	Energy difference between situation of LldR bounded onto its specific binding site and non-specific binding site.	Manually setting refering to 10.1016/j.gde.2005.02.006.
$\Delta e_{LL}$	$-9.5$	$k_b T$	Energy difference between situation of LldR bounded with one another and unbounded with one another.	Manually setting refering to 10.1016/j.gde.2005.02.006.
$F_{Loop}$	$+3$	$k_b T$	Energy difference between situation of DNA loop restricted by LldR binding and free DNA strand.	Manually setting refering to 10.1016/j.gde.2005.02.006.
$Rate_R$	$1$	$s^{-1}$	Transcription rate of states where RNA polymerase bounded onto pPepT without assistance of FNR.	doi: 10.1371/journal.pcbi.1008572.
$Rate_{FR}$	$100$	$s^{-1}$	Transcription rate of states where RNA polymerase bounded onto pPepT with assistance of FNR.	doi: 10.1371/journal.pcbi.1008572.
$const_{oxy,a}$	$50$	-	The specified parameter in the function of $K_{FNR}$ about concentration of oxygen.	Manually setting
$const_{oxy,b}$	$30$	-	The specified parameter in the function of $K_{FNR}$ about concentration of oxygen.	Manually setting
$const_{lac}$	$0.3$	${k_b T} \over {mM}$	The specified parameter in the function of $\Delta e_{Ld}$ about concentration of lactate and that of $\Delta e_{LL}$ about concentration of lactate.	Manually setting

2. Effect of Oxygen

2.1 Mechanism of FNR Dimer Formation

FNR activates RNA polymerase binding only when it exist as a dimer. In FNR dimer, ${Fe_4 S_4^{2+}}$ holds the two monomers together. Encountered by oxygen, the iron sulfur cluster are oxidized and converted to two ${Fe_2 S_2^{2+}}$ , which makes FNR protein tend to dissociate (Figure 2). Thus, in high concentration of oxygen, FNR fails to activate RNA polymerases binding.

Figure 6. Mechanism of depression of oxygen on FNR polymerization.

2.2 Modeling of oxygen effect

For FNR dimerization process, the chemical reaction equation and expression of reaction equilibrium constant are as followed:

2FNR_{mon} \rightleftharpoons FNR_{dim},

{K_{FNR}}={{{[FNR]_{mon}}^2} \over {[FNR]_{dim}}}

And we know the total number of FNR:

[FNR]_{total} = 2 \times [FNR]_{dim}+ [FNR]_{mon}

By combining the above two equations, it can be concluded that:

[FNR]_{dim} = 0.125 \cdot (K_{\text{FNR}} + 4 \cdot \text{FNR}_{\text{total}} - (K_{\text{FNR}} \cdot (K_{\text{FNR}} + 8 \cdot \text{FNR}_{\text{total}}))^{0.5})

[FNR]_{mon} = 0.25 \cdot (-K_{\text{FNR}} + (K_{\text{FNR}} \cdot (K_{\text{FNR}} + 8 \cdot \text{FNR}_{\text{total}}))^{0.5}),

which means number of $FNR_{dim}$ and $FNR_{mon}$ can be settled by equilibrium $K_{FNR}$ and $FNR_{total}$ . Then, given a value of $K_{FNR}$ , $FNR_{dim}$ can be directly calculated (Figure 3, left).

Figure 7. Left: Relation between $[FNR]_{dim}$ and $K_{FNR}$ . Right: Relation between $[FNR]_{dim}$ and $oxygen$ .

Based on mechanism in (2.1), we simply assumed an exponential relationship between the equilibrium constant and oxygen concentration: $K_{FNR} = const_{oxy,a} e^{const_{oxy,b} \times oxygen}$ . According to molecular mechanism of the ciucuit, as concentration of oxygen increasing, equilibrium of reaction gets enlarged and FNR has a bigger tendency to exist as monomer.

Based on the pre experimental data of hypoxia characterization, we found that the increase in OD600 value was significant during the stage of oxygen concentration increasing from 0 to 20%. This indicates that during the stage where the oxygen concentration increases from 0 to 20%, the increase in FNR dimer concentration and the increase in the probability of the occurrence of FNR and RNAP binding DNA together are significant. Based on the range of experimental data from the pre experiment, we adjust the values of two parameters $const_{oxy,a}$ and $const_{oxy,b}$ to determine the functional relationship between $K_{FNR}$ and $oxygen$ (Figure 3, right):

K_{FNR} = const_{oxy,a} e^{const_{oxy,b} \times oxygen},

const_{oxy,a} = 50, \ \ const_{oxy,b} = 30, \ \ oxygen \in [0, \ 0.3]

Figure 8. Left: Probabilities of the basic four states (none, FNR, RNAP, FNR_RNAP) under condition of different $[FNR]_{dim}$ . Purple: None of proteins bounded onto DNA. Orange: RNAP bounded onto DNA. Yellow: FNR and RNAP bounded onto DNA together. Right: The probability distribution in the left figure changes as the number of FNR dimers increases. As the number of FNR dimers increases, the probability of RNAP binding state and FNR and RNAP co binding state increases, while the probability of no protein binding state decreases.

3. Effect of Lactate

3.1 Mechanism of LldR Polymerization

The LldR protein monomer first forms a dimer, which can bind to O1 and O2. When both O1 and O2 bind to the LldR protein, the two dimers combine to form the LldR tetramer, forming a DNA loop that inhibits DNA expression in the loop. There is a lactate binding site on each LldR monomer, and when lactate binds to a protein monomer, it can lead to a change in the monomer's concept, tending to form a protein monomer rather than a polymerized state (Figure 5). Therefore, we assume that the concentration of lactic acid molecules affects the binding energy of LldR protein to DNA and the binding energy of LldR dimers in a simple linear manner, reducing the energy reduction in their binding process.

Figure 9. Structure of LldR and the process of LldR polymerazation and dissociation.

3.2 Modeling of lactate effect

\Delta {e_{ll}^{'}} = \Delta e_{ll} + const_{lac} \times lactate

\Delta {e_{ld}^{'}} = \Delta e_{ld} + const_{lac} \times lactate

const_{lac} = 0.3,\ \ \ lactate \in [0, \ 30]

4. Effect of O1, O2 and pPepT Location

In wet lab of our project, we constructed 6 kinds of composite promoters: $S3$ , $S2$ , $S1$ , $NS$ , $L2$ , $L1$ . These six composite promoters bears different base number between O1-pPepT and O1-O2 (Table3).

Table 5. Base number of O1-pPepT and O1-O2 for 6 constructed composite promoters.

Name of promoter	O1-pPepT (bp)	O1-O2 (bp)	$\lambda$	$F_{loop}$
$S3$	50	134	0.373	32.45
$S2$	61	143	0.427	39.84
$S1$	52	143	0.364	28.95
$NS$	71	162	0.438	37.01
$L2$	62	153	0.405	33.50
$L1$	81	172	0.471	40.31

In the original model, parameter $F_{loop}$ bears the significance of the energy difference between states of DNA loop restricted by LldR and free DNA strand. With the distance of O1-pPepT and O1-O2 changed, energy difference between the two states changed. Considering both the length of composite promoter (base number between O1 and O2) and the location of pPepT (base number between O1 and pPepT) affect the configuration of DNA loop, we introduced a variable $\lambda$ to quantify this affect:

\lambda = {{x_{O1,pPepT}} \over {{x_{O1,O2}}}}

F_{loop} = F_{loop, standard} \times {({{\lambda}/0.16})^2 \over {length}/160}

F_{loop,standard} = 5

Figure 10. Left: Fixing $\lambda$ , the functional relationship between $F_{loop}$ and base number between O1 and O2. Middle: Fixing base number between O1 and O2, the functional relationship between $F_{loop}$ and $\lambda$ . Right: The surface of $F_{loop}$ with respect to changes in $\lambda$ and base number between O1 and O2.

We changed the parameter $F_{loop}$ according to Table 3, and simulated the transcription rate distribution again (Figure 11). Under the influence of the length and pPepT position of the composite promoter, $F_{loop}$ changing result in different probabilities of circular DNA appearance, thereby affecting the sensitivity of the composite promoter to lactate. When the position of pPepT is closer to the position of O1, $\lambda$ is smaller, and $F_{loop}$ is smaller, and the circular DNA state appears as a larger probability. At these situations, the sudden changes in composite promoters' transcription rate caused by lactate concentration are more significant, as shown in the S3, S1, and L2 promoters in Figure 11. Compared to the initial value of $F_{loop}$ , the order of magnitude of this parameter (about 30-40 $k_bT$ ) is slightly greater than the value of the original setting of $F_{loop}$ (about 3-5 $k_bT$ ). This change reduces the probability of the appearance of a circular DNA state, regardless of the oxygen and lactate concentrations. Compared with the initial parameter setting of $F_{loop}$ , the current parameter value of $F_{loop}$ is in better agreement with the promoter characterization results in the wet experiment.

Figure 11. Transcription rate distribution of 6 different composite promoters, and the relationship between transcription rate and concentration of lactate, fixing oxygen concentration at 0 and 0.2.

5 Effect of $\Delta e_{FR}$

According to the distance between the FNR protein binding site and the RNA polymerase binding site, the binding of the two proteins adopts different molecular mechanisms (Figure 8). When the two binding sites are 26.5 bases apart, only the C-terminal of RNA polymerase binds to FNR; When two binding sites are 6.5 bases apart, the C-terminus and N-terminus of RNA polymerase jointly bind to FNR. Based on this, we speculate that when the latter mechanism is combined, the energy reduction of the system is more significant, i.e. $\Delta e_ {FR}$ has a larger absolute value.

Figure 12. Molecular Mechanism of the Interaction between FNR Protein and RNA Polymerase

Based on this, we fixed other parameters in the model and adjust he value of $\Delta e_{FR}$ . The value of this parameter in the original model is -8 ${k_b}T$ , and now observe the distribution of transcription rate and experimental data prediction when the energy term is taken as -6, -7, -9, -10, -11, -12 ${k_b}T$ units (Figure 9). We found that as the absolute binding energy between FNR and RNA polymerase increased, the overall transcription rate increased, but the targeting ability for the hypoxic and high lactate conditions corresponding to tumors $(lactate>10mM, oxygen<0.2)$ decreased. We found that as the absolute binding energy between FNR and RNA polymerase increased, the overall transcription rate increased, but the targeting ability for the hypoxic and high lactate conditions corresponding to tumors $(lactate>10mM, oxygen<0.2)$ decreased. The reason for this phenomenon is that under aerobic conditions, although most DNA strands exist in the form of loops, when the absolute value of binding energy is high, DNA strands that do not exist in the form of loops are more in the form of FNR bound RNA polymerase, resulting in higher compositional expression of pPepT.

Figure 13. Distribution of transcription rate with $\Delta e_{FR}$ takes the value of -6, -7, -9, -10, -11, -12 ${k_b}T$ units.

6. Result

6.1 Probability Distribution of All States as Lactate Concentration Changing

Applying our AND-gate model, we presented the probability of different states with fixed oxygen concentration of $0.1$ and lactate concentration ranging from 0.1mM, 1mM, 10mM, 30mM (Figure 5). We can see from the probability distribution diagram that when the concentration of lactic acid is low (0.1mM, 1mM), the system mainly exists in the form of DNA-loop. When the concentration of lactic acid is high (10mM, 30mM), the probability of DNA-loop appearing is almost zero, indicating good lactate concentration targeting. And under high lactate concentration conditions, the proportion of the four basic states (none, FNR, RNAP, FNR_RNAP) remains unchanged, and the probability of FNR and RNAP binding together is higher than the other three states. When the concentration of lactic acid increases to 30mM, the probability of FNR_RNAP state occurrence is much higher than other states, although when concentration of oxygen is not that low. In addition, when the lactate concentration is near the 10mM level, states of one LldR binding occurs, while when the lactate concentration is too low and too high, these states bear extreme low probabilities.

Figure 14. Probability distribution of all states at different conditions.

6.2 Transcription Rate Prediction

As the oxygen concentration decreases and the lactate concentration increases, the transcription rate of pPepT increases. The sudden jump in transcription rate accompanied by an increase in lactate concentration occurs around 7mM, while the sudden jump in transcription rate accompanied by an increase in oxygen concentration occurs around 0.18. It can be seen that the transcription rate has a greater sudden change in lactate concentration. (Figure 6)

Considering that the oxygen concentration in normal tissues is about 0.2 and the lactate concentration is below 10mM, this result theoretically demonstrates the targeting of engineered Escherichia coli in low oxygen and high lactate environments corresponding to cancer.

Figure 15. Heatmap (left) and 3D (right)picture of predicted transcription rate of pPepT at all probable conditions, $oxygen \in [0, \ 0.3]$ , $lactate \in [0 mM, 30 mM]$ .

Next, we use this model to match the experimental data (Figure 7). Firstly, it is assumed that there is a simple linear relationship between od value and transcription rate. Considering that the lactate concentrations during the experiment were 0.1mM, 1mM, and 10mM, we took the logarithm of lactate concentration and created a numerical image of transcription rate lactate concentration (Figure 7, right). In areas with lactate concentrations below 10mM, both experimental data and predicted images exhibit approximately linear changes in transcription rate with respect to lactate concentration logarithms. However, in the region around 10mM, theoretical predictions indicate a sudden jump in transcription rate. And the experimental data does not match it.

Figure 16.Left: Relationship of transcription rate and lactate, normal oxygen condition is shown in blue and low oxygen condition is shown in orange. Right: Relationship of transcription rate and log(lactate).

Integrate Model

1. Result

Aerobic intensity prediction model (step 1) provides relative transcription rates for different pPepT sequences, while AND-Gate model (step 2) provides relative transcription rates for different oxygen concentrations, lactate concentrations, and O1, pPepT, and O2 relative positions. For our initial goal of finding the most suitable composite promoter for targeting hypoxic and high lactate environments, we integrated the two models and presented the transcriptional rates of different promoters with different pPepT sequences as a function of oxygen and lactate conditions.

6 promoters with different relative positions of O1, O2, and pPepT (s3, s2, s1, ns, l2, l1) were nested with 8 possible pPepT sequences, and 48 composite promoters were obtained by permutation and combination. We assume that there is no mutual influence between the sequence targeting pPepT in the first step and the predictions targeting conditions, promoter length, and pPepT position in the second step. According to the multiplication principle of probability, the final prediction of hybrid promoter can be calculated by:

Rate (oxygen, lactate, x_{O1,O2}, x_{O1,pPepT}, \alpha)

= Rate_1(\alpha) \times Rate_2(oxygen, lactate, x_{O1,O2}, x_{O1,pPepT})

Table 6. Predicted $Rate_2(\alpha)$ by step 1.

$\alpha$	WA	WA	WT	WT	FA	FA	FT	FT
$Name \ of \ pPepT$	BWW	JWW	BWT	JWT	BFW	JFW	BFT	JFT
$Rate_2(\alpha)$	388.241	379.012	980.648	957.338	724.798	707.569	1830.75	1787.23

Given the two base number, lactate and oxygen conditions, $Rate_1$ can be calculated based on the AND-Gate model, and then give the pPepT sequence to obtain $Rate_2$ . Multiplying the two rates, we obtain the predicted results of the integrated model.

The results (Figure 12) indicate that all probable promoters have selectivity for lactate and oxygen conditions, but the basal expression levels of different promoters are different. S2, NS, and L1 bear higher basal expression levels. The basal expression levels of different promoters can be seen from the relative expression rate under low lactate concentration and normoxic conditions.

In addition, regardless of the form of the composite promoter, the relative transcription rate ratio between different pPepT sequences is certain. This is determined by the model itself, as we assume that the composite promoter has no effect on the prediction of the pPepT sequence, i.e. the two rate variables $Rate_1$ and $Rate_2$ are independent of each other.

Figure 17. The relative transcription rates predicted by the integrated model for 48 possible promoters under hypoxic/aerobic conditions and different lactate concentrations.

We then selected the composite promoters that were successfully characterized in the wet experiment and standardized the data in the same way as wet experiments. Due to the high background expression intensity of the S2, NS, and L1 promoters, after standardization, these three promoters showed lower selectivity for hypoxic and high lactate environments. The other three promoters S3, S1, and L2 have higher selectivity for hypoxic and high lactate environments. The transcription rates of different pPepT promoters maintain the proportion predicted by step1. After standardization, the differences in transcription rates among different pPepT sequences were eliminated. Because the relative transcription rate of each pPepT sequence under different conditions is standardized using data corresponding to its own sequence.

Figure 18. Left: The relative transcription rates of all promoters successfully characterized in wet experiments under different conditions. Right: Standardized presentation of the data in the left image. The standardization method is to divide each row of data by the first column of that row.

The high selectivity of S3, S1, and L2 towards high lactate and low oxygen environments is consistent with actual experimental results.

Figure 19. Experimental results (left) and predicted results (right) of standardized relative transcription rate.

The simulation results indicate that among the many successfully characterized promoters, L2-WT, S3-WT, and S3-FA are more in line with the model's predicted target for high lactate and low oxygen environments, indicating that the construction of these sequences is more successful.

2. Discussion

2.1 Comparison between wet lab and dry lab results.

The fitting results of the dry experiment indicate that regardless of how the binding energy of protein DNA is adjusted, there is always a sudden jump region in the transcription rate regarding lactate concentration. Before the sudden concentration of lactate appears, the transcription rate remains at a low level. However, in actual measurements, the od value varies approximately linearly with the concentration of lactic acid. Possible reasons include: Firstly, the model may not include a pathway for the constitutive expression of some pPepT. Secondly, when the concentration of lactic acid is high, it is difficult for Escherichia coli to survive, and under existing experimental conditions, we cannot obtain od values when the concentration of lactic acid is greater than 10mM. Considering the impact of increased lactate concentration on mortality, the sudden jump phenomenon will be weakened or eliminated.

2.2 Noise.

Biological processes, including gene transcription and expression processes, can be considered as random processes. Therefore, considering the gene expression process of random noise is more in line with biological reality.

2.3 Parameters' Physical Meaning.

When considering the interaction function of oxygen and lactate small molecule concentration, we referred to the specific molecular mechanisms of two small molecules participating in the pathway and proposed a conjecture of the interaction function based on the molecular mechanisms. However, due to the lack of direct measurement of the interactions between various proteins and DNA, we are unable to accurately calibrate $const_{oxy, a}$ , $const_{oxy, b}$ and $const_{lac}$ The specific values of the three parameters lac under different conditions do not currently have specific physical meanings, although they are physical quantities with certain units. This is the part where the model itself can be improved.

Code

MC Model

Introduction

Humans are complex systems which can be simplified into interconnected compartments. When a drug, especially one based on bacteria, is administered, it doesn't remain static. It moves, interacts, and gets metabolized in various compartments of the body. To encapsulate this dynamic, a multi-compartment (MC) model that divides the body into several compartments is proposed, each representing a significant region where the drug might interact or get metabolized.

Methods

Compartments Designation

For this model, the body is represented by the following compartments:

Central Compartment (C): This is the primary region where the drug is administered and from where it gets distributed to other parts of the body.
Tumor Compartment (T): A specific target for many drugs, this compartment represents the tumor region in the body.
Liver Compartment (L): The liver, a major organ for drug metabolism, has its own compartment.
Peripheral Compartment (P): This represents other parts of the body not explicitly modeled.
Phagocyte Compartment (PC): Representing the cells that engulf and destroy bacteria, this compartment is crucial for a bacteria-based drug.

Parameters and Their Significance

The model is governed by a plethora of parameters, each adding a layer of depth to the interactions:

Cx: The concentration of drug in the x-compartment.

$k_{cij{\text{max}}}$ and $K_{ij}$ determine the flow from the i-compartment to the j-compartment.
r: The intrinsic growth rate of the bacteria in the Tumor Compartment.
K: The carrying capacity, or the maximum concentration that the Tumor Compartment can sustain.

Mathematical Core

The heart of this model lies in its differential equations. These equations capture the rate of change of the drug concentration in each compartment over time. They consider various factors such as:

Transfer of drug between compartments.
Metabolism or elimination of the drug.
Growth or decay of bacterial concentration, especially in the tumor region.

For the differential equations, we assume that the gradient of concentration is respectively linear to the concentrations of itself and the compartments connected to it, with positive representing inflow and negative representing outflow. For the symbols, initial letters are adopted as the abbreviation of each compartment.

Following this logic, we get the ODEs:

Central Compartment (C):

\frac{dCc}{dt} = -k_{ct} \cdot Cc + k_{tc} \cdot Ct - k_{cl} \cdot Cc + k_{lc} \cdot Cl + k_{pcl} \cdot Cpc - k_{cp} \cdot Cc + k_{pc} \cdot Cp - k_{e_c} \cdot Cc

Tumor Compartment (T):

\frac{dCt}{dt} = k_{ct} \cdot Cc - k_{tc} \cdot Ct + r \cdot Ct \cdot \left(1 - \frac{Ct}{K}\right) - k_{e_t} \cdot Ct

Liver Compartment(L):

\frac{dCl}{dt} = k_{cl} \cdot Cc - k_{lc} \cdot Cl - k_{lpc} \cdot Cl + k_{pcl} \cdot Cpc - k_{e_l} \cdot Cl

Peripheral Compartment (p):

\frac{dCp}{dt} = k_{cp} \cdot Cc - k_{pc} \cdot Cp - k_{e_p} \cdot C_p

phagecyto Compartment (pc):

\frac{dCpc}{dt} = k_{lpc} \cdot Cl - k_{pcl} \cdot Cpc - k_{e_p} \cdot C_pc

For better simulation of the real scenarios, we assume that k is a non-linear function of the concentration. So we get:

k_{ct} = k_{ct\_max} \cdot \frac{Cc}{K_{ct} + Cc}

k_{tc} = k_{tc\_max} \cdot \frac{Ct}{K_{tc} + Ct}

k_{cl} = k_{cl\_max} \cdot \frac{Cc}{K_{cl} + Cc}

k_{lc} = k_{lc\_max} \cdot \frac{Cl}{K_{lc} + Cl}

k_{lpc} = k_{lpc\_max} \cdot \frac{Cl}{K_{lpc} + Cl}

k_{pcl} = k_{pcl\_max} \cdot \frac{Cpc}{K_{pcl} + Cpc}

Simulating the Dynamics

With the model and its parameters in place, I set forth to simulate the bacterial concentration dynamics over a span of 10 time units. To add more depth, I varied the elimination rates in the Central compartment, observing how different rates affected the overall distribution of the drug.

Visualizing the Journey of Bacteria

Each simulation painted a vivid picture, showing how the bacterial concentration evolved in each compartment over time. The plots revealed fascinating patterns, indicating the balance between administration, distribution, metabolism, and elimination.

Results

Outcome Analysis

From the graph, we can see that drug concentration in the tumor compartments maintains a relatively high numerical value. The relative order of final concentration is as follows: tumor > central compartment > liver > peripheral compartment > phagocyte. This is quite a reasonable outcome since the hypoxia and high-lactate environment boost bacteria growth. As the only compartment connected to the tumor, the central compartment has the second highest concentration. Due to the comprehensive effect of a relatively larger decaying rate and moving-in rate, the order is followed by the peripheral and liver compartment. Phagocyte displays a constantly low concentration since it's where the bacteria are deliberately killed.

To simulate the changing trend of compartments when the bacteria shows different elimination rate which can be achieved through our envisioned programmable CAP system (Harimoto et al) by further genetic editing aiming at surface modulation, we change the k_e in the range from 0.05 to 1. This offers a reference for future modification of the bacteria to maintain a moderate drug concentration through CAP system adjustment.

Figure 16-25 | Concentration trend by different elimination rate

Numerical Sensitivity Analysis

We explore the impacts of varying each model parameter up and down by 1%, 5%, and 10%. By investigating these changes, we can observe how sensitive each compartment (Cc, Ct, Cl, Cpc, Cp) is to changes in each parameter. For each perturbation, the model's differential equations are solved, and the final concentrations in each compartment are recorded. The analysis produces a total of 15 plots, each showing the impact of parameter changes on a single compartment for a specific perturbation percentage.

Sensitivity Bidirectional