M

o

d

e

l

HPP Model


Our main goal is to filter out a promoter sequence with the best distinction under conditions with or without environmental oxygen, so that the survival of E.coli differs most in the tumor and the normal homeoenvironment. To achieve this goal, we develop Hybrid Promoter Predicting (HPP) model. We divide the whole task into the following 3 steps:


1. Predict the relative intensity of a set of pPept parts in promoters in aerobic environments (Aerobic Intensity Prediction - Linear Model)
2. Predict the promoter intensity in hypoxia environment focusing primarily on the overall gene circuit, without taking into account the details such as mutant pPept components. (AND-Gate Model)
3. Predict the overall relative intensity by multiplying the 2 above (Integrate Model)



Aerobic Intensity Prediction - Linear Model


We calculate the transcription rate of the naked DNA strings, namely the intensity of the promoter under aerobic conditions, simulating the bacteria growth in normal tissues. Then, we verify the result by comparing the outcome of the in-silico experiment with the proportion of fluorescence to OD600.


Introduction


Inspiration from the paper LaFleur, T.L., Hossain, A. & Salis, H.M. Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria. Nat Commun 13, 5159 (2022).. Through machine learning, the author used the dataset based on data from NCBI and the supplementary data the author provided to train a prediction model by determining the parameters needed. These interactions occur at DNA sites known by their canonical positions, including (i) an upstream 6-nucleotide site called the -35 motif; (ii) a 20-nucleotide region that appears upstream of the -35 motif, called the Up element; (iii) a downstream 6-nucleotide site called the -10 motif; (iv) a spacer region that separates the -10 and -35 motifs; (v) a 4-nucleotide site upstream of the -10 motif, called the -10 extended motif (-10 ext); (vi) a typically 6-nucleotide region in between the -10 motif and TSS, called the discriminator (Disc); and (vii) the first 20 transcribed nucleotides, called the initial transcribed region (ITR). The dataset includes the parts mentioned above. Parameters are assigned according to model aims to find the transcription starting site and its corresponding free energy change as well as transcription rate, showing good predicting ability in testing datasets. Therefore, our team decided to adapt this model to help the prediction of our known promoter sequences, again certifying our in-vitro experiment outcome, further amplifying our confidence in its in-vivo effect as a cancer-targeting drug.


Method


statistical thermodynamic model

In the article (Travis L. et al), the author formulated a statistical thermodynamic model, which indicates that we can decompose how a promoter's sequence controls the interaction energies into a sum of free energies that can be related to the transcription initiation rate (TX), according to:

ΔGtotal=ΔGUP+ΔG35+ΔGspacer+ΔG10ext+ΔG10+ΔGdisc+ΔGITRΔG_{\text{total}} = ΔG_{\text{UP}} + ΔG_{35} + ΔG_{\text{spacer}} + ΔG_{10\text{ext}} + ΔG_{10} + ΔG_{\text{disc}} + ΔG_{\text{ITR}}
logTXTXref=β(ΔGtotalΔGtotal,ref)\log\frac{TX}{TX_{\text{ref}^{\prime}}} = β(\Delta G_{\text{total}} - \Delta G_{\text{total,ref}})

Overall, a rigid regression model with 346 fitted coefficients was found. To perform our analysis with cofficients in free_energy_coeffs.npy, we processed our promoter sequence in the following way:

Sequence_NameUPh35spacer
BFTtctagagGGATAAAATTGATCTGAATCAATATTTGTCTTTTCTTGCTT
BFWtctagagGGATAAAATTGATCTGAATCAATATTTGTCTTTTCTTGCTT
BWTtctagagGGATAAAAGTGACCTGACGCAATATTTGTCTTTTCTTGCTT
BWWtctagagGGATAAAAGTGACCTGACGCAATATTTGTCTTTTCTTGCTT
JFTtctagagGGATAAAATTGATCTGAATCAATATTTGTCTTTTCTTGCTT
JFWtctagagGGATAAAATTGATCTGAATCAATATTTGTCTTTTCTTGCTT
JWTtctagagGGATAAAAGTGACCTGACGCAATATTTGTCTTTTCTTGCTT
JWWtctagagGGATAAAAGTGACCTGACGCAATATTTGTCTTTTCTTGCTT

Cont.

Sequence_Nameh10DiscItr
BFTTATAATGTTGTCAtctaGAGaaagaggagaaaTACTA
BFWAATAATGTTGTCATCTAGAGaaagaggagaaaTACTA
BWTtATAATGTTGTCATCTAGAGaaagaggagaaaTACTA
BWWAATAATGTTGTCATCTAGAGaaagaggagaaaTACTA
JFTTATAATGTTGTCAtctagaGAAAGACAGGACCCacta
JFWAATAATGTTGTCATCTAgaGAAAGACAGGACCCacta
JWTtATAATGTTGTCATCTAgaGAAAGACAGGACCCacta
JWWAATAATGTTGTCATCTAgaGAAAGACAGGACCCacta

*BWW is the wild type promoter

Table 1 Promoter Sequence Parts


For sequence name, B/J represents the ribosome binding site; F/W represent the FNR-box; T/W represent the -10 site.

Note that we determine the parts by choosing the -10 site as standard. Then, a function analyze_excel is defined, taking the Excel as input, calculating the respective free energy change and corresponding TX rate and visualizing them in the histogram.


Result

Promoter TypeSequence_NamedG_totalTX_raterelative TX
WA(wild type)BWW-1.35921388.2411
WAJWW-1.3445379.0120.97623
WTBWT-1.9255980.6482.52588
WTJWT-1.9108957.3382.46584
FABFW-1.74074724.7981.86688
FAJFW-1.72603707.5691.8225
FTBFT-2.307041830.754.7155
FTJFT-2.292331787.234.60341

Table 2 Prediction result


With the linear prediction model, we acquire the relative free energy change, transcription rate and their relative transcription rate using the wild type promoter(BWW) as base intensity.


Figure 1: In-silico prediction of transcription rate

in-silico prediction of transcription rate


Figure 2, 3: In vitro GFP standard expression


Figure 4: Linear fit with error bars

linear fit


Since the culture environment varies with batch, which results in huge differences between the multitude of standard fluorescence, we choose the 090123 batch since the bacteria solution had been renewed, leading to higher bacteria activity. From the graph, it’s easy to see that the results of in-silico and in-vitro outcomes match well. Consequently. It offers a reference for the next step of hypoxia prediction.


Discussion


  • · Free energy calculation may vary based on how we process our known sequences into parts by choosing different variables. We choose 11 for the length of Disc, and 20 for the length of Itr. For different lengths chosen, slightly different results may be reached.

  • · FNR binds to the -41.5 site, hindering the contribution of the -35 site by covering on the deoxyriboses. Therefore, the role -35 site plays may assume less importance in the actual scenario.



And-Gate Model


1.Model Construction


We constructed the model based on the molecular mechanism of gene circuit of AND-gate, and gave the predicted transcription rate at arbitrarily given lactate and oxygen concentration. Thus, we theoretically demonstrated the selectivity and specificity of our engineered E.coli for tumor microenvironment. Also, results of modeling gave clues about methods for gene circuit optimization.


1.1 Landscape of the Circuit

Gene circuit of our project consists of three main parts:


(1) Element targeting lack of oxygen:

FNR which is an activator, binds specific sequence at the upstream site of pPepT, and helps RNA polymerase bindng onto the promoter. FNR protein is an activator when it exist as a dimer, and oxygen molecular hinders the polymerization of two FNR monomers.


(2) Element targeting high concentration of lactate:

LldR which is a repressor, binds onto O1 and O2 sequence and subsequently polymerize into a dimer. The dimer forms a DNA loop and hinder the expression of genes between O1 and O2.


(3) Element expressing LldR:

With LldR overexpressing, pPepT is locked by DNA loop and will not be turned on.

Element (1) and (2) make up the and-gate structure. Promoter pPepT is turned on only in condition with high concentration of lactate and lack of oxygen, when DNA loop is open and FNR takes effect. Element (3) is designed to reduce constitutive expression.

Image 1
Figure 5: Landscape of Engineered E.coli's Gene Circuit

1.2 Basic assumptions

(1) The thermodynamic system consists of a single DNA strand containing fragment of relative genes, large amount of proteins including RNA polymerases, LldR and FNR (monomers and dimers), and small moleculars including lactate and oxygen.


(2) There are enumerous binding sites for RNA polymerases on the DNA strand but only one site is specific for RNA polymerase. This assumption also holds for other activators (FNR protein) and repressors (LldR protein).


(3) The probability of systems' states are given by energy consuming during DNA-protein interaction according to Boltzmann distribution. For a given state of a thermal system, the Boltzmann factor is the exponential of minus its energy, measured in units of kb×Tk_b \times T. (4) The strength of promoter expression are directedly decided by probabilities of states where RNA polymerase are bound onto specific site on pPepT. For each state ii, rir_i labels the transcription rate of state ii. When RNA polymerazes are bounded onto non-specific sites, rir_i is assumed to be zero. Thus, the whole transcription rate is given by:

Transcription rate=ripiTranscription \ rate = \sum{r_i}{p_i}

1.3 System States and Corresponding Probability

According to the gene circuits of our AND-gate, we abstracted 13 discrete states based on DNA-protein interaction, and gave the probability of each based on Boltzmann distribution. When LldR are not bounded onto O1 and O2, there are four possble states: neither FNR nor RNA polymerase bounded on their specific sites (state 1); RNA polymerase bounded on its specific site but FNR does not (state 2);FNR bounded on its specific site but RNA polymerase does not (state 3); both FNR and RNA polymerase bounded on their specific sites (state 4). These four situations also occur in other two major categories: LldR binding to O1 (states 5-8), LldR binding to O1, and O2 (states 9-12). In state 13, two LldR proteins polymerize and form the DNA loop.


Table 3. Pictures of all States of the system and their corresponding probabilities.
State NumberState PictureStatistical WeightVariables
1
Img
1pnonep_{none}
2
Img
RNAPNNSeΔeRdkbT{{RNAP} \over {N_{NS}}}{e^{-{{\Delta e_{Rd}} \over {{k_b}T}}}}pRNAPp_{RNAP}
3
Img
FNRdimNNSeΔeFdkbT{{FNR_{dim}} \over {N_{NS}}}{e^{-{{\Delta e_{Fd}} \over {{k_b}T}}}}pFNRp_{FNR}
4
Img
RNAPNNSFNRdimNNSeΔeRd+ΔeFd+ΔeFRkbT{{RNAP} \over {N_{NS}}}{{FNR_{dim}} \over {N_{NS}}}{e^{-{{{\Delta e_{Rd}}+{\Delta e_{Fd}}+{\Delta e_{FR}}} \over {{k_b}T}}}}pFNR,RNAPp_{FNR,RNAP}
5
Img
LldRNNSeΔeLdkbT{{LldR} \over {N_{NS}}} {e^{-{{{\Delta e_{Ld}}\over {k_b T}}}}}pnone,O1p_{none,O1}
6
Img
LldRNNSeΔeLdkbT×RNAPNNSeΔeRdkbT{{LldR} \over {N_{NS}}} {e^{-{{{\Delta e_{Ld}}\over {k_b T}}}}} \times {{RNAP} \over {N_{NS}}}{e^{-{{\Delta e_{Rd}} \over {{k_b}T}}}}pRNAP,O1p_{RNAP,O1}
7
Img
LldRNNSeΔeLdkbT×FNRdimNNSeΔeFdkbT{{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{FNR_{dim}} \over {N_{NS}}}{e^{-{{\Delta e_{Fd}} \over {{k_b}T}}}}pFNR,O1p_{FNR,O1}
8
Img
LldRNNSeΔeLdkbT×RNAPNNSFNRdimNNSeΔeRd+ΔeFd+ΔeFRkbT{{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{RNAP} \over {N_{NS}}}{{FNR_{dim}} \over {N_{NS}}}{e^{-{{{\Delta e_{Rd}}+{\Delta e_{Fd}}+{\Delta e_{FR}}} \over {{k_b}T}}}}pFNR,RNAP,O1p_{FNR,RNAP,O1}
9
Img
LldRNNSeΔeLdkbT×LldRNNSeΔeLdkbT{{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}}pnone,O12p_{none,O12}
10
Img
LldRNNSeΔeLdkbT×LldRNNSeΔeLdkbT×RNAPNNSeΔeRdkbT{{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{RNAP} \over {N_{NS}}}{e^{-{{\Delta e_{Rd}} \over {{k_b}T}}}}pRNAP,O12p_{RNAP,O12}
11
Img
LldRNNSeΔeLdkbT×LldRNNSeΔeLdkbT×FNRdimNNSeΔeFdkbT{{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{FNR_{dim}} \over {N_{NS}}}{e^{-{{\Delta e_{Fd}} \over {{k_b}T}}}}pFNR,O12p_{FNR,O12}
12
Img
LldRNNSeΔeLdkbT×LldRNNSeΔeLdkbT×RNAPNNSFNRdimNNSeΔeRd+ΔeFd+ΔeFRkbT{{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{LldR} \over {N_{NS}}} {e^{-{{\Delta e_{Ld}}\over {k_b T}}}} \times {{RNAP} \over {N_{NS}}}{{FNR_{dim}} \over {N_{NS}}}{e^{-{{{\Delta e_{Rd}}+{\Delta e_{Fd}}+{\Delta e_{FR}}} \over {{k_b}T}}}}pFNR,RNAP,O12p_{FNR,RNAP,O12}
13
Img
(LldRNNS)2e2ΔeRd+Floop+ΔeLLkbT{({{LldR} \over {N_{NS}}})^2}{e^{-{{2 \Delta e_{Rd}+{F_{loop}}+{\Delta e_{LL}}} \over {{k_b}T}}}}pringp_{ring}

Given statistical weights for all states, the probability of each state can be calculated by:


pi=statistic weightij=113statistic weightjp_i = { {statistic \ weight_i} \over {\sum\limits_{j = 1}^{13} {statistic \ weight_j}} }

1.4 Parameter Determination

Table 4. All parameters and their values, units and references.
ParameterValueUnitSignificanceReference
NNSN_{NS}5e65e6-Number of non-spoecific binding sites on the DNA strand.10.1016/j.gde.2005.02.006.
RNAPRNAP10001000-Number of RNA polymerases10.1016/j.gde.2005.02.006.
LldRLldR500500-Number of LldR molecules10.1016/j.gde.2005.02.006.
FNRFNR10001000-Total number of FNR molecules, calculated by its monomers.Manually setting
ΔeRd\Delta e_{Rd}8-8kbTk_b TEnergy difference between situation of RNA polymerase bounded onto its specific binding site and non-specific binding site.Manually setting refering to 10.1016/j.gde.2005.02.006.
ΔeFd\Delta e_{Fd}4-4kbTk_b TEnergy difference between situation of FNR dimer bounded onto its specific binding site and non-specific binding site.Manually setting refering to 10.1016/j.gde.2005.02.006.
ΔeFR\Delta e_{FR}8-8kbTk_b TEnergy difference between situation of RNA polymerase bounded with FNR and unbounded with FNR.Manually setting refering to 10.1016/j.gde.2005.02.006.
ΔeLd\Delta e_{Ld}10.3-10.3kbTk_b TEnergy difference between situation of LldR bounded onto its specific binding site and non-specific binding site.Manually setting refering to 10.1016/j.gde.2005.02.006.
ΔeLL\Delta e_{LL}9.5-9.5kbTk_b TEnergy difference between situation of LldR bounded with one another and unbounded with one another.Manually setting refering to 10.1016/j.gde.2005.02.006.
FLoopF_{Loop}+3+3kbTk_b TEnergy difference between situation of DNA loop restricted by LldR binding and free DNA strand.Manually setting refering to 10.1016/j.gde.2005.02.006.
RateRRate_R11s1s^{-1}Transcription rate of states where RNA polymerase bounded onto pPepT without assistance of FNR.doi: 10.1371/journal.pcbi.1008572.
RateFRRate_{FR}100100s1s^{-1}Transcription rate of states where RNA polymerase bounded onto pPepT with assistance of FNR.doi: 10.1371/journal.pcbi.1008572.
constoxy,aconst_{oxy,a}5050-The specified parameter in the function of KFNRK_{FNR} about concentration of oxygen.Manually setting
constoxy,bconst_{oxy,b}3030-The specified parameter in the function of KFNRK_{FNR} about concentration of oxygen.Manually setting
constlacconst_{lac}0.30.3kbTmM{k_b T} \over {mM}The specified parameter in the function of ΔeLd\Delta e_{Ld} about concentration of lactate and that of ΔeLL\Delta e_{LL} about concentration of lactate.Manually setting

2. Effect of Oxygen


2.1 Mechanism of FNR Dimer Formation

FNR activates RNA polymerase binding only when it exist as a dimer. In FNR dimer, Fe4S42+{Fe_4 S_4^{2+}} holds the two monomers together. Encountered by oxygen, the iron sulfur cluster are oxidized and converted to two Fe2S22+{Fe_2 S_2^{2+}}, which makes FNR protein tend to dissociate (Figure 2). Thus, in high concentration of oxygen, FNR fails to activate RNA polymerases binding.


Image 1
Figure 6. Mechanism of depression of oxygen on FNR polymerization.

2.2 Modeling of oxygen effect

For FNR dimerization process, the chemical reaction equation and expression of reaction equilibrium constant are as followed:

2FNRmonFNRdim,2FNR_{mon} \rightleftharpoons FNR_{dim},
KFNR=[FNR]mon2[FNR]dim{K_{FNR}}={{{[FNR]_{mon}}^2} \over {[FNR]_{dim}}}

And we know the total number of FNR:

[FNR]total=2×[FNR]dim+[FNR]mon[FNR]_{total} = 2 \times [FNR]_{dim}+ [FNR]_{mon}

By combining the above two equations, it can be concluded that:

[FNR]dim=0.125(KFNR+4FNRtotal(KFNR(KFNR+8FNRtotal))0.5)[FNR]_{dim} = 0.125 \cdot (K_{\text{FNR}} + 4 \cdot \text{FNR}_{\text{total}} - (K_{\text{FNR}} \cdot (K_{\text{FNR}} + 8 \cdot \text{FNR}_{\text{total}}))^{0.5})
[FNR]mon=0.25(KFNR+(KFNR(KFNR+8FNRtotal))0.5),[FNR]_{mon} = 0.25 \cdot (-K_{\text{FNR}} + (K_{\text{FNR}} \cdot (K_{\text{FNR}} + 8 \cdot \text{FNR}_{\text{total}}))^{0.5}),

which means number of FNRdimFNR_{dim} and FNRmonFNR_{mon} can be settled by equilibrium KFNRK_{FNR} and FNRtotalFNR_{total}. Then, given a value of KFNRK_{FNR}, FNRdimFNR_{dim} can be directly calculated (Figure 3, left).


Image 1

Figure 7. Left: Relation between [FNR]dim[FNR]_{dim} and KFNRK_{FNR}. Right: Relation between [FNR]dim[FNR]_{dim} and oxygenoxygen.

Based on mechanism in (2.1), we simply assumed an exponential relationship between the equilibrium constant and oxygen concentration: KFNR=constoxy,aeconstoxy,b×oxygenK_{FNR} = const_{oxy,a} e^{const_{oxy,b} \times oxygen}. According to molecular mechanism of the ciucuit, as concentration of oxygen increasing, equilibrium of reaction gets enlarged and FNR has a bigger tendency to exist as monomer.

Based on the pre experimental data of hypoxia characterization, we found that the increase in OD600 value was significant during the stage of oxygen concentration increasing from 0 to 20%. This indicates that during the stage where the oxygen concentration increases from 0 to 20%, the increase in FNR dimer concentration and the increase in the probability of the occurrence of FNR and RNAP binding DNA together are significant. Based on the range of experimental data from the pre experiment, we adjust the values of two parameters constoxy,aconst_{oxy,a} and constoxy,bconst_{oxy,b} to determine the functional relationship between KFNRK_{FNR} and oxygenoxygen (Figure 3, right):

KFNR=constoxy,aeconstoxy,b×oxygen,K_{FNR} = const_{oxy,a} e^{const_{oxy,b} \times oxygen},
constoxy,a=50,  constoxy,b=30,  oxygen[0, 0.3]const_{oxy,a} = 50, \ \ const_{oxy,b} = 30, \ \ oxygen \in [0, \ 0.3]
Image 1
Image 2

Figure 8. Left: Probabilities of the basic four states (none, FNR, RNAP, FNR_RNAP) under condition of different [FNR]dim[FNR]_{dim}. Purple: None of proteins bounded onto DNA. Orange: RNAP bounded onto DNA. Yellow: FNR and RNAP bounded onto DNA together. Right: The probability distribution in the left figure changes as the number of FNR dimers increases. As the number of FNR dimers increases, the probability of RNAP binding state and FNR and RNAP co binding state increases, while the probability of no protein binding state decreases.

3. Effect of Lactate


3.1 Mechanism of LldR Polymerization

The LldR protein monomer first forms a dimer, which can bind to O1 and O2. When both O1 and O2 bind to the LldR protein, the two dimers combine to form the LldR tetramer, forming a DNA loop that inhibits DNA expression in the loop. There is a lactate binding site on each LldR monomer, and when lactate binds to a protein monomer, it can lead to a change in the monomer's concept, tending to form a protein monomer rather than a polymerized state (Figure 5). Therefore, we assume that the concentration of lactic acid molecules affects the binding energy of LldR protein to DNA and the binding energy of LldR dimers in a simple linear manner, reducing the energy reduction in their binding process.


Image 1
Image 1
Figure 9. Structure of LldR and the process of LldR polymerazation and dissociation.

3.2 Modeling of lactate effect

Δell=Δell+constlac×lactate\Delta {e_{ll}^{'}} = \Delta e_{ll} + const_{lac} \times lactate
Δeld=Δeld+constlac×lactate\Delta {e_{ld}^{'}} = \Delta e_{ld} + const_{lac} \times lactate
constlac=0.3,   lactate[0, 30]const_{lac} = 0.3,\ \ \ lactate \in [0, \ 30]

4. Effect of O1, O2 and pPepT Location


In wet lab of our project, we constructed 6 kinds of composite promoters: S3S3, S2S2, S1S1, NSNS, L2L2, L1L1. These six composite promoters bears different base number between O1-pPepT and O1-O2 (Table3).


Table 5. Base number of O1-pPepT and O1-O2 for 6 constructed composite promoters.
Name of promoterO1-pPepT (bp)O1-O2 (bp)λ\lambdaFloopF_{loop}
S3S3501340.37332.45
S2S2611430.42739.84
S1S1521430.36428.95
NSNS711620.43837.01
L2L2621530.40533.50
L1L1811720.47140.31

In the original model, parameter FloopF_{loop} bears the significance of the energy difference between states of DNA loop restricted by LldR and free DNA strand. With the distance of O1-pPepT and O1-O2 changed, energy difference between the two states changed. Considering both the length of composite promoter (base number between O1 and O2) and the location of pPepT (base number between O1 and pPepT) affect the configuration of DNA loop, we introduced a variable λ\lambda to quantify this affect:

λ=xO1,pPepTxO1,O2\lambda = {{x_{O1,pPepT}} \over {{x_{O1,O2}}}}
Floop=Floop,standard×(λ/0.16)2length/160F_{loop} = F_{loop, standard} \times {({{\lambda}/0.16})^2 \over {length}/160}
Floop,standard=5F_{loop,standard} = 5
Image 1
Image 2
Figure 10. Left: Fixing λ\lambda, the functional relationship between FloopF_{loop} and base number between O1 and O2. Middle: Fixing base number between O1 and O2, the functional relationship between FloopF_{loop} and λ\lambda. Right: The surface of FloopF_{loop} with respect to changes in λ\lambda and base number between O1 and O2.

We changed the parameter FloopF_{loop} according to Table 3, and simulated the transcription rate distribution again (Figure 11). Under the influence of the length and pPepT position of the composite promoter, FloopF_{loop} changing result in different probabilities of circular DNA appearance, thereby affecting the sensitivity of the composite promoter to lactate. When the position of pPepT is closer to the position of O1, λ\lambda is smaller, and FloopF_{loop} is smaller, and the circular DNA state appears as a larger probability. At these situations, the sudden changes in composite promoters' transcription rate caused by lactate concentration are more significant, as shown in the S3, S1, and L2 promoters in Figure 11. Compared to the initial value of FloopF_{loop}, the order of magnitude of this parameter (about 30-40 kbTk_bT) is slightly greater than the value of the original setting of FloopF_{loop} (about 3-5 kbTk_bT). This change reduces the probability of the appearance of a circular DNA state, regardless of the oxygen and lactate concentrations. Compared with the initial parameter setting of FloopF_{loop}, the current parameter value of FloopF_{loop} is in better agreement with the promoter characterization results in the wet experiment.


Image 1
Image 2
Image 4
Image 1
Image 2
Image 4
Image 1
Image 2
Image 4
Image 1
Image 2
Image 4
Image 1
Image 2
Image 4
Image 1
Image 2
Image 4
Image 1
Image 2
Image 4
Image 1
Image 2
Image 4
Figure 11. Transcription rate distribution of 6 different composite promoters, and the relationship between transcription rate and concentration of lactate, fixing oxygen concentration at 0 and 0.2.

5 Effect of ΔeFR\Delta e_{FR}


According to the distance between the FNR protein binding site and the RNA polymerase binding site, the binding of the two proteins adopts different molecular mechanisms (Figure 8). When the two binding sites are 26.5 bases apart, only the C-terminal of RNA polymerase binds to FNR; When two binding sites are 6.5 bases apart, the C-terminus and N-terminus of RNA polymerase jointly bind to FNR. Based on this, we speculate that when the latter mechanism is combined, the energy reduction of the system is more significant, i.e. ΔeFR\Delta e_ {FR} has a larger absolute value.

Image 1
Figure 12. Molecular Mechanism of the Interaction between FNR Protein and RNA Polymerase

Based on this, we fixed other parameters in the model and adjust he value of ΔeFR\Delta e_{FR}. The value of this parameter in the original model is -8 kbT{k_b}T , and now observe the distribution of transcription rate and experimental data prediction when the energy term is taken as -6, -7, -9, -10, -11, -12 kbT{k_b}T units (Figure 9). We found that as the absolute binding energy between FNR and RNA polymerase increased, the overall transcription rate increased, but the targeting ability for the hypoxic and high lactate conditions corresponding to tumors (lactate>10mM,oxygen<0.2)(lactate>10mM, oxygen<0.2) decreased. We found that as the absolute binding energy between FNR and RNA polymerase increased, the overall transcription rate increased, but the targeting ability for the hypoxic and high lactate conditions corresponding to tumors (lactate>10mM,oxygen<0.2)(lactate>10mM, oxygen<0.2) decreased. The reason for this phenomenon is that under aerobic conditions, although most DNA strands exist in the form of loops, when the absolute value of binding energy is high, DNA strands that do not exist in the form of loops are more in the form of FNR bound RNA polymerase, resulting in higher compositional expression of pPepT.

Image 1
Image 2
Image 4

Image 1
Image 2
Image 1
Image 2
Image 1
Image 2

Image 4
Image 4
Image 4

Image 1
Image 2
Image 1
Image 2
Image 1
Image 2
Figure 13. Distribution of transcription rate with ΔeFR\Delta e_{FR} takes the value of -6, -7, -9, -10, -11, -12 kbT{k_b}T units.


6. Result


6.1 Probability Distribution of All States as Lactate Concentration Changing

Applying our AND-gate model, we presented the probability of different states with fixed oxygen concentration of 0.10.1 and lactate concentration ranging from 0.1mM, 1mM, 10mM, 30mM (Figure 5). We can see from the probability distribution diagram that when the concentration of lactic acid is low (0.1mM, 1mM), the system mainly exists in the form of DNA-loop. When the concentration of lactic acid is high (10mM, 30mM), the probability of DNA-loop appearing is almost zero, indicating good lactate concentration targeting. And under high lactate concentration conditions, the proportion of the four basic states (none, FNR, RNAP, FNR_RNAP) remains unchanged, and the probability of FNR and RNAP binding together is higher than the other three states. When the concentration of lactic acid increases to 30mM, the probability of FNR_RNAP state occurrence is much higher than other states, although when concentration of oxygen is not that low. In addition, when the lactate concentration is near the 10mM level, states of one LldR binding occurs, while when the lactate concentration is too low and too high, these states bear extreme low probabilities.


Img

Figure 14. Probability distribution of all states at different conditions.

6.2 Transcription Rate Prediction

As the oxygen concentration decreases and the lactate concentration increases, the transcription rate of pPepT increases. The sudden jump in transcription rate accompanied by an increase in lactate concentration occurs around 7mM, while the sudden jump in transcription rate accompanied by an increase in oxygen concentration occurs around 0.18. It can be seen that the transcription rate has a greater sudden change in lactate concentration. (Figure 6)


Considering that the oxygen concentration in normal tissues is about 0.2 and the lactate concentration is below 10mM, this result theoretically demonstrates the targeting of engineered Escherichia coli in low oxygen and high lactate environments corresponding to cancer.


Img

Figure 15. Heatmap (left) and 3D (right)picture of predicted transcription rate of pPepT at all probable conditions, oxygen[0, 0.3]oxygen \in [0, \ 0.3], lactate[0mM,30mM]lactate \in [0 mM, 30 mM].

Next, we use this model to match the experimental data (Figure 7). Firstly, it is assumed that there is a simple linear relationship between od value and transcription rate. Considering that the lactate concentrations during the experiment were 0.1mM, 1mM, and 10mM, we took the logarithm of lactate concentration and created a numerical image of transcription rate lactate concentration (Figure 7, right). In areas with lactate concentrations below 10mM, both experimental data and predicted images exhibit approximately linear changes in transcription rate with respect to lactate concentration logarithms. However, in the region around 10mM, theoretical predictions indicate a sudden jump in transcription rate. And the experimental data does not match it.


Image 1
Image 2
Figure 16.Left: Relationship of transcription rate and lactate, normal oxygen condition is shown in blue and low oxygen condition is shown in orange. Right: Relationship of transcription rate and log(lactate).



Integrate Model


1. Result


Aerobic intensity prediction model (step 1) provides relative transcription rates for different pPepT sequences, while AND-Gate model (step 2) provides relative transcription rates for different oxygen concentrations, lactate concentrations, and O1, pPepT, and O2 relative positions. For our initial goal of finding the most suitable composite promoter for targeting hypoxic and high lactate environments, we integrated the two models and presented the transcriptional rates of different promoters with different pPepT sequences as a function of oxygen and lactate conditions.

6 promoters with different relative positions of O1, O2, and pPepT (s3, s2, s1, ns, l2, l1) were nested with 8 possible pPepT sequences, and 48 composite promoters were obtained by permutation and combination. We assume that there is no mutual influence between the sequence targeting pPepT in the first step and the predictions targeting conditions, promoter length, and pPepT position in the second step. According to the multiplication principle of probability, the final prediction of hybrid promoter can be calculated by:

Rate(oxygen,lactate,xO1,O2,xO1,pPepT,α)Rate (oxygen, lactate, x_{O1,O2}, x_{O1,pPepT}, \alpha)
=Rate1(α)×Rate2(oxygen,lactate,xO1,O2,xO1,pPepT)= Rate_1(\alpha) \times Rate_2(oxygen, lactate, x_{O1,O2}, x_{O1,pPepT})
Table 6. Predicted Rate2(α)Rate_2(\alpha) by step 1.
α\alphaWAWAWTWTFAFAFTFT
Name of pPepTName \ of \ pPepTBWWJWWBWTJWTBFWJFWBFTJFT
Rate2(α)Rate_2(\alpha)388.241379.012980.648957.338724.798707.5691830.751787.23

Given the two base number, lactate and oxygen conditions, Rate1Rate_1 can be calculated based on the AND-Gate model, and then give the pPepT sequence to obtain Rate2Rate_2. Multiplying the two rates, we obtain the predicted results of the integrated model.


The results (Figure 12) indicate that all probable promoters have selectivity for lactate and oxygen conditions, but the basal expression levels of different promoters are different. S2, NS, and L1 bear higher basal expression levels. The basal expression levels of different promoters can be seen from the relative expression rate under low lactate concentration and normoxic conditions.

In addition, regardless of the form of the composite promoter, the relative transcription rate ratio between different pPepT sequences is certain. This is determined by the model itself, as we assume that the composite promoter has no effect on the prediction of the pPepT sequence, i.e. the two rate variables Rate1Rate_1 and Rate2Rate_2 are independent of each other.


Image 1
Figure 17. The relative transcription rates predicted by the integrated model for 48 possible promoters under hypoxic/aerobic conditions and different lactate concentrations.

We then selected the composite promoters that were successfully characterized in the wet experiment and standardized the data in the same way as wet experiments. Due to the high background expression intensity of the S2, NS, and L1 promoters, after standardization, these three promoters showed lower selectivity for hypoxic and high lactate environments. The other three promoters S3, S1, and L2 have higher selectivity for hypoxic and high lactate environments. The transcription rates of different pPepT promoters maintain the proportion predicted by step1. After standardization, the differences in transcription rates among different pPepT sequences were eliminated. Because the relative transcription rate of each pPepT sequence under different conditions is standardized using data corresponding to its own sequence.

Image 1
Image 1
Figure 18. Left: The relative transcription rates of all promoters successfully characterized in wet experiments under different conditions. Right: Standardized presentation of the data in the left image. The standardization method is to divide each row of data by the first column of that row.

The high selectivity of S3, S1, and L2 towards high lactate and low oxygen environments is consistent with actual experimental results.

Image 1
Image 1
Figure 19. Experimental results (left) and predicted results (right) of standardized relative transcription rate.

The simulation results indicate that among the many successfully characterized promoters, L2-WT, S3-WT, and S3-FA are more in line with the model's predicted target for high lactate and low oxygen environments, indicating that the construction of these sequences is more successful.


2. Discussion


2.1 Comparison between wet lab and dry lab results.

The fitting results of the dry experiment indicate that regardless of how the binding energy of protein DNA is adjusted, there is always a sudden jump region in the transcription rate regarding lactate concentration. Before the sudden concentration of lactate appears, the transcription rate remains at a low level. However, in actual measurements, the od value varies approximately linearly with the concentration of lactic acid. Possible reasons include: Firstly, the model may not include a pathway for the constitutive expression of some pPepT. Secondly, when the concentration of lactic acid is high, it is difficult for Escherichia coli to survive, and under existing experimental conditions, we cannot obtain od values when the concentration of lactic acid is greater than 10mM. Considering the impact of increased lactate concentration on mortality, the sudden jump phenomenon will be weakened or eliminated.


2.2 Noise.

Biological processes, including gene transcription and expression processes, can be considered as random processes. Therefore, considering the gene expression process of random noise is more in line with biological reality.


2.3 Parameters' Physical Meaning.

When considering the interaction function of oxygen and lactate small molecule concentration, we referred to the specific molecular mechanisms of two small molecules participating in the pathway and proposed a conjecture of the interaction function based on the molecular mechanisms. However, due to the lack of direct measurement of the interactions between various proteins and DNA, we are unable to accurately calibrate constoxy,aconst_{oxy, a}, constoxy,bconst_{oxy, b} and constlacconst_{lac} The specific values of the three parameters lac under different conditions do not currently have specific physical meanings, although they are physical quantities with certain units. This is the part where the model itself can be improved.


Code


Code

MC Model


Introduction


Humans are complex systems which can be simplified into interconnected compartments. When a drug, especially one based on bacteria, is administered, it doesn't remain static. It moves, interacts, and gets metabolized in various compartments of the body. To encapsulate this dynamic, a multi-compartment (MC) model that divides the body into several compartments is proposed, each representing a significant region where the drug might interact or get metabolized.


Methods


Compartments Designation


mcm_dig

For this model, the body is represented by the following compartments:


  • Central Compartment (C): This is the primary region where the drug is administered and from where it gets distributed to other parts of the body.
  • Tumor Compartment (T): A specific target for many drugs, this compartment represents the tumor region in the body.
  • Liver Compartment (L): The liver, a major organ for drug metabolism, has its own compartment.
  • Peripheral Compartment (P): This represents other parts of the body not explicitly modeled.
  • Phagocyte Compartment (PC): Representing the cells that engulf and destroy bacteria, this compartment is crucial for a bacteria-based drug.

Parameters and Their Significance


The model is governed by a plethora of parameters, each adding a layer of depth to the interactions:


  • Cx: The concentration of drug in the x-compartment.
  • kcijmaxk_{cij{\text{max}}} and KijK_{ij} determine the flow from the i-compartment to the j-compartment.
  • r: The intrinsic growth rate of the bacteria in the Tumor Compartment.
  • K: The carrying capacity, or the maximum concentration that the Tumor Compartment can sustain.

Mathematical Core


The heart of this model lies in its differential equations. These equations capture the rate of change of the drug concentration in each compartment over time. They consider various factors such as:


  • Transfer of drug between compartments.
  • Metabolism or elimination of the drug.
  • Growth or decay of bacterial concentration, especially in the tumor region.

For the differential equations, we assume that the gradient of concentration is respectively linear to the concentrations of itself and the compartments connected to it, with positive representing inflow and negative representing outflow. For the symbols, initial letters are adopted as the abbreviation of each compartment.


Following this logic, we get the ODEs:


Central Compartment (C):

dCcdt=kctCc+ktcCtkclCc+klcCl+kpclCpckcpCc+kpcCpkecCc\frac{dCc}{dt} = -k_{ct} \cdot Cc + k_{tc} \cdot Ct - k_{cl} \cdot Cc + k_{lc} \cdot Cl + k_{pcl} \cdot Cpc - k_{cp} \cdot Cc + k_{pc} \cdot Cp - k_{e_c} \cdot Cc

Tumor Compartment (T):

dCtdt=kctCcktcCt+rCt(1CtK)ketCt\frac{dCt}{dt} = k_{ct} \cdot Cc - k_{tc} \cdot Ct + r \cdot Ct \cdot \left(1 - \frac{Ct}{K}\right) - k_{e_t} \cdot Ct

Liver Compartment(L):

dCldt=kclCcklcClklpcCl+kpclCpckelCl\frac{dCl}{dt} = k_{cl} \cdot Cc - k_{lc} \cdot Cl - k_{lpc} \cdot Cl + k_{pcl} \cdot Cpc - k_{e_l} \cdot Cl

Peripheral Compartment (p):

dCpdt=kcpCckpcCpkepCp\frac{dCp}{dt} = k_{cp} \cdot Cc - k_{pc} \cdot Cp - k_{e_p} \cdot C_p

phagecyto Compartment (pc):

dCpcdt=klpcClkpclCpckepCpc\frac{dCpc}{dt} = k_{lpc} \cdot Cl - k_{pcl} \cdot Cpc - k_{e_p} \cdot C_pc

For better simulation of the real scenarios, we assume that k is a non-linear function of the concentration. So we get:


kct=kct_maxCcKct+Cck_{ct} = k_{ct\_max} \cdot \frac{Cc}{K_{ct} + Cc}
ktc=ktc_maxCtKtc+Ctk_{tc} = k_{tc\_max} \cdot \frac{Ct}{K_{tc} + Ct}
kcl=kcl_maxCcKcl+Cck_{cl} = k_{cl\_max} \cdot \frac{Cc}{K_{cl} + Cc}
klc=klc_maxClKlc+Clk_{lc} = k_{lc\_max} \cdot \frac{Cl}{K_{lc} + Cl}
klpc=klpc_maxClKlpc+Clk_{lpc} = k_{lpc\_max} \cdot \frac{Cl}{K_{lpc} + Cl}
kpcl=kpcl_maxCpcKpcl+Cpck_{pcl} = k_{pcl\_max} \cdot \frac{Cpc}{K_{pcl} + Cpc}

Simulating the Dynamics


With the model and its parameters in place, I set forth to simulate the bacterial concentration dynamics over a span of 10 time units. To add more depth, I varied the elimination rates in the Central compartment, observing how different rates affected the overall distribution of the drug.


Visualizing the Journey of Bacteria


Each simulation painted a vivid picture, showing how the bacterial concentration evolved in each compartment over time. The plots revealed fascinating patterns, indicating the balance between administration, distribution, metabolism, and elimination.


Results


Outcome Analysis


From the graph, we can see that drug concentration in the tumor compartments maintains a relatively high numerical value. The relative order of final concentration is as follows: tumor > central compartment > liver > peripheral compartment > phagocyte. This is quite a reasonable outcome since the hypoxia and high-lactate environment boost bacteria growth. As the only compartment connected to the tumor, the central compartment has the second highest concentration. Due to the comprehensive effect of a relatively larger decaying rate and moving-in rate, the order is followed by the peripheral and liver compartment. Phagocyte displays a constantly low concentration since it's where the bacteria are deliberately killed.


To simulate the changing trend of compartments when the bacteria shows different elimination rate which can be achieved through our envisioned programmable CAP system (Harimoto et al) by further genetic editing aiming at surface modulation, we change the k_e in the range from 0.05 to 1. This offers a reference for future modification of the bacteria to maintain a moderate drug concentration through CAP system adjustment.


Figure 16-25 | Concentration trend by different elimination rate


Numerical Sensitivity Analysis


We explore the impacts of varying each model parameter up and down by 1%, 5%, and 10%. By investigating these changes, we can observe how sensitive each compartment (Cc, Ct, Cl, Cpc, Cp) is to changes in each parameter. For each perturbation, the model's differential equations are solved, and the final concentrations in each compartment are recorded. The analysis produces a total of 15 plots, each showing the impact of parameter changes on a single compartment for a specific perturbation percentage.


Sensitivity Bidirectional

Discussion


  • the value of parameters are set to better simulate the real-life senario.
  • The Hill–Langmuir equation is a special case of a rectangular hyperbola and is commonly expressed in the following ways:
    Y=[L]nKd+[L]nY = \frac{{[L]^n}}{{K_d + [L]^n}}

    In this model, we choose n = 1. However, more possibilities are waiting to be explored with different choice of the n value.
    1. Hill Coefficient n>1:
      • Represents positive cooperativity, where binding of a ligand enhances the affinity of the receptor for more ligand binding.
      • A Hill coefficient greater than 1 would result in a steeper response curve. The system would be more sensitive to changes in concentration, leading to a more abrupt transition between the unbound and bound states.
    2. Hill Coefficient n=1:
      • Represents no cooperativity, where each binding event is independent.
      • With a Hill coefficient of 1, the system exhibits a standard dose-response relationship. It has a moderate and proportional response to changes in concentration.
    3. Hill Coefficient 0<n<1:
      • Represents negative cooperativity, where binding of a ligand decreases the affinity of the receptor for further ligand binding.
      • A Hill coefficient less than 1 results in a more gradual response curve, indicating the system is less sensitive to changes in concentration.

Conclusion


The multi-compartment model offered a profound insight into the journey of a bacteria-based drug in the human body. It's a testament to the power of mathematics in deciphering the complex dance of molecules within us. The insights from this model could pave the way for better drug administration strategies, ensuring optimal therapeutic effects while minimizing side effects.


References


  1. Harimoto, T., Hahn, J., Chen, YY. et al. A programmable encapsulation system improves delivery of therapeutic bacteria in mice. Nat Biotechnol 40, 1259–1269 (2022). https://doi.org/10.1038/s41587-022-01244-y
  2. Gao Y, Zhou H, Liu G, Wu J, Yuan Y, Shang A. Tumor Microenvironment: Lactic Acid Promotes Tumor Development. J Immunol Res. 2022 Jun 12;2022:3119375. doi: 10.1155/2022/3119375. PMID: 35733921; PMCID: PMC9207018.
  3. Brizel D. M., Schroeder T., Scher R. L., et al. Elevated tumor lactate concentrations predict for an increased risk of metastases in head-and-neck cancer. International Journal of Radiation Oncology • Biology • Physics . 2001;51(2):349–353. doi: 10.1016/S0360-3016(01)01630-3.
  4. Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, Kondev J, Phillips R. Transcriptional regulation by the numbers: models. Curr Opin Genet Dev. 2005 Apr;15(2):116-24. doi: 10.1016/j.gde.2005.02.007. PMID: 15797194; PMCID: PMC3482385.
  5. Mettert EL, Kiley PJ. Reassessing the Structure and Function Relationship of the O2 Sensing Transcription Factor FNR. Antioxid Redox Signal. 2018 Dec 20;29(18):1830-1840. doi: 10.1089/ars.2017.7365. Epub 2017 Nov 14. PMID: 28990402; PMCID: PMC6217745
  6. Morrison M, Razo-Mejia M, Phillips R. Reconciling kinetic and thermodynamic models of bacterial transcription. PLoS Comput Biol. 2021 Jan 19;17(1):e1008572. doi: 10.1371/journal.pcbi.1008572. PMID: 33465069; PMCID: PMC7845990.
  7. Muller-Hill, B., 1996. The Lac Operon, first ed. de Gruyter, Berlin. Narang, A., Pilyugin, S.S., 2006. Why does the lac exhibit no bistability during growth of Escherichia coli on lactose or lactose + glucose? Bull. Math. Biol., submitted for publication.
  8. Jensen D, Galburt EA. The Context-Dependent Influence of Promoter Sequence Motifs on Transcription Initiation Kinetics and Regulation. J Bacteriol. 2021 Mar 23;203(8):e00512-20. doi: 10.1128/JB.00512-20. PMID: 33139481; PMCID: PMC8088511.
  9. Wang J, Zhang S, Lu H, Xu H. Differential regulation of alternative promoters emerges from unified kinetics of enhancer-promoter interaction. Nat Commun. 2022 May 17;13(1):2714. doi: 10.1038/s41467-022-30315-6. PMID: 35581264; PMCID: PMC9114328.
  10. Oehler S, Amouyal M, Kolkhof P, von Wilcken Bergmann B, Hill BM. Quality and position of the three lac operators of E. coli define efficiency of repression. The EMBO journal. 1994; 13(14):3348–3355. https://doi.org/10.1002/j.1460-2075.1994.tb06637.x PMID: 8045263
  11. Oehler S, Eismann ER, Kra¨mer H, Hill BM. The three operators of the lac operon cooperate in repression. The EMBO journal. 1990; 9(4):973–979. https://doi.org/10.1002/j.1460-2075.1990.tb08199.x PMID: 2182324
  12. Bintu L, Buchler NE, Garcia HG, Gerland U, Hwa T, Kondev J, Kuhlman T, Phillips R. Transcriptional regulation by the numbers: applications. Curr Opin Genet Dev. 2005 Apr;15(2):125-35. doi: 10.1016/j.gde.2005.02.006. PMID: 15797195; PMCID: PMC3462814.

Code


Code