Modeling

Overview

Based on the metal ions adsorption property of our protein, we pushed on our model with the aim of calculating the practical adsorption ability both in the laboratory and industry.

To solve these problems, we divided our job into 3 inner parts and tried to use the model to aid design:

The Validation of Proteins’ Adsorption Ability
This section focuses on predicting the structural stability of the mutant-SmtA (MSmtA) to facilitate the adsorption of heavy metal ions. By simulating and quantitatively comparing MSmtA's adsorption capacity with that of the wild-type SmtA achived. This analysis validates the feasibility of our heavy metal remediation program.
Modeling of Protein Production in E. coli
The model will quantify the production process, measure the optimal IPTG concentration and form an experimental cycle to guide the experimental design, and solve the problem of CsgA monomer aggregation by changing promoters.
The Vital Simulation of Practical Adsorption Environments
This model aims to Measure the adsorption capacity of the device to metal ions. According to the design of the hardware, we simulated the adsorption situation in the flowing water environment to shorten the gap between the ideal environment and the real-world environment.

Besides, to modularize the design in the view of proteins, we built a database about the probability of protein evolution.

Database
We have built a database to enrich the present resources in the synthetic biology community and give a base for future design.

Part 1. The Validation of Proteins’ adsorption Ability

Purpose

To enhance the adsorption capacity of SmtA, we changed the amino acids of wild-type SmtA protein (Lys 8, Lys 22, Arg 26, and Lys45) that have a positive charge with a large surface accessible area to cysteine, in order to enhance the electronegativity of the protein surface (increasing the range of negative lode). It means the coulomb cage of negative charge is bigger, which can better capture positively charged heavy metal ions.

To verify that our mutantprotein can both adsorb heavy metal ions and has improved adsorption capacity, we made a prediction of the structure and molecular simulation of the mutant protein to verify the effectiveness of the mutation in the simulated environment.

Click here to view our documentation.

The Prediction of SmtA Structure

Figure 1: The mutant-SmtA structure predicted by AlphaFold2.

According to Fig 2 (Predicted LDDT results and Predicted Aligned Error), the prediction result is reliable.
We can find that the change brought by mutation to MSmtA’s structure is slight.

Figure 2: The confidence of the structure predicted by AlphaFold2.

Using Molecular Docking to Verifies Adsorption Feasibility

Although confirmed that the structure would not change a lot after the mutation, the adsorption ability of MSmtA should be further explored.

To verify that the mutant protein can bind to heavy metal ions (represented only by Cd ions) and be better than wild SmtA, we used Autodock Vina 1.2.2 for molecular docking of the pre-mutation and post-mutation protein with $Cd^{2+}$ and evaluated the binding energy of the docking with docking scores.

Figure 3: Wild-type SmtA and $Cd^{2+}$ docking result. $Cd^{2+}$ form ionic bonds with Cys14, Cys54, Cys52, and Cys47. Docking score: -6.091kcal/mol

Figure 4: MSmtA and $Cd^{2+}$ docking result. $Cd^{2+}$ forms ionic bonds with Cys14, Cys54, Cys52, and Cys47. Docking score: -6. 133kcal/mol.

We can see that the docking score after the four-point mutation is smaller than the docking score of the wild SmtA, and we can preliminarily infer that the mutant SmtA would improve its adsorption capacity.

By simulating and comparing the mutant-SmtA(MSmtA)'s quantified adsorption ability with the wild SmtA, the feasibility of our heavy metal program could be guaranteed by the MSmtA which has a better adsorption ability than the wild type.

Part 2. Modeling of Protein Production in E.coli

Purpose

Protein was the heart of our adsorption device, the more amount of protein, the adsorption ability of our system was better. So, we built our second part with the aim of increasing the production of proteins. Also, quantifying protein yield was significant for the final calculation of adsorption capacity.

For this, we mathematically described the process of protein production in the system to identify the factors that affect protein production capacity, optimize protein production progress, and ultimately obtain quantified protein yield results.

We divided this part into 2 inner parts: A description of CsgA-AG4 production and a description of MSmtA production.

In the description of CsgA-AG4 production, we calculated the best IPTG concertation, which is the main effect factor of CsgA-AG4 production.

Also, during the process of solving the toxic groups of the CsgA monomer, we found the Tac promoter is better than the T7 promoter in decreasing toxicity.

(Notice: We used T7 promoter in E.coli BL21, and used Tac promoter in E. coli MC4100. )

Description of CsgA-AG4 production

Figure 4: Biological pathway of CsgA-AG4 production in T7 promoter.

The CsgA-AG4 protein is regulated by the CsgA gene and is meta-translated to generate CsgA-AG4 protein under the induction of IPTG.

The channel composed of CsgE, CsgF, and CsgG is secreted extracellularly, and our goal is to obtain the amount secreted outside the cell as a function of time, and at the same time find the appropriate IPTG concentration so that the protein expression maximum.

At the same time, considering that CsgA monomers can polymerize into amyloids, their intracellular aggregation may be toxic to biofilms. Still, since we cannot find the toxicity threshold of CsgA, we want its ratio to CsgC (which prevents CsgA from forming toxic intracellular aggregates).

At a concentration ratio 500:1, CsgC inhibits intracellular nucleation of CsgA. ^[1]

In our study, we exerted control over the production of CsgA proteins via the T7 promoter, which necessitates the presence of T7 RNA polymerase. To model protein yield within biological pathways, we adopted the following assumptions:

We disregarded the facilitated diffusion of IPTG by LacY, considering only free diffusion, given the high concentration of IPTG employed ^[2].
We neglected the inherent leakage from the operon, attributable to the elevated IPTG concentration used.
We treated E. coli as a homogenously mixed, single-chamber space, assuming uniform internal concentrations.
In our focus on CsgA yield, we omitted the specific details of CsgA rollout and instead assumed a constant production rate for CsgA-AG4.
We assumed that the binding of IPTG to LacI rapidly reached equilibrium, leading to a prompt equilibrium response

Modeling of the optimal IPTG concentration

Symbol	value	Meaning	Unit	Reference
$K_{input}$	0.92	rate constant of IPTG enter the cells	$min^{-1}$	^[3]
$K_{output}$	0.05	rate constant of IPTG out of the cells	$min^{-1}$	^[3:1]

First, consider IPTG input cells:

$IPTG_{out}→IPTG_{in}$

According to the law of conservation of mass：

$IPTG_{total}=IPTG_{in}+IPTG_{out}$
So we got:

$\frac{d[IPTG_{in}]}{dt}=K_{input}[IPTG_{out}]-K_{out}[IPTG_{in}]$

For Lac operons, in the absence of IPTG, Lac I can bind to Lac O upstream of the E. coli T7 gene to inhibit the production of T7 RNA polymerase.

$LacI+{DNA}_{T7RNA\ polymerase}\rightarrow{LacI\bullet\ DNA}_{T7RNA\ polymerase}$

Then,

$\frac{d[LacI·DNA_{T7 RNA polymerase}]}{dt}=k_{on}[LacI][DNA_{T7 RNA polymerase}]-k_{off}[LacI·DNA_{T7 RNA polymerase}]$

In this formula, $k_{on}$ is the forward reaction rate constant, $k_{off}$ is the reverse reaction rate constant

When the reaction equilibrates，Kd[LacI·DNA]=[LacI][DNA]，with $Kd=\frac{K_{off}}{K_{on}}$ . Considering the conservation of mass, then the number of uninhibited
$DNA_{T7RNApolymerase}$ molecules to the total $DNA_{T7RNApolymerase}$ is

$\frac{DNA_{T7RNApolymerase}}{LacI·DNA_{T7RNApolymerase}+DNA_{T7RNApolymerase}}=\frac{Kd}{Kd+LacI}$

We then assess the LacI concentration during induction. When IPTG is introduced during protein expression induction, it binds to LacI, inducing a conformational change that prevents LacI from binding to LacO.

This release of repression allows the T7 RNA polymerase gene and the target gene to be expressed normally, enabling the transcription of the target gene.

IPTG Interaction with LacI：

$LacI+n_1IPTG → LacI·n_1IPTG$
We have:

$\frac{d[LacI·n_1IPTG]}{dt}=K_{on}[LacI][IPTG_{in}]^{n_1}-K_{off}[LacI·n_1IPTG]$

In steady state, the differential equation can be written as

$K_x[n_1IPTG·LacI]=[IPTG]^{n_1}[LacI]$ , with $K_x=\frac{K_{off}}{K_{on}}$ ，

the equation can be associated with the mass conservation equation $LacI_{total}=n_1IPTG·LacI+LacI$ the equation can be solved as:

$[n_1IPTG·LacI]=\frac{[IPTG_{in}]^{n_1}[LacI_{total}]}{K_x^{n_1}+[IPTG_{in}]^{n_1}}$

Then
$[LacI]=\frac{LacI_{total}}{1+(\frac{[IPTG_{in}]}{K_x})^{n_1}}$

The total number of uninhibited promoters is:

$O=O_T·\frac{1}{1+\frac{LacI_{total}}{K_d}·\frac{1}{1+(\frac{IPTG_{in}}{K_x})^n}}$

At last, we use the amount of $DNA_{T7poly}$ multiple the rate constant at which each uninhibited DNA produces mRNA. Then we can finally get the T7 polymerase mRNA expression rate.

$\frac{d[mRNA_{T7poly}]}{[dt]}=K_{poly}[O]-k_{mdegT7}[mRNA_{T7poly}]$

Consider the transcription of T7polymerase:

$\frac{d[T7polymerase]}{dt}=k_{transT7}[mRNA_{T7poly}]-r_{degT7}[T7polymerase]$

The effect of CsgA·AG4 caused by T7polymerase can be described by Hill equation:

$\frac{d[mRNA·CsgA·AG4]}{dt}=\beta_{gCsgA}\frac{[T7polymerase]^{n_2}}{K_d+[T7polymerase]^{n_2}}-K_{mdeg}[mRNA·CsgA·AG4]$

Our parameters:

Symbol	value	Meaning	Unit	Reference
$K_{poly}$	6.116	Expression rate constant of T7 RNA polymerase	$min^{-1}$	^[4]
$O_T$	1	Operator content
$[LacI]_{total}$	0.01	the normal concentration of LacI in one Escherichia coli	μM	^[5]
$K_{mdegT7}$	0.462	degradation constant for RNA polymerase mRNA T7	$min^{-1}$	^[6]
$k_{transT7}$	0.5	rate constant for RNA polymerase T7	$min^{-1}$	^[6:1]
$r_{degT7}$	0.2	degradation constant for RNA polymerase T7	$min^{-1}$	^[6:2]
$n_1$	2	hill coefficient of IPTG and LacI		^[7]
$K_d$	$5*10^{-5}$	the unbinding LacI concentration at which transcription rate of tac promoter is at half-maximum	μM	^[8]
$K_x$	1	dissociation constant of IPTG and LacI	μM	^[7:1]
$\beta_{gCsgA}$	0.0206	the transcription rate of mRNA CsgA-AG4 by pT7	$μM·min^{-1}$	^[9]
$K_d$	3	ateapparent dissociation constant	nM	^[9:1]
$n_2$	2	hill coeficient of T7 RNA polymerase and T7 promoter		^[9:2]

In steady state, we can get the relation between input IPTG concentration and the mRNA of target protein

Figure 5: The relation between IPTG concentration and production of mRNA·CsgA-AG4 in T7 promoter. The best IPTG concentration is around 500-600μM

We can tell from the figure that while the IPTG concentration around 200μM, the yield of mRNA increased sharply. When IPTG concentration increased to around 500-600μM, the yield of mRNA will mainly not increase, Which is consistent with the experimental result that the optimal IPTG concentration is 600 μM. This gives the optimal iptg concentration for this system, which is around 600 μM.
Finally, we can get the changing equation of CsgA·AG4, where intracellular CsgA-AG4 is transported extracellularly at a rate of $r_{transfer}$ .

$\frac{d[CsgA·AG4]}{dt}=K_{trans}[mRNA·CsgA·AG4]-r_{deg}[CsgA·AG4]-r_{transfer}[CsgA·AG4]$

$\frac{d[CsgA·AG4_{ex}]}{dt}=r_{transfer}[CsgA·AG4]-[CsgA·AG4_{ex}]$
Our parameters:

Symbol	value	Meaning	Unit	Reference
$K_{mdeg}$	0.264	the degradation rate of mRNA CsgA-AG4	$min^{-1}$	^[10]
$K_{trans}$	34.2	the translation rate of mRNA CsgA-AG4	$min^{-1}$	^[10:1]
$r_{deg}$	0.00378	the degradation rate of CsgA-AG4	$min^{-1}$	estimated from expasy
$r_{transfer}$	0.0108	the transfer rate constant of CsgA-AG4	$min^{-1}$	measured by TUdelft(2015)
$r_{deg2}$	0.000385	the degradation rate of CsgA-AG4 in vitro	$min^{-1}$	estimated from expasy

The yield of protein under the optimal IPTG concentration is:

Figure 6：The changes of CsgA-AG4 concentration overtime in T7 promoter. The red line shows the CsgA·AG4 transferred out of the cell; The blue line shows the concentration of CsgA·AG4 inside the cell.

Modeling of the toxicity of CsgA polymer

We had a concern regarding the potential overexpression of CsgA, but our uncertainty about the toxicity threshold for CsgA within the cell made it imperative for us to maintain a specific concentration ratio between CsgA and CsgC, ideally at 1:500. This ratio corresponded to the concentration at which CsgC effectively inhibits the formation of CsgA fibers.

In our system, during steady-state conditions, the intracellular concentration of CsgA stabilizes at approximately $183 μM$ , While We find that the ratio to CsgC is much greater than 500^[11]. This situation suggests that CsgC may not completely suppress the nucleation of CsgA within the cell, potentially leading to detrimental effects on biofilm and cell viability.

To address this concern and prevent CsgA from surpassing its toxicity threshold, we intend to replace the T7 promoter with the tac promoter, which, although slightly less efficient in transcription, is expected to be more effective in regulating CsgA expression. This strategic modification aims to maintain a balanced CsgA-to-CsgC ratio and mitigate the risk of overexpression.

For the Tac promoter, we also modeled to describe the pathway:

Figure 7: The biological pathway of CsgA-AG4 production by tac promoter.

$\frac{d[IPTG_{in}]}{dt}=K_{output}[IPTG_{out}]-K_{out}[IPTG_{in}]$

$\frac{d[mRNA]}{dt}=\frac{\beta}{1+\frac{[LacI_{total}]}{K_d}·\frac{1}{1+(\frac{IPTG_{in}}{K_x})^n}}-k_{deg}[mRNA]$

Association of IPTG with mRNA production:

Figure 8: The relation between IPTG concentration and production of mRNA·CsgA-AG4 in tac promoter. Promoter activity is the rate of mRNA production, i.e. how much mRNA is produced per second. The best IPTG concentrationis 400μM.

This picture illustrate that when IPTG concentration increased to around 300-400μM, the yield of mRNA will mainly not increase, which is consistent with the experimental result that the optimal IPTG concentration is 400 μM.

$\frac{d[CsgA-AG4]}{dt}=K_{trans}[mRNA]-r_{deg}[CsgA·AG4]-r_{transfer}[CsgA·AG4]$
$\frac{[CsgA·AG4_{ex}]}{dt}=r_{transfer}[CsgA·AG4]-r_{deg2}[CsgA·AG4_{ex}]$

The results were obtained at the optimum iptg 0.4 mM concentration

Figure 9: The changes of CsgA-AG4 concentration overtime in tac promoter. The red line shows the CsgA·AG4 transferred out of the cell; The blue line shows the concentration of CsgA·AG4 inside the cell.

We observed that the concentration of CsgA·AG4 remains relatively constant at approximately 63μM, while the ratio of CsgA to CsgC decreased to around 500: 1.

To better assess the protein yield rather than just its concentration, we opted to express the output in μmol over time. This adjustment allows us to track the production of the protein over a specified period:

Figure 10: The CsgA-AG4 production yield over time.

The description of SmtA protein production
Based on the experimental pathways, we constructed models to describe the production process of SmtA.
$\frac{dIPTG_{in}}{dt}=K_{output}[IPTG_{out}]-K_{out}[IPTG_{in}]$
$\frac{d[mRNA_{T7poly}]}{[dt]}=\frac{\beta}{1+\frac{LacI_{total}}{K_d}·\frac{1}{1+(\frac{IPTG_{in}}{K_x})^n}}-k_{deg}[mRNA]$
$\frac{d[T7polymerase]}{dt}=k_{transT7}[mRNA_{T7poly}]-r_{degT7}[T7polymerase]$
$\frac{d[mRNA·SmtA]}{dt}=\beta_{gSmtA}\frac{[T7polymerase]^n}{K^n+[T7polymerase]^n}-K_{deg}[mRNA·SmtA]$
$\frac{d[SmtA]}{dt}=K_{trans}[mRNA]-r_{deg}[SmtA]$
Our parameters:

Symbol	value	Meaning	Unit	Reference
$\beta_{tac}$	0.0072	the transcription rate of mRNA CsgA-AG4 by ptac	$μM·min^{-1}$	^[7:2]
$\beta_{gSmtA}$	0.0336	the transcription rate of mRNA SmtA	$μM·min^{-1}$	^[9:3]
$K_{deg2}$	0.204	the degradation rate of mRNA SmtA	$min^{-1}$	^[10:2]
$K_{trans2}$	26.4	the translation rate of mRNA SmtA	$min^{-1}$	^[10:3]
$r_{deg2}$	0.002928	the degradation rate of SmtA	$min^{-1}$	^[12]

Figure 11: The SmtA production yield over time.

Part 3. The Vital Simulation of Practical Adsorption Environments

Purpose

Based on the first two parts, we optimized the yield of protein. But in practical applications, in order to quantify and increase the adsorption capacity of our system for several metal ions, we also need to consider the adsorption capacity of proteins in different environments.

So, we did this in Part 3.

Prediction of optimal adsorption environment

Since there is no literature mentioning the influence of the environment on AG4 adsorption capacity, we hope to analyze the effects of different times and temperatures on AG4 adsorption efficiency through experiments and models. Considering that the temperature in daily industrial wastewater treatment does not exceed 25 degrees Celsius, we set the maximum temperature to 25 degrees Celsius.

To obtain the environment when the adsorption capacity of CsgA-AG4 is optimal, we combined with the experiment, using the surface center design (α=1), analyzed the response surface of the experimental data, and finally obtained the optimal temperature and time of CsgA-AG4 adsorption.

Figure 12. The function image of time-temperature-adsorption efficiency. A represents a three-dimensional graph illustrating the adsorption efficiency as a function of temperature and time, obtained through fitting. The face center composite design (α=1) was carried out using the design expert 13. B displays a contour plot of the same function.

C and D showcase interactions between temperature and time. The resulting graph is as above. 8 hours and 25℃ are proper for proteins to adsorb silver ions.

The final prediction result obtained by the least squares method was:

$y=0.539+0.126\times x_1-0.000365\times x_2+0.00043\times x_1x_2-0.010675\times x_1^2-0.000016\times x_2^2$

Symbol	Meaning	Unit
y	adsorption efficiency	%
$x_1$	time	h
$x_2$	temperature	℃

Among them, the F value of this model is 976.79, and the P value is lower than 0.0001, which means that this model is significant. At the same time, the adjusted $R^2$ is 0.9975, which means that the model fits well.

Test	Value
F value	976.79
P value	<0.0001
adjusted $R^2$	0.9975

Figure 13: Model diagnost. Picture A shows the predicted value versus the Actual value, the true value is evenly distributed at both ends of the predicted value. Picture B shows the normal plot residuals, which are distributed in a straight line. These pictures reflect the correctness of the model to some extent.

Langmuir adsorption curve fitting and its microscopic interpretation.

In wastewater treatment, metal ion concentrations exhibit variability. It is imperative to evaluate the adsorption efficiency of our biosorbents across these fluctuating concentrations. To solve this problem, we decided to build a Langmuir adsorption model to solve the situation.

Let's start by making assumptions:

The biosorbent is assumed to be an ideal solid surface, and the residues on our protein bound to metal ions are regarded as metal ion-bound sites.
The energy of all sites is equal, and the energy released by all sites is equal.
Form only a single layer of coverage
Adsorbate molecules at adjacent sites have only ideal interactions, that is, the energy of side-to-side interactions at all sites is equal
Assuming that its chemical reaction from the unadsorbed state to the adsorbed state is done in one step without intermediate state
The biosorbents either bind n metal ions at one time or not at all, ignoring the intermediate state of binding fewer than n ions

According to the law of mass action:

$\frac{[n_x·metal·protein]}{dt}=k_{onx}[metal_{ion}]^{n_x}·[protein]-k_{offx}[n_x·metal·protein]$

$\frac{metal·adsorbed}{dt}=n_x(k_{onx}[metal_{ion}]^{n_x}·[protein]-k_{offx}[n_x·metal·protein])$

$\frac{d[protein]}{dt}=-k_{offx}[n_x·metal·protein]+k_{onx}[metal_{ion}]^{n_x}·[protein]$

Through the previous 3 equations, the relationship between the adsorbed metal concentration and the remaining metal ion concentration at equilibrium can be obtained:

$[metal·adsorbed]=\frac{n_x[protein_{total}][metal_{ion}]^{n_x}}{K_{Dx}+[metal_{ion}]^{n_x}}$

For CsgA-AG4, we used experimental data to fit the Langmuir adsorption curve expressed by the above equation:

Figure 14: : This figure gave a fitting langmuir curve based on the experiment data by Python.

The langmuir curve was fitted by the experiment data. According to the curve we could calculate 2 parameters: $K_{D 1}$ and $n_1$ .

Explaination

Symbol	Meaning	Value	Reference
$K_{D1}$	$K_{offx}$ / $K_{onx}$	146.3151 μM	estimated from experiment
$n_2$	was the average number can be adsorbed by a protein	6.4442	estimated from experiment
$R^2$ for $Ag^+$ adsorption	the r value of the fittig results	0.9727	estimated from experiment

Based on the CsgA-AG4's molecule structure, the ideal $n_1$ should be about 4. But the pratical $n_1$ was 6.442, which was larger than the ideal value.

Figure 15: The image of the silver mapping and the STEM-HAADF. The left image is the silver mapping by SEM and the yellow color distribution shows the silver element, including silver elemental substances and silver ions. The middle and right images are the STEM-HAADF which exhibits nano-silver (silver elemental substances).

After doing the silver mapping by Scanning Electron Microscopy (SEM) and scanning transmission electron microscopy (STEM), the results show that the CsgA-AG4 may have an electrostatic action to unreduced $Ag^+$ because of the difference between SEM and STEM.

In addition, in order to meet existing policy standards, figure out how much protein was needed to adsorb it below the national standard under the premise of knowing the content of heavy metals in sewage. So we transformed the formula:

According to the law of conservation of mass, used $C_{Ag^+total}·V_{total}=C_{freeAg^+}·V_{total}+n_2C_{CsgA·AG4·n_2AgNP}·V_{solid}$
to presesnt the CsgA·AG4 and $CsgA·AG4·n_2AgNP$ . Then, got a function with the primary concentration of $Ag^+$ and concentration of CsgA·AG4 as input and with the remaining concentration of $Ag^+$ at steady state as output.

So, we got a function about the amount of the protein needed when the concentration of metal ions in sewage reached the steady state:

$\frac{1}{K_{D1}}=\frac{\frac{V_{total}}{V_{solid}n_2}([Ag^+]_{total}-[Ag^+])}{[Ag^+]^{n_2}·([CsgA·AG4]_{total}-\frac{V_{total}}{V_{solid}n_2}([Ag^+]_{total}-[Ag^+]))}$

Figure 16: Relationship between input CsgA-AG4 and residual Ag+ concentration. We take the input ion concentration of 1mg/L, 0.9mg/L, 0.8mg/L at the volume of 1L sewage as an example. The red line indicates the Chinese standard of 0.3mg/L. The horizontal coordinate of the point where the standard line intersects the curve is the amount of protein required. By changing the ion concentration we input, we can get the amount of protein we require to get to the national standard^[13]

For SmtA adsorption of heavy metal ions, we also need to consider the competition of different heavy metals for protein adsorption sites. We assume that the influence of the adsorbed heavy metals on the post-adsorbed heavy metals is not considered ( $K_d$ is constant), and the competition of metal ions to the site is reacted to the protein, that is, it is assumed that all sites of a protein adsorb only the same metal (under the premise of the previous assumption, this assumption does not affect the concentration of the remaining metal ions at equilibrium.

In this regard, write the SmtA adsorption equation:

$n1·Cd^{2+}+SmtA ↔ n1·Cd-SmtA$
$n3·Cu_{2+}+SmtA ↔ n3·Cu-SmtA$
$n4·Pb_{2+}+SmtA ↔ n4·Pb-SmtA$

$\frac{d[Cd·SmtA]}{dt}=K_a([SmtA_{total}]-[Cd·SmtA]-[Cu·SmtA]-[Pb·SmtA])-k_{-a}[Cd·SmtA]$

$\frac{d[Cu·SmtA]}{dt}=K_b([SmtA_{total}]-[Cd·SmtA]-[Cu·SmtA]-[Pb·SmtA])-k_{-b}[Cu·SmtA]$

$\frac{d[Pb·SmtA]}{dt}=K_c([SmtA_{total}]-[Cd·SmtA]-[Cu·SmtA]-[Pb·SmtA])-k_{-c}[Pb·SmtA]$

It is obtained by the law of conservation of mass:

Dynamic adsorption model

Considering how our model was in the device, we want to get the adsorption time as a function of the remaining ion concentration in the effluent. To this end, we established a stochastic model using cellular automata to simulate the dynamic adsorption of ions by biosorbents ^[14].
We assumed that the distribution of sites on the membrane is homogeneous and divided into two states, occupied and unoccupied. When the solution passes, the unbound silver ions randomly hit the unoccupied site, and there is a probability of being adsorbed, and the unoccupied site state is transformed into the occupied site, and the ion concentration in the solution -1. Considering the effect of adsorption sites on adjacent nonadsorbable sites, the adsorption probability is related to the status of surrounding adjacent 8-grid sites. At the same time, each occupied site has a probability of desorption, that is, the site state becomes unoccupied, and the ion concentration in the solution +1.

Figure 17: The 2 statement of sites in membrane and their interconversion.

The adsorption surface represents a three-dimensional square array, where each array element is a binding site. The occupancy level of eight adjacent sites modifies the probability threshold for attaching to a given site, such that the increased occupancy reduces the chance of attaching. The probability threshold for separation is considered independent of occupancy.
In each simulation step, attachment is determined by generating a random number ( $p_1$ ) between 0 and 1. Multiply this by the fractional occupancy of the eight adjacent locations and the multiplier reflecting the surface density to give the probability value (p), which is compared to the absolute probability threshold ( $P_a$ ), considering the effect of the solution metal ion concentration on adsorption:

$p=p_1(1+\frac{o}{8·s}·\frac{solution_{tot}}{solution_{now}})$

Explaination:

Parameter Definition

$p_1$ a random value from 0 to 1

$p_2$ another random value from 0 to 1 (could be same as $p_1$ )

$p_d$ was the selected disengagement probability threshold

o the amount of the site which was already ocppied

s was a constant which described the density of sites

$solution_{total}$ was the origin concentration of metal ions

$solution_{now}$ was the present simulated concentration of metal ions

If P was lower than the $P_a$ , then the site would be filled (change the site into "1") and the $solution_{now}$ was reduced 1.

For detachment, if the surface position was occupied, the random value $p_2$ was compared with the selected disengagement probability threshold ( $p_d$ ). If the threshold exceeded the generated random value $p_2$ , the site would be reset to 0 and $solution_{now}$ would be increased by 1.

Figure 18: These 2 picture shows the dynamic adsorption. The first picture illustrates the residual $Ag^+$ concentration over simulation time. The second picture shows the statement of the membrane, the green square represents the occupied sites, and the white square represents the free sites.

By adjusting the $p_a$ , $p_d$ could obtain a Langmuir adsorption curve consistent with the experiment.

Finally, the simulated equilibrium step is mapped to the time when the adsorption reaches equilibrium, so that the time required for the adsorption step step step is obtained, and finally the remaining ion concentration in the effluent at different adsorption times can be obtained.

This model is only a preliminary idea at the moment due to time and computing power, but it may give a possible direction for future research.

Part 4. Database

During our model building, we found how difficult it is to find a proper parameter. Also, even though there was a great database about proteins like PDB, there is still a lack of a database that can give more information on mutation, modification, and fusion, in other words, proteins' evolution.

For example, changing 4 amino acids in SmtA which increased its adsorption of heavy metal ions can be perceived as an artificial protein evolution.

But how can we know, apply and even predict these potential evolutions?

The answer is a database!

Building a database of proteins' evolution not only can be a base for protein revolution research in molecular biology but also can be auxiliary information for future design in the synthetic biology community.

(The code of database is on our gitlab.)

We built 3 tables correlative to each other in our database by SQL Server to describe the different evolution pathways. Also, this database could collect relevant parameters about the new proteins:

Figure 19: Use tables to describe the proteins' evolution. Made 3 tables: The blue table showed the information of primary protein. The pink one expressed the relationship between primary proteins and evolved proteins. And the brown table shows relevant parameters to evolved proteins. All these 3 tables gave information of the evolution of protein(bottom left of the figure).

One of basic ideas of synthetic biology is blocking. To make the database of iGEM competition already processed more abundant, we designed a blocking system of proteins.

For example:

A is a protein which has three known evolutionary directions:
be mutant into M-A
be modificated into Me-A
be fused with protein B and become A-B

Now we can pick two or more directions to get a secondary evolution which can meets our demand:

$(M-A)+(Me-A)+(A-B) → [Me-(M-A)]-B$

Figure 20: The application of blocking protein database. The results of different evolution directions could be considered as a new block. By packaging these new blocks together, we could obtain a protein with ideal ability.

When the database is abundant enough, we can search for our aim protein by purpose in the database:

Figure 21: Purpose-driven blocking.

Figure 22: The screenshot of the inner SQL Server Management Studio (SSMS) page. We tried to search our protein on the SSMS, the result showed the connection of tables was successful.

Further, we hope that we can build a web page that is accessible to everyone. Besides, we can obtain more information about aimed proteins and make the structure clearer in the design step. Then, we want to import AI which can analyze proteins’ structure.

Even, in the future, we could use databases to train an AI that can predict the likely evolutionary direction of proteins.

REFERENCE

Evans, M.L. et al. (2015) 'The bacterial curli system possesses a potent and selective inhibitor of amyloid formation.' Molecular cell vol. 57,3. pp. 445-55.
Available at: doi:10.1016/j.molcel.2014.12.025 ↩︎

Huy, T. et al. (2015) 'Kinetics of the cellular intake of a gene expression inducer at high concentrations.' Molecular bioSystems vol. 11,9. pp. 2579-87. doi:10.1039/c5mb00244c ↩︎

TUDelft iGEM (2009) Available at: Team:TUDelft - 2009.igem.org(Accessed: 7 July 2023) ↩︎ ↩︎

Norton, M. et al. (2010) 'Determining the Rate of Transcription of T7 RNA Polymerase Using Single Molecule Fluorescence Imaging' Available at: (https://mds.marshall.edu/etd/112/) ↩︎

Kalisky, T. et al. (2007)'Cost-benefit theory and optimal design of gene regulation functions.' Physical biology vol. 4,4 229-45. 7 Nov.
Available at: doi:10.1088/1478-3975/4/4/001 ↩︎

Ecuador iGEM (2021) Available at: https://2021.igem.org/Team:Ecuador (Accessed: 7 July 2023) ↩︎ ↩︎ ↩︎

Wright, J K et al. (1981) 'Lactose carrier protein of Escherichia coli: interaction with galactosides and protons.' Biochemistry vol. 20,22. pp. 6404-15.
Available at : doi:10.1021/bi00525a019 ↩︎ ↩︎ ↩︎

Stamatakis, M. and Nikos V.M. (2009) 'Comparison of deterministic and stochastic models of the lac operon genetic network.' Biophysical journal vol. 96,3. pp. 887-906.
Available at: doi:10.1016/j.bpj.2008.10.028 ↩︎

Ikeda, R.A., Lin, A.C., and Clarke, J. (1992) 'Initiation of transcription by T7 RNA polymerase as its natural promoters.' Journal of biological chemistry Vol. 267, Issue 4,5. pp. 2640-2649
Available at: https://www.sciencedirect.com/science/article/pii/S0021925818459297 ↩︎ ↩︎ ↩︎ ↩︎

Kelly, J.R. et al. (2009) 'Measuring the activity of BioBrick promoters using an in vivo reference standard.' Journal of biological engineering vol. 3 4. 20 Mar.
Available at: doi:10.1186/1754-1611-3-4 ↩︎ ↩︎ ↩︎ ↩︎

Gorochowski, Thomas E., et al. (2018) 'Absolute Quantification of Translational Regulation and Burden Using Combined Sequencing Approaches.' bioRxiv Available at: doi:10.1016/j.cell.2014.02.033 ↩︎

Halter, M., Tona, A., Bhadriraju, K., Plant, A.L. and Elliott, J.T. (2007), 'Automated live cell imaging of green fluorescent protein degradation in individual fibroblasts.' Cytometry, 71A: 827-834.
Available at: https://doi.org/10.1002/cyto.a.20461 ↩︎

Ministry of Ecology and Environment of the People's Republic of China (2023) Guideline onavailable techniques ofwater pollution prevention and control for electronic incustry.Available at: https://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/kxxjszn/202306/t202306211034274.shtml ↩︎

Hubble, J. “Stochastic modeling of affinity adsorption.” Biotechnology progress vol. 17,3 (2001): 565-7.
Available at: https://doi.org/10.1021/bp010014o ↩︎

Modeling

Overview

Part 1. The Validation of Proteins’ adsorption Ability

Purpose

The Prediction of SmtA Structure

Using Molecular Docking to Verifies Adsorption Feasibility

Part 2. Modeling of Protein Production in E.coli

Purpose

Description of CsgA-AG4 production

Modeling of the optimal IPTG concentration

Modeling of the toxicity of CsgA polymer

Part 3. The Vital Simulation of Practical Adsorption Environments

Purpose

Prediction of optimal adsorption environment

Langmuir adsorption curve fitting and its microscopic interpretation.

Dynamic adsorption model

Part 4. Database

WE ARE XJTLU ! WE ARE XJTLU !

WE ARE XJTLU ! WE ARE XJTLU !

Parameter	Definition
$p_1$	a random value from 0 to 1
$p_2$	another random value from 0 to 1 (could be same as $p_1$ )
$p_d$	was the selected disengagement probability threshold
o	the amount of the site which was already ocppied
s	was a constant which described the density of sites
$solution_{total}$	was the origin concentration of metal ions
$solution_{now}$	was the present simulated concentration of metal ions