Content

Abstract

This part presents a multifaceted model for the iGEM competition. The focus of our study is BC fibtils. The document is organized into four main sections:

BC fibtils Numerical Methods
Best medium condition model
Growth Time Optimization Model
Bioinformatics Amino Acid Fold Model

These models and methods are instrumental in understanding the synthesis mechanism of BC fibtils, optimizing production conditions, and exploring protein structures.

1. BC fibtils Numerical Meth

We introduce a numerical model based on ordinary differential equations to elucidate BC fibtils' changes. Specifically, we utilize iterative methods to quantify various substances, modeling the transformation of glucose into BC fibtils. This model aids in comprehending the mechanism of BC fibtils production.

2. Best medium condition model

Our model identifies optimal cultivation conditions and confidence intervals for BC fibtils yield by analyzing glucose and ethanol concentrations through surface fitting. Experimental results show variations in BC fibtils yield under different conditions.

3. Growth Time Optimization Model

This model determines the best timing for BC fibtils production by maximizing the yield-to-time ratio, employing experimental data and Logistic growth curve fitting to optimize production schedules.

4. Bioinformatics Amino Acid Fold Model

Amino acid folding tools analyze the three-dimensional structures of BpsA and BpsA-PcpS fusion proteins, highlighting potential structural deformations in regions with low confidence. This analysis guides decisions regarding fusion protein expression systems.

Symbols and Variables

Table 1. symbols

1. BC fibtils Numerical Methods

We propose a numerical model to illustrate the change of BC fibtils based on ordinary differential equations. In detail, we will use iteration methods to find the amount of different substance.

I will use some symbols to represent some variables.

Our analysis follows the sequence of figure.

Figure 1. BC-fibtils production and related mechanisms

The hypothesis of the model is:

(1)The initial substrate concentration [S0] is larger than enzyme concentration [E0], so the substrate concentration will be consumed very little when the reaction equilibrium is reached, so we can assume the reaction will consume all concentration [S].

(2)The metabolic genetic circuit of the target product BC-fibtil can be be illustrated by a series of enzymatic reactions.

(3)Only a fixed amount of nutrients was added at the beginning of the reaction, and the reaction at each moment obeyed the assumptions of the Michaelis equation at each substrate concentration at that moment.

This is the equation from Glucose to Glu-6-Phos. Since glk enzyme between Glucose and Glu-6-Phos can be described by Michaelis-Menten equation, the equation of Glucose and Glu-6-Phos is:

$$\frac{dG}{dt}=\frac{V_{Glu,Glu-6}[Glucose]}{K_{Glu,Glu-6}+[Glucose]}$$

$$\frac{dG6}{dt}=\frac{V_{Glu,Glu-6}[Glucose]}{K_{Glu,Glu-6}+[Glucose]}-\frac{V_{Glu-6,dis}[Glu-6] }{K_{Glu-6,dis}+[Glu-6]}\tag{1}$$

This is the equation from Glu-6-Phos to Glu-1-Phos, and to the UDP-Glu.

$$\frac{dG1}{dt}=\frac{V_{Glu-6,Glu-1}[Glu-6][PGM]}{K_{Glu-6,Glu-1}+[Glu-6]}-\frac{V_{Glu-1,UDPGlu}[Glu-1]}{K_{Glu-1,UDPGlu}+[Glu-1]}$$

$$\frac{d(UDP-Glu)}{dt}=\frac{V_{Glu-6,Glu-1}[Glu-1][galu]}{K_{Glu-6,Glu-1}+[Glu-1]}-\frac{V_{UDP-Glu,UDP}[UDP-Glu]}{K_{UDP-Glu,UDP}+[UDP-Glu]}\tag{2}$$

Finally, the amount will be used in creating BC-fibtils, and we can get the equation.

$$\frac{d[BC-fibtils]}{dt}=\frac{V_{UDP-Glu,UDP}* V_{BC-fibtils}* [UDP-Glu]}{K_{UDP-Glu,UDP}+[UDP-Glu]}\tag{3}$$

Therefore, we can get the velocity that every kinds of material changes with time. Setting up initial values, we can simulate the concentration of every elements.

Figure 2. Numerical simulation of main molecules

2. Best medium condition model

This model is designed to find the best condition of medium, and find the confidence range of the yield.

In our model, we will use surface fitting to find the relationship between the yield of BC-fibtil and the concentration of glucose and ethylalcohol in the medium.

We do experiment in four kinds of glucose concentration and three kinds of ethylalcohol concentration. We cultivate our bacteria in each kinds of environment for seven days, and measure the amount of BC-fibtil twice a day.

After getting the results, we use polynomial functions $f(x,y)$ to fit the surface of $x$(the concentration of glucose), $y$(the concentration of ethylalcohol) and $z$(the amount of BC-fibtil). We can get the maximum value points of the surface by convex methods. We use quadric surface to fit the points, because it has many good property on optimization, and it is similar to the real data.

First, we plot the scatter diagram. We cultivate three samples of the bacteria in different kinds of glucose concentration and alcohal percentage, cultivate 96h in 30 degree centigrade. After the film is formed, we measure the weight every 24 hours, and average them if there are multiple samples. The scatter is below, in which red points represent the data in 48h, green in 72h and blue in 96h.

Figure 3. Yield - $C_{Glu}$, $C_{EtOH}$ scatter diagram

Some of the experiment fail, so we just fit the incomplete data. This is the fitting surface plot, each of it represent a surface. We can see three curves representing 48,72 and 96h data from below to above.

Figure 4. Yield - $C_{Glu}$, $C_{EtOH}$ surface plot

The three surfaces' analytic expression are these:

$$curve_{48}=-0.05x^2+0.09xy+1.346y^2+2.44x-6.56y-8.34$$

$$curve_{72}=-0.077x^2-0.575xy+0.1347y^2+4.403x+8.214y-28.875\tag{4}$$

$$curve_{96}=-0.069x^2-0.729xy+0.669y^2+4.558x+9.493y-23.372$$

The optimization value of $curve_{48}$ is 21.428, whose coordinate is $(24.400,0)$.

The optimization value of $curve_{72}$ is 34.068, whose coordinate is $(28.591,0)$.

The optimization value of $curve_{96}$ is 51.901, whose coordinate is $(33.029,0)$.

After that, we will use the statistic methods to find the confidence interval.

We assume the yield of BC-fibtil in specific concentration of glucose, ehylalcohal and growth day satisfy $N(\mu,\sigma^2)$, and we get some samples in every kinds of conditions. Suppose we get sample $x_1,x_2 ...,x_n$ and want to estimate the normal distribution parameter $\mu$ and $\sigma$. We use the classic MLE (Maximum Likelihood Estimation) method to achieve that. By maximizing likelihood Estimation, we get the likelihood $LL$ is:

$$L(\mathbf{x})=\prod_{i=1}^n \frac{1}{\sigma \sqrt{2\pi}}\exp(\frac{-(x_i-\mu)^2}{2\sigma^2})\tag{5}$$

and the log likelihood $LL$ is:

$$LL(\mathbf{x})=\sum_{i=1}^n (\frac{-(x_i-\mu)^2}{2\sigma^2}-ln(2\pi)-ln(\sigma))\tag{6}$$

This is equivalent to this question:

$$\min_{\mu,\sigma} (\sum_{i=1}^n \frac{(x_i-\mu)^2}{2\sigma^2}-n\ln(\sigma))\tag{7}$$

We can estimate best $\mu$ and $\sigma$ by setting the partial derivative to 0.

3. Growth Time Optimization Model

This model is built for knowing when is the best time for yielding the BC fibtil. To make the efficiency best, our goal is to maximize the yield divide the time, i.e.:

$$\max_t \frac{f(t)}{t}\tag{8}$$

in which $f(t)$ is the cumulative yield of BC-fibtils.

To find the formula of the $f(t)$, we do experiment to get the true data $x_t$(which is the cumulative weight of the BC-fibtil growing $t$ days), and use it to fit the $f(t)$. To be specific, we will use Logistic growth curve to fit the growth trend.

Our optimization goal is minimize the L-2 penalty(RMSE) between curve and real data.

There are many datas in different medium condition. We tries to figure out whether there are some similarities (about t) in different conditions.

We fit some conditions. The result of (Glucose=20g/L,alcohal concentration=1%) is $f(t)=\frac{59.2}{1+\exp{-0.57(t-3.03)}}$, RMSE=6.824. Interestingly, it shows $\frac{f(t)}{t}$ decrease slowly in $[2,4]$. But it decrease stably and more quickly when $t>4$.($f(2)/2=10.6, f(3)/3=9.8, f(4)/4=9.4$)

For other conditions, the $\frac{f(t)}{t}$ has a peak. The answer of (Glucose=30g/L, alcohal concentration=1.5%) is $f_2(t)=\frac{51.5}{1+\exp(-1.44(x-3.11))}$, and the answer of (Glucose=10g/L, alcohal concentration=2%) is $f_3(t)=\frac{28.3}{1+\exp(-1.92(x-3.02))}$. Both of the $\frac{f_2(t)}{t}$ and $\frac{f_3(t)}{t}$ has a peak about $t=4$.

Figure 5. Growth time optimization model

Consider the high variance of yield in $t\in[2,3]$, we choose yield the fibtil every 4 day.

4. Bioinformatics Amino Acid Fold Model

BpsA is a single-module non-ribosomal peptide synthetase, which can catalyze a single enzymatic reaction of two molecules of L-glutamine into the directly detectable blue pigment indigoidine.Like all NRPS enzymes, in order for BpsA to synthesise its product, it first needs to be converted from an inactive apo form to an active holo form.This conversion is achieved by the attachment of a 4′-phosphopantetheine (PPT) moiety derived from coenzyme A to the peptidyl carrier protein domain of BpsA, catalysed by a 4′-phosphopantetheinyl transferase (PPTase) enzyme,such as PcpS.

As the present research universally choose to express the two proteins separately，we want to explore the possibility of BpsA-PcpS fusion protein. We use Amino Acid Folding tools to calculate the structure of amino acid sequence. To be specific, we will use ColabFold v1.5.2-patch, a online folding tool to calculate the 3D structure of proteins.

At first, we fold the amino acid sequence fragment of BpsA.

The 2D structure of BpsA protein is this(we will give two perspectives):

Figure 6. 2D structure of BpsA protein

The 3D structure of bpsA protein is:

Figure 7. 3D structure of BpsA protein

And then, we fold the amino acid sequence fragment of BpsA-PcpS fusion protein.

The 2D structure of BpsA-PcpS fusion protein is this(we will gibe two perspectives):

Figure 8. 2D structure of BpsA-PcpS fusion protein

The 3D structure of bpsA-pcpS fusion protein is:

Figure 9. 3D structure of BpsA-PcpS fusion protein

Comparing the 3D structure of BspA and BspA-PcpS fusion protein, we find that the fusion protein appears to have some domain structural deformation, this is concentrated in the green segment, although this may be due to the low confidence in the structural predictions of this segment.

We fed this information back to the wet lab. After discussion,considering the antimicrobial effect of indigoidine, in order to avoid interfering with the growth of bacteria too early, and in combination with the results of structure prediction, we suggest that BPSA and PcpS should be eventually expressed separately by bacteria, instead of using a fusion protein expression system. On the one hand, this may lead to higher indigoidine production efficiency, on the other hand, it may make the system easier to control.