Result

Based on the given parameters and initial conditions, we solve the differential equations and assume that the initial conditions for all equations except equations (1) and (2) are 0. The solutions to the differential equations are as follows:

Light Conditions

\( \mathrm{RFP}=1.35 \times 10^{-4}-1.8080 \times 10^{-8} \mathrm{e}^{-0.346 \mathrm{t}}+1.3710 \times 10^{-4} \mathrm{e}^{-6 \times 10^{-3} \mathrm{t}} \)

Figure 1. Curve equation of the rate of RFP synthesis under light conditions.

\( \mathrm{GFP}=4.8170 \times 10^{-1}-1.4991 \times 10^{-7} \mathrm{e}^{-0.346 \mathrm{t}}-0.4817 \mathrm{e}^{-6 \times 10^{-3} \mathrm{t}} \)

Figure 2. Curve equation of the rate of GFP synthesis under light conditions.

Based on the function curves and by taking the derivative of the differential equations and setting the derivative to approach zero within an acceptable error range, we can determine the feasible time for the system. It is found that the synthesis rates of red fluorescent protein (RFP) and green fluorescent protein (GFP) reach their maximum values at T=1000 min (no significant growth trend). Therefore, we will consider T=1000 min as the duration of the light condition cycle and use the synthesis rates of RFP and GFP at this time as the initial conditions for the differential equations under the no-light condition.


Dark Conditions

We substituted the synthesis rates of red fluorescent protein (RFP) and green fluorescent protein (GFP) under light conditions as initial conditions into the differential equation. Then, we solved the synthesis rate equations for both proteins under no light conditions and obtained the following results.

\( \begin{array}{c}\mathrm{RFP}=1.35 \times 10^{-4}-2.2814 \times 10^{-6} \mathrm{e}^{-0.346 \mathrm{t}}+2.6119 \times 10^{-4} \mathrm{e}^{-6 \times 10^{-3} \mathrm{t}} \\-2.7211 \times 10^{-4} \mathrm{e}^{-3 \times 10^{-3} \mathrm{t}}\end{array} \)

Figure 3. Curve equation of the rate of RFP synthesis under dark conditions.

\( \text { GFP }=-4.8170 \times 10^{-1}+5.4231 \times 10^{-5} \mathrm{e}^{-0.346 \mathrm{t}}+1.1140 \times 10^{-3} \mathrm{e}^{-6 \times 10^{-3} \mathrm{t}} \)

Figure 4. Curve equation of the rate of GFP synthesis under dark conditions.

Due to the complexity of the function's expression in low-light conditions, it is not possible to obtain an analytical solution for finding the zero points. Therefore, we adopt an image recognition approach to directly extract the solutions from images. By dividing the x and y axes equally based on coordinate proportions and pixels and using color recognition with OpenCV to identify curves, we obtained the intersections of the RFP and GFP curves with the coordinate axes under low-light conditions. Based on the specific meaning of the curves, these intersection points reflect the time at which the synthesis rates of RFP and GFP drop to zero.

Furthermore, we concatenate the curves of green fluorescent protein (GFP) and red fluorescent protein (RFP) under both light and dark conditions and conduct multiple experiments continuously. This allows us to determine the appropriate pulse frequency of light that enables the continuous alternate synthesis of red and green fluorescent proteins.

From the image results, it is evident that selecting T=25 minutes can effectively achieve the goal of reducing the synthesis rates of red fluorescent protein (RFP) and green fluorescent protein (GFP) to zero. This, in turn, initiates the next synthesis cycle.

Conclusion

Therefore, according to the analysis of the ODE equations, it can be inferred that when the ratio of time under light and time in the dark is 40:1 , a periodic pattern of synthesis of different types of fluorescent proteins can be observed. Protein synthesis cycle T = 1025 minutes.

Application

In fact, the model we have established has a much broader application. Apart from determining the light-dark cycles during the synthesis of fluorescent proteins, we can theoretically obtain the synthesis quantities of both red fluorescent protein (RFP) and green fluorescent protein (GFP) for any time period by integrating the rate equations. Thus, theoretically, we can achieve any desired ratio of the two fluorescent proteins by altering the light-dark cycle.

Reflection

The current prediction method still has issues. Each time we change the light-dark cycle, it requires re-solving the boundary conditions of the differential equations. Additionally, handling both light and dark conditions together increases the overall workload. Furthermore, theoretically, the model may not always align perfectly with experimental results. When new experimental data is obtained, it necessitates re-solving the differential equations and modifying the constants, which is not advantageous for practical applications. To address this, we further employ the ARIMA model to fit experimental data, allowing us to refine the model based on experimental results and cross-validate the rate-based differential equation model.

ARIMA (AutoRegressive Integrated Moving Average) is a statistical model used for time series analysis and forecasting. It is built by breaking down time series data into three main components, namely AutoRegressive (AR), Integrated (I), and Moving Average (MA).

Here is the mathematical expression and a brief introduction to the ARIMA model:

AutoRegressive (AR) Component

The AutoRegressive part of the ARIMA model is denoted as AR(p), where p is the order of AutoRegression. The AR(p) model represents a linear relationship between the current observation and the past p observations. Its mathematical expression is:

\( X_{t}=c+\emptyset_{1} X_{t-1}+\emptyset_{2} X_{t-2}+\ldots+\emptyset_{p} X_{t-p}+\varepsilon_{t}\)

Here, \( X_{t} \) is the current observation, \(c \) is a constant \(\emptyset_{1},\emptyset_{12}\ldots\emptyset_{p}\) are AutoRegressive coefficients, and \(\varepsilon_{t} \) is white noise error.

Integrated (I) Component

The Integrated part of the ARIMA model is denoted as I(d), where d is the order of differencing. Differencing is used to make the time series stationary, eliminating seasonality and trend. Its mathematical expression is:

\(Y_{t}=(1-B)^{d} X_{t}\)

Here, \(B\) is the differencing operator \(B Y_{t}=Y_{t-1}\), \(Y_{t}\) is the differenced time series, and \(X_{t}\) is the original time series.

Moving Average (MA) Component

The Moving Average part of the ARIMA model is denoted as MA(q), where q is the order of Moving Average. The MA(q) model represents a linear relationship between the current observation and the past q white noise errors. Its mathematical expression is:

\(X_{t}=\mu+\varepsilon_{t}-\theta_{1} \varepsilon_{t-1}-\theta_{1} \varepsilon_{t-1}-\ldots-\theta_{q} \varepsilon_{q-1}\)

Here, \(\mu\) is a constant, \(\theta_{1},\theta_{2}\ldots\theta_{q}\) are Moving Average coefficients, and \(\varepsilon_{t}\) is white noise error.

By combining the three steps together, we can finally establish a model to conduct time series prediction based on the data collected from the experiment.

Prediction results based on ARIMA model and real data from experiments. We can use this method to generate new data and compared it with that generated by ODE methods.

Figure 5. ARIMA time series forecasting curve.

Future work

Our model can be extended and generalized as follows:

1.Microorganism Selection

The choice of microorganisms can be expanded to include yeast, replacing Escherichia coli, to serve the subsequent tasks in the synthesis of fragrances within yeast.

2.Synthetic Pathways

We can introduce the Michaelis-Menten equation into the model to extend the rate model guiding the synthesis of fluorescent proteins to guide the synthesis of fragrances. This allows for the synthesis of different proportions of fragrances using light pulses of varying durations.

The model can be modified in this way:

\(\frac{d(K_{mRNA_{lpps\ or\ tps}})}{dt}=K'_{S_{Ind}}[L_I]-\delta'_{S_{Ind}}[K_{mRNA_{lpps\ or\ tps}}], \)

\(\ where\ L_{I}\ = \begin{cases} &1, when\ light\ is\ 'ON'\\ &0, when\ light\ is\ 'OFF'\\ \end{cases} \)

\(\frac{d(K_{mRNA_{sass\ or\ cyp \ or\ cpr}})}{dt}=K'_{S_{Ind}}[L_R] -\delta'_{S_{Ind}}[K_{mRNA_{sass\ or\ cyp \ or\ cpr}}], \)

\(\ where\ L_{R}\ = \begin{cases} &0, when\ light\ is\ 'ON'\\ &1, when\ light\ is\ 'OFF'\\ \end{cases} \)

\(\frac{d(mRNA_{lpps\ or\ tps})}{dt}=K_{mRNA_{lpps\ or\ tps}}+K_{basalmRNA_{lpps\ or\ tps}}-\delta_{mRNA_{lpps\ or\ tps}}[mRNA_{lpps\ or\ tps}] \)

\(\frac{d(mRNA_{sass\ or\ cyp \ or\ cpr})}{dt}=K_{mRNA_{sass\ or\ cyp \ or\ cpr}}+K_{basalmRNA_{sass\ or\ cyp \ or\ cpr}}-\delta_{mRNA_{sass\ or\ cyp \ or\ cpr}}[mRNA_{sass\ or\ cyp \ or\ cpr}] \)

\(\frac{d(lpps\ or\ tps)}{dt}=K_{lpps\ or\ tps}[mRNA_{lpps\ or\ tps}]-\delta_{lpps\ or\ tps}[lpps\ or\ tps] \)

\(\frac{d(sass\ or\ cyp \ or\ cpr)}{dt}=K_{sass\ or\ cyp \ or\ cpr}[mRNA_{sass\ or\ cyp \ or\ cpr}]-\delta_{sass\ or\ cyp \ or\ cpr}[sass\ or\ cyp \ or\ cpr] \)

\(\frac{d(CPP)}{dt}=\frac{K_{1}[tps][GGPP]}{[GGPP]+K_{m1}} \)

\(\frac{d(sclareol)}{dt}=\frac{K_{2}[lpps][CPP]}{[CPP]+K_{m2}} \)

\(\frac{d(santalene)}{dt}=\frac{K_{3}[sass][FPP]}{[FPP]+K_{m3}} \)

\(\frac{d(santalol)}{dt}=\frac{K_{4}[cyp\ or\ cpr][santalene]}{[cyp\ or\ cpr]+K_{m4}} \)

Equations (13) and (14) describe the enzymatic reactions during sclareol synthesis, where the synthesis rate depends on the concentrations of the substrate GGPP and the enzymes TPS and LPPS. Equations (15) and (16) describe the enzymatic reactions during santalol synthesis, where the synthesis rate depends on the concentrations of the substrate FPP and the enzymes SASS, CYP, and CPR. At this point, the aforementioned system of ODEs has become too complex, making it impossible to obtain the relevant parameters within these ODE equations through any fitting algorithm. It is also challenging to obtain analytical expressions for the yields of the two products over time. Therefore, we hope to establish the relationship between yield and time using another model that does not depend on process parameters.

3.Prediction Methods

ARIMA models based on time series characteristics can be effectively applied to practical tasks with different data sizes, enabling real-time improvement and refinement of model parameters based on real experimental data and cross-validation of ODE models. On the other hand, neural network models, SVM models, and similar approaches are more constrained by data volume and exhibit lower robustness.

Figure 6. Support Vector Machine (SVM) forecasting curve.

The diagram shows the effect of neural network. It is obvious that this method is not suitable in small scale of data.

Figure 7. Neural Network forecasting curve.

The diagram shows the effect of supportive vector machine. It is obvious that this method is also not suitable in this task.

4.Bidirectional Prediction

As experiments progress and we gather more data, we can adopt a neural network approach to fit the relationship between the synthesis of fragrances in different proportions and the corresponding light exposure frequency, light-dark cycle, and temperature conditions. This enables bidirectional prediction between fragrance blending ratios and synthesis conditions. We can add an inhibitory factor alpha to the model, which enables the model to be applied more effectively to yeast cells.

Reference

  1. [1] Jayaraman, Premkumar et al. “Blue light-mediated transcriptional activation and repression of gene expression in bacteria.” Nucleic acids research vol. 44,14 (2016): 6994-7005. doi:10.1093/nar/gkw548.
  2. [2] Xie, Jinger et al. “A comparative study examining the cytotoxicity of inducible gene expression system ligands in different cell types.” Toxicology in vitro : an international journal published in association with BIBRA vol. 22,1 (2008): 261-6. doi:10.1016/j.tiv.2007.08.019.
  3. [3] Lee, Tong Ihn et al. “Transcriptional regulatory networks in Saccharomyces cerevisiae.” Science (New York, N.Y.) vol. 298,5594 (2002): 799-804. doi:10.1126/science.1075090.
  4. [4] Zhang, Kai, and Bianxiao Cui. “Optogenetic control of intracellular signaling pathways.” Trends in biotechnology vol. 33,2 (2015): 92-100. doi:10.1016/j.tibtech.2014.11.007.
  5. [5] Harley, C B, and R P Reynolds. “Analysis of E. coli promoter sequences.” Nucleic acids research vol. 15,5 (1987): 2343-61. doi:10.1093/nar/15.5.2343.
  6. [6] Piccolo, Domenico. (2008). A distance measure for classifying ARIMA models. Journal of Time Series Analysis. 11. 153 - 164. 10.1111/j.1467-9892.1990.tb00048.x.
  7. [7] Schmidhuber, Juergen. “Deep Learning in Neural Networks: An Overview.” ArXiv.org, 30 Apr. 2014,arxiv.org/abs/1404.7828v4.
  8. [8] Stoianov, Ivilin, and Marco Zorzi. “Emergence of a 'visual number sense' in hierarchical generative models.” Nature neuroscience vol. 15,2 194-6. 8 Jan. 2012, doi:10.1038/nn.2996
  9. [9] Sun, G. Z., et al. “The Neural Network Pushdown Automaton: Model, Stack and Learning. Simulations.” ArXiv.org, 15 Nov. 2017, https://doi.org/10.48550/arXiv.1711.05738.
  10. [10] Kotioni,Christina.“Time Series Modeling of the Greek GDP.” www.academia.edu, www.academia.edu/12565534/Time_series_modeling_of_the_Greek_GDP.