Overview

1 Plant virus diffusion infection model

1.1 Wind Field Large Eddy Simulation (LES)

1.2 Turbulent diffusion model of plant viruses

1.3 Plant virus pattern dynamics

1.4 Plant virus infestation process

2 Movement of RNA in plant transduction

2.1 Physical Shapes and Entropic Elasticity of RNA in Fluids

2.2 Wiener process in vascular bundle

2.3 Transpiration tension in vascularity

3 Deep learning and central dogma

3.1 Replication and degradation dynamics

3.2 Feasibility analysis based on deep learning

4 Binding of antigens and antibodies

Reference

Overview

Rice blast occurs in major rice regions of the world, with the more severe rice areas concentrated in Asia and Africa. Rice blast can occur in all growth stages and parts of rice, the most serious of which are leaf blast and neck blast. In disease-endemic areas or diseased fields, about 10%~30% of rice yield is reduced or even extinct every year [1]

Major grasses (maize, wheat and rice) together account for about half of global caloric intake. Unfortunately, our global dependence on grasses has inadvertently promoted some of the most destructive fungal plant pathogens, such as Magnaporthe Oryza sativa, the pathogen of rice blast.

In order to resist the damage of viruses to plants, we designed saRNA-based plant vaccines, and modeled the processes of virus infection, vaccine transport and delivery, efficacy time, antigen-antibody matching, etc. The whole modeling adopts a cross-scale model, from the macroscopic infection process to the microscopic molecular transport, combined with the neural network model and finite element method, and compared with the experimental results, to provide explanations and feasibility demonstrations for biological phenomena. The entire model is divided into four parts:

Figure 1 workflow

(1) Plant virus diffusion infection model: the wind field large eddy simulation(LES) was carried out in the experimental area (Zhejiang University), which provided a basis for the diffusion coefficient in the plant virus turbulent diffusion model. Assuming the linear propagation of the initial instantaneous wavefront, the influence of temporal and spatial factors on virus propagation is investigated by analytical solution. For the nonlinear plant virus diffusion model, the spatiotemporal dynamics of infection were constructed according to pattern dynamics.

(2) Movement of RNA in plant transpiration: according to the random walk model and considering the influence of charge on the spatial configuration of RNA, a possible three-dimensional configuration is obtained. Since the random movement of the entire RNA molecule needs to be considered at its scale, with the concentration gradient-driven vaccine transport delivery, the vaccine transport delivery process under multiple inducing forces is simulated.

(3) Deep learning and central dogma: With the assistance of AlphaFold & deep learning and stochastic differential equations, the replication, translation and degradation processes of saRNA are constructed, the translation results of RNA are predicted, and the time of pharmacodynamic effects is analyzed according to the competitive model of RNA replication and degradation. In the process of training the model, white Gaussian noise and random restart are added to improve the robustness and stability of the model, and the generalization of translation result prediction is improved with the help of meta-learning.

(4) Binding of antigens and antibodies: according to the deep learning model, in the process of selecting the best antigen and antibody binding, the binding energy and chemical pairing relationship are synthesized, and finally the binding method with the highest score and the corresponding protein structure are selected. Finally, on the basis of the screening results, the binding probability is analyzed according to the coordination model to provide a feasibility demonstration for the vaccine effect.

1 Plant virus diffusion infection model

1.1 Wind Field Large Eddy Simulation (LES)

Meteorology is a huge factor affecting the whole process of plant growth, and there are various air currents around the world, as shown in the figure below [2]. Since the transmission mode of rice blast spores is mainly environmental transmission (such as airflow, etc.), it is necessary to simulate the airflow field eddy in the planting environment. The Yuquan Campus of Zhejiang University (longitude: 120.123077, latitude: 30.263842) where the ZJU-China team is located was used as an example to simulate the rice planting environment.

Figure 2 Meteorological simulation of Hangzhou area (2023-08-20-15:00)

Figure 3 Satellite imagery of Yuquan Campus of Zhejiang University

Figure 4 Geometric modeling

Because it is large-scale meteorological flow, then, large eddy simulation is used to directly calculate large-scale fluid pulsation, and small-scale vortex is filtered. For the fundamental Navier-Stokes tensor equation:

$$ \frac{\partial u_i}{\partial x_i} = 0 \tag{1} $$

$$ \frac{D u_i}{D t} = - \frac{1}{\rho}\frac{\partial p}{\partial x_i} + \nu \frac{\partial^2 u_i}{\partial x_j \partial x_j} + f_i \tag{2} $$

where $u_i$ represents the magnitude of velocity, $x_i$ represents the spatial position of the fluid, $ f_i $ represents the mass force, $\rho$ represents the density of the fluid, $p$ represents the pressure, $t$ represents time, and $\nu$ represents Kinematic viscosity.

The method of small-scale vortex filtration is to perform low-pass filter of the Navier-Stokes equation, which decomposes the fluid velocity into large-scale motion velocity (solvable scale pulsation) and small-scale pulsation velocity (sublattice-scale pulsation):

$$ u(\boldsymbol{x},t) = \tilde{u}(\boldsymbol{x},t) + u''(\boldsymbol{x},t) \tag{3} $$

where $u(\boldsymbol{x},t)$ represents the actual velocity, $\tilde{u}(\boldsymbol{x},t)$ represents the low-pass pulsation, and $u''(\boldsymbol{x},t)$ represents the residual pulsation. The integral filtering equation can be determined by the following expression:

$$ \tilde{u}(\boldsymbol{x},t) = \frac{1}{l^3} \int_{-\frac{l}{2}}^{\frac{l}{2}} \int_{-\frac{l}{2}}^{\frac{l}{2}} \int_{-\frac{l}{2}}^{\frac{l}{2}} u(\boldsymbol{\xi},t) G({\boldsymbol{x} - \boldsymbol{\xi}}) d\xi_1 d\xi_2 d\xi_3 \tag{4} $$

where $G(\boldsymbol{x} - \boldsymbol{\xi})$ is the filter function, and the integral filtering of the cube of length $l$ is performed, then the one-dimensional box integral filtering function can be expressed as:

$$ G(\eta) = \left\{ \begin{align*} & 1,\ \left| \eta \right| \lt \frac{l}{2} \\\\ & 0,\ \left| \eta \right| \gt \frac{l}{2} \end{align*} \right. \tag{5} $$

In addition to box filtering, there are also gaussian filters and other forms. Considering the uniform filtering process, in the physical space, the filtering operation and the partial conduction operation can be interchanged. Substitute the above form into the Navier-Stokes equation, we can derive the filtered Navier-Stokes equation:

$$ \frac{\partial \tilde{u}_i}{\partial x_i} = 0 \tag{6} $$

$$ \frac{\partial \tilde{u}_i}{\partial t} + \frac{\partial \tilde{u}_i \tilde{u}_j}{\partial x_j} = - \frac{1}{\rho} \frac{\partial \tilde{p}}{\partial x_i} + v \frac{\partial^2 \tilde{u}_i}{\partial x_j \partial x_j} + \tilde{f}_i + \frac{\partial \tau_{ij}}{\partial x_j} \tag{7} $$

where $\widetilde{(\cdot)}$ represents filtered variable, and $\tau_{ij} = \tilde{u}_i\tilde{u}_j - \widetilde{u_i u_j}$ is the sublattice stress. Sublattice stress is the momentum transport between filtered out small-scale pulsations and solvable-scale turbulence. To achieve large eddy simulation, it is necessary to construct a closure of sublattice stresses.

Assuming that the small-scale pulsations filtered out by the isotropic filter are locally equilibrium, that is, the energy transfer from the unsolvable scale pulsations of the solvable scale term is equal to the turbulent energy dissipation, the sublattice Reynolds stress mode in the vortex-viscous form is [3]:

$$ \tau_{ij} = \tilde{u}_i\tilde{u}_j - \widetilde{u_i u_j} = 2\nu_tS{ij} - \frac{1}{3}\ \tau_{kk} \delta_{ij} \tag{8} $$

where $\nu_t$ is the sublattice vortex viscosity coefficient, $S_{ij}$ is the strain rate tensor, and the figure below is the large eddy simulation results of the environment of Yuquan Campus of Zhejiang University in a defined physical space.

Figure 5 Large eddy simulation results

Assuming that the wind flows from the cold source to the hot environment, the West Lake is considered as the cold source, so the setting condition is that the air flows from east to west. From the simulation results, it can be seen that in the part where the mountain and architecture areas are not encountered, the air flow has relatively small vortex generation, while in the part with buildings and mountains, the air flow has a relatively large and unstable vortex state, and the streamline line is also deflected, so the planting environment here has the characteristics of turbulent atmosphere.

1.2 Turbulent diffusion model of plant viruses

Consider a large, homogeneous field, planted to the rice which is genetically uniform, and a single infection of the plant pathogen is introduced to the field. Consider further that the theory of turbulent diffusion governs the dispersal of windborne spores of most foliar pant pathogens which suggests that spread becomes more efficient as the area with diseased plants expands. This should result in epidemics with increasing frontal velocity (dispersive epidemic waves).

Under ideal circumstances, the direction from which the wind moves through the rice is random. Then, the fact leads to a focal epidemic whose front will expand radially in a wavelike manner from the site of initial infection [4].

Let $U(r,t)$ be the diseased rice tissue (e.g., the number of diseased leaflets per plant) at radial distance $r$ and time $t$. Based on the simulation of the airflow above (see Section 1.1 for details) with the theory of Richardson for turbulent atmospheric diffusion predicts that the diffusion coefficient is $K(r) \propto r^{\frac{4}{3}}$. And considering the disease intensifies follows the Vanderplank logistic equation with apparent infection rate $\gamma$, then the differential equation is:

$$ \frac{\partial U(r,t)}{\partial t} = \frac{1}{r} \frac{\partial}{\partial r}\left(K(r) r \frac{\partial U}{\partial r}\right) + \gamma U (1-U) \tag{9} $$

where $K(r) = C_1r^{\frac{4}{3}}$, $C_1$ is a constant, then:

$$ \frac{\partial U(r,t)}{\partial t} = \frac{C_1}{r} \frac{\partial}{\partial r}\left(r^{\frac{7}{3}} \frac{\partial U}{\partial r}\right) + \gamma U (1-U) \tag{10} $$

However, the equation (10) cannot be solved analytically. If we only focus on the front of the epidemic, the differential equation can be solved approximately by changing the logistic term $\gamma U (1-U)$ into linear term $\gamma U$:

$$ \frac{\partial U(r,t)}{\partial t} = \frac{C_1}{r} \frac{\partial}{\partial r}\left(r^{\frac{7}{3}} \frac{\partial U}{\partial r}\right) + \gamma U \tag{11} $$

solving the equation, together with the boundary condition $U(r,0) = \delta(r)$ and $U(\infty, t) \rightarrow 0$:

$$ U(r,t) = \frac{1}{6 \pi (\frac{4}{9} C_1 t)^3} \exp\left(\gamma t - \frac{9 r^{\frac{2}{3}}}{4 C_1 t}\right) $$

Figure 6 linear approximation (take $\gamma$ and $C_1$ as unit)

Table 1 Infection Rate vs. Time Series
Time Series	0	23	45	68	91	113	136	158	181
Infection Rate	0.00%	11.12%	23.18%	37.83%	47.23%	59.14%	69.54%	77.46%	80.58%

The table above shows the changes in the proportion of infections measured with the time series at a defined place $r$. According to the quasi-steady-state frontal approximation, the curves are fitted using the sets of data, and the Trust-Region Dogleg method is used to calculate the nonlinear system of equations with random initial iterations, and the constant values and iterative data are obtained in the following table (only convergent solutions are shown):

Table 2 Fitting constants
Times	$C_1$ origin	$\gamma$ origin	$r$ origin	$C_1$ converge	$\gamma$ converge	$r$ converge
1	0.221747	0.117418	0.296676	0.007803174	0.049709744	0.140243978
2	0.679728	0.136553	0.721227	0.007803174	0.049709744	0.140243978
3	0.08347	0.133171	0.173389	0.007803306	0.049710212	0.140244064
4	0.325146	0.105629	0.610959	0.007803174	0.049709744	0.140243978
5	0.695140	0.067993	0.254790	0.007803174	0.049709744	0.140243978
6	0.473486	0.152721	0.341125	0.007803304	0.049710294	0.140245163
7	0.318074	0.119215	0.939829	0.007803174	0.049709744	0.140243978
8	0.218677	0.105798	0.109697	0.007803174	0.049709744	0.140243978
9	0.192028	0.138874	0.696266	0.007803174	0.049709744	0.140243981
10	0.347713	0.149997	0.586092	0.007803174	0.049709744	0.140243978

It can be found that the convergent solutions are all the same convergent solution in the error range, other initial iteration points do not make the results converge or conform to practical requirements, and the larger the value of the random point (the more it deviates from the initial stability domain), the less it converges.

From the left graph, it can be seen that the data can match the curve only if it meets the quasi-static short time approximation, and the curve deviates after exceeding the quasi-static short time.

1.3 Plant virus pattern dynamics

In order to construct a mathematical model consisting of activator and inhibitor, assuming $V(r,t)$ as the dead or recovered rice tissue (i.e., that tissue cannot infect other plants) at radial distance $r$ and time $t$ , the equation is:

$$ \frac{\partial V(r,t)}{\partial t} = \frac{K_2}{r} \frac{\partial}{\partial r}\left(r\frac{\partial V}{\partial r}\right) + \tilde{G}(U,V) \tag{13} $$

where $K_2$ is the diffusion coefficient of the infection-free rice, $\tilde{G}(U,V)$ is the function controls interactions between nfected and uninfected plants. Considering a small area of the field, the diffusion coefficient can be regarded as a constant parameter $K_1$, then the equation is:

$$ \frac{\partial U(r,t)}{\partial t} = \frac{K_2}{r} \frac{\partial}{\partial r}\left(r\frac{\partial U}{\partial r}\right) + \tilde{F}(U,V) \tag{14} $$

Combine equations (13)(14) and take a polynomial function approximation to the function $\tilde{F}$ and $\tilde{G}$, assuming that the uninfected tissue will get infected by contacting with the other two infected tissue, then the form is consistent with pattern dynamics.

Figure 7 Pattern Dynamics

The whole process of the spread of M. Oryza sativa is shown in the figures above, the patterns start from small random position, and then spread to the whole area, and the speed of diffusion first increases and then decreases.

1.4 Plant virus infestation process

The rice blast is mainly in the form of conidia and hyphae under certain temperature and humidity conditions relying on airflow, through airborne transmission to plant leaves and other aerial parts, in the presence of water film and suitable environmental conditions, the viscosity of the conidia tip makes it adhered to the surface of aboveground tissues such as rice leaves. Then, under the premise of not needing external nutrients, the conidia only germinate through the endogenous nutrients of the spores to form bud tubes, and then the bud tubes specifically differentiate into attached cells with melanin. Then the attachment cells of melanin form infection plugs and penetrate the stratum corneum and superficial cell wall of some plant tissues above ground, invading epidermal cells or elongated cells. After invasion, rice blast bacteria grow in rice cells, and the hyphae rapidly expand and spread, and then infect adjacent epidermal cells and penetrate deep into mesophyll cells.

After the infection is completed, hyphae and conidia are produced in the infected cells, and are directly released into the air from the stomata or penetrating the plant epidermis, and new conidia are produced at the apex of the conidia peduncle, and mature conidia are released into the air, and again spread to the leaves of other rice by means of airflow propagation to cause secondary infection [5].

2 Movement of RNA in plant transduction

2.1 Physical Shapes and Entropic Elasticity of RNA in Fluids

Consider random walk of saRNA to describe its physical morphology in rice vascular. Assuming the polymer chains can be decentralized into plenty of rigid segments, denoted Kuhn length $a$. It is assumed that the physical morphology of saRNA is determined by end-to-end distance $R$, i.e., the distance that the end of the RNA segments leaves the initial end after random walk $N$ times. Using the averaging method, the expected value of $R$ is:

$$ \langle R \rangle = \langle \sum_{i=1}^N x_i \rangle \tag{15} $$

where $x_i = \pm a$ is the displacement of step $i$, then considering the variance of random walk:

$$ \langle R^2 \rangle = \langle \sum_{i=1}^N \sum_{j=1}^N x_i x_j \rangle \tag{16} $$

the expansion of equation (16) is:

$$ \langle R^2 \rangle = \sum_{i=1}^N \langle x^2_i \rangle \sum_{i \neq j=1}^N \langle x_i x_j \rangle \tag{17} $$

Due to the randomness of the walk, the walk at step $j$ is not correlated with the $i$ step, then the second term of equation (17) is 0, and the variance is determined entirely by the autocorrelation displacement:

$$ \langle R^2 \rangle = N a^2 \tag{18} $$

according to the results of one-dimensional random walk [6], it can be generalized to three-dimensional form:

$$ p(\boldsymbol{R};N) = \mathcal{N} e^{- \kappa \boldsymbol{R}^2} \tag{19} $$

the parameters $\mathcal{N}$ and $\kappa$ are determined by normalization and variance:

$$ \left\{ \begin{align*} \iiint_{-\infty}^{+\infty} p(\boldsymbol{R};N) d^3 \boldsymbol{R} &= 1 \\ \iiint_{-\infty}^{+\infty} R^2p(\boldsymbol{R};N) d^3 \boldsymbol{R} &= N a^2 \end{align*} \right. \tag{20} $$

the random walk chain distance distribution consisting of Kuhn length with $N$ step length $a$ is obtained:

$$ p(R;N) = \left( \frac{3}{2 \pi N a^2} \right)^{\frac{3}{2}} e^{- \frac{3R^2}{2Na^2}} \tag{21} $$

Figure 8 Gaussian Distribution of the Chain Length of RNA Molecules

From the results, it can be seen that ideally, the probability of conformation is the largest when $R$ is small (the probability of center position is the highest in the figure), that is, RNA molecular chains tend to form more uniform clusters, while fully elongated long chains are not easy to appear, and the simulation schematic diagram of the two cases based on molecules is as follows:

Figure 9 End-to-end distance of the chain

Appendix

The theory that RNA segments can be discretized into plenty of rigid segments is based on the theory of persistence length of RNA, assuming that within the persistence length, the molecular segments basically take the form of straight segments. The persistence length is the characteristic scale of the tangent vector correlation function attenuating along the chain segment:

$$ \langle \boldsymbol{t}(s) \cdot \boldsymbol{t}(u) \rangle = e^{-\frac{\left| s-u \right|}{\xi_p}} \tag{22} $$

where $\boldsymbol{t}(x)$ represents the tangent vector, $x$ represents the position along the segment in a chain segment of length $L$, and $\xi_p$ is the resident length, it should be noted that the exponential decay rate only applies to sufficiently small spacings $\left| s-u \right|$. By definition, the end-to-end distance is:

$$ R = \int_0^L \boldsymbol{t}(s)ds \tag{23} $$

then the end-to-end distance variance is:

$$ \langle R^2 \rangle = \langle \int_0^L \boldsymbol{t}(s)ds \cdot \int_0^L \boldsymbol{t}(u)du \rangle \approx 2 L \xi_p \tag{24} $$

It can be seen that the relationship between the Kuhn length $a$ and the resident length $\xi _p$ is $a = 2 \xi _ p$.

The physical morphology of RNA is not only affected by thermodynamics, but also by the electrostatic action of the spatial structure, as well as the electrical action of ions around the liquid during vascular transport (satisfying the Poisson-Boltzmann equation), considering the above influencing factors, the ball and stick model of a physical form of RNA is shown below [7]

Figure 10 RNA ball stick morphology

2.2 Wiener process in vascular bundle

Based on the investigation of its physical form in Section 2.1, if the saRNA molecule is regarded as a whole, it meets the requirements of the Wiener process for molecular scale of 0.1~10 microns. Consider RNA molecule with coordinates $\boldsymbol{q}$ and momentum $\boldsymbol{p}$ suspended in a liquid composed of solvent particles:

$$ H(\boldsymbol{p}, \boldsymbol{q}, \{\boldsymbol{p}_i, \boldsymbol{q}_i\}) = \frac{\boldsymbol{p}^2}{2M} + \sum_{i=1}^{N} \frac{\boldsymbol{p}^2_i}{2m_i} + \sum_{i=1}^{N}V_{FP}(\boldsymbol{q} - \boldsymbol{q}_i) + \sum_{1 \leq i \leq j \leq N} V_{FF}(\boldsymbol{q}_i - \boldsymbol{q}_j) + V_{ext}(\boldsymbol{q}) \tag{25} $$

where the $V_{FP}$ is the interaction of solvent particles and particles, the $V_{FF}$ is the interaction of solvent particles and solvent particles, and the $V_{ext}$ is the external field potential. Leaving aside the macroscopic flow of the fluid, assuming that there is a harmonic oscillator potential $V_{FP}(\boldsymbol{q} - \boldsymbol{q}_i) = \frac{1}{2}m_i\omega_i^2(\boldsymbol{q} - \boldsymbol{q}_i)^2$, and assuming that the interaction potential between solvent molecules is zero potential energy, and all solvent molecular masses are the same $m$, and $M \gg m$, then the Hamiltonian can be simplified to: $ H(\boldsymbol{p}, \boldsymbol{q}, \{\boldsymbol{p}_i, \boldsymbol{q}_i\})$ $=$ $\frac{\boldsymbol{p}^2}{2M} $ $+$ $\sum_{i}\frac{\boldsymbol{p}^2}{2m_i}$ $+$ $\frac{1}{2}\sum_i \omega_i^2(\boldsymbol{q} - \boldsymbol{q}_i)^2$ $+$ $V(\boldsymbol{q}) $

according to the momentum theorem and Hamiltonian equations:

$$ \frac{d^2 \boldsymbol{q}_i}{dt^2} = \frac{1}{m} \frac{d \boldsymbol{p}_i}{dt} = - \frac{1}{m} \frac{\partial H}{\partial \boldsymbol{q}_i} = - \omega_i^2 (\boldsymbol{q}_i - \boldsymbol{q}(t)) \tag{26} $$

From the Duhamel integral, it can be seen that the trajectory of the solvent molecule is:

$$ \boldsymbol{q}_i(t) = \boldsymbol{q}_i(0) cos(\omega_i t) + \frac{\boldsymbol{p}_i(0)}{m\omega_i}sin(\omega_it) + \omega_i \int_o^t \boldsymbol{q}(s)sin(\omega_i(t-s))ds \tag{27} $$

Integral by parts to equation (27):

$$ \boldsymbol{q}_i(t) = \boldsymbol{q}_i(0) cos(\omega_i t) + \frac{\boldsymbol{p}_i(0)}{m\omega_i}sin(\omega_it) + \boldsymbol{q}(t) - \boldsymbol{q}(0)cos(\omega_it) - \frac{1}{M} \int_0^t \boldsymbol{p}(s)cos(\omega_i(t-s))ds \tag{28} $$

Then the Hamiltonian equation for RNA clusters is:

$$ \frac{d \boldsymbol{q}}{dt} = \frac{\boldsymbol{p}}{M} \tag{29} $$

$$ \frac{d \boldsymbol{p}}{dt} = -\nabla V(\boldsymbol{q}) - \sum_{i=1}^{N}m_i\omega_i^2(\boldsymbol{q} - \boldsymbol{q}_i) \tag{30} $$

where $ (\boldsymbol{q} - \boldsymbol{q}_i) $ can be obtained by transposition from equation (28):

$$ \boldsymbol{q}_i(t) - \boldsymbol{q}(t) = \boldsymbol{q}_i(0)cos(\omega_it) + \frac{\boldsymbol{p}_i(0)}{m\omega_i} sin(\omega_it) - \boldsymbol{q}(0)cos(\omega_it) - \frac{1}{M}\int_0^t \boldsymbol{p}(s)cos(\omega_i(t-s))ds \tag{31} $$

let:

$$ K(t) = \sum \nolimits_i m\omega_i^2cos(\omega_it) \tag{32} $$

$$ \xi(t) = - \sum \nolimits_i m \omega_i^2(\boldsymbol{q}(0) - \boldsymbol{q}_i(0)) cos(\omega_it) + \sum \nolimits_i \boldsymbol{p}_i(0)\omega_isin(\omega_it) \tag{33} $$

Substitute the above equation into (29)(30), we can derive the equilibrium dynamical equation:

$$ \left \{ \begin{align*} \frac{d \boldsymbol{q}}{dt} & = \frac{\boldsymbol{p}}{M} \\ \frac{d \boldsymbol{p}}{dt} & = - \nabla V(\boldsymbol{q}) - \int_{0}^{t}\frac{\boldsymbol{p}(s)}{M}K(t-s)ds + \xi(t) \end{align*} \right. \tag{34} $$

It can be seen from the above kinetic equation that without considering the interaction between solvents, $K(t)$ reflects that the solvent molecules only play a buffering role, remembering the effect of RNA particles on the solution at the past moment and delaying this effect to the RNA particles; $\xi(t)$ is related to the initial states of the system, which are unknown and can be treated as random variables. Assuming that the solution is in equilibrium, $\xi(t)$ conforms to the Gaussian distribution, then there are:

$$ \langle \xi(t) \rangle = 0 \tag{35} $$

$$ \langle \xi_{\alpha}(t) \xi_{\beta}(t') \rangle = k_B T \delta_{\alpha \beta}K(t-t') \tag{36} $$

Taking $K(t) = 2 \mu \delta(t)$ gives the Langevin equation for RNA, where $\xi(t)$ is white Gaussian noise. Since the movement of RNA is a superposition of random motion and transpiration tension, if the influence of one of these factors is studied, it is necessary to characterize it with the assistance of another equivalent tracer particle, because the expression of fluorescent protein is related to RNA, and to a certain extent, the two are similar in scale, so fluorescent protein can be used as a "conjugated particle" to approximate the influence of RNA random motion factors. For solving the above stochastic differential equation, the following assumptions are met according to the experiment:

(1) Fluorescent label does not have a specific drift direction:

$$ \langle W_{t + t_0} - W_{t_0} \rangle = 0 \tag{37} $$

(2) The drift direction of the fluorescent label is independent of the previous time, and the different increments are independent of each other if $t_i \lt t_{i+1} \leq t_j \lt t_{j+1}$:

$$ \langle ( W_{t_{j+1}} - W_{t_j} ) ( W_{t_{i+1}} - W_{t_i} ) \rangle = 0 \tag{38} $$

(3) The mean displacement squared of the movement of the fluorescent label is proportional to time:

$$ \langle (W_{t + t_0} - W_{t_0})^2 \rangle \propto \left| t \right| \tag{39} $$

(4) The displacement increment of fluorescent label obeys the Gaussian distribution:

$$ W_{t + t_0} -W_{t_0} \sim \mathcal{N}(0, \sqrt{t}) \tag{40} $$

Since the mass of the fluorophore is small, the kinetic equation after ignoring the inertia term can be reduced to:

$$ \frac{d q(t)}{d t} = - \nabla V(\boldsymbol{q}) + \xi(t),\ \langle \xi(t) \rangle = 0,\ \langle \xi(t)\xi'(t) \rangle = \delta(t-t') \tag{41} $$

Solving the above equation, we can derive $q(t) = W_t$, i.e., the solution to the differential equation for the stochastic process is the Wiener process. For a stochastic $X_t$ process, if it has a Wiener process with an offset of $f(X_t, t)$ and a standard deviation $\sqrt{2D(X_t, t)}$, the corresponding Langevin equation is:

$$ dX_t = f(X_t,t)dt + \sqrt{2D(X_t,t)}dW_t \tag{42} $$

For the random process to take the value at the left end of the interval, the Ito- Langevin equation for fluorescence motion can be obtained by using the Ito integral:

$$ X_t = X_0 + \int_0^t f(X_t,t)dt + \lim_{\max{ \{t_{i+1}-t_i\} }\rightarrow 0} \sum \nolimits_i \sqrt{2D(X_t,t)}(W_{i+1} - W_i) \tag{43} $$

The trajectory can be obtained by computer simulation of the above process, assuming that the fluorescence starts from the black point to the red point terminate, and the trajectory color changes from the initial position white to the dark blue at the end point, which is consistent with the fluorescence diffusion of the experiment under statistical average, as shown in the figure below:

Figure 11 Timing Wiener process

2.3 Transpiration tension in vascularity

Rice transpiration can be modeled according to energy conservation and mass conservation, which links transpiration to leaf-to-air vapor pressure difference:

$$ \lambda E T = R_n - H_c \tag{44} $$

$$ \lambda E T = \frac{\rho C_p}{\gamma}g_tD_c \tag{45} $$

where $\lambda$ is the latent heat of vaporization, $ET$ is the evapotranspiration rate, $R_n$ is the net radiation intercepted by the crop, $H_c$ is the sensible heat exchanged between the canopy and the air, $\rho$ is the density of the air, $C_p$ is the specific heat of air, $\gamma$ is the psychrometric constant, $g_t$ is the total canopy conductance to water vapor transfer, $D_c$ is the canopy-to-air vapor pressure difference. Since some parameters are difficult to measure, the parameters need to be transformed or equivalented [8].

In the process of modeling plant conducting tissues, the transpiration of plants will cause the difference in water content in the ducts from roots to leaf fragments, resulting in inorganic salt concentration gradients, while the TLS segment of RNA can bind proteins to drive RNA movement in the ducts with a concentration gradient. Not only that, transpiration itself also causes water to flow from the roots to the leaves, and can also carry RNA throughout the body. Since the scale of the simulation is macroscopic, a cross-scale model needs to be established to simulate the average effect of the Wiener process in Section 2.2.

In the simulation, the chemical potential difference and osmotic pressure difference caused by the concentration gradient can be equivalent to the duct pipe pressure difference, and the overall equivalent structural pressure difference and the laminar flow of the pipe triggered by the initial flow rate can simulate the process of RNA passing through the plant conducting tissue to the whole body.

Figure 12 Small-scale approximation of straight-vascular bundle flow simulation

In the figure above, a small scale approximation part along the axis length of a plant vascular bundle is taken, and due to the small axis length of the selected segment, it can be approximately equivalent to a straight segment for a curved plant vascular. Considering the fluid pressure differential drive and Brownian motion caused by transpiration tension, the saRNA transport is relatively stable. And the actual plant vascular is with pipe curvature and there is a certain ring in the duct, ring and pipe bending will affect the flow state of fluid particles, making the actual flow more complex, as shown in the following two figures, vascular bending and the existence of ring lines will delay the flow speed, considering gravity, will further increase the resistance of the transpiration tension of the plant from the root to the leaf, so the flow state in the vascular tube is similar to laminar flow, the flow rate and streamline are relatively stable.

Figure 13 Curved vascular bundle flow velocity simulation

Figure 14 Flow vortex simulation of curved vascular bundles

Figure 15 Particle transport motion simulation

In the experimental results, taking the results of 720bp GFP mRNA and 3951bp RUBY mRNA as an example, three TLS and a control group were used in the experiment, and Actin was used as the control group in each collection, and the fluorescence intensity was measured with qPCR to reflect the mRNA concentration of different spatial positions of the plants on the fourth day.

(1) In the same spatial position, by filtering the flow direction data (that is, taking the Lagrange description coordinate system as the reference coordinate), a spatial position change map is made, and extracting different frames can verify that its motion is random in the relative coordinate system.

(2) With the increase of distance, the amount of mRNA decreases, which is the same as the finite element simulation result.

(3) With the increase of the bp number of mRNA, it is clear that the movement efficiency of the GFP mRNA group is higher than that of the RUBY mRNA group, that is, the larger physical configuration molecules as particle streams, their transport is slower, consistent with the finite element results, the figure below is the diffusion spatial location map and the relative mRNA magnitude.

Figure 16 Random motion in Lagrange description

Table 3 Experimental result
	0 cm			1 cm			2 cm			3 cm
Ara. $\Delta \Delta t$	0.95237	1.03555	1.01207	0.027021215	0.03111	0.02469	0.006631412	0.00713	0.00709	0.003939975	0.003794662	0.003794662
Ara. full	1.03142	0.89493	1.07365	0.502395045	0.50875	0.58817	0.1900898	0.14691	0.13312	0.16744408	0.128286766	0.281430415
Nico. full	0.99112	1.00888		0.354603498	0.36521	0.44602	0.066838305	0.05554	0.04482	0.065587886	0.08402915	0.07390646
None	1.04465	0.96651	0.98884	0.007697609	0.00743	0.00751	0.007601851	0.00888	0.00721	0.005060847	0.006411933	0.006501321

3 Deep learning and central dogma

3.1 Replication and degradation dynamics

Triggering the replication of saRNA requires the replicase to recognize a specific site, so the saRNA that is not bound to the replicase is regarded as an inactive state $I_s$ and the saRNA that binds to the replicase is regarded as an activated state $A_s$, then there are:

$$ I_{\text{saRNA}} \leftrightharpoons A_{\text{saRNA}} \tag{46} $$

The state function $p$ is used to digitize the active and inactive states $\varphi(I_{\text{saRNA}}) = 0; \varphi(A_{\text{saRNA}}) = 1$, and assume that the degradation products of saRNA are $\eta$, and the degradation products of proteins are $\xi$, $\alpha,\ \beta,\ \gamma,\ \omega,\ c_1,\ c_2$ are constants:

A ternary function is used to describe a random system, if $\lambda$ represents the amount of saRNA (including activated and inactivated, where the proportion of activated saRNA in the total is $\theta$), $\mu$ represents the amount of protein, and $\varphi$ is a state function variable, then the joint distribution rate is:

$$ \left\{ \begin{align*} & f_{\lambda \mu} = P[\text{saRNA} = \lambda, \text{protein} = \mu ; \varphi = 0] \\\\ & g_{\lambda \mu} = P[\text{saRNA} = \lambda, \text{protein} = \mu ; \varphi = 1] \end{align*} \right. \tag{47} $$

the master equation of the system over time is:

$$ \frac{df_{\lambda \mu}}{dt} = c_1 g_{\lambda \mu} - c_2 f_{\lambda \mu} + (1+\gamma)[(\lambda+1)f_{\lambda+1; \mu} - \lambda f_{\lambda \mu}] + \beta\lambda f_{\lambda ;\mu-1} +\omega(\mu+1)f_{\lambda; \mu+1}- (\beta \lambda +\omega \mu )f_{\lambda \mu} \tag{48} $$

$$ \frac{dg_{\lambda \mu}}{dt} =c_2f_{\lambda \mu} - c_1g_{\lambda \mu} +(1+\gamma)[(\lambda+1)g_{\lambda+1; \mu} -\lambda g_{\lambda \mu}] + \alpha \theta \varphi(A)g_{\lambda-1; \mu} - \alpha \theta \varphi(A)g_{\lambda \mu} + \beta \lambda g_{\lambda;\mu-1} + \omega(\mu + 1)g_{\lambda; \mu+1} - (\beta \lambda + \omega \mu)g_{\lambda \mu} \tag{49} $$

where $\lambda, \ \mu \in \mathbb{N}_0$, and the influence terms in the differential equation are all influence terms added considering feedback regulation. When there is a large number of genes and proteins, according to the Law of Large Numbers, the result will converge according to the probability, but this requires a lot of differential equations to describe the phenomenon, as the number of molecules increases, the number of differential equations required will exceed the solvable range, so considering practical applications, it is necessary to simplify the above master equation and describe the system with stochastic differential equations to make the problem solvable[9].

For the above phenomenon, the state of the gene is described by the random process $G(t)$:

$$ \frac{d\lambda}{dt} = - \lambda + H \cdot G(t) \tag{50} $$

$$ \frac{d \mu}{dt} = K \Lambda - \sigma \mu \tag{51} $$

The stable solution of the above equation is described by expectation as well as variance, as shown in the figure:

Figure 17 Relative mRNA level predicted of different days

Figure 18 Decline rate of mRNA

To better describe the phenomenon, different molecular behaviors can be modeled separately over time:

(1) Expression of plasmid: After the plasmid is injected into the plant, the expression amount will reach a peak after a certain period of time, after which it will be degraded, and it can be seen from the predicted results that the expression peaks on the 7th day of injection.

(2) Transcription and translation: Transcription and translation rely in part on the stochastic differential equation of (50)(51).

(3) Product degradation: After infection and vaccination, the previous product will be degraded during the recovery process of the plant, and the whole existence process lasts about 15 days from the prediction results.

3.2 Feasibility analysis based on deep learning

Artificial intelligence has been widely used in biological structure prediction, and the prediction of translated structure, especially the prediction of protein folding structure, has made great progress after the emergence of AlphaFold [10]. In this model, with the assistance of meta-learning, simple RNA sequence-based structure prediction is trained, and the feasibility of the project is analyzed through the results of structure prediction. Since AlphaFold does not solve the structural prediction caused by mutations, and mutations often occur in general biological experiments, the structural model based on mutation sequences is trained with the assistance of meta-learning, and the method of randomly adding restrictive white Gaussian noise is used in the generation of mutation raw samples, and the mutation sites are classified into different tasks in quantitative form, and the tasks are divided into Training Tasks and Testing Tasks in a ratio of 7:3. In each task, the original sample data is randomly produced by adding the method of restrictive white Gaussian noise for certain times, so that the data is more realistic, and the samples in each task are divided into Support Set and Query Set, according to which the Model Agnostic Meta Learning (MAML)[11] is used to train model parameters as the loss function to consider the value, and a random restart algorithm is added to avoid falling into local extremums in the optimization process. In the final protein structure prediction, this model uses the method of embedding AlphaFold to predict the results.

The criteria for evaluating the feasibility of the project are: after entering the genetic sequence in a certain plasmid measured in the experiment, the corresponding protein and its structure are predicted, and the experimental design is feasible if the final result (taking into account the bond angle, spacing, and amino acid type) matches the designed antibody with a high probability. By entering the sequence constructed in the experiment, the result feasibility is predicted.

Figure 19 Feasibility analysis process

4 Binding of antigens and antibodies

The selection process of antibodies can be with the assistance of deep learning models, in the process of selecting the best antigen and antibody binding, the binding energy and chemical pairing relationship are synthesized, and finally the binding method with the highest score and the corresponding protein structure are selected (see the Deep Learning Section for details), this chapter only discusses the binding probability of antigens and antibodies under the premise of selecting antibodies. Antigen-antibody binding is essentially a ligand receptor binding problem, consider the following chemical equation:

$$ A + B \leftrightharpoons AB \tag{52} $$

The extent to which the reaction proceeds can be described by the dissociation constant:

$$ K_d = \frac{[A][B]}{[AB]} \tag{53} $$

Since the probability of antigen-antibody binding is occupied by the receptor rate, the probability of the complex occupying the whole is:

$$ p = \frac{[A][B]}{[AB]} = \frac{\frac{[A]}{K_d}}{1 + \left( \frac{[A]}{K_d} \right)} \tag{54} $$

Considering the solution lattice model, i.e., $\Phi$ ligands are randomly distributed in $\Omega$ lattices, according to the Boltzmann distribution function:

$$ Z(\Phi, \Omega) = \sum e^{-\beta \Phi \varepsilon_{\text{sol}}} + e^{-\beta \varepsilon_b} \sum e^{-\beta(\Phi-1)\varepsilon_{\text{sol}}} \tag{55} $$

The probabilities can be deduced as [6]:

$$ p = \frac{e^{-\beta \varepsilon _b}\frac{\Omega^{\Phi-1}}{(\Phi-1)!} e^{-\beta(\Phi-1)\varepsilon_{\text{sol}}}} {\frac{\Omega ^ \Phi}{(\Phi)!} e ^{-\beta \Phi \varepsilon_{\text{sol}}} + e^{-\beta\varepsilon_b} \frac{\Omega^{\Phi-1}}{(\Phi-1)!} e^{-\beta(\Phi-1)\varepsilon_{\text{sol}}}} \tag{56} $$

where $\varepsilon_b$ is the ligand-receptor binding energy and $\varepsilon_{\text{sol}}$ is the energy of the ligand in solution, comparing (54)(56) two equations:

$$ K_d = \frac{1}{v} e^{\beta \Delta \varepsilon} \tag{57} $$

Taking $k_b = 1.38 \times 10^{-23} J/(K \cdot T)$, selecting the unit volume of $1nm^3$, $\varepsilon _b - \varepsilon_{\text{sol}} \approx -10 k_bT$ in the lattice model, it can be seen from the results that 95% binding rate can be achieved in a solution with a concentration of $10^1$ order of magnitude, and compared with different binding energy differences, the greater the binding energy difference, indicating that the antigen-antibody binding is more specific, the more stable the binding, and the lower the concentration required to achieve 95% binding rate. In this project, our antibodies have high specificity, and the reference concentration provided in the modeling can be selected to maximize antibody efficiency.

Figure 20 The probability varies with concentration

Reference

[1] HE Zhenrui, HUANG Xiaotong, SHU Canwei, ZHOU Erxun. Research Progress on Mycoviruses of Magnaporthe Oryza sativa[J]. Journal of Tropical Biology, 2021, 12(3): 385-392.

[2] earth :: a global map of wind, weather, and ocean conditions[EB/OL]. [2023-08-20]. https://earth.nullschool.net.

[3] Berselli L C. Mathematics of large eddy simulation of turbulent flows[M]. Berlin ; New York: Springer, 2006.

[4] Scherm H. On the velocity of epidemic waves in model plant disease epidemics[J]. Ecological Modelling, 1996, 87(1-3): 217-222.

[5] LE Mei-wang, CHEN Shiß, PAN Qing-hua, ZENG Ren-sen, LUO Shi-ming. A Review of Infecting Approaches of Rice Blast Fungus Magnaporthe grisea[J]. Acta Agriculturae Jiangxi, 2006, 18(2):101~105.

[6] Phillips R. Physical biology of the cell[M]. Second edition. London : New York, NY: Garland Science, 2013.

[7] Townshend R J L, Eismann S, Watkins A M, et al. Geometric deep learning of RNA structure[J]. Science, 2021, 373(6558): 1047-1051.

[8] Katsoulas N, Stanghellini C. Modelling Crop Transpiration in Greenhouses: Different Models for Different Applications[J]. Agronomy, 2019, 9(7): 392.

[9] Kærn M, Elston T C, Blake W J, et al. Stochasticity in gene expression: from theories to phenotypes[J]. Nature Reviews Genetics, 2005, 6(6): 451-464.

[10] Senior A W, Evans R, Jumper J, et al. Improved protein structure prediction using potentials from deep learning[J]. Nature, 2020, 577(7792): 706-710.

[11] Finn C, Abbeel P, Levine S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks[M]. arXiv, 2017.