Overview

The metabolic processes in living organisms inherently possess stochastic elements and intrinsic noise, rendering the attainment of a strictly analytical or precisely deterministic solution a formidable challenge in the realm of biomathematics. In response to this complexity, biologists and mathematicians frequently employ kinetics and statistics as powerful tools to establish mathematical models that encapsulate the dynamics of biochemical reactions ^[1]. In the context of elucidating the operational intricacies of the formaldehyde metabolism pathways and NCD synthesis within E. coli, and, more significantly, in comprehending the ramifications of alterations in the efficiency and activity of the enzymes governing these processes on the system's overall behaviour, we advocate for an in-depth, bottom-up approach. A kinetic modelling framework, grounded in the metabolic pathway, is essential for scrutinizing this dynamic system.

Biochemical Description and Analysis

In the examination of the NCD-dependent formaldehyde metabolic system, it is imperative to adopt a biochemist's perspective. Complex biochemical networks are conventionally employed to represent intricate metabolic pathways, and herein, we provide a structural formula-based representation of the NCD-dependent formaldehyde pathway (Fig. 1). This figurative framework enables us to dissect the system's progression systematically. The influx of formaldehyde into the cellular matrix is driven by concentration gradients, and, guided by our reengineered metabolic pathway, it undergoes a transformation within the cell, effectively functioning as a formaldehyde processing conduit, akin to a specialized pipeline.

Figure 1. The Formaldehyde redisgn metabolic networks.

Biochemical description

In a biochemical context, the progression of formaldehyde metabolism unfolds as follows:

Exogenous Formaldehyde Passive Diffusion: The initial stage entails the passive diffusion of formaldehyde (HCHO), a small hydrophilic molecule, across the cell membrane driven by concentration gradients. This diffusion process is reversible and is distinguished by subscripts ex and in denoting the formaldehyde's presence in distinct cellular compartments.

$\ce{Formaldehyde_{ex} <=> Formaldehyde_{in}}$

Oxidation of Cellular Formaldehyde: Following diffusion, formaldehyde encounters oxidation catalyzed by formaldehyde dehydrogenase (FADH). This catalytic reaction results in the conversion of hazardous formaldehyde into formate. Notably, the non-natural cofactor NCD, an artificial analog of NAD, plays a pivotal role as an electroreceptor in facilitating this reaction. FADH catalyzes this reaction, which can be represented as a reversible process.
$\ce{Formaldehyde + NCD+ + H2O <=>[FADH] Formate + NCDH + H+}$
A structural comparison between NAD (Nicotinamide Adenine Dinucleotide) and NCD (Nicotinamide Cytosine Dinucleotide) is depicted in Fig. 2, revealing the distinctions marked by cyan and tawny rectangles. While NCD and NAD share a common dinucleotide backbone covalently linked by pyrophosphate bonds, their different functional nitrogenous bases impart varying biochemical roles in vivo ^[2].

Figure 2. The structure difference between NAD and NCD.

Oxidation of Formate: Formate, the product of formaldehyde oxidation, undergoes further transformation into carbon dioxide, with this reaction also being dependent on NCD. This process involves an artificially mutated formate dehydrogenase (FDH*), distinguished by an asterisk, indicating differences from the wild-type FDH.
$\ce{Formate + NCD+ <=>[FDH^{*}] CO2 + NCDH + H+}$

Incorporating Downstream Products: The challenge of incorporating the downstream product of formaldehyde into the existing natural metabolic pathways is addressed by guiding carbon dioxide (CO2) towards the Tricarboxylic Acid Cycle (TCA) through a series of steps facilitated by a mutant-type malic enzyme (ME*). By bibliographic retrieval, we find that this coupled reaction can be divided into two oxaloacetate-involved reactions ^[2,3,4]. This transformation of CO2 into malate is achieved through a progression known as the "Hydrocarboxylation of Pyruvate."
$\ce{Pyruvate + CO2 + NCDH + H+ ->[ME^*] Malate + NCD+}$
This complex reaction comprises two distinct sub-reactions:
a. Carboxylation of Pyruvate: In the first sub-reaction, pyruvate is carboxylated, yielding oxaloacetate intermediates. This step enables the immobilization of carbon dioxide.
$\ce{Pyruvate + CO2 ->[ME^*] Oxaloacetate}$
b. Reduction of Oxaloacetate Intermediate: In the second sub-reaction, the oxaloacetate intermediate is reduced, with the artificial proton donor NCDH facilitating the reaction and regenerating the oxidized form of NCD. The primary product, malate, participates in various biochemical processes in vivo, including the TCA cycle. Significantly, this redesigned formaldehyde metabolic pathway presents an eco-friendly approach for malate production, reducing the reliance on hydrogen (H₂) as seen in conventional industrial processes ^[4].
$\ce{Oxaloacetate + NCDH + H+ ->[ME^*] Malate + NCD+}$

The mechanism of hydrocarboxlyation reaction

In the elucidation of the hydrocarboxylation reaction mechanism, we present a rational model based on a combination of structural analysis and comprehensive bibliographic research ^[2,3,4,5]. The process commences with pyruvate expelling a proton under the influence of a Brønsted base, leading to the conjugation of the lone pair of the γ-carbon with the β-carbonyl group. Employing resonance structures, the enolate anion isomer arises from the carbonyl anion, offering enhanced suitability for nucleophilic attacks. Subsequently, the enolate form of the pyruvate anion participates in a nucleophilic addition with carbon dioxide originating from the formaldehyde oxidation pathway, resulting in the formation of the oxaloacetate anion. This oxaloacetate anion then undergoes further reduction to yield the malate anion, all facilitated by the enzymatic action of ME^*. Finally, the malate anion is protonated to attain a stable species, culminating in the hydrocarboxylation reaction (Fig. 3).

Figure 3. The reaction mechanism of the pyruvate hydrocarboxylation.

Chemical Kinetic Model

Model establishment and analysis

Kinetic analysis stands as a time-honoured and highly effective approach for the anticipation and computational simulation of intricate biochemical networks. In the context of our study, the rate variables within this system, e.g., formaldehyde, NCD, CO₂, are no longer represented directly but have instead been replaced by a combination of concentrations for specific chemical species. This transformation allows us to embark on the crucial task of simulating the dynamic behaviour of the newly redesigned metabolic network (Fig. 4).

Figure 4. The chemical kinetic pathway of formaldehyde metabolism.

$s$ $[S]_1,[S]_2,[S]_i,...,[S]_s$ $p$ $[P]_1,[P]_2,[P]_j,...,[P]_p$ , we have

$\ce{S_1 + S_2 + ... + S_{\mathit{s}} <=>[\mathit{v_{+}}][\mathit{v_{-}}] P_1 + P_2 + ... + P_{\mathit{p}}}$

$v=v_{+}-v_{-}=k_{+}\prod_{i=1}^{s}S^{n_i}_i - k_{-}\prod_{j=1}^{p}P^{n_j}_j,$

$n_i$ $n_j$ $S_i$ $P_j$ respectively. We can describe the dynamics of the HCHO metabolic pathway by the linear combination of rates, here is a set of ordinary differential equations assigned to particular species as follows.

$\frac{\mathrm{d\left[Formaldehyde\right]_{ex}}}{\mathrm{d}t}=-v^{(0)}_{+}+v^{(0)}_{-}$

$\frac{\mathrm{d\left[Formaldehyde\right]_{in}}}{\mathrm{d}t}=v^{(0)}_{+}-v^{(0)}_{-}-v^{(1)}_{+}+v^{(1)}_{-}$

$\frac{\mathrm{d\left[CO_{2}\right]}}{\mathrm{d}t}=v^{(2)}_{+}-v^{(2)}_{-}-v^{(3)}$

$\frac{\mathrm{d\left[NCD^{+}\right]}}{\mathrm{d}t}=-\frac{\mathrm{d\left[NCDH\right]}}{\mathrm{d}t}=-v^{(1)}_{+}+v^{(1)}_{-}-v^{(2)}_{+}+v^{(2)}_{-}+v^{(3)}$

$\frac{\mathrm{d\left[Malate\right]}}{\mathrm{d}t}=v^{(3)}-v^{(4)}+v^{(5)}$

Assign law of mass action to the above system of ODE, we have

$v^{(0)}_{+}=k^{(0)}_{+}[\mathrm{Formaldehyde_{ex}}]$

$v^{(0)}_{-}=k^{(0)}_{-}[\mathrm{Formaldehyde_{in}}]$

$v^{(1)}_{+}=k_1[\mathrm{Formaldehyde_{in}}][\mathrm{NCD}]$

$v^{(1)}_{-}=k_{-1}[\mathrm{Formate}][\mathrm{NCDH}]$

$v^{(2)}_{+}=k_2[\mathrm{Formate}][\mathrm{NCD}]$

$v^{(2)}_{-}=k_{-2}[\mathrm{CO_2}][\mathrm{NCDH}]$

$v^{(3)}=k_3[\mathrm{CO_2}][\mathrm{Pyruvate}][\mathrm{NCDH}],$

$k^{(i)}_{+}$ $k^{(i)}_{-}$ denote the reaction rate constant of the forward reaction and reverse reaction respectively. To accomplish this, we capitalize on the capabilities of the SimBiology tool from the MathWorks toolbox. Notably, SimBiology is a tool commonly employed in the dynamic modelling of pharmacokinetics, an area that exhibits significant parallels with the characteristics of metabolic networks. Consequently, it becomes evident that SimBiology can yield precise and numerical outcomes in our analytical pursuits, as is further substantiated in our subsequent model optimization endeavours. This choice of methodology not only affords a comprehensive understanding of the system but also facilitates model refinement, contributing to the robustness and accuracy of our analysis (Fig. 5).

Figure 5. The construction of SimBiology simulation system.

Biochemical network simulation using SimBiology

Fig. 6 consists of five subgraphs (a)-(e) that collectively elucidate the dynamics of various chemical species over time or other species. This metabolic system behaves in the temporal evolution of concentrations for multiple interacting species (Fig. 6a), and a notable trend emerges, where the concentration of environmental/cytoplasmic formaldehyde decreases progressively with time. This decline is juxtaposed with the behaviour of other species, which exhibit a distinctive peak-shaped pattern, except for the constitutive expressed NCD. These species show an initial increase in concentration, followed by a subsequent decrease. This unique trend underscores the intricate and dynamic nature of the chemical interactions within the system, revealing a delicate balance of processes that drive these changes over time.

In contrast, Fig. 6b-e delves into the phase diagrams, illustrating the interdependencies and fluctuations of specific species concerning the concentrations of environmental formaldehyde, cytoplasmic formaldehyde, malate, and NCD. The phase graphs offer a comprehensive overview of the relationships and trends among these chemical components, facilitating a deeper understanding of their interactions and potentially uncovering critical insights into the system under study.

Figure 6. The simulation results of formaldehyde metabolic network.

Local sensitivity analysis of the biochemical network

$S\left(\mathrm{d\left[Formaldehyde\right]_{ex}}, k\right)$ $k^{(0)}_{+},k^{(0)}_{-}$ ) value are the time-dependent derivatives.

$S\left(\mathrm{d\left[Formaldehyde\right]_{ex}},k^{(0)}_{+}\right)=\frac{\partial\mathrm{d\left[Formaldehyde\right]_{ex}}}{\partial k^{(0)}_{+}}\\ S\left(\mathrm{d\left[Formaldehyde\right]_{ex}},k^{(0)}_{-}\right)=\frac{\partial\mathrm{d\left[Formaldehyde\right]_{ex}}}{\partial k^{(0)}_{-}}\\ ...\\ S\left(\mathrm{d\left[Formaldehyde\right]_{ex}},k_{4}\right)=\frac{\partial\mathrm{d\left[Formaldehyde\right]_{ex}}}{\partial k_{4}},$

$\partial \mathrm{d\left[Formaldehyde\right]_{ex}}$ $\partial k$ are the inputs to sensitivity analysis. local sensitivity analysis (LSA) was conducted using SimBiology, a platform that combines ordinary differential equation (ODE) solvers with complex-step approximation techniques. This approach allowed us to explore the system's sensitivity to parameter variations.

In conjunction with our previous analysis of dynamic trends of the metabolism system, we performed a local sensitivity analysis using SimBiology. The results revealed crucial insights into the system's behaviour. Specifically, we observed that the rate constants kf_0 and kr_0 exert a significant influence on the concentration of environmental formaldehyde, underscoring the system's dependency on diffusion progress rates (Fig. 7a). Furthermore, cellular formaldehyde concentrations exhibited high sensitivity to parameters kf_0, kr_0, and kf_ncd, highlighting the pivotal role of in vivo expression of the non-natural cofactor NCD in regulating cellular formaldehyde levels, particularly by facilitating downstream oxidation (Fig. 7c). Notably, the versatile cofactor NCD displayed heightened sensitivity to parameters kf_ncd, kf_0, kr_0 (partially), and kf_1, suggesting optimization prospects for endogenous NCD expression and the potential introduction of formaldehyde membrane transporters to enhance NCD's efficiency (Fig. 7d).

Fig. 7b shows the parameters involved in the LSA and the schematics of the external formaldehyde metabolic pathway. These LSA findings provide valuable insights for understanding and optimizing the system's behavior, offering opportunities for further research in various application domains.

Figure 7. The local sensitivity analysis of formaldehyde metabolic network.

In Fig. 8, we present the temporal evolution profiles of external formaldehyde, internal formaldehyde, and NCD under varying parameter conditions. To comprehensively understand the system's dynamics, we executed parameter iterations for kf_0, kr_0, kf_1, and kf_ncd. Remarkably, these iterations, performed under distinct initial conditions characterized by varying kinetic constants, led to discernable distinctions in the concentration dynamics of the species of interest. This observed variability in behaviour aligns seamlessly with the outcomes of our prior investigation into local sensitivity, as described earlier. These findings underscore the significant influence of parameter variations on the system's behaviour, offering valuable insights into its intricate dynamics and the potential for optimization.

Figure 8. Parametric scanning results for the species of interest

Molecular Model

The aim of our project is to metabolize formaldehyde through the use of modified enzymes capable of utilizing the non-natural coenzyme NCD. Consequently, conducting molecular docking simulations with these pertinent enzymes is of paramount importance for our forthcoming wet-lab experiments. Furthermore, successful molecular docking serves as an additional validation of the effectiveness of our bioengineering efforts.

Method & Results

Receptor	Ligand
Ncds-2	CTP
Ncds-2	NMN

Figure 9. Molecular docking diagram of Ncds-2 and NMN. The NMN molecule engages in hydrogen bonding interactions with residues G9, T11, F12, H16, G107, D109, F177, and I179.

Figure 10. Molecular docking diagram of Ncds-2 and CTP. The CTP molecule engages in hydrogen bonding interactions with residues G9, I33, N40, R42, and S11.

Weighted RMSD

$N$ $A=\left\{a_{i}\right\}_{N}$ $w=\left\{w_{i}\right\}_{N}$ $N$ $A$ $A^{\prime}$ $\left\{a_{i}\right\}_{N}$ $\left\{a_{i}^{\prime}\right\}_{N}$ , which represent two different conformations of a molecule, we can establish the weighted RMSD between them ^[6] .

$\operatorname{RMSD}\left(A, A^{\prime}\right)^{2}=\frac{1}{W} \sum_{i} w_{i}\left|\mathbf{a}_{i}-\mathbf{a}_{i}^{\prime}\right|^{2}$

$W$ $W=\sum_{i} w_{i}$ $\left\{w_{i}\right\}_{N}$ $W$ $W=N$ ).

We use Autodock4 for molecular docking to simulate the affinity of the enzyme for various small molecules.

Quaternion arithmetic

$Q$ $\mathrm{s}$ $\mathbf{q}=\left\{q_{x}, q_{y}, q_{z}\right\}^{T}$ $Q$ $[s, q]$ $\hat{Q}$ , which has a unit norm. Quaternion algebra includes various operations such as multiplication, division, inversion, and norm calculation. A brief overview of quaternion arithmetic can be found in our previous work ^[7].

The rigid-body motion case

^[7] $a_{i}=\left\{x_{i}, y_{i}, z_{i}\right\}^{T}$ $A^{\prime}=\left\{\mathbf{a}_{i}^{\prime}\right\}_{N}$ $\mathbf{a}_{i}^{\prime}=\mathbf{R a}_{i}+\mathbf{T}$ $\mathbf{R}$ $3 \times 3$ $\mathbf{T}$ $Q=[s, q]$ $\mathbf{R}$ $A$ $A^{\prime}$ , the transformed positions, can be written according to equation (Available in a centroid reference frame of C=0, Less arithmetic expressions) from Rapid determination of RMSDs corresponding to macromolecular rigid body motions as follows:

$\operatorname{RMSD}^{2}=\mathbf{T}^{2}+\frac{4}{W} \mathbf{q}^{T} \mathbf{I q}+2 \mathbf{T}^{T}\left(\mathbf{R}-\mathbf{E}_{3}\right) \mathbf{C}$

$\mathbf{E_{3}}$ $3 \times 3$ $C$ $\frac{1}{W}\left\{\sum w_{i} x_{i}, \sum w_{i} y_{i}, \sum w_{i} z_{i}\right\}^{T}$ , and I denotes the inertia tensor.

\begin{matrix} I = (\begin{array}{ccc} \sum w_{i} (y_{i}^{2} + z_{i}^{2}) & - \sum w_{i} y_{i} z_{i} & - \sum w_{i} x_{i} z_{i} \\ - \sum w_{i} x_{i} y_{i} & \sum w_{i} (x_{i}^{2} + z_{i}^{2}) & - \sum w_{i} y_{i} z_{i} \\ - \sum w_{i} x_{i} z_{i} & - \sum w_{i} y_{i} z_{i} & w_{i} (x_{i}^{2} + y_{i}^{2}) \end{array}) \end{matrix}

$C=0$ . Thus, in this frame, the RMSD can be expressed with fewer arithmetic operations as

$\mathrm{RMSD}^{2}=\mathbf{T}_{\text{COM}}^{2}+\frac{4}{W}\mathbf{q}^{T} \mathbf{I}_{\text{COM}}\mathbf{q}$

RMSD for flexible molecules modelled with collective motions

^[7] $\left\{f_{i}^{j}\right\}_{N}^{M}$ $M$ $\mathbf{f}_{i}^{j}=$ $\left\{f_{i x}^{j}, f_{i y}^{j}, f_{i z}^{j}\right\}^{T}$ $\mathrm{i}$ $N$ $\mathrm{j}$ $M$ $\left\{\mu^{j}\right\}^{M}$ $A^{F}=\left\{a_{i}^{F}\right\}_{N}$ $\mathbf{a}_{i}^{F}=\mathbf{a}_{i}+\sum_{j=1}^{M} \mu^{j} \mathbf{f}_{i}^{j}$ $A^{\prime F}$ $\left\{\lambda^{j}\right\}^{M}$ $\mathbf{R}$ $\mathbf{T}$ $\left\{\mathbf{a}_{i}^{\prime F}\right\}_{N}$ $\mathbf{a}_{i}^{\prime F}=\mathbf{R}\left(\mathbf{a}_{i}+\sum_{j=1}^{M} \lambda^{j} \mathbf{f}_{i}^{j}\right)+\mathbf{T}$ $\mathbf{A}^{F}$ $\mathbf{A}^{\prime F}$ is given as follows:

$\operatorname{RMSD}^{2}\left(A^{F}, A^{\prime F}\right)=\frac{1}{W} \sum_{i} w_{i}\left|\mathbf{a}_{i}+\sum_{j} \mu^{j} \mathbf{f}_{i}^{j}-\mathbf{R}\left(\mathbf{a}_{i}+\sum_{j} \lambda^{j} \mathbf{f}_{i}^{j}\right)-\mathbf{T}\right|^{2}$

$\mathbf{a}_{i}, \mathbf{T}$ $\mathbf{R}$ as

$\operatorname{RMSD}^{2}=\frac{1}{W} \sum_{i} w_{i}\left|\left[0, \mathbf{a}_{i}+\sum_{j} \mu^{j} \mathbf{f}_{i}^{j}\right]-\hat{Q}\left[0, \mathbf{a}_{i}+\sum_{j} \lambda^{j} \mathbf{f}_{i}^{j}\right] \hat{Q}^{-1}-[0, \mathbf{T}]\right|^{2}$

$\hat{Q}$ $\mathbf{R}$ $\hat{Q}$ to obtain

$\operatorname{RMSD}^{2}=\frac{1}{W} \sum_{i} w_{i}\left|\left[0, \mathbf{a}_{i}+\sum_{j} \mu^{j} \mathbf{f}_{i}^{j}\right] \hat{Q}-\hat{Q}\left[0, a_{i}+\sum_{j} \lambda^{j} \mathbf{f}_{i}^{j}\right]-[0, \mathbf{T}] \hat{Q}\right|^{2}$

$\hat{Q}=[s, q]$ , we rewrite the previous RMSD expression as

\begin{aligned} {RMSD}^{2} = \frac{1}{W} \sum_{i} w_{i} [q \cdot (T + \sum_{j} λ^{j} f_{i}^{j} - \sum_{j} μ^{j} f_{i}^{j}) \\ - s (T + \sum_{j} λ^{j} f_{i}^{j} - \sum_{j} μ^{j} f_{i}^{j}) \\ {+ (2 a_{i} - T + \sum_{j} μ^{j} f_{i}^{j} + \sum_{j} λ^{j} f_{i}^{j}) \times q]}^{2} \end{aligned}

$\mathbf{I}$ $\mathbf{C}$ $\mathbf{R}$ , we obtain

\begin{aligned} {RMSD}^{2} = T^{2} + \frac{4}{W} q^{T} I q + 2 T^{T} (R - E_{3}) C \\ - 2 \sum_{j} μ^{j} T^{T} B^{j} + \sum_{j k} μ^{j} μ^{k} Tr (F^{j k}) \\ - 2 \sum_{j} μ^{j} Tr ({(R - E_{3})}^{T} D^{j}) \\ + 2 \sum_{j} λ^{j} T^{T} {R B}^{j} + \sum_{j k} λ^{j} λ^{k} Tr (F^{j k}) \\ - 2 \sum_{j} λ^{j} Tr ((R - E_{3}) D^{j}) - 2 \sum_{j} \sum_{k} λ^{j} μ^{k} Tr (R F^{j k}) \\ {RMSD}^{2} = \frac{1}{w} \sum_{i} w_{i} [q \cdot (T + \sum_{j} λ^{j} f_{i}^{j} - \sum_{j} μ^{j} f_{i}^{j}) \\ - s (T + \sum_{j} λ^{j} f_{i}^{j} - \sum_{j} μ^{j} f_{i}^{j}) \\ {+ (2 a_{i} - T + \sum_{j} μ^{j} f_{i}^{j} + \sum_{j} λ^{j} f_{i}^{j}) \times q]}^{2} \end{aligned}

$\operatorname{Tr}()$ $D^{j}$ $M 3 \times 3$ matrices of cross-products

\begin{matrix} D^{j} = \frac{1}{W} (\begin{aligned} \sum w_{i} x_{i} f_{i x}^{j} & \sum w_{i} y_{i} f_{i x}^{j} & \sum w_{i} z_{i} f_{i x}^{j} \\ \sum w_{i} x_{i} f_{i y}^{j} . & \sum w_{i} y_{i} f_{i y}^{j} & \sum w_{i} z_{i} f_{i y}^{j} \\ \sum w_{i} x_{i} f_{i z}^{j} & \sum w_{i} y_{i} f_{i z}^{j} & \sum w_{i} z_{i} f_{i z}^{j} \end{aligned}) \end{matrix}

$M^{2} 3 \times 3$ matrices of cross-products

\begin{matrix} F^{j k} = \frac{1}{W} (\begin{aligned} \sum w_{i} f_{i x}^{k} f_{i x}^{j} & \sum w_{i} f_{i y}^{k} f_{i x}^{j} & \sum w_{i} f_{i z}^{k} f_{i x}^{j} \\ \sum w_{i} f_{i x}^{k} f_{i y}^{j} & \sum w_{i} f_{i y}^{k} f_{i y}^{j} & \sum w_{i} f_{i z}^{k} f_{i y}^{j} \\ \sum w_{i} f_{i x}^{k} f_{i z}^{j} & \sum w_{i} f_{i y}^{k} f_{i z}^{j} & \sum w_{i} f_{i z}^{k} f_{i z}^{j} \end{aligned}) \end{matrix}

$\mathbf{B}^{j}=\frac{1}{W}\left\{\sum w_{i} f_{i x}^{j}, \sum w_{i} f_{i y}^{j}, \sum w_{i} f_{i z}^{j}\right\}^{T}$ $\mathrm{C}=0$ $\operatorname{Tr}\left(\mathbf{F}^{j k}\right)=\delta_{j k} / W$ . From now on, we will consider only this type of motions. Thus, in this case, the RMSD equation simplifies to

\begin{array}{cc} {RMSD}^{2} = T^{2} + \frac{4}{W} q^{T} I q + \frac{1}{W} \sum_{j} (μ^{j 2} + λ^{j 2}) \\ - 2 \sum_{j} μ^{j} T^{T} B^{j} - 2 \sum_{j} μ^{j} Tr ({(R - E_{3})}^{T} D^{j}) \\ + 2 \sum_{j} λ^{j} T^{T} {R B}^{j} - 2 \sum_{j} λ^{j} Tr ((R - E_{3}) D^{j}) \\ - 2 \sum_{j k} λ^{j} μ^{k} Tr (R F^{j k}) \end{array}

$\mathbf{T}^{2}+\frac{4}{W} \mathbf{q}^{T} \mathbf{I q}$ $\frac{1}{W} \sum_{j}\left(\mu^{j 2}+\lambda^{j 2}\right)$ $\mathbf{I},\mathbf{B}^{j},\mathbf{D}^{j}$ $\mathbf{F}^{j k}$ that depend on the number of atoms and the number of collective motions are computed, the calculation time of RMSD is independent of the number of atoms and is at most quadratic with the number of collective motions. In the following, we will explicitly consider several special cases of simplified motions that also simplify the master equation and reduce its computational cost.

RMSD corresponding to a rigid reference conformation

$\left\{\mu^{j}\right\}^{M}$ are zero, and the RMSD expression simplifies to the following:

\begin{array}{lc} {RMSD}^{2} = T^{2} + \frac{4}{W} q^{T} I q + \frac{1}{W} \sum_{j} λ^{j 2} + 2 \sum_{j} λ^{j} T^{T} {R B}^{j} - 2 \sum_{j} λ^{j} Tr ((R - E_{3}) D^{j}) \end{array}

$M$ .

RMSD corresponding to a pure flexible motion

When the study is limited to the flexible movements of a molecule, such as in the refinement of docking poses or the generation of pseudo-random structural ensembles, the master equation becomes simplified

$\operatorname{RMSD}^{2}=\frac{1}{W} \sum_{j}\left(\mu^{j}-\lambda^{j}\right)^{2}$

$\mathbf{R}$ $\mathbf{T}$ $M$ .

RMSD corresponding to the relative rigid-body motion

$A=\left\{a_{i}\right\}_{N}$ $A_{1}=\left\{a_{i}^{(1)}\right\}_{N}$ $A_{2}=\left\{a_{i}^{(2)}\right\}_{N}$ $\left\{\lambda^{j}\right\}^{M}$ $A_{1}$ $\left\{\mu^{j}\right\}^{M}$ $A_{2}$ $R_{1}$ $T_{1}$ $A_{1}$ $R_{2}$ $T_{2}$ $\mathrm{A}$ $A_{2}$ $\left[s_{12}, q_{12}\right]$ $\mathbf{R}_{12} \equiv \mathbf{R}_{2}^{T} \mathbf{R}_{1}$ $\mathbf{T}_{12} \equiv \mathbf{R}_{1}^{T}\left(\mathbf{T}_{2}-\mathbf{T}_{1}\right)$ $A_{1}$ $A_{2}$ . is given by a generalized version of the master equation as follows:

\begin{matrix} {RMSD}^{2} = {(T_{2} - T_{1})}^{2} + \frac{4}{W} q_{12}^{T} I q_{12} + \frac{1}{W} \sum_{j} (μ^{j 2} + λ^{j 2}) \\ - 2 \sum_{j} μ^{j} T_{12}^{T} B^{j} - 2 \sum_{j} μ^{j} Tr ({(R_{r} - E_{3})}^{T} D^{j}) \\ + 2 \sum_{j} λ^{j} T_{12}^{T} R_{12} B^{j} - 2 \sum_{j} λ^{j} Tr ((R_{12} - E_{3}) D^{j}) \\ - 2 \sum_{j k} λ^{j} μ^{k} Tr (R_{12} F^{j k}) \end{matrix}

Computational analysis revealed that the Root Mean Square Deviation (RMSD) of the CTP molecule in relation to the Ncds-2 protein is 2.68, considering 43 to 43 atoms. In contrast, the RMSD of the NMN molecule with respect to the Ncds-2 protein is 2.32, considering 30 to 30 atoms.

Conclusion

$\mathrm{NCD}$ , thereby providing valuable theoretical guidance for subsequent wet-lab experiments.

Reference

[1] Raj, A., & van Oudenaarden, A. (2008). Nature, Nurture, or Chance: Stochastic Gene Expression and Its Consequences. Cell, 135(2), 216–226.

[2] Guo, X., Liu, Y., Wang, Q., Wang, X., Li, Q. X., Liu, W-J., & Zhao, Z. K. (2020). Non-natural Cofactor and Formate‐Driven Reductive Carboxylation of Pyruvate. Angewandte Chemie International Edition, 59(8), 3143–3146.

[3] Ji, D., Wang, L., Hou, S., Liu, W., Wang, J., Wang, Q., & Zhao, Z. K. (2011). Creation of Bioorthogonal Redox Systems Depending on Nicotinamide Flucytosine Dinucleotide. Journal of the American Chemical Society, 133(51), 20857–20862.

[4] Wang, J., Guo, X., Wan, L., Liu, Y., Xue, H., & Zhao, Z. K. (2022). Engineering Formaldehyde Dehydrogenase from Pseudomonas putida to Favor Nicotinamide Cytosine Dinucleotide. Chembiochem: A European Journal of Chemical Biology, 23(7), e202100697.

[5] Zelle, RM., de Hulster, E., van Winden, WA., de Waard, P., Dijkema, C., Winkler, AA., Geertman, J-MA., van Dijken, JP., Pronk, JT., & van Maris, AJA. (2008). Malic Acid Production by Saccharomyces cerevisiae: Engineering of Pyruvate Carboxylation, Oxaloacetate Reduction, and Malate Export. Applied and Environmental Microbiology, 74(9), 2766–2777.

[6] Borda-Molina, D., José Salvador Montaña, María Mercedes Zambrano, & Baena, S. (2017). Mining lipolytic enzymes in community DNA from high Andean soils using a targeted approach. Antonie Van Leeuwenhoek International Journal of General and Molecular Microbiology, 110(8), 1035–1051.

[7] Neveu, E., Popov, P., Hoffmann, A., Migliosi, A., Besseron, X., Grégoire Danoy, Pascal Bouvry, & Sergei Grudinin. (2018). RapidRMSD: rapid determination of RMSDs corresponding to motions of flexible molecules. Bioinformatics, 34(16), 2757–2765.