The metabolic processes in living organisms inherently possess stochastic elements and intrinsic noise, rendering the attainment of a strictly analytical or precisely deterministic solution a formidable challenge in the realm of biomathematics. In response to this complexity, biologists and mathematicians frequently employ kinetics and statistics as powerful tools to establish mathematical models that encapsulate the dynamics of biochemical reactions [1]. In the context of elucidating the operational intricacies of the formaldehyde metabolism pathways and NCD synthesis within E. coli, and, more significantly, in comprehending the ramifications of alterations in the efficiency and activity of the enzymes governing these processes on the system's overall behaviour, we advocate for an in-depth, bottom-up approach. A kinetic modelling framework, grounded in the metabolic pathway, is essential for scrutinizing this dynamic system.

Biochemical Description and Analysis

In the examination of the NCD-dependent formaldehyde metabolic system, it is imperative to adopt a biochemist's perspective. Complex biochemical networks are conventionally employed to represent intricate metabolic pathways, and herein, we provide a structural formula-based representation of the NCD-dependent formaldehyde pathway (Fig. 1). This figurative framework enables us to dissect the system's progression systematically. The influx of formaldehyde into the cellular matrix is driven by concentration gradients, and, guided by our reengineered metabolic pathway, it undergoes a transformation within the cell, effectively functioning as a formaldehyde processing conduit, akin to a specialized pipeline.

Figure 1. The Formaldehyde redisgn metabolic networks.

Biochemical description

In a biochemical context, the progression of formaldehyde metabolism unfolds as follows:

  1. Exogenous Formaldehyde Passive Diffusion: The initial stage entails the passive diffusion of formaldehyde (HCHO), a small hydrophilic molecule, across the cell membrane driven by concentration gradients. This diffusion process is reversible and is distinguished by subscripts ex and in denoting the formaldehyde's presence in distinct cellular compartments.


  1. Oxidation of Cellular Formaldehyde: Following diffusion, formaldehyde encounters oxidation catalyzed by formaldehyde dehydrogenase (FADH). This catalytic reaction results in the conversion of hazardous formaldehyde into formate. Notably, the non-natural cofactor NCD, an artificial analog of NAD, plays a pivotal role as an electroreceptor in facilitating this reaction. FADH catalyzes this reaction, which can be represented as a reversible process.


    A structural comparison between NAD (Nicotinamide Adenine Dinucleotide) and NCD (Nicotinamide Cytosine Dinucleotide) is depicted in Fig. 2, revealing the distinctions marked by cyan and tawny rectangles. While NCD and NAD share a common dinucleotide backbone covalently linked by pyrophosphate bonds, their different functional nitrogenous bases impart varying biochemical roles in vivo [2].

Figure 2. The structure difference between NAD and NCD.

  1. Oxidation of Formate: Formate, the product of formaldehyde oxidation, undergoes further transformation into carbon dioxide, with this reaction also being dependent on NCD. This process involves an artificially mutated formate dehydrogenase (FDH*), distinguished by an asterisk, indicating differences from the wild-type FDH.


  1. Incorporating Downstream Products: The challenge of incorporating the downstream product of formaldehyde into the existing natural metabolic pathways is addressed by guiding carbon dioxide (CO2) towards the Tricarboxylic Acid Cycle (TCA) through a series of steps facilitated by a mutant-type malic enzyme (ME*). By bibliographic retrieval, we find that this coupled reaction can be divided into two oxaloacetate-involved reactions [2,3,4]. This transformation of CO2 into malate is achieved through a progression known as the "Hydrocarboxylation of Pyruvate."


    This complex reaction comprises two distinct sub-reactions:

    a. Carboxylation of Pyruvate: In the first sub-reaction, pyruvate is carboxylated, yielding oxaloacetate intermediates. This step enables the immobilization of carbon dioxide.


    b. Reduction of Oxaloacetate Intermediate: In the second sub-reaction, the oxaloacetate intermediate is reduced, with the artificial proton donor NCDH facilitating the reaction and regenerating the oxidized form of NCD. The primary product, malate, participates in various biochemical processes in vivo, including the TCA cycle. Significantly, this redesigned formaldehyde metabolic pathway presents an eco-friendly approach for malate production, reducing the reliance on hydrogen (H2) as seen in conventional industrial processes [4].


The mechanism of hydrocarboxlyation reaction

In the elucidation of the hydrocarboxylation reaction mechanism, we present a rational model based on a combination of structural analysis and comprehensive bibliographic research [2,3,4,5]. The process commences with pyruvate expelling a proton under the influence of a Brønsted base, leading to the conjugation of the lone pair of the γ-carbon with the β-carbonyl group. Employing resonance structures, the enolate anion isomer arises from the carbonyl anion, offering enhanced suitability for nucleophilic attacks. Subsequently, the enolate form of the pyruvate anion participates in a nucleophilic addition with carbon dioxide originating from the formaldehyde oxidation pathway, resulting in the formation of the oxaloacetate anion. This oxaloacetate anion then undergoes further reduction to yield the malate anion, all facilitated by the enzymatic action of ME*. Finally, the malate anion is protonated to attain a stable species, culminating in the hydrocarboxylation reaction (Fig. 3).

Figure 3. The reaction mechanism of the pyruvate hydrocarboxylation.

Chemical Kinetic Model

Model establishment and analysis

Kinetic analysis stands as a time-honoured and highly effective approach for the anticipation and computational simulation of intricate biochemical networks. In the context of our study, the rate variables within this system, e.g., formaldehyde, NCD, CO2, are no longer represented directly but have instead been replaced by a combination of concentrations for specific chemical species. This transformation allows us to embark on the crucial task of simulating the dynamic behaviour of the newly redesigned metabolic network (Fig. 4).

Figure 4. The chemical kinetic pathway of formaldehyde metabolism.

For a general reaction, s substrates with concentration [S]1,[S]2,[S]i,...,[S]s are converted into p products with concentration [P]1,[P]2,[P]j,...,[P]p, we have



where ni and nj define molecularities of substrate Si and product Pj respectively. We can describe the dynamics of the HCHO metabolic pathway by the linear combination of rates, here is a set of ordinary differential equations assigned to particular species as follows.








Assign law of mass action to the above system of ODE, we have








where k+(i) and k(i) denote the reaction rate constant of the forward reaction and reverse reaction respectively. To accomplish this, we capitalize on the capabilities of the SimBiology tool from the MathWorks toolbox. Notably, SimBiology is a tool commonly employed in the dynamic modelling of pharmacokinetics, an area that exhibits significant parallels with the characteristics of metabolic networks. Consequently, it becomes evident that SimBiology can yield precise and numerical outcomes in our analytical pursuits, as is further substantiated in our subsequent model optimization endeavours. This choice of methodology not only affords a comprehensive understanding of the system but also facilitates model refinement, contributing to the robustness and accuracy of our analysis (Fig. 5).

Figure 5. The construction of SimBiology simulation system.

Biochemical network simulation using SimBiology

Fig. 6 consists of five subgraphs (a)-(e) that collectively elucidate the dynamics of various chemical species over time or other species. This metabolic system behaves in the temporal evolution of concentrations for multiple interacting species (Fig. 6a), and a notable trend emerges, where the concentration of environmental/cytoplasmic formaldehyde decreases progressively with time. This decline is juxtaposed with the behaviour of other species, which exhibit a distinctive peak-shaped pattern, except for the constitutive expressed NCD. These species show an initial increase in concentration, followed by a subsequent decrease. This unique trend underscores the intricate and dynamic nature of the chemical interactions within the system, revealing a delicate balance of processes that drive these changes over time.

In contrast, Fig. 6b-e delves into the phase diagrams, illustrating the interdependencies and fluctuations of specific species concerning the concentrations of environmental formaldehyde, cytoplasmic formaldehyde, malate, and NCD. The phase graphs offer a comprehensive overview of the relationships and trends among these chemical components, facilitating a deeper understanding of their interactions and potentially uncovering critical insights into the system under study.

Figure 6. The simulation results of formaldehyde metabolic network.

Local sensitivity analysis of the biochemical network

The time-dependent sensitivities of environmental formaldehyde S(d[Formaldehyde]ex,k) with respect to each parameter (e.g. k+(0),k(0)) value are the time-dependent derivatives.


where the numerator d[Formaldehyde]ex is the output and the denominators k are the inputs to sensitivity analysis. local sensitivity analysis (LSA) was conducted using SimBiology, a platform that combines ordinary differential equation (ODE) solvers with complex-step approximation techniques. This approach allowed us to explore the system's sensitivity to parameter variations.

In conjunction with our previous analysis of dynamic trends of the metabolism system, we performed a local sensitivity analysis using SimBiology. The results revealed crucial insights into the system's behaviour. Specifically, we observed that the rate constants kf_0 and kr_0 exert a significant influence on the concentration of environmental formaldehyde, underscoring the system's dependency on diffusion progress rates (Fig. 7a). Furthermore, cellular formaldehyde concentrations exhibited high sensitivity to parameters kf_0, kr_0, and kf_ncd, highlighting the pivotal role of in vivo expression of the non-natural cofactor NCD in regulating cellular formaldehyde levels, particularly by facilitating downstream oxidation (Fig. 7c). Notably, the versatile cofactor NCD displayed heightened sensitivity to parameters kf_ncd, kf_0, kr_0 (partially), and kf_1, suggesting optimization prospects for endogenous NCD expression and the potential introduction of formaldehyde membrane transporters to enhance NCD's efficiency (Fig. 7d).

Fig. 7b shows the parameters involved in the LSA and the schematics of the external formaldehyde metabolic pathway. These LSA findings provide valuable insights for understanding and optimizing the system's behavior, offering opportunities for further research in various application domains.

Figure 7. The local sensitivity analysis of formaldehyde metabolic network.

In Fig. 8, we present the temporal evolution profiles of external formaldehyde, internal formaldehyde, and NCD under varying parameter conditions. To comprehensively understand the system's dynamics, we executed parameter iterations for kf_0, kr_0, kf_1, and kf_ncd. Remarkably, these iterations, performed under distinct initial conditions characterized by varying kinetic constants, led to discernable distinctions in the concentration dynamics of the species of interest. This observed variability in behaviour aligns seamlessly with the outcomes of our prior investigation into local sensitivity, as described earlier. These findings underscore the significant influence of parameter variations on the system's behaviour, offering valuable insights into its intricate dynamics and the potential for optimization.

Figure 8. Parametric scanning results for the species of interest

Molecular Model

The aim of our project is to metabolize formaldehyde through the use of modified enzymes capable of utilizing the non-natural coenzyme NCD. Consequently, conducting molecular docking simulations with these pertinent enzymes is of paramount importance for our forthcoming wet-lab experiments. Furthermore, successful molecular docking serves as an additional validation of the effectiveness of our bioengineering efforts.

Method & Results

Figure 9. Molecular docking diagram of Ncds-2 and NMN. The NMN molecule engages in hydrogen bonding interactions with residues G9, T11, F12, H16, G107, D109, F177, and I179.
Figure 10. Molecular docking diagram of Ncds-2 and CTP. The CTP molecule engages in hydrogen bonding interactions with residues G9, I33, N40, R42, and S11.

Weighted RMSD

Consider a molecule characterized by N atoms located at positions A={ai}N , with coordinates w={wi}N . When presented with two groups of N points, A and A , with respective coordinates {ai}N and {ai}N , which represent two different conformations of a molecule, we can establish the weighted RMSD between them [6] .


The term W is defined as the sum of the weights, i.e., W=iwi . The statistical weights {wi}N can be used to highlight the significance of a specific part of the structure. For instance, in the case of a protein, this could be the backbone or the side chains. More often, these weights are equivalent to the atomic masses (in this scenario, W is equal to the total mass of the molecule) or they may be set to 1 (in which case, W=N ).

We use Autodock4 for molecular docking to simulate the affinity of the enzyme for various small molecules.

Quaternion arithmetic

A quaternion, denoted as Q, can be expressed as a combination of a scalar component, s, and a three-dimensional vector q={qx,qy,qz}T, Thus, Q can be written as [s,q]. This representation is particularly useful for describing spatial transformations, especially rotations. For instance, a rotation operation can be represented by a rotation quaternion, denoted as Q^, which has a unit norm. Quaternion algebra includes various operations such as multiplication, division, inversion, and norm calculation. A brief overview of quaternion arithmetic can be found in our previous work [7].

The rigid-body motion case

Here, we recapitulate the main outcome of our prior work [7]. We presume that a molecule A with coordinates ai={xi,yi,zi}T is translated and rotated to new positions A={ai}N, which are given as ai=Rai+T. Here, R is the 3×3 rotation matrix and T is the translation 3 -vector. It is convenient to use the quaternion representation Q=[s,q] of the rotation matrix R. Then, the weighted RMSD between A, the original positions, and A, the transformed positions, can be written according to equation (Available in a centroid reference frame of C=0, Less arithmetic expressions) from Rapid determination of RMSDs corresponding to macromolecular rigid body motions as follows:


With E3 being the 3×3 identity matrix, C represents the centre of mass 1W{wixi,wiyi,wizi}T, and I denotes the inertia tensor.


We should mention that it is practical to work in the center-of-mass reference frame where C=0. Thus, in this frame, the RMSD can be expressed with fewer arithmetic operations as


RMSD for flexible molecules modelled with collective motions

We are expanding on our previous work [7] by incorporating molecular flexibility through linear collective motions. These can be computed using techniques such as Normal Mode Analysis (NMA) or Principal Component Analysis (PCA). More specifically, let {fij}NM be a set of M vectors that describe the linear collective motions applied to a molecule, where fij= {fixj,fiyj,fizj}T . Here, i is the atom index ranging from 1 to N , and j is the index of the collective motions ranging from 1 to M . Let {μj}M be the amplitudes of the collective motions for the reference conformation of the molecule. Then the reference flexible coordinates AF={aiF}N are given as aiF=ai+j=1Mμjfij . Now, to compute the flexible coordinates of the target conformation AF , we first add flexible displacements to the rigid coordinates of the reference conformation and then apply the rigid-body transformation to the result. Let {λj}M be the amplitudes of the collective motions for the target conformation of the molecule, where the collective motion vectors are the same as in the reference conformation. Similarly to the rigid-body case, let R and T be the rotation matrix and the translation vector of the target rigid-body transformation, respectively, so that the flexible target coordinates {aiF}N are given as aiF=R(ai+j=1Mλjfij)+T . Then, the weighted RMSD between positions AF and AF is given as follows:


We can rewrite the previous expression using the quaternion representation of vectors ai,T , and the rotation matrix R as


Here, the unit quaternion Q^ corresponds to the rotation matrix R . Since the norm of a quaternion does not change if we multiply it by a unit quaternion, we may right-multiply the kernel of the previous expression by Q^ to obtain


Using the scalar-vector representation of a quaternion, Q^=[s,q] , we rewrite the previous RMSD expression as


Performing scalar and vector products in the above equation, then grouping the terms that depend on atomic positions together, and after introducing the inertia tensor I, the center of mass vector C and reintroducing the rotation matrix R, we obtain


Here, Tr() is the matrix trace operator, Dj is the set of M3×3 matrices of cross-products


is the set of M23×3 matrices of cross-products


and Bj=1W{wifixj,wifiyj,wifizj}T are the centers of the collective motions. Again, it is practical to choose the reference frame of the target molecule such that C=0. Also, commonly used collective motions, e.g. those computed using the NMA or the PCA, possess the weight orthonormality property, i.e. Tr(Fjk)=δjk/W. From now on, we will consider only this type of motions. Thus, in this case, the RMSD equation simplifies to


This is our primary RMSD equation and the main result of this work, we call it master equation. It comprises the rigid contribution T2+4WqTIq, the flexible contribution 1Wj(μj2+λj2), and the cross-terms. Once the matrices and vectors I,Bj,Dj, and