Protein and Math Modeling

Protein and math modelling are in silico tools that can be used to forecast the effects of changes in the sequence or conformation of a protein or model a biological system under different constraints. This can help guide and confirm experimental work and reduce the amount of trial-and-error in the lab.

Modelling in PyRosetta

PyRosetta is a powerful computational tool used in structural biology and protein engineering¹. It is an interactive Python-based interface to the Rosetta molecular modeling suite, which allows users to design and predict the 3D structure of both small and large bio-molecules. PyRosetta offers a robust platform for understanding how changes in a protein's structure can influence its function and stability.

Our project harnesses the power of protein modeling to anticipate the effects of single-point mutations on the stability and function of various proteins integral to our goals. Utilizing PyRosetta, an industry-standard computational tool, we input the Protein Data Bank (PDB) files of our chosen proteins to simulate and assess the effects of these mutations. This approach not only aids in the reduction of trial-and-error in experimental work but also serves as a quality control measure, allowing us to monitor any potential mutations during our synthetic biology processes.

Objectives

The core aim of our simulation studies revolves around preserving protein functionality amidst potential mutations. Our objectives are delineated as follows:

Primary Goal: Ensuring that the function of a protein remains unaffected by single-point mutations. This understanding permits us to foresee the repercussions of spontaneous mutations during our synthetic biology undertakings.
Quality Control: By employing computational methods, we can verify the presence or absence of mutations in proteins throughout our processes. This capability is invaluable for monitoring and sustaining quality control within our project.

The motivations behind our protein selections and the design of our simulations are deeply rooted in these objectives, shaping our subsequent analyses.

Methods

Understanding the implications of point mutations on protein stability, especially when assessed using ΔΔG values, remains a vital component in the realm of protein bioinformatics. A paper by Hernández et al.² delves into this subject, elucidating a streamlined method, KORPM, that employs a residue-based orientational potential focusing on just three backbone atoms for predicting stability changes upon mutation. The performance metrics and approach of this method, especially its efficiency and reduced risk of overfitting, underscore the rapid advancements in predictive modeling of protein stability.

In our work, we harness the capabilities of PyRosetta to compute ΔΔG values, aiming to derive insights on the effects of point mutations, especially on the MxeGyrA protein. While the approaches might seem distinct at first, there's a deeper convergence at the level of objective: both methodologies strive for accurate, efficient, and robust predictions of the effects of mutations on protein stability.

While Hernández et al. focus on circumventing the challenges posed by mutations at interfaces and other extreme conditions, such aspects can be incorporated as supplementary considerations in our research. It might serve to provide context, ensuring that our stability predictions account for broader biochemical contexts.

Protein Selection

Our simulations focused on proteins pivotal to our project's objectives, namely IL_10, Staygold, and both single and docked inteins:

IL_10 and Staygold: Selected as our proof-of-concept, these proteins allowed us to illustrate the feasibility of our method in predicting the consequences of mutations on protein function and stability.
Single and Docked Inteins: Incorporated as a part of our Cell-Free Protein Synthesis (CFPS) project, these proteins were chosen for simulation to preempt and counteract any potentially detrimental mutations during the CFPS procedure.

These proteins were chosen based on their distinct properties and roles in our project, with simulations designed to derive insights into potential mutation impacts on their function and stability.

PyRosetta Scripting

We began by inputting the Protein Data Bank (PDB) file of our protein into PyRosetta. The software carries out an extensive conformational sampling, probing a broad spectrum of the protein's possible structures and orientations. Particular emphasis was placed on understanding the role of single-point mutations and their implications for the stability of the protein, evaluated using the ΔΔG (ddG) score metric.

Inputs

The PyRosetta script's primary input is a Protein Data Bank (PDB) file. This file contains the 3D atomic coordinates of the protein under study. PDB files are a standardized format in structural biology, utilized to archive and disseminate information about 3D structures of proteins, nucleic acids, and complex assemblies.

Outputs

Upon execution, the script yields a Comma-Separated Values (CSV) file that corresponds to the input PDB file. This CSV file has headers subdivided into two segments: pre-existing data and simulation-generated data.

Pre-Existing Data

This encompasses the information extant prior to running the script based on the input PDB file:

type: Specifies whether the amino acid is part of the main chain or a side chain.
residue_number: Indicates the position of the amino acid in the protein sequence.
previous_aa: The amino acid in the original protein structure.
new_aa_1l: The one-letter code of the mutated amino acid.
new_aa_3l: The three-letter code of the mutated amino acid.
conversion: Describes the specific mutation that occurred, e.g., "Ala to Gly."
new_seq: The sequence of the protein with the mutation incorporated.

Simulation-Generated Data

This set encapsulates data produced during the PyRosetta script's simulation:

fa_score: Full-atom score, indicating the overall energy of the protein structure.
ddg_score: Change in the free energy (ΔΔG) upon mutation, indicating the potential impact on protein stability.
hbond_score: Energy score based on hydrogen bonding in the protein.
sasa_score: Solvent accessible surface area score, hinting at potential protein-protein or protein-ligand interactions.
secondary_structure: Prediction of the protein's secondary structural elements, such as alpha-helices and beta-sheets, post-mutation.
diff_hbonds: Difference in the number of hydrogen bonds before and after the mutation.
diff_sasa: Change in solvent-accessible surface area upon mutation.
diff_secondary_structure: Changes in secondary structural elements due to mutation.

Results

In this section, we present the analysis and findings based on the execution of the PyRosetta script on our selected proteins. The outcomes will be discussed in light of both the pre-existing and simulation-generated data.

IL_10

ΔΔG Analysis

High ΔΔG Values: Among the top 100 ΔΔG values, it was observed that changing a non-arginine (R) amino acid to arginine caused a significant increase in the ΔΔG score. This suggests that such mutations could substantially destabilize the IL_10 protein.

Low ΔΔG Values: On the other hand, when looking at the lowest ΔΔG values, it was noticed that changing a wild-type amino acid to alanine (A), glycine (G), or serine (S) reduced the ΔΔG score substantially, implying that these mutations could stabilize the protein.

Solvent Accessible Surface Area (SASA) Analysis

High SASA Values: An examination of the SASA values that were above the mean plus two standard deviations showed that mutations which changed amino acids to tryptophan (W), tyrosine (Y), and glycine (G) seemed to increase the SASA value. Interestingly, all of the top SASA values were associated with a low ΔΔG score, indicating a potential correlation between these two parameters.

Low SASA Values: In contrast, examining the values below the mean minus two standard deviations, it was found that the majority of the low SASA values seemed to be associated with mutating tyrosine (Y), arginine (R), and lysine (K) to another amino acid. This implies that such mutations might decrease the protein's interaction with the solvent. Furthermore, residues at the 139th, 87th, and 129th positions seemed particularly susceptible to these drops in SASA, suggesting they could be crucial regions for the protein's solvent interaction.

StayGold

ΔΔG Analysis

High ΔΔG Values: Our findings showed that among the 15 mutants with the highest ΔΔG scores, 14 had proline (P) at the mutated position. This suggests that the introduction of proline can considerably destabilize the protein, indicated by the high ΔΔG score. Interestingly, the addition of this amino acid seemed to reduce the count of hydrogen bonds by one, though this impact might not be significant given the overall number of hydrogen bonds (over 250).

Low ΔΔG Values: Mutating the histidine residue (H), specifically the 382nd residue in the amino acid chain, significantly reduced the ΔΔG score. This implies that such a mutation could improve the stability of Staygold.

Residue 382: A deeper examination of residue 382 revealed that most mutations on this histidine residue resulted in a negative ΔΔG score, suggesting an increase in protein stability. The highest ΔΔG score at this position occurred when histidine was mutated to proline (P). Intriguingly, all mutations at this position caused minimal change to the SASA score.

**Figure 1**: The figure illustrates changes in mean ΔΔG values on the Y-axis, while the X-axis represents the original amino acid (the amino acid that was replaced due to mutation). As observed, mutating Glycine (G) throughout the amino acid chain to any other amino acid tends to negatively affect the score more than other mutations. On the other hand, while mutating Tryptophan (W) to another amino acid does impact the score, it leads to significantly less destabilization. The red line denotes the average change in the ΔΔG score, grouped by the amino acid that was replaced due to mutation.

**Figure 2**: This figure displays the mean ΔΔG scores for amino acids introduced into the chain due to point mutations. The ΔΔG values are categorized based on the type of amino acid added. Mutating any amino acid to Proline (P) significantly destabilizes the protein. In contrast, a mutation to Glycine (G) has minimal impact, as evidenced by the negligible change in ΔΔG values and preliminary analysis of the protein's 3D structure indicating no adverse effects on foldability. The red line in the chart represents the average ΔΔG value change across all mutations.

SASA Analysis

General Observations: Most of the mutants' SASA scores were noticeably close to the wild-type's, and the range of scores was quite narrow.

High SASA Values: Our observations revealed that mutations resulting in tryptophan (W) generally led to a significantly high SASA score, implying increased exposure of the protein to the solvent.

Low SASA Values: On the other end of the spectrum, the lowest SASA scores seemed to be associated with mutations on arginine (R). These low SASA scores might indicate a decrease in the solvent interaction of these specific mutants.

Docked Intein

ΔΔG Analysis

High ΔΔG Values: Mutations that involved changing various amino acids to proline (P) resulted in higher ΔΔG scores. This indicates that the introduction of proline can considerably destabilize the protein. Notably, the mean ΔΔG score across all mutations was 340, whereas it rose to 1830 when an amino acid was changed to proline.

Low ΔΔG Values: Based on the lowest ΔΔG scores, the 56th residue, followed by the 161st, 252nd, and 333rd residues, were most likely to contribute to lower ΔΔG scores in the event of a mutation. These lower scores suggest increased protein stability following these specific mutations.

**Figure 3**: The figure above presents a heatmap illustrating the ΔΔG scores of the MxeGyrA protein upon introducing a single amino acid change through a point mutation. The Y-axis represents the location of the amino acid within the chain, while the X-axis shows the amino acid introduced due to the mutation. White or colorless spots highlight positions in the chain where the mutated amino acid matches the wild-type amino acid already present at that position. Darker shades signify greater changes in the ΔΔG score, indicating increased instability. Conversely, lighter (more pinkish) shades represent locations where the ΔΔG score remains relatively unaffected. The heatmap reveals that mutations introducing Proline (P) notably compromise protein stability. Preliminary 3D structural analysis confirms this, showing adverse effects on protein foldability. Elevated ΔΔG levels, though of a lesser magnitude, are also observed when amino acids mutate to Tryptophan (W) or Tyrosine (Y).

SASA Analysis

High SASA Values: Statistical analysis showed that mutations at residue numbers 22 and 257 caused significant increases in SASA scores. In general, mutations resulting in tryptophan (W) led to considerably high SASA scores. These observations suggest a possible increase in the protein's exposure to the solvent following these mutations.

Low SASA Values: Mutations at several residues, including 219, 243, 107, 79, 75, led to a significant decrease in SASA scores. Particularly, the arginine (R) residue at position 219 showed high sensitivity to mutations, indicated by a marked drop in the SASA score. This suggests a potential decrease in solvent interaction for these mutants.

Modelling with GROMACS

Introduction and Background

GROMACS³ is a widely used open source software suite for modeling proteins, in particular for doing molecular dynamics. Molecular dynamics solve Newton’s equations of motion for a system of N atoms⁴.

Two types of modelling was performed, a simple MD run of Staygold Intein-Protein in a box of water, and a Free energy of solvation of Staygold Intein-Protein in water.

Objectives

Model the general stability of Staygold Intein-Protein.
Model the thermodynamic nature of Staygold Intein-Protein in water solvent to aid in wetlab results.

Methods

These simulations were completed using GROMACS version 4.5.5 with CUDA, running on UBC's ARC Sockeye⁵. The OPLS⁶ forcefield was used for simulations pertaining to Staygold.

Basic Modelling

An exploratory MD run on the Staygold Intein-Protein in water to model its stability was done. Using the Staygold Intein-Protein pdb file provided by the wet lab team, GROMACS was used to generate the topology of the protein. Next, the protein is solvated with water in a dodecahedron box, chosen because it is one of the smallest boxes saves CPU time⁷. Then to remove charges in the system, counter ions of Na and Cl were added in replacement of a few water molecules. The CHARMM⁸ forcefeild was chosen because it's design most aligned with our protein.

To stabilise the system, energy minimisation using steepest decent⁹, theromostat coupling and pressure coupling was performed. Configuration files were based on GROMACS provided tutorials¹⁰ ¹¹, help from Sockeye, and GROMACS documentation¹². Finally, an MD run of 500000 ns was performed; root mean square deviation, radius of gyration, and diffusion constant values were obtained.

Advanced Modelling

Next to model the Staygold Intein-Protein in water, the free energy of solvation was determined with GROMACS. The preparation steps done in basic modelling are also used in advanced modelling.

Solvation¹³ is the interaction of solvent and solute, with hydration being the subcase of water as the solvent. Calculating the Gibbs free energy of solvation for the Staygold Intein-Protein can give insight to the nature of its folding when in water. Protein folding is more favourable when, for example the interactions between water molecules and hydrophobic side chains are minimised.

Generally speaking, solvation requires a number of steps resulting in changes in energy. First, a cavity is created in the solvent to insert the solute. Then as the solute is inserted into the solvent, more changes in energy occur. Using GROMACS, free energy¹⁴ of solvation of a protein can be simulated using lambda (vector) states¹⁵. Each state represents the protein and solvent in a certain stage, with lambda 0 being the inistal state of protein seperated from solvent and the last lambda state being the final state of protein submerged in water.

Since each lambda state requires it’s own MD run, the time required to run these simulations on our local computers ended being around 2-3 days. On Sockeye, this was reduced to 1 day and 15 hours. Thus, for faster turnaround and to be able to debug these simulations quickly, UBC’s ARC Sockeye GPUs were used.

Results

Basic Modelling

MSD: The mean square displacement (MSD), or diffusion constant, was obtained for both Staygold and IL-10¹⁶ with GROMACS¹⁷. This will aid in our math modelling.
Radius of Gyration: indicator of protein structure compactness¹⁸. If the protein is stably folded, it will maintain a relatively steady value, otherwise if the protein unfolds then the value will change over time.

**Figure 4**: Diffusion constant for Staygold

**Figure 5**: Diffusion constant for IL-10

**Figure 6**: Radius of Gyration for Staygold

Advanced Modelling

The simulation of 20 lambdas were run on UBC Sockeye, taking about 5 days, with these results:

**Figure 7**: Free energy differences for Staygold solvation

A Gibbs free energy value of 11249.38 +/- 70.50 kJ/mol (2688.666348 cal/mol) was obtained when running gmx bar was run.

Temperature: 298 K

Detailed results in kT (see help for explanation):

lam_A  lam_B      DG   +/-     s_A   +/-     s_B   +/-   stdev   +/-
0      1  241.01  0.87   30.29  0.55   33.63  0.59  227.90 121.08
1      2  249.42  0.66   28.74  0.82   33.77  0.48  204.09 19.69
2      3  257.77  0.41   27.43  0.34   33.80  0.75  156.99 33.52
3      4  266.15  0.57   27.02  0.55   34.06  0.22  146.05  9.63
4      5  273.67  0.59   27.38  0.87   35.26  0.15  177.62 47.37
5      6  275.48  1.12   31.69  0.68   36.94  1.18  243.29 100.59
6      7  280.43  0.35   29.12  0.50   36.11  0.57  186.29 99.48
7      8  280.83  0.51   34.94  0.51   38.22  0.53  435.09 952.01
8      9  284.65  0.89   32.30  0.72   38.76  0.96  296.55 132.49
9     10  286.78  0.49   34.73  0.52   41.76  0.79  323.55 365.78
10     11  287.63  0.82   35.39  1.06   45.94  1.07  371.44 5265.04
11     12  281.06  1.22   42.29  0.68   43.53  0.47 1632.14 8266.62
12     13  283.29  2.29   40.84  1.34   54.52  1.57 7783.97 147761.99
13     14  276.36  2.26   45.32  1.28   59.66  1.15 3564.57 101308.68
14     15  259.96  2.75   58.34  0.61   63.35  1.40 224492.47 1888281.83
15     16  245.36  4.22   65.97  1.46   87.78  6.43 11629420.57 8082484763.52
16     17  179.58  9.85  114.98  2.38  123.33  4.91 33300971512.31 123325441600798448.00
17     18  172.35  9.23   70.01  2.64  677.44  9.43  245.69 14.00
18     19 -141.55  1.91   40.08  2.03   98.29  2.72  333.80  9.98

Final results in kJ/mol:

point      0 -      1,   DG 597.15 +/-  2.16
point      1 -      2,   DG 617.99 +/-  1.64
point      2 -      3,   DG 638.67 +/-  1.01
point      3 -      4,   DG 659.45 +/-  1.40
point      4 -      5,   DG 678.08 +/-  1.47
point      5 -      6,   DG 682.56 +/-  2.78
point      6 -      7,   DG 694.82 +/-  0.88
point      7 -      8,   DG 695.82 +/-  1.25
point      8 -      9,   DG 705.28 +/-  2.20
point      9 -     10,   DG 710.56 +/-  1.21
point     10 -     11,   DG 712.68 +/-  2.03
point     11 -     12,   DG 696.38 +/-  3.03
point     12 -     13,   DG 701.92 +/-  5.68
point     13 -     14,   DG 684.73 +/-  5.61
point     14 -     15,   DG 644.12 +/-  6.81
point     15 -     16,   DG 607.94 +/- 10.46
point     16 -     17,   DG 444.94 +/- 24.41
point     17 -     18,   DG 427.04 +/- 22.88
point     18 -     19,   DG -350.72 +/-  4.73

total      0 -     19,   DG 11249.38 +/- 70.50

Analysis and Relation to the Project

Advanced Modelling

A solvation process is thermodynamically favored only if the overall Gibbs energy of the solution is decreased, compared to the Gibbs energy of the separated solvent and solid¹³.

A Gibbs free energy value of 11249.38 +/- 70.50 kJ/mol (2688.666348 cal/mol) indicates that the solvation process is not thermodynamically favored. This could mean there's unfavorable interactions with Staygold and water such as exposed hydrophobic side chains.

Thus, this should be taken into account when designing experiments with the Staygold protein, and perhaps another protein of choice may be better suited.

Flux Based Analysis

Background

In the world of biomanufacturing, a key parameter that impacts industrial performance is biomass specific substrate uptake rate, or simply the biomass flux¹⁹. While E. Coli has long been the staple workhorse of metabolic engineering with a high growth rate of (60 minute doubling time), dedicated faster growing strains seem to reach a maximal limit of (40 minute doubling time)²⁰.

To achieve higher production rates, there is a lot of interest in further characterizing Vibrio natriegens, a common gram negative, non pathogenic bacteria with a growth rate of (10 minute doubling time) when cultured under ideal conditions²¹. This interest has sparked the creation of two annotated genomes²² ²³, validation of its competence in metabolic engineering and expediting common molecular cloning pipelines²⁴, and in the characterization of Vibrio natriegens' growth physiology, core metabolism and biomass consumption¹⁹. This research has culminated in the publication of the first genome scale metabolic model (GSMM) in 2023²⁵, which we aim to exploit by conducting constraint-based modelling and analysis.

Introduction to Flux Balance Analysis

Flux Balance Analysis (FBA) is a computational approach used in synthetic biology to model and analyze metabolic pathways in organisms²⁶. FBA predicts the metabolic fluxes that are feasible for a given metabolic network under specific constraints, such as nutrient availability or metabolic capacity. The goal of this workflow is to typically identify strategies for optimizing the production of a specific compound (known as the objective function) by maximising the rate of associated pathways (known as fluxes).

In our analysis, three foundational assumptions were applied to our FBA model:

Steady State Assumption: This assumes that the system is being modelled at steady-state, whereby the concentrations of metabolites and other components does not change over time. This allows the many dynamic aspects of the model to be simplified, and rather focuses on the fluxes between pathways in the system.
Mass Balance Assumption: This assumed that all mass is being conserved in the system. It extends the Steady State Assumption by allowing fluxes into and out of the system, but mandates that the fluxes of metabolites entering the system must equal the fluxes of metabolites exiting the system (for ex., carbon can enter and leave the system, but all carbon entering the system must also leave the system). This ensures consistency with the Steady State assumption while allowing the system itself to remain dynamic.
Limitation to a Single Objective Function: In theory, a system has infinite solutions, given the flux of each pathway is a continuous variables that can be optimized. Hence, despite having a finite number of pathways, it is impossible to optimise all pathways at the same time. FBA relies on optimising one pathway of interest at a time - identified as the objective function

Our objectives with FBA

As one of the first teams with access to a published GSMM model of Vibrio natriegens, we chose to investigate V. natriegens' suitability as a host in the context of Project PILOT, while also exploring open-ended questions that would expand the iGEM community's understanding of V. natriegens as a whole.

Therefore, our objectives with FBA were:

Determining the ideal sugar for optimal biomass growth.
Finding the optimal biomass bioreactor conditions that would promote the highest biomass accumulation rates.
Identifying critical genetic perturbations that strongly effect biomass growth (i.e., kill the cell), and therefore compare V. Natrigens’ robustness against E Coli.

Methods

We completed these analysis using the COBRA Toolbox in MATLAB. Specifically, we utilized the recently published Genome Scale Metabolic Model iLC858²⁷ of Vibrio natriegens and the E Coli base model iJO1366²⁸, using the COBRA Toolbox’s flux optimization algorithms and single reaction deletion algorithms to perform FBA and SRD analysis, respectively.

Results

Comparison of Various Carbon Sources in Vibrio natriegens’ Biomass Growth

To begin, we identified all exchange reactions present in the V. natriegens GSMM, and then filtered for all carbon exchanges, resulting in the following metabolites being identified:

Glucose
Fructose
Glycerol
D-Mannose
D-Mannitol
Dulcose
Amylotriose
Maltohexaose
Acetate

Next, we tested different uptake rates for each metabolite by relaxing its lower bound constraint , and classified them as either having an impact on the overall Biomass Accumulation Rate (BAR) or not. Solutions to the FBA were calculated using COBRA’s optimizeCbModel() function, and solutions being visualized using MATLAB’s base plot() function.

Results revealed that only Glucose and Fructose substantially affected overall BAR. Among the remaining carbon sources, D-Mannitol and Acetate caused minor fluctuations in the BAR, while Glycerol, D-Mannose, Dulcose, Amylotriose, and Matohexaose had no effects on the BAR.

**Figure 8**: Plots showing the effect of varying uptake rate of **only Glucose and Fructose** on the BAR. The optimal Fructose uptake rate is 650 mmol/day, allowing for a growth rate of 44.23 1/day. The optimal Glucose uptake rate is 980 mmol/day, allowing for a growth rate of 36.58 1/day.

While increasing Fructose uptake from 0mmol/h to its optimal at 650mmol/day allows for the peak biomass growth rate, it is interesting to note that the model can survive without any Fructose, but can not without any Glucose. This signifies the glucose exchange pathway is critical to the objective function. It would be interesting to model the optimal growth rate as a function of Fructose and Glucose in future studies.

**Figure 9**: Plots showing the effect of varying uptake rate of each metabolite on the BAR. Interestingly, **D-Mannitol** and **Acetate** cause minor, but noticeable fluctuations to the BAR, while the other carbon sources have no effect at the baseline growth rate.

Finding Optimal Bioreactor Conditions to Optimize Biomass Yield

A bioreactor is a controlled environment used in biotechnology and industrial processes to cultivate microorganisms, cells, or tissues for various applications, including fermentation, biofuel production, and pharmaceutical manufacturing. When designing a bioreactor, two crucial parameters to consider are the oxygen uptake rate (OUR) and glucose uptake rate (GUR). The OUR represents the rate at which microorganisms consume oxygen for metabolic processes, while the GUR reflects the rate at which they consume glucose as a carbon source. Properly optimizing these rates is vital to ensure optimal cell growth and productivity. A balanced OUR and GUR are essential to avoid oxygen limitation, which can lead to reduced cell viability, and glucose limitation, which can hinder biomass and product formation. Bioreactor design and operation must carefully manage these rates to achieve efficient and controlled bioprocesses, making them fundamental considerations in bioreactor design and operation.

As such, we recreated FBA optimization on a 3-Dimensional array, optimizing for BAR as a function of OUR and GUR. The results were plotted in a 3D scatterplot using MATLAB’s scatter3() function.

**Figure 10**: 3D FBA Visualization of the effect of OUR and GUR on BAR (Growth Rate). The GUR was varied between 0 to 2000 mmol/day, while OUR was varied between 0 to 300 mmol/day - although BAR values flatlined at 1500mmol/day and 200mmol/day, respectively. Using the *findpeaks()* function in MATLAB, the optimal **OUR and GUR is 200 mmol/day and 980mmol/day** respectively, resulting in a **BAR (growth rate) of 36.972 1/day**.

Comparing Metabolic Robustness of V. natriegens against E. Coli Using Single Reaction Deletion Analysis

Single Reaction Deletion (SRD) analysis within MATLAB's COBRA Toolbox is a valuable tool for elucidating the essential reactions within a metabolic network model. By systematically simulating the deletion of individual reactions and assessing their impact on the model's growth or objective function, SRD analysis helps pinpoint reactions critical to the network's function. This information is invaluable for identifying potential drug targets, understanding the network's robustness, and prioritizing genetic modifications for metabolic engineering. SRD analysis provides a comprehensive view of the metabolic dependencies, enabling a comparison of metabolic robustness between two or more metabolic models.

This analytical approach is employed in our project at a qualitative level, comparing the metabolic robustness of E. Coli to V. natriegens using the SingleRxnDeletion() function. We use this function to calculate Growth Ratios (GRs) after every SRD, which is the BAR flux ratio after and before a particular reaction was deleted. In essence, if a reaction has a GR of 0, or close to 0, it implies that after its deletion, the model has a significant reduction in its BAR - and that reaction is therefore critical to the model’s survival. Results from the SRD analysis were visualized using barcode plots, presented below:

**Figure 11 & 12**: Barcode plots showing differences in GR before and after SRDs. Qualitatively, the higher density of the E coli barcode plot implies fewer reactions have a GR of 0 after SRD.

At first glance, this may imply that the V. natriegens model is more sensitive to metabolite perturbations or changes in its environment. However, it must be noted that the E Coli. model used in this analysis is much more refined and comprehensive than the newly published V. natriegens model, and these results must therefore be validated using further bioinformatic pathway enrichment analysis.

Diffusion Modelling

Implications of Diffusion Modelling

Another important aspect of the mathematical model was gaining some insight into the distribution of our proteins of interest - Staygold and IL-10, inside a potential bioreactor.

Diffusion modelling is particularly useful in Scale-Up and Scale-Down studies; by modelling diffusion, we are able to predict the concentration of our proteins of interest as they change over time and space, which in turn provides the basis for calculations in transition from lab scale to industrial scale bioprocesses. Furthermore, it would also guide an industrial bioreactor’s geometry, identify potential regions of protein aggregation and optimize resource utilization.

Setting up the Model

For this investigation, we modelled a stir rod with radius in the relative center of the bioreactor. A region of bacterial abundance surrounds it between radius and , which will be considered the source of our two proteins of interest (Staygold and IL-10). Finally, the surface of the interior surface of the bioreactor, would be where the proteins would exit the system from through a filter, acting as a target/sink. Our goal is the model the diffusion of Staygold and IL-10 from the source to the sink, by generating a concentration profile on MATLAB, and investigating the effects of varying cell radius, diffusion coefficient and basal absorption rate.

**Figure 13**: Graphical Representation of the Bioreactor System

Figure 1 shows a diagram of the system of interest, where the system is being modelled in cylindrical coordinates. is defined as the radius of the stir rod, while is defined as the radius of the bacterial abundance. Both these regions are being modelled as concentric cylinders. Following the log phase of growth, the transformed cells are likely to begin ejecting excess the split inteins into its surroundings at steady-state. Hence, we may set Ci as the initial concentration of the protein obtained at the surface of the stir rod, and consider it to be a perfectly reflective boundary (boundary condition). For this design of the bioreactor, we are assuming a concentric filter at radius allowing for perfect absorption of the protein of interest, making the concentration of the proteins in this region 0 (boundary condition).

Having set up our boundary conditions, we may now make some general assumptions:

Considering the region of bacterial abundance of the bioreactor to be the solitary region producing our proteins of interest.
Assuming a constant, perfect efficiency filter at radius that has no protein aggregation.
The Staygold and IL-10 are assumed to be moving purely through passive diffusion between the source (region of bacterial abundance) and the sink (filter).
No Staygold and IL-10 is being lost to internal bacterial metabolic processes.
No Staygold and IL-10 is being lost to protein degradation.
The host bacteria’s endogenous activity is being ignored.

With these assumptions in place, we are ready to begin deriving a concentration profile for our system.

Deriving the Steady-State Concentration Profile of Our Proteins in our Diffusion System

We begin with the multi-dimensional diffusion equation at steady state (in cylindrical coordinates). The function incorporates the bacterial region’s basal production of Staygold and IL-10.

We will let be the location of the outer edge of the stir rod. will be the edge of the region of bacterial abundance, and will be the inner surface of the protein filter. This is summarized below, in Figure 2:

**Figure 14**: Cylindrical view of the system of interest. The boundary conditions have been incorporated into the system.

We have symmetry in , and no dependence on . We can therefore simplify this equation to:

Integrating both sides:

Integrating again:

Applying boundary conditions:

From equation 3:

From equation 2:

Subtracting equation 4 from equation 1:

From equation 4:

Pluggin in all constants:

Finding the Steady-State Concentration of ACC Molecules

We can integrate this function over the volume of interest to find the total number of protein molecules (Staygold and IL-10), , between the cell nucleus and the rhizosphere. Below, denotes the height of the cell.

$\text{[math]}$ $\text{[math]}$ $\text{[math]}$ $\text{[math]}$

We can divide by the volume () of interest to obtain the average concentration of protein (Staygold and IL-10) throughout the region.

$\text{[math]}$

Simulating the System

The ODEs were solved using MATLAB and the general concentration profile is presented below:

**Figure 15**: Concentration profile of the **StayGold**. The diffusion constant for the protein was calculated using GROMACS from the protein modelling sub team, and the plots were generated in MATLAB. Interestingly, we see the concavity of the function switch from concave up to concave down at the edge of bacterial abundance, which is where the basal rate of production turns off.

**Figure 16**: Concentration profile of the **IL-10** proteins. The diffusion constant for the protein was calculated using GROMACS from the protein modelling sub team, and the plots were generated in MATLAB. Interestingly, we see the concavity of the function switch from concave up to concave down at the edge of bacterial abundance, which is where the basal rate of production turns off.

Exploring Concentration Response to Media Conditions by Varying D

Since the value for the Diffusion Constant D was obtained from a source with a high degree of uncertainty, the team found it useful to visualize the maximum and minimum deviations to our standard concentration profile due to the error within D. This result is presented below:

**Figure 17**: Concentration profile of **StayGold** at the edges of their uncertainty values, as predicted by GROMACS.

**Figure 18**: Concentration profile of **IL-10** at the edges of their uncertainty values, as predicted by GROMACS.

We also conducted Sensitivity analysis of a varying Diffusion coefficient, plotting changes to the concentration profile by varying the GROMACS value by up to 50% in the positive and negative directions. The results are plotted below:

**Figure 19**: Sensitivty analysis profiles of **StayGold** at the edges of their uncertainty values, as predicted by GROMACS.

**Figure 20**: Sensitivty analysis profiles of **IL-10** at the edges of their uncertainty values, as predicted by GROMACS.

Exploring Concentration Response to Filter Efficiency by Varying

Optimal filter efficiency is essential to the our model assumptions, preventing protein aggregation near the edge of the bioreactor at steady state, making the filter a perfect absorber. However, it is likely that under sustained use the filter would lose efficiency over time, and hence the team found it useful to model the weakening filter’s efficiency as an increase in distance to the protein filter in our diffusion model : the idea being that a weaker filter is analogous to a perfect sink existing further away from the region of bacterial abundance. The results from this analysis are plotted below:

**Figure 21**: Concentration profiles of **StayGold** with varying filter distance. It appears that a inefficient filter produces higher levels of protein near the stir rod and in the region of bacterial abundance at steady state.

**Figure 22**: Concentration profiles of **IL-10** with varying filter distance. It appears that a inefficient filter produces higher levels of protein near the stir rod and in the region of bacterial abundance at steady state.

Conclusion

These findings provide a general idea of what the concentration profiles of our proteins of interest would look like inside a bioreactor. Follow up calculations could be to experimentally determine protein concentrations that negatively impact the bacterial populations producing the split inteins, before using these values as a upper bound to start constraining the model (for example, to calculate the optimal filter distance within the upper limit of the concentration profile).

Protein Modelling Sources

Staygold Simple Modelling

Staygold Solvation Modelling

Amino Acid Mutation Code Python Only

Chaudhury, S., Lyskov, S., & Gray, J. J. (2010). PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta. Bioinformatics, 26(5), 689-691. ↩
Hernández, I. M., Dehouck, Y., Bastolla, U., López-Blanco, J. R., & Chacón, P. (2023). Predicting protein stability changes upon mutation using a simple orientational potential. Structural Bioinformatics. ↩
GROMACS development team. (2022). Welcome to GROMACS. Welcome to GROMACS - GROMACS webpage https://www.gromacs.org documentation. https://www.gromacs.org/ ↩
GROMACS development team. (2023). Computational Chemistry and Molecular Modeling. Introduction - GROMACS 2023.2 documentation. https://manual.gromacs.org/current/reference-manual/introduction.html#computational-chemistry-and-molecular-modeling ↩
UBC. (2023). UBC ARC SOCKEYE. UBC Arc Sockeye. https://arc.ubc.ca/ubc-arc-sockeye ↩
GROMACS development team. (2023b). OPLS. Force fields in GROMACS - GROMACS 2023.2 documentation. https://manual.gromacs.org/current/user-guide/force-fields.html#opls ↩
GROMACS development team. (2023c). Periodic boundary conditions. Periodic boundary conditions - GROMACS 2023.2 documentation. https://manual.gromacs.org/current/reference-manual/algorithms/periodic-boundary-conditions.html ↩
MacKerell Lab. (n.d.). CHARMM force field files. https://www.charmm.org/archive/charmm/resources/charmm-force-fields/ ↩
GROMACS development team. (2023b). Energy minimization. Energy minimization - GROMACS 2023.2 documentation. https://manual.gromacs.org/current/reference-manual/algorithms/energy-minimization.html#em ↩
Blau, C., Zhmurov, A., Lindahl, E., Villa, A., & Jordan, J. (2021). Calculating Free Energy. Calculating free energy - GROMACS tutorials . https://tutorials.gromacs.org/docs/free-energy-of-solvation.html ↩
Lemkul, J. A. (2018). Gromacs tutorial. MD Tutorials. http://www.mdtutorials.com/gmx/free_energy/05_equil.html ↩
GROMACS development team. (2018). Getting good performance from mdrun. Getting good performance from mdrun - GROMACS 2018.8 documentation. https://manual.gromacs.org/2018-current/user-guide/mdrun-performance.html#gmx-mdrun-on-gpu ↩
Wikimedia Foundation. (2023, August 24). Solvation. Wikipedia. https://en.wikipedia.org/wiki/Solvation ↩ ↩²
GROMACS development team. (2023b). Free energy implementation. Free energy implementation - GROMACS 2023.2 documentation. https://manual.gromacs.org/current/reference-manual/special/free-energy-implementation.html ↩
GROMACS development team. (2023b). Free energy interactions. Free energy interactions - GROMACS 2023.2 documentation. https://manual.gromacs.org/current/reference-manual/functions/free-energy-interactions.html ↩
RCSB Protein Data Bank. (2006, October 7). 2H24: Crystal Structure of human IL-10. RCSB PDB. https://www.rcsb.org/structure/2h24 ↩
GROMACS development team. (2023b). gmx msd. gmx msd - GROMACS 2023.2 documentation. https://manual.gromacs.org/documentation/current/onlinehelp/gmx-msd.html ↩
Lemkul, J. A. (2018a). Gromacs tutorial. MD Tutorials. http://www.mdtutorials.com/gmx/lysozyme/10_analysis2.html ↩
Long CP, Gonzalez JE, Cipolla RM, Antoniewicz MR. Metabolism of the fast-growing bacterium Vibrio natriegens elucidated by 13C metabolic flux analysis. Metabolic engineering. 2017 Nov 1;44:191-7. ↩ ↩²
LaCroix RA, Sandberg TE, O'Brien EJ, Utrilla J, Ebrahim A, Guzman GI, Szubin R, Palsson BO, Feist AM. Use of adaptive laboratory evolution to discover key mutations enabling rapid growth of Escherichia coli K-12 MG1655 on glucose minimal medium. Applied and environmental microbiology. 2015 Jan 1;81(1):17-30. ↩
Payne WJ, Eagon RG, Williams AK. Some observations on the physiology of Pseudomonas natriegens nov. spec. Antonie Van Leeuwenhoek. 1961 Dec;27(1):121-8. ↩
Wang Z, Lin B, Hervey IV WJ, Vora GJ. Draft genome sequence of the fast-growing marine bacterium Vibrio natriegens strain ATCC 14048. Genome announcements. 2013 Aug 29;1(4):e00589-13. ↩
Maida I, Bosi E, Perrin E, Papaleo MC, Orlandini V, Fondi M, Fani R, Wiegel J, Bianconi G, Canganella F. Draft genome sequence of the fast-growing bacterium Vibrio natriegens strain DSMZ 759. Genome announcements. 2013 Aug 29;1(4):e00648-13. ↩
Weinstock MT, Hesek ED, Wilson CM, Gibson DG. Vibrio natriegens as a fast-growing host for molecular biology. Nature methods. 2016 Oct;13(10):849-51. ↩
Coppens L, Tschirhart T, Leary DH, Colston SM, Compton JR, Hervey IV WJ, Dana KL, Vora GJ, Bordel S, Ledesma‐Amaro R. Vibrio natriegens genome‐scale modeling reveals insights into halophilic adaptations and resource allocation. Molecular Systems Biology. 2023 Apr 12;19(4):e10523. ↩
Orth JD, Thiele I, Palsson BØ. What is flux balance analysis?. Nature biotechnology. 2010 Mar;28(3):245-8. ↩
Coppens L, Tschirhart T, Leary DH, Colston SM, Compton JR, Hervey IV WJ, Dana KL, Vora GJ, Bordel S, Ledesma‐Amaro R. Vibrio natriegens genome‐scale modeling reveals insights into halophilic adaptations and resource allocation. Molecular Systems Biology. 2023 Apr 12;19(4):e10523. ↩
Orth JD, Palsson B. Gap-filling analysis of the iJO1366 Escherichia coli metabolic network reconstruction for discovery of metabolic functions. BMC systems biology. 2012 Dec;6(1):1-5. ↩

Protein and Math Modeling

Modelling in PyRosetta

Objectives

Methods

Protein Selection

PyRosetta Scripting

Inputs

Outputs

Pre-Existing Data

Simulation-Generated Data

Results

IL_10

ΔΔG Analysis

Solvent Accessible Surface Area (SASA) Analysis

StayGold

ΔΔG Analysis

SASA Analysis

Docked Intein

ΔΔG Analysis

SASA Analysis

Modelling with GROMACS

Introduction and Background

Objectives

Methods

Basic Modelling

Advanced Modelling

Results

Basic Modelling

Advanced Modelling

Analysis and Relation to the Project

Advanced Modelling

Flux Based Analysis

Background

Introduction to Flux Balance Analysis

Our objectives with FBA

Methods

Results

Comparison of Various Carbon Sources in Vibrio natriegens’ Biomass Growth

Finding Optimal Bioreactor Conditions to Optimize Biomass Yield

Comparing Metabolic Robustness of V. natriegens against E. Coli Using Single Reaction Deletion Analysis

Diffusion Modelling

Implications of Diffusion Modelling

Setting up the Model

Deriving the Steady-State Concentration Profile of Our Proteins in our Diffusion System

Finding the Steady-State Concentration of ACC Molecules

Simulating the System

Exploring Concentration Response to Media Conditions by Varying D

Exploring Concentration Response to Filter Efficiency by Varying

Conclusion

Protein Modelling Sources

Footnotes