In synthetic biology, modeling is used to study complex problems such as protein structure, promoter sequence, gene expression, system behavior and so on. It helps to reduce costly and time-consuming trial-and-error experiments, better understand and optimize the design of the project.
This year, BNUZH-China 2023 developed five models in total to assist in project design. Additionally, modeling established a tight connection with experiment. Experiments presented problems and demands to modeling and provided experimental data for modeling to optimize parameters, while modeling optimized the experimental design conditions in advance and achieved the verification and visualization that the experiment could not finish.
The TARGETING Module included the Protein Molecular Model and the Salmonella Infection Model. In Protein Molecular Model, we predicted the structure of Lpp-OmpA-scFv, performed molecular docking with CEACAM5, calculated the developability indices and carried out a redesign of the anti-CEA scFv, incorporating humanizing mutations and enhancing antibody affinity. In Salmonella Infection Model, we simulated the invasion of the engineered bacteria to tumor cells and normal cells before and after expression of scFv respectively, and proved by the visualization of death ratio that the engineered bacteria had increased targeting to tumor cells and better killing of tumor cells after expressing scFv.
For KILLING Module, the Fenton Reaction Model was built. In Fenton Reaction Model, we developed a system of ordinary differential equations (ODEs) to kinetically observe changes in substance concentration related to Fenton reaction, and demonstrated that the therapeutic pathway special to kill tumor cells by generating hydrogen peroxide to promote Fenton reaction was feasible.
The SAFETY Module consisted of the Gene Circuit Model and the Plasmid Loss Model. In Gene Circuit Model, we solved a series of ODEs to study the relationship between the concentrations of the substances in the toxin-antitoxin system, and verified that the gene circuit designed in this module was capable of achieving the desired functions. In Plasmid Loss Model, we established the recurrence formula of the proportion of plasmid-containing bacteria in the presence or absence of toxin-antitoxin system respectively, and displayed the actual effect of the system in maintaining plasmid function through comparation of two cases.
1. Introduction
We conducted molecular modeling analysis on the single-chain antibody fragment variable (scFv) employed in this project to investigate the interaction between the scFv and the carcinoembryonic antigen (CEA) present on the surface of tumor cells. We employed molecular docking and corresponding calculations to predict the structure of the antibody and its affinity for CEACAM5.To assess the practical value of our antibody, we thoroughly examined the exploitability of scFv, including aggregation and stability. Traditionally, researchers have reduced the immunogenicity of mouse antibodies by grafting the CDR region from mouse antibodies onto the human variable region framework, thus creating humanized antibodies1 . Using the original sequence as a reference, we utilized modeling techniques to identify the key amino acid sites involved in antigen binding. Through virtual mutation of these sites, we aimed to enhance the affinity of the anti-CEA scFv and enable the engineered bacteria to exhibit stronger tumor cell targeting ability. In general, the protein molecular modeling process encompasses structural prediction of Lpp-OmpA-scFv and molecular docking with CEACAM5, followed by the modification of the antibody proteins based on an understanding of their complex structure (Figure 1).
Figure 1. Protein molecular modeling workflow
2. Structure prediction of Lpp-OmpA-scFv
Background
Since the structure of the Lpp-OmpA-scFv is unknown, our first step is to predict the structure of the antibody. AlphaFold2 (AF2), an advanced deep learning model, has achieved unprecedented performance in predicting the structure of single-chain proteins2. UCSF ChimeraX is the excellent molecular visualization tool launched by Resource for Biocomputing, Visualization, and Informatics (RBVI) following UCSF Chimera. We attempted to open the AlphaFold tool in ChimeraX and use ColabFold to make new protein predictions3.Methodology
We obtained the amino acid sequences of the Lpp-OmpA-scFv protein in a FASTA file. Additionally, we opened the AlphaFold panel in ChimeraX and copied the protein's amino acid sequence into the designated box.In the Options, we selected "Energy-minimize predicted structures", "Trim fetched structure to the aligned structure sequence" and "Use PDB templates when predicting structures". We clicked on the predict button to initiate the execution. Selecting all options may cause longer processing time, but it yields more accurate predictions. The 3D structure of the protein was predicted using ColabFold, a free computational environment provided by Google.
Results
Upon analyzing the prediction results, we downloaded the optimal prediction (Figure 2.A).We assessed the reliability of each segment of the predicted structure by analyzing the additional generated images(Figure 2.B, C, D). The AlphaFold prediction provided expected position error values for each residue pair (X.Y),showing the predicted position error at residue X when aligned with residue Y in the true structure. These residue-residue "predicted aligned error" values can be visualized with an error plot (Figure 2.B).Additionally, a sequence coverage plot was generated to examine the number of similar sequences found at different positions in the Lpp-OmpA-scFv (Figure 2.C).The predicted structures include atomic coordinates and confidence estimates for each residue, with scores ranging from 0 to 100. Higher scores indicate higher confidence. This confidence measure is called pLDDT and corresponds to the model's predicted per-residue scores on the IDDT-Ca metric (Figure 2.D).Figure 2.A Prediction of Lpp-OmpA-scFv protein structure based on Alphafold2
Figure 2.B Prediction of Lpp-OmpA-scFv protein structure based on Alphafold2
Figure 2.C Prediction of Lpp-OmpA-scFv protein structure based on Alphafold2
Figure 2.D Prediction of Lpp-OmpA-scFv protein structure based on Alphafold2
Analysis
To predict the transmembrane protein regions of the aforementioned structure, we incorporated an implicit membrane into the protein structure. Additionally, we modified the membrane properties by utilizing the Analyze Transmembrane Proteins tools in Discovery Studio. The Hidden Markov Model (HMM) was employed to predict the transmembrane helices based on the amino acid sequence of the protein. Subsequently, a hidden membrane consisting of two parallel planes was introduced to the protein structure (Figure 3). The placement of the membrane was determined by optimizing the simplified solvation energy. If there was a significant charge difference between protein residues located outside the membrane, adjustments were made to the membrane.Figure 3. Transmembrane protein regions prediction of Lpp-OmpA-scFv
Notes: The angle of inclination is defined as the angle between the first principal axis of the protein and the normal of the membrane, and the angle of rotation is the angle between the second principal axis of the atomic set and the normal of the plane defined by the first principal axis and the normal of the membrane.
3. Molecular docking
Background
We employed molecular docking to investigate the interaction between the antigenic protein and the scFv. Carcinoembryonic antigen related cell adhesion molecule 5(CEACAM5), also known as CEA or CD66e, belongs to the carcinoembryonic antigen family4. ZDOCK is a protein interaction docking program based on fast Fourier transform. It is primarily utilized to explore all potential binding modes by translating and rotating two proteins in space, and subsequently assess each binding model through an energy-based scoring function. In ZDOCK version 3.0.2, IFACE statistical potential energy, structural complementarity, and electrostatic scoring functions are employed. ZDOCK 3.0.2 was employed to dock CEACAM5 and Lpp-OmpA-scFv proteins, and the most optimal docking result was selected upon completion. The PyMol V2.4.0 software was utilized to label and present the binding sites of the docking complex..Methodology
The structure of the human CEACAM5 protein was searched for in the PDB database (https://www.rcsb.org/) and subsequently downloaded. The Lpp-OmpA-scFv prediction from the previous step was optimally prepared. We uploaded both protein structures and performed the ZDOCK 3.0.2 prediction. Following this, we downloaded and installed PyMol V2.4.0 software. Upon completion of the molecular docking, we opened the resulting complex structure file with the best prediction using PyMol V2.4.0 software. To delete all solvent molecules and display sticks within 5 angstroms of scFv or CEACAM5, we entered the command below into the command line.remove solvent
PyMOL>bg_color white
PyMOL>show sticks, byres CEACAM5 within 5 of scFv
PyMOL>show sticks, byres scFv within 5 of CEACAM5
Afterward, we conducted a search and selection process to identify residues that establish contact bonds, which were then utilized for labeling and displaying the binding sites of the docking complex. We particularly emphasized amino acids capable of forming hydrogen bonds on the interaction surface. Following that, we uploaded the docking complexes to PDBePISA in order to analyze the interaction domains and surfaces of the proteins involved.
Results
Amino acids capable of forming hydrogen bonds in the docking molecules are shown in the picture (Figure 4). The gray dotted line is the amino acid residue interaction hydrogen bond, the capital letter is the abbreviation of the corresponding amino acid, and the number is the position information of the amino acid in the protein sequence. For example, S362 is serine 362 in CEACAM5 protein. G130 is the glycine 130 in Lpp-OmpA-scFv protein. Upload the best predicted protein structure in PDBePISA, analyze the interaction between the two proteins and download the summary of the interaction surface (Table 1).Figure 4.A The docking of antibody molecules with antigen molecules
Figure 4.B The docking of antibody molecules with antigen molecules
Notes:(A)Results of molecular docking, Lpp-OmpA-scFv is shown in green and CEACAM5 is shown in orange. (B)Schematic representation of the molecular docking surface. Among them, amino acids that can form hydrogen bonds are shown.
Table 1. Summary of docking complex interface
4. Developmental analysis of antibodies
Background
High concentrations of antibodies often result in their aggregation, which can diminish their activity and potentially induce an immune response5. The calculation of antibody aggregation trend is an indicator to measure the tendency of protein surface amino acid aggregation6. Identifying the protein surface specify regions that are prone to aggregation allows us to use targeted mutation to engineer proteins with higher stability. The aggregation of scFv proteins is also an important factor affecting their developability. We calculated Developability Indices (DI) (Fv) scores for assessing the exploitability of scFv.Methodology
We use the apply forcefield command in the change forcefield tools to apply a CHARMm forcefield to the Lpp-OmpA-scFv protein structure. Then we calculated the solvent accessible surface area (SAA) for the side-chain of each residue in Discovery Studio. In the Tools Explorer, we launched the Macromolecules and Predict Protein Aggregation Tools panel, click ‘Calculate Aggregation Scores’. In the parameter setting interface that appears, we chose the proteins for the calculation and set the Cutoff Radius to 5,7,10. Per atom aggregation propensity score is calculated as the ratio of the actual side-chain SAA to the SAA of side-chain atoms in a fully exposed residue of the same type, multiplied by the residue hydrophobicity, for all residues within the Cutoff Radius of each atom. We use the Developability Index (DI) to analysis anti-CEA scFv protein, which is a measure used to rank the aggregation propensity of related proteins based on their hydrophobic and total charge properties. The DI is calculated from the aggregation propensity score (APS) minus the weighted squared total of the charge as described by the following formula:DI = [APS] - β x [total charge]²
[where APS is the positive aggregation score and is calculated as the summation of all positive atomic aggregation scores.]Results
The color is based on the protein aggregation trend score of the Cutoff radius=10. Atoms with a high protein aggregation trend score will be shown in red, while atoms with a low protein aggregation trend score will be shown in blue (Figure 5.C). In addition, a line chart and a point chart are automatically created (Figure 5.A, Figure 5.B). In the line chart, the amino acid sequence number was taken as the horizontal coordinate, and the protein aggregation trend score was taken as the vertical coordinate. In the point chart, the sequence number of protein aggregation sites was taken as the horizontal coordinate, and the score of protein aggregation trend was taken as the vertical coordinate. Site 1 is shown in red in Figure.5C, and we click the arrow on the right side of the Display Style at the top of the window to better show the details of the amino acids in Figure 5.D. Because anti-CEA scFv protein is an antibody with variable regions, the APS is calculated for these to generate DI(Fv) scores (Table 2).Figure 5.A Protein aggregation propensity scores calculation using CHARMm
Figure 5.B Protein aggregation propensity scores calculation using CHARMm
Figure 5.C Protein aggregation propensity scores calculation using CHARMm
Figure 5.D Protein aggregation propensity scores calculation using CHARMm
Notes: (A). Line chart of protein aggregation tendency score. (B).Point chart of aggregation tendency score.(C).Solvent surfaces stained according to protein aggregation tendency scores.(D).Show detail of site 1.
5. Redesign of antibody molecules
① Humanization of anti-CEA scFv by CDR Grafting
Background
Humanization is the process of engineering an antibody that retains the antigen-binding specificity and affinity of a non-human antibody, but exhibits low human immunogenicity and does not compromise stability7. Humanized scFv can mitigate the human immune response against mouse antibodies. Humanized scFv can reduce human anti-mouse antibody response. This project we aimed to humanize the heavy chain variable (VH) and light chain variable (VL) regions of the anti-CEA scFv to develop a humanized scFv exhibiting minimal immunogenicity while maintaining high binding affinity for CEACAM5.Methodology
The FASTA file of anti-CEA-scFv (without Lpp-OmpA) obtained in the previous steps were prepared. Open the FASTA file in Discovery Studio and load the sequences to annotate in a Sequence Window. Before humanizing scFv, we first identified and annotated domains and Complementarity Determining Regions (CDR) of antibody sequences. Several Hidden Markov Models (HMM) are precomputed for the protocol and used to scan the input protein sequence to identify the antibody domains. Then we selected the sequence, clicked on "Predict Humanizing Mutations" and set the target scores parameter to 0.9. We set both the "Germline and Frequent Residues" and "Machine Learning Prediction" parameters to “True” and started the prediction program. After completion of the run, we checked the residue substitution alignment to contain the various mutations suggested by the scheme, and calculated the mutation stability energy of the mutations.Results
First of all, we predicted sequence properties of anti-CEA scFv protein in Discovery Studio. The tool is perfect to predict the antibody domain, the features can be filtered to retain only those that are within the specified antibody domain or complementarity determining regions (CDRs). The characteristic prediction results of anti-CEA scFv protein sequences are shown in Figure 6.Figure 6. The characteristic prediction results of antibody protein sequence
Notes: The feature and antibody domains and CDRs showing in the graphic. The color of light chain CDR is magenta, the color of heavy chain CDR is red.
Predicting humanized antibody structure requires residues in the framework region of the mutation-variable domain, which can be obtained from the antibody sequence annotation information in our previous step, making these residues more similar to the types found in human antibodies (Figure 7). The interactive residue analysis report for each variable domain contains a legend that defines the colors and styles that represent the residue features in the table. There is a detail legend in the interactive report, which we display by hovering the cursor over each residue cell. The interactive residue analysis report for each variable field contains a legend that defines the color and style to represent the residue characteristics in the table (Table 3, Table 4).Figure 7. Residues substitutions of humanized antibody prediction
Notes: The first sequence is a query annotated using the normal antibody annotation style, and the second sequence is also a query sequence, but cursor residues are indicated by blue caret, hotspots are orange, and the combined Kabat and IMGT CDR regions are magenta or red, depending on the domain. The next three sequences show various mutations in hot spot residues not excluded from substitutions.
Table 3. VH_kappa interactive residue analysis report
Table 4. VL_kappa interactive residue analysis report
Figure 8.A The structure of Best Single Mutations
Figure 8.B The structure of Frequent Residue Substitutions
Figure 8.C The structure of Germline Substitutions
Notes: The protocol contains recommendations for the three types of humanized mutations. (A) Best_Single_Mutations sequence: The germ line sequence, the type of residue found in the ML model, or the frequency > 5%, produces the most stable structure; (B) Frequent_Residue_Substitutions sequence: Where appropriate, a residue of the type most commonly found in that location in human antibodies is replaced; (C) Germline_Substitutions sequence: Germline_substitutions sequence: Mutate residues in appropriate places to find a type in the best-matched human germline sequence.
Table 5. Developability indices of three types of humanized mutations
② Virtual amino acid mutations increase the binding affinity of antibodies
Background
Site-specific amino acid mutation of protein is often used in antibody design. Virtual amino acid mutation based on interaction can be performed on antibody-antigen complex through Discovery Studio software to improve the binding affinity of antibody.Methodology
Firstly, we uploaded the result structure file of the molecular docking. Subsequently, the protein complex was subjected to the CHARMm force field, and amino acid residues within a 3 Å radius around the ligand were selected. The Single Mutations mode was chosen, and all the previously selected amino acids were mutated to ALA. We used the MODELER program to mutate to specified types and optimize the mutated residues, as well as any surrounding residues within the selected radius. We ranked the results of mutational energy to identify the key amino acid residues that affect antibody affinity. These key amino acids were then mutated to the remaining 19 standard amino acids. These mutations were ranked in order of mutation energy, from lowest to highest.Additionally, we calculated the mutation energy to evaluate the effect of mutations on protein stability. The difference in free energy of folding between the mutant structures and the wild-type proteins (ΔΔGmut) represented the energy effect of each mutation on protein stability.
ΔΔGmut = ΔΔGfolding(mutant) - ΔΔGfolding(wild type)
ΔΔGfolding = ΔGfld - ΔGunf
Results
By sorting the results of single mutations in mutation energy, only 5 mutations with the highest and lowest mutation energy were displayed. Mutations in the last five index in the table (LEU196, ARG177, ILE193, TYR190, ASN194) can lead to a decrease in affinity, so these five amino acids are key amino acids for the interaction between antibodies and antigens(Figure 9, Table 6). We marked the positions of these key amino acids in the protein structure (Figure 10). Amino acids in the antibody that resulted in decreased antibody affinity after mutation to ALA are highlighted in red.Figure 9. Mutation energy (Binding) analysis in single mutation of scFv
Notes: Amino acid residues within 3 Å around the CEACAM5 were selected. Stabilizing mutation energy is less than -0.5 kcal/mol; neutral mutation energy is between -0.5 to 0.5 kcal/mol; destabilizing mutation energy is greater than 0.5 kcal/mol.
Table 6. Calculation of mutation energy (Binding) of anti-CEA scFv
Notes: The table in the Summary section of the report lists up to ten mutations with the five lowest energy mutations and five highest energy mutations. The ranking of the mutations was sorted from lowest to highest energy mutations.
Figure 10. Schematic representation of amino acids critical for antibody affinity
Notes: Molecule labeled in yellow represent anti-CEA scFv; Molecule labeled in blue represent CEACAM5. Amino acids in the antibody that resulted in decreased antibody affinity after mutation to ALA are highlighted in red, whereas amino acids with increased antibody affinity after mutation are highlighted in magenta.
In order to obtain more detailed information on how mutations enhance antibody affinity, we performed saturation mutagenesis on the five amino acids mentioned above. From the results (Table 7, Figure 11), we found that the amino acid site ASN195 is crucial for the affinity of anti-CEA scFv. When ASN195 becomes TRP, GLN, PHE, GLU and HIS, the affinity of anti-CEA scFv can be significantly improved.By calculation of mutations on protein stability, we found that, except for ASN195 mutated to GLU and GLN, several other mutations with increased affinity did not affect antibody stability (Table 8).
Figure 11. Mutation energy (Binding) analysis in single mutation of key amino acids in scFv
Table 7. Calculation of mutation energy (Binding) of key amino acids in anti-CEA scFv
Notes: The table in the Summary section of the report lists up to ten mutations with the five lowest energy mutations and five highest energy mutations. The ranking of the mutations are sorted from lowest to highest energy mutations
Table 8. Calculation of mutation energy (Stability) of key amino acids in anti-CEA
scFv
Notes: The table in the Summary section of the report lists up to ten mutations with the five lowest energy mutations and five highest energy mutations. The ranking of the mutations are sorted from lowest to highest energy mutations.
Introduction
Initially, when engineered bacteria expressing scFv was introduced to tumor cells, the bacteria exhibited random distribution alongside the cells. Over time, however, the movement of engineered bacteria expressing scFv became irregular. The experimental findings indicated that the position of the tumor cells remained relatively stable. Additionally, after discussing with the team members of the wet experiment, it was concluded that considering the death and division of bacteria and cells in the model is unnecessary in short time. There was minimal change observed in the number of engineered bacteria expressing scFv and tumor cells within a short duration compared to the baseline.We have obtained the following data by consulting information and calculation: 24-well plate with one well bottom area of 2cm2, and 50,000 tumor cells were distributed more uniformly in the 24-well plate, and on average, a single tumor cell occupies an area of 4 × 108 μm2. Engineered bacteria expressing scFv exhibits irregular movement and has a certain probability of entering cells. We learned from literature review that the radius of tumor cells and normal cells is approximately 8 μm8.
To determine the true distance between engineered bacteria expressing scFv and tumor cells, we randomly generated the horizontal and vertical coordinates of engineered bacteria expressing scFv using a modeling approach. cells number and infected time are deduced from distance. We assumed that engineered bacteria expressing scFv moves irregularly and forms an area over time. Considering the differential invasion rate of engineered bacteria expressing scFv towards normal cells and tumor cells, we can calculate the required time for each engineered bacterium expressing scFv to enter tumor cells by estimating the spreading rate of these bacteria. Based on the model results, we generated a comparative graph depicting the quantity of engineered bacteria expressing scFv infecting normal cells and tumor cells during a certain period of infection time.
Model Establishment
Table 9 Parameters in Salmonella Infection Model
Determination of parameters
Based on the calculations performed by the wet lab experimental members, the initial number of normal and tumor cells seeded in each well of the 24-well plate was close to 50 000. Referring to the modeling of HZAU-China 2018, we learned that the affinity ratio of attenuated Salmonella to infect tumor cells and normal cells was about 1:10, so we used this ratio as the infection affinity of engineered bacteria without scFv expression.To model the killing effect of the engineered bacteria on tumor cells during co-culture, the number of tumor cells and normal cells dying after a period of infection were predicted. Among them, we assumed that 20 attenuated Salmonella typhimurium needed to enter the tumor cells before the cells would rupture and die, based on our observations and estimation in wet experiments. Moreover, if a cell undergoes breakdown and dies, the attenuated Salmonella present inside the cell will also perish. These assumptions derived from the experimental group suggest that when a certain quantity of tumor cells is invaded by attenuated Salmonella, cellular rupture occurs and we can estimate the number of dead cells during the co-culture period.
In wet lab, we performed several experiments with engineered bacteria expressing scFv and infecting tumor cells (see Experiments for the further). We observed the experimental results by fluorescence microscopy and attempted to calculate and count the number of cells and bacteria using ImageJ software. We calculated that the affinity of the scFv-expressing strain to infect tumor cells was about 5.5 times higher than that of the original strain without scFv expression. This data collected by the experimental group will be used in our modeling.
Results and Analysis
The figures (Figure 12.A) presented below show the comparison of our engineered Salmonella infestation in normal cells and tumor cells. From the results of our modeling, over an extended period, attenuated Salmonella with scFv demonstrated more than two dozen times greater infestation in BGC-823 compared to HEK293T (Figure 12.A). Within a five-hour timeframe, over 80% of BGC-823 cells were infected by attenuated Salmonella, which aligns with the data obtained from the experimental group. From our modeling results, it was observed that the number of dead cells didn't change much at the beginning for a while(Figure 12.B). At 50 minutes, there was a surge in the number of dead BGC-823 cells and not much change in the number of HEK293T cells. However, at 150 minutes, the number of dead HEK293T cells started to increase.Similarly, we constructed a difference map of attenuated Salmonella invading tumor cells and normal cells over time from the affinity enhancement data validated in experimental groups. Furthermore, the anti-CEA scFv expression was modeled to evaluate the anti-CEA scFv cytotoxicity against tumor cells. The results showed that attenuated Salmonella with scFv infected tumor cells more easily, resulting in a higher rate of tumor cell death (Figure 12.C & D).
Changes of tumor cells infected by attenuated Salmonella typhimurium over time were displayed in the following figures:
Figure 12.A Number of infected cells after VNP20009(with scFv)infection over time
Figure 12.B Number of death cells after VNP20009(with scFv)infection over time
Figure 12.C Number of infected tumor cells after VNP20009 infection over time
Figure 12.D Number of death tumor cells after VNP20009 infection over time
2 JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold [J]. Nature, 2021, 596(7873): 583-9.
3 MIRDITA M, SCHüTZE K, MORIWAKI Y, et al. ColabFold: making protein folding accessible to all [J]. Nature methods, 2022, 19(6): 679-82.
4 TAHERI M, SARAGOVI U, FUKS A, et al. Self recognition in the Ig superfamily: identification of precise subdomains in carcinoembryonic antigen required for intercellular adhesion [J]. Journal of Biological Chemistry, 2000, 275(35): 26935-43.
5 GRETCH D, SUTER M, STINSKI M. The use of biotinylated monoclonal antibodies and streptavidin affinity chromatography to isolate herpesvirus hydrophobic proteins or glycoproteins [J]. Analytical biochemistry, 1987, 163(1): 270-7.
6 THIAGARAJAN G, SEMPLE A, JAMES J K, et al. A comparison of biophysical characterization techniques in predicting monoclonal antibody stability; proceedings of the MAbs, F, 2016 [C]. Taylor & Francis.
7 SAFDARI Y, FARAJNIA S, ASGHARZADEH M, et al. Antibody humanization methods–a review and update [J]. Biotechnology and Genetic Engineering Reviews, 2013, 29(2): 175-86.
8 Takeishi, N., Imai, Y., Yamaguchi, T. & Ishikawa, T. Flow of a circulating tumor cell and red blood cells in microvessels. Physical Review E 92, 063011 (2015).
1. Consumption of glucose mediated by glucose oxidase
Introduction
Glucose plays a crucial role as an important nutrient in tumor growth. The proliferation of tumor cells mainly relies on aerobic glycolysis, making them more sensitive to changes in glucose concentration. The catalytic glycolysis process of glucose oxidase(GOx)can regulate the tumor microenvironment (TME).
On the one hand, GOx consumes oxygen, further increasing the hypoxia level, which is beneficial for better targeted and selective colonization of engineered bacteria. On the other hand, the generated H2O2 not only significantly enhances tumor oxidative stress but also can be converted into hydroxyl radicals to target cancer cells. The glucose and oxygen reaction, resulting in the formation of gluconic acid and hydrogen peroxide, is catalyzed by glucose oxidase (GOx).
Initially, GOx (FAD) binds with glucose and oxidizes it into gluconic acid. This oxidation process reduces GOx (FAD) to GOx (FADH2). Subsequently, oxygen oxidizes GOx (FADH2) back to GOx (FAD), generating hydrogen peroxide as a byproduct.
According to the data collected from thesises, the glucose concentration in tumor cells is typically 5 mmol/L. The oxygen content is typically 0.8 mmol/L. In reality, the glucose diffuses randomly, so that the glucose concentration can be approximately maintained at 5 mmol/L over a longer period of time. From the image, it is evident that approximately 1 mmol/L of hydrogen peroxide is generated within a time frame of 0.01 seconds. This observation suggests that the tumor cells have the ability to produce an ample quantity of hydrogen peroxide.
We modeled the reaction with the following set of ordinary differential equations.
Chemical kinetic equation
Parameters
Diagram
2. Fenton reation and lipid peroxidation
Introduction
In order to test whether the generated hydrogen peroxide succeeded in mediating efficient generation of polyunsaturated fatty acids (PUFAs), we constructed the following model. In the Fenton reaction, hydrogen peroxide produces hydroxyl radicals in the presence of ferrous ions. Hydroxyl radicals react with polyunsaturated fatty acids to produce polyunsaturated fatty acid free radicals, which destroy the structure of cell membrane and cause ferroptosis of cancer cells.
We built the following model based on the literature and the experimental results from wet lab. It can be found that H2O2 is significantly reduced, and close to 0.2 mmol/L of polyunsaturated fatty acid is generated in a short period of time. In the tumor cells, the reaction products will diffuse, Fe2+, and H2O2 is continuously generated. So the reaction will continue to generate a sufficient amount of polyunsaturated fatty acid. In the following formula, PUFAs are denoted by LH. And peroxidized PUFAs are denoted by L· in the following equation.
Chemical kinetic equation
Parameters
Diagram
According to the modeling results, intracellular glucose can produce polyunsaturated fatty acid free radicals effectively, resulting in lipid peroxidation in tumor cells.
2 Chang, M.-W., Chen, T.-S. & Chern, J.-M. Initial Degradation Rate of p-Nitrophenol in Aqueous Solution by Fenton Reaction. Industrial & Engineering Chemistry Research 47, 8533-8541 (2008).
3 Yin, H., Xu, L. & Porter, N.A. Free radical lipid peroxidation: mechanisms and analysis. Chem Rev 111, 5944-72 (2011).
4 Gou, Y. et al. Degradation of fluoroquinolones in homogeneous and heterogeneous photo-Fenton processes: A review. Chemosphere 270, 129481 (2021).
5 Li, Z. et al. Green rust (GR) and glucose oxidase (GOX) based Fenton-like reaction: Capacity of sustainable release, promoted conversion of glucose through GOX-iron and pH self-adjustment. Environmental Research 208, 112656 (2022).
6 Kremer, M.L. Mechanism of the Fenton reaction. Evidence for a new intermediate, (Physical Chemistry Chemical Physics, 1999).
7 Variations in motility and biofilm formation of Salmonella enterica serovar Typhi. (2014).
Introduction
In our project, our objective was to guarantee the efficacy and safety of the engineered bacteria upon introduction into the human body. We aimed to harness the therapeutic potential of these modified bacteria while minimizing any harm to normal cells. To achieve this goal, a gene circuit (a toxin-antitoxin system) was designed to achieve two functions in the Safety Module of our project. (see “Design” - “TA system” for the further)Figure 1. Schematic representation of the "logical suicide circuit"
The first function was to prevent plasmid loss. The engineered bacteria without the plasmid lost their therapeutic function, and their reduced targeting posed a safety threat to human body. Preventing plasmid loss was also an important measure to prevent horizontal gene transfer. Consequently, we anticipate that the engineered bacteria that have lost the plasmid would be capable of being induced suicide via toxin-antitoxin system.
Figure 2. Schematic representation of pSilence lost
Figure 3. Schematic representation of pFenton lost
The second function was to artificially initiate suicide of the engineered bacteria. It was anticipated that infection with the engineered bacteria would result in shrinkage and regression of the tumor tissue during later stages of treatment, while also increasing the possibility of the engineered bacteria invading normal tissues. Therefore, we aim to administer doxycycline orally to prompt the engineered bacteria to respond accordingly when abnormalities occured in the human body.
Figure 4. Dox induce the suicide of engineered bacteria
The questions then arose. Can these two functions be achieved? How do we select promoters with the right activity to enable the system to perform these two functions? What is the minimum doxycycline concentration required for bacterial suicide?
To verify the feasibility of the project design and optimize the experimental conditions before experiments, we need to study the relationship between the concentrations of the substances in this system.
Therefore, we developed Gene Circuit Model. Through the establishment and solution of ordinary differential equations and adjustment of parameters, we have obtained suggested answers to the questions above.
Model Assumptions
1. Post-transcriptional modifications do not affect the amount of mRNA and its translation rate.2. The effect of incorrect protein folding on protein concentration can be ignored.
3. The rate of transcription by constitutive promoter is constant.
4. Transcription factors are always in a state of activation as long as they are not bound.
5. The direct effect of doxycycline itself on protein translation can be ignored.
6. The threshold concentration of the toxin responsible for the suicide of the engineered bacteria is 0.1 nM.
Basic Mathematics Formulas
1. Basic protein expression modelBasic protein expression model is based on the central dogma to study the changes in protein concentration over time in the cell, which can be used directly to describe constitutive expression. It is the basis for modeling gene circuits.
In the formulas, m represents the concentration of mRNA, and p represents the concentration of protein. ap is used to characterize the promoter activity. rm is the degradation rate of mRNA. bp is used to characterize the binding efficiency of ribosomes to mRNA. rp is the degradation rate of protein.
2. Hill function of repressor
The Hill function can be utilized to depict the production rate of the expressed product Y, which undergoes regulation by transcription factor X. Transcription factor X possesses the capability to function as an activator or repressor. The Hill function of repressor is presented above.
In the formula, X represents the concentration of repressor, whereas Y represents the concentration of the protein subject to regulation by X. The maximum rate of generation β, denotes the rate at which the repressor fails to bind to the promoter entirely, i.e., the value of Y when X = 0. K symbolizes the repression coefficient of the gene, with the half-maximum repression effect being attained when X = K. Lastly, n signifies the Hill coefficient, determining the steepness of the input function.
Results
1. Expression of tetRIn our circuit design, constitutive promoter prpsM was used to regulate the expression of tetR protein. The process of this constitutive expression can be described as follows:
Using the basic protein expression model, we have formulated a set of ordinary differential equations (ODEs) corresponding to the reactions above:
Since the rate of transcription is greater than the rate of translation. We assumed that mRNA concentration is in equilibrium:
Consequently, the ODE was simplified as follow:
Table 1. Parameters in the expression of tetR
Figure 5. The expression of tetR protein
The concentration at equilibrium of tetR was approximately 18 nM, which would be used in the next process - expression of LacI.
2. Expression of LacI
The tetR protein produced in the previous process can repress the promoter pLtetO, which was used to regulate the expression of LacI protein. The reactions were described as follows:
The ODEs corresponding to the reactions above were as follows:
However, considering that it was an expression process regulated by a transcription factor, the Hill function could be applied and would be a better choice. Therefore, referencing the Hill function of repressor, we directly modeled the expression of LacI in a better way with one ODE:
We substituted the equilibrium value of tetR obtained in the previous step into the ODE as the concentration of tetR.
Table 2. Parameters in the expression of LacI
Figure 6. The expression of LacI protein
The concentration at equilibrium of LacI was approximately 0.00289 nM. We would use this value for next step - generation of antitoxin Sok.
3. Antagonism between antitoxin Sok and toxin Hok
The LacI protein produced in the previous process can inhibit the activity of the pLacO promoter, which was used to regulate the expression of Sok RNA. Meanwhile, the toxin Hok protein was expressed under regulation of constitutive promoter ptrp. Additionally, Sok RNA can indirectly repress Hok protein expression by binding to the mRNA of Mok, thus promoting the expression of Hok (In modeling, we ideally equated this to binding to the mRNA of Hok). The reactions were described as follows:
As in the previous steps, we opted to utilize the Hill function as a more suitable representation for the equation pertaining to the concentration of Sok. Therefore, the reactions described above were simulated using the subsequent set of ODEs:
Table 3. Parameters in the antagonism between antitoxin and toxin
Figure 7. The generation of Sok
Figure 8. The generation of Hok mRNA
As the figures showed, the Sok level was eventually equilibrated at approximately 3 nM and the Hok mRNA level was eventually equilibrated at approximately 1 nM when they were not bound.
Figure 9. The concentration of Sok
Figure 10. The concentration of Hok mRNA
Figure 11. The combination of Sok and Hok mRNA
When the combination of Sok and Hok mRNA was took account into the model, Sok level was eventually equilibrated at 2.97 nM and Hok mRNA level was eventually equilibrated at approximately 0.00021 nM. Additionally, the concentration of the complex Sok-Hok mRNA was eventually equilibrated at approximately 0.25 nM.
Figure 12. The expression of Hok
The figure presented above illustrated the expression of Hok in the presence of both plasmids, without the addition of doxycycline. The concentration of Hok was ultimately stabilized at 0.0252 nM, which we deemed inadequate for killing the engineered bacteria. Therefore, under normal circumstances, the toxin-antitoxin system will not affect the survival of the engineered bacteria.
4. Introduction of doxycycline
Through the three processes above, the gene circuit has been basically established. For the function of bacterial suicide, doxycycline was introduced to the system, which mainly influenced the concentration of tetR protein.
Therefore, we took the introduction of doxycycline into account in the models, and adapted the model of tetR expression, the first section described above.
Here were the supplementary equations:
Correspondingly, the adjusted versions of ODEs were as follows:
Table 4. Parameters in the expression of tetR after introducing doxcycline
Figure 13. Binding of doxycycline to tetR
Figure 14. The concentration of doxycycline
The results revealed that doxycycline could bind tetR before it was completely consumed, resulting in a significant reduction in the concentration of functional tetR, nearly reaching zero.
Figure 15. The expression of LacI (after the introduction of doxycycline)
Figure 16. The concentration of Sok (after the introduction of doxycycline)
Figure 17. The concentration of Hok (after the introduction of doxycycline)
The figure above displayed the expression of Hok in the presence of both plasmids and with the addition of doxycycline. In the presence of doxycycline, the concentration of Hok was finally equilibrated at 120 nM, which was high enough to kill the engineered bacteria. (In practice the engineered bacteria would be killed at a certain value before the Hok grew to 120 nM.) Consequently, the introduction of doxycycline can induce the engineered bacteria to commit suicide.
5. Simulation of plasmid loss
When losing pSilence plasmid, we set parameter atetR (Transcription rate constant of tetR mRNA) as zero to simulate this process.
Figure 18. The concentration of tetR (after losing the plasmid pSilence)
Figure 19. The concentration of LacI (after losing the plasmid pSilence)
Figure 20. The concentration of Hok (after losing the plasmid pSilence)
In the simulation of losing plasmid pSilence, the results were similar to the case of adding doxycycline. Due to the lack of tetR mRNA transcription, the concentration of Hok was finally equilibrated at 120 nM. (In practice the engineered bacteria would be killed at a certain value before the Hok grew to 120 nM.) Consequently, in the case of losing pSilence, the engineered bacteria can also commit suicide.
In the case of losing pFenton plasmid, we set parameter βLacI (Maximum generation rate constant of LacI), βSok (Maximum generation rate constant of Sok) and aHok (transcription rate constant of Hok) as 0 to simulate this plasmid loss.
Figure 21. The concentration of LacI (after losing the plasmid pFenton)
Figure 22. The concentration of Sok (after losing the plasmid pFenton)
Figure 23. The concentration of Hok (after losing the plasmid pFenton)
In the simulation of losing plasmid pFenton, the results showed as figures above. Because Hok mRNA was degraded much more slowly than Sok, residual Hok mRNA would translate into Hok. This resulted in a transient increase in the Hok concentration, up to approximately 0.25 nM, which we considered to be enough to kill the engineered bacteria. Consequently, in the case of losing pFenton, the engineered bacteria can also commit suicide.
Analysis
1. Promoter activityIn order to obtain the ideal results, we performed several adjustments of parameters. In this process, we have found some practical advice for the selection of promoters.
At first, the promoter we intended to use for Hok expression was highly active. Thus, in the modeling we assumed a higher value as the value of parameter aHok (transcription rate constant of Hok mRNA). However, the results did not correspond to those expected by our design - under normal conditions, the antitoxin Sok could not effectively inhibit the toxin expression, which meant that the engineered bacteria would be killed by toxins under any circumstances.
After analyzing our results, we have reached the conclusion that the degradation of Sok was rapid and its expression was non-constitutive due to repression by the repressor LacI. In the contrast, the degradation rate of Hok mRNA was significantly lower, and a constitutive promoter was employed. Therefore, it is recommended to use a lower-activity promoter for Hok and a considerably more active promoter for Sok.
2. Doxycycline concentration
According to our modeling results, the strong binding between doxycycline and tetR protein allows for the reduction of tetR concentration to 0, regardless of the amount of doxycycline added. This occured before the depletion of doxycycline concentration to 0 (in the presence of doxycycline). Consequently, if the concentration of doxycycline was sufficiently high to prolong the period of tetR concentration reaching 0, providing enough time for Hok expression to eliminate the engineered bacteria, the concentration of doxycycline was deemed effective.
Eventually, through modeling we drew a conclusion that the minimum effective concentration of doxycycline was approximately 100 nM (0.04 μg/mL). However, because the model had multiple idealized assumptions, it cannot provide accurate quantitative answer, but only provided an approximate range as a reference for follow-up experiments.
3. pFenton Plasmid Loss
In the model assumption, we set a threshold concentration of the toxin responsible for the suicide of the engineered bacteria, which was 0.1 nM. And then the modeling results showed an approximate maximum Hok concentration of 0.25 nM in the case of losing pFenton, which exceeded the threshold we set to a relatively smaller extent. This indicated that the killing efficiency of the system against the engineered bacteria free of pFenton may not reach 100%, which may influence the function of system in maintaining plasmids.
Therefore, the next model, Plasmid Loss Model, tested the effect of maintaining plasmids with a high but less than 100% of killing probability as a complement of this model to perfect the validation of the maintenance plasmid function.
Introduction
The Gene Circuit Model verified that, under ideal condition, the gene circuit can effectively induce the engineered bacteria that have lost the plasmid to undergo suicide.However, there are certain exceptional circumstances that may hinder the system from achieving a 100% killing rate against engineered bacteria that have lost the plasmid. For example, some bacteria may be resistant to toxins because of genetic mutations. Therefore, we can only assume that in practice, this system had a high probability of killing the bacteria that have lost the plasmid.
In a situation with such a high killing rate, what is the practical effect of this system in maintaining plasmids?
In the wet laboratory, it was difficult for us to verify the Safety Module by transferring the plasmid into the desired recipient bacteria. Because T7 RNA polymerase was missing from the genetic background of chassis VNP20009 and we failed to complement it in the plasmid, the engineered strain was unable to initiate T7 expression after our transfer. We transferred the plasmid into BL21 (DE3) competent cells. However, because the bacterial chromosome carried the DE3λ prophage and the fragment carried the genes LacI and sam7I, the background expression of LacI could interfere with our system. In conclusion, the toxin-antitoxin systems are difficult to validate effectively in wet lab experiments.
Therefore, our aim was to utilize modeling to accomplish what experiments could not achieve.
We employed Plasmid Loss Model to simulate and present the practical impact, ultimately confirming the functionality of our Safety system in maintaining plasmid from a macroscopic and realistic standpoint. Our objective involved establishing a correlation between generations and the proportion of cells that retain the plasmid. We used mathematics recursion formulas and computer simulations to demonstrate the practical effect of this system on plasmid maintenance.
Model Assumptions
1. Only the case of plasmid loss due to unequal division of bacteria was considered, since plasmid loss due to other reasons have a low probability.2. Only the loss of plasmid pFenton was considered because our wet-laboratory data indicated a significantly lower loss rate of plasmid pSilence, possibly due to the association of the plasmid skeleton with the replicon.
Notations
The primary notations used in this model are listed in Table 5.Table 5. Notations in the Plasmid Loss Model
Model Establishment
1. In the absence of toxin-antitoxin systemWe first study the case that there is no toxin-antitoxin system.
The subsequent generation of bacteria lacking the pFenton plasmid were still considered pFenton-free. Meanwhile, there was a certain probability (p) of plasmid loss during the division of bacteria with the pFenton. Subsequently, the population of bacteria free of pFenton plasmid and those containing it were quantitatively assessed as follows:
Figure 24. Relation of recurrence
Additionally, plasmid-free bacteria have a smaller burden. This needed to be factored into the model. Consequently, we made the assumption that when plasmid-free bacteria are subjected to one passaged, only 1/m of plasmid-containing bacteria go through the same process. We then adjusted the initial formulas accordingly:
Figure 25. Relation of recurrence considering growth difference
The recursion formula of the proportion of plasmid-containing bacteria was derived as follows:
2. In the presence of toxin-antitoxin system
On the basis of the previous case, when considering the toxin-antitoxin system, the bacteria losing the plasmid had a certain probability (k) of being induced to commit suicide. In this case, the number of the plasmid-free bacteria and the plasmid-containing bacteria were characterized as follows:
Figure 26. Relation of recurrence in the presence of toxin-antitoxin system
The recursion formula of the proportion of plasmid-containing bacteria was derived as follows:
To sum up, the recursion formulas of two cases were as follows:
In the absence of toxin-antitoxin system:
In the presence of toxin-antitoxin system:
Optimizations of parameters
Plasmid Loss Model featured a parameter, denoted as m, which quantified the disparity in growth rates between plasmid-free and plasmid-containing bacteria. This parameter can be determined and derived from bacterial growth data obtained from the wet lab. Additionally, the model employed the number of generations as the unit of time, while the experimental data utilized hours (or minutes) as the unit of time. To ensure effective alignment between the model and experimental data, it was necessary to have bacterial growth data for accurate time unit conversion. Therefore, wet lab provided our model with bacterial growth data to finish the above work.Besides, the data of plasmid loss in the absence of toxin-antitoxin system was also provided by wet lab to optimize parameter p in Plasmid Loss Model.
1. Logistic Equation
Logistic equation is a well-known model for population growth. Moreover, it is a significant model in ecology as it describes continuous population growth.
Within our project, we conducted experiments in wet lab to gather data on bacterial growth with and without plasmids. By analyzing the data distribution and evaluating the suitability of the Logistic model, we fit the experimental data with Logistic expressions to obtain complete growth curves.
The initial logistic equation was as follows:
There are two important parameters in this expression that have specific biological implications: (1) Intrinsic rate of increase r represents the growth ability of the population in an ideal state. (2) Carrying capacity K represents the equilibrium density of a species in a specific environment.
Solving this differential equation, we got the following expression:
We used nonlinear least squares method to fit the data into logistic expression. The visualizations of results were as follows:
Figure 27. BL21 (DE3) Specific Growth Rates (without plasmid)
Figure 28. BL21 (DE3) Specific Growth Rates (with plasmid)
2. Optimization of parameter m
As described above, parameter r in logistic expression represents intrinsic rate of increase, the growth ability of the population in an ideal state. Considering the biological implications of parameter r, we used the ratio of r for plasmid-free bacteria to r for plasmid-containing bacteria as the parameter m in Plasmid Loss Model.
According to our fitting results, the Specific Growth Rate parameter r in BL21 (DE3) (without plasmid) was 1.456, whereas it was 1.42 in BL21 (DE3) (with plasmid). Then parameter m can be calculated as follow:
Table 6. Symbols in the optimization of parameter m
3. Time unit conversion
On the complete growth curve of plasmid-free BL21 (DE3), we took the time difference between OD600 values of 0.2 and 0.4 as the minimum doubling time td. In addition, based on the observations of OD600 values, the exponential growth time was estimated to be 8 h during each 12-h growth period. Therefore, the unit conversion between the number of generations and the time unit of minutes was implemented5.
We calculated the data and found the following time points: (1) When OD600 was 0.2, the time was about 1.120 h. (2) When OD600 was 0.4, the time was about 1.734 h. Therefore, the following was the calculation of the time unit conversion in our model:
Figure 29. The calculation of td
Table 7. Symbols in the time unit conversion
4. Optimization of parameter p
Wet lab experiments also provided our model with plasmid loss data in the absence of toxin-antitoxin system to optimize parameter p. According to the experimental data, the loss of plasmid pFenton was 5.1% at 48 h and 15.4% at 72 h. In the model, that meant curve X(t) - t should cross two points (48, 94.9) and (72, 84.6) in the ideal state.
We used the method of minimizing residual sum of squares (RSS) to find the best fit of this curve to these two points.
The result showed that RSS was the smallest when p = 0.0022 (RSS = 0.00094).
Figure 30. The calculation of p
Table 8. Symbols in the optimization of parameter p
Results and Analysis
We used computer simulation to obtain the following results:Figure 31. Plasmid Maintenance Rates
Table 9. Parameters in the Plasmid Loss Model
As we can see from the figure, after introduction of the toxin-antitoxin system, our plasmid loss rate was greatly reduced. The loss of plasmid pFenton was 0.17% at 48 h and 0.32% at 72 h. Thus, our toxin-antitoxin system had effective function of maintaining plasmid.
We also tried to adjust for multiple different values of k, and observed the sensitivity of the model to parameter k. 0.92, 0.95, 0.98 were chosen for the adjustment.
Figure 32. Plasmid Maintenance Rates under Multiple k Values
Figure 33. Plasmid Maintenance Rates under Multiple k Values (0-100 h amplification)
The figure clearly indicated a significant reduction in plasmid loss rate following the introduction of the toxin-antitoxin system under multiple k values ranging from 0.92 to 0.98. Especially in the range of 0~100 h, there was no significant difference in the strong effect of the maintaining the plasmid with all values we tried. This demonstrated the remarkable effectiveness of the maintenance plasmid function in our toxin-antitoxin system within a certain margin of error. The inclusion of the Safety Module in our project design was justified.
2 Kawano, M. Divergently overlapping cis-encoded antisense RNA regulating toxin-antitoxin systems from E. coli: hok/sok, ldr/rdl, symE/symR. RNA Biol 9, 1520-1527, doi:10.4161/rna.22757 (2012).
3 Baumeister, R., Flache, P., Melefors, Ö., Gabain, A. v. & Hillen, W. Lack of a 5'non-coding region in Tn 1721 encoded tetR mRNA is associated with a low efficiency of translation and a short half-life in Escherichia coil. iNucleic acids research 19, 4595-4600 (1991).
4 Basu, S., Gerchman, Y., Collins, C. H., Arnold, F. H. & Weiss, R. A synthetic multicellular system for programmed pattern formation. Nature 434, 1130-1134 (2005).
5 Wu, K. & Wood, T. K. Evaluation of the hok/sok killer locus for enhanced plasmid stability. Biotechnology and bioengineering 44, 912-921 (1994).
6 Danino, T. et al. Programmable probiotics for detection of cancer in urine. Science translational medicine 7, 289ra284-289ra284 (2015).
In our relentless pursuit of innovation, BNUZH-China has ingeniously fused modeling with the HP activity, resulting in a captivating and one-of-a-kind event known as the "2023 BNUZH Synthetic Biology Programming Competition" (For more details: Education). BNUZH-China has harnessed their creative prowess to seamlessly meld principles of synthetic biology with programming challenges that are as engaging as they are educational. These challenges delve into the fundamental problems faced during the modeling process.
Through this audacious endeavor, we aim to cultivate a deeper understanding and appreciation for the field of synthetic biology. By introducing more individuals to the intriguing world of iGEM's modeling work, we strive to democratize the knowledge and joy associated with synthetic biology modeling. Our hope is that this bold initiative will empower a larger community to embrace the marvels of iGEM and foster a greater appreciation for the endless possibilities of synthetic biology.