Modeling

Moving Image
Image 1

Loading...

Binding Interaction

Introduction
When the engineered bacteria are released into the gut, they secrete the adhesion protein cp19k to help the bacteria to colonize at the gut. The best targets for intestinal colonization are usually the small intestinal villi. However, in the case of celiac disease, the villi of a patient's small intestine are severely damaged. Therefore, the dry lab group turned their attention to the mucosal layer of the small intestine. The mucosal layer of the small intestine, which is primarily composed of a loosely structured mucin layer, provides an opportunity for cp19k to bind because it can be penetrated by resident microbes. [1] Intestinal epithelial cells contain an anchoring protein called type VI collagen, which primarily stabilizes the structure of the extracellular matrix (ECM) of the stroma and the basement membrane. [2] Damaged intestinal villi expose type VI collagen, providing binding sites for engineered bacteria.
Both proteins are suitable candidates for wet assays, but given the challenges associated with mucin secretion and safety, wet assays have focused primarily on collagen VI adhesion experiments. For the dry lab group, a computer modeling method was used to simulate the binding interaction between the two receptor proteins and cp19k. After comparing different datasets and models, we finally decided to conduct a series of combined model studies using Discovery Studio [6] and PPA-Pred (Protein-Protein Affinity Predictor)[3].
Figure 5.1.1 3D structure of cp19k
Figure 5.1.2 3D structure of Collagen VII
Figure 5.1.3 3D structure of Mucin
Prediction of Affinity
Yugandhar and Gromiha (2014) introduced a new method to predict the affinity between two proteins using amino acid sequences. They open sourced the PPA-Pred web server, allowing users to easily predict the affinity between two proteins by uploading protein sequences less than 50 amino acids in length. The server provides feedback to the user in the form of α-G (molar Gibbs free energy) and equilibrium dissociation constant (Kd) values to assess the strength of the affinity. [4]The relationship between these parameters follows the following equation:
(1)
where R is the gas constant (1.987 × 10^−3 kcal mol^−1 K^−1) and T is temperature.
Result analysis:
Express Kd from the perspective of the binding reaction of two proteins:
(2)
In the formula above, [L] represents the concentration of the free ligand, [R] the concentration of the unbound receptor, and [LR] the concentration of the bound ligand-receptor complex.
From this equation, we can observe that when the Kd value of two proteins is lower, it means that the [LR] value is larger, indicating a closer interaction [4]. Therefore, based on the data presented by PPA-Pred, we can assume that mucin exhibits a higher affinity for cp19k compared to collagen VII.
However, in addition to this equation, the algorithm also takes into account other influencing factors such as binding site residues, number of hydrogen bonds, electrostatic energy/bond angles, etc. These factors can be interrelated and may introduce a degree of error. Therefore, the output of these data can only be used as a rough estimate for the discovery studio pro hypothesis and does not accurately describe the affinity between two proteins.
Discovery studio
Discovery Studio is a suite of tools designed for simulating small and large molecules, which encompasses, but is not limited to, ligand design, polymer engineering (protein-protein docking, antibody design, and optimization), quantitative structure-activity relationship (SAR) analysis, and more. The dry lab group primarily utilizes the protein-protein coupling functionality within this suite, such as ZDOCK, to enhance the adhesive properties in the project and provide more reliable data to estimate the adhesive performance of cp19k to the wet lab group.
ZDOCK is a protein-protein interaction docking program based on the fast Fourier transform. It explores all possible spatial binding modes by translating and rotating two proteins, and evaluates each binding model based on an energy-based scoring function.
The main parameters involved in the process of use are:
PoseNum: Number of different protein poses generated during binding site prediction (Tipically larger PoseNum indicates high probability binding sites and poses)
ZDOCK score: The score comes from the ZDOCK algorithm used for protein-protein docking. (A lower ZDOCK score usually indicates a more favorable binding site and pose)
RMSD: Root Mean Square Deviation. Measure the structural difference between the predicted protein binding site and the experimentally determined site.(A smaller RMSD indicates a higher structure similarity between the predicted and lab sites)
Cluster: similar sites in the prediction site clustered together. It is used to identify sites that share similarities.
ZRANK: used to rank the quality of different binding sites. (Lower ZRank scores usually indicate more promising binding sites).
E-RDock: estimates the binding energy or the interaction energy between the protein and the ligand in a given docking pose. The value is used to assess the quality of a docking prediction. (Lower E_Rdock indicates more favorable or stable binding poses).
ZDOCK Analysis Workflow:
Rigid sampling around the proposed binding region:
ZDOCK was used to generate 2000 conformations around the putative binding region, excluding residues that do not belong to the region. Generally, the default parameters of ZDOCK and the 10 ° rotation sampling interval are used. Generally speaking, reducing the rotation angle can improve prediction accuracy but will significantly increase the running time of the algorithm.
Scoring based on DFIRE energy function:
The full atomic statistical energy function based on DFIRE is used to calculate the binding affinity of the ZDOCK-generated conformations. At least one pair of heavy atoms should be within 4.5 Å of each other.
Rigid sampling around the proposed binding region:
Conformations ordered by DFIRE energy. The first cluster includes structures with the lowest energy root-mean-square distance (RMSD) within the structure. RMSD is based on Ca atoms and is calculated using only residues belonging to the putative binding region. For targets with unknown binding regions, the total RMSD was calculated. The next clustering is based on the structures around the lowest-energy structure of the remaining conformations. This process is repeated until the first 15 clusters are obtained. For each cluster, the conformation with the lowest DFIRE energy was selected as the representative structure for further analysis.
Combined with the explanation of the above parameters, we can roughly judge the meaning expressed in the table below.
Analysis - 1:
Based on the intrinsic definition of these parameters, the fluorescently labeled Pose1789 was the best binding site among all predictions. At this site, both proteins had higher affinity and binding capacity compared to other sites tested. strength. Therefore, in general in ZDOCK results, most users will choose to output the first ten lines to judge whether the selection of the optimal site is reasonable enough. Most of the parameter values in the Pose1789 data in Table 1 meet the conditions for it to become the optimal position.
In addition to ZRANK, it also uses scoring functions for calculations. This scoring function linearly weights the variables (van der Waals forces, electrostatic short-range repulsion, electrostatic long-range repulsion, etc.). The following equation describes ZRank:
(3)
The ZRank score is calculated using a scoring function that incorporates various energy terms, including the following.
- EvdW_a: Van der Waals attractive energy.
- EvdW_r: Van der Waals repulsive energy.
- Eelec_sra: Short-range electrostatic attractive energy.
- Eelec_srr: Short-range electrostatic repulsive energy.
- Eelec_lra: Long-range electrostatic attractive energy.
- Eelec_lrr: Long-range electrostatic repulsive energy.
- Eds: Solvation energy.
These energy terms are used to evaluate the interaction between molecules and determine the overall ZRank score, which serves as an important metric in assessing the binding affinity and quality of protein-protein interactions.
Figure 5.1.4 3D structure of the binding protein of cp19k and mucin
Figure 5.1.5 3D structure of the binding protein of cp19k and Collagen VII.
Analysis -2:
Based on the characteristics of the collagen VII structure, it is evident from these two images (Figure 5.1.3/5.1.2) that there is no significant chemical link between cp19k and mucin. On the contrary, cp19k and Collagen VI are clearly linked through Collagen VI's NC-1 and NC-2 domains (both α chains), which is almost entirely contrary to the data from PPA-Pred. This has been well validated in the wet lab data, where the combination of cp19k + collagen significantly enhances adhesion (p < 0.05). The next step will involve investigating the reasons for this unexpected result:
  • In the dry lab group, when Discovery Studio was used, the mucin results could not be optimized, and thus we lost some data that could be used for comparison. Based on previous questions from users of Discovery Studio, we suspect that it may be due to the large size of mucin or significant conformational changes in mucin, making it challenging for the software to predict the binding site between mucin and cp19k. It is not possible to conclude the absence of a reaction solely based on this.
  • During the calculation of ZRank, numerous mechanical factors are considered. The elongated side chains between two molecules inadvertently increase the distance between them. According to the equation, an increase in distance may lead to an increase in the Score value.
    Figure 5.1.5 Interface structure of the binding protein of cp19k and Collagen VII
    Solvent Accessibility:The total surface area reached by each atom or residue in a molecule in water or solvent.
    Ligand Contact Surface Area:Contact area between the ligand and the receptor protein
    Ligand polar contact Surface Area:The contact surface area between the ligand and the receptor in terms of polar interaction
  • For the Pose1789 binding site, the binding of the ligand to the receptor protein has a higher total solvent accessibility (which may be caused by having more hydrogen bonds), making it easier to interact with water, and in the intestinal environment binding may be affected. The contact area and polar contact area between the two are approximately doubled, with a higher ligand contact area associated with tighter binding and lower dissociation constants. However, its polar contact area is lower than this, which also means that there are fewer polar interactions between the two to maintain binding stability.
Summary
Although the dry lab group has addressed the unexpected results with ZRank, there are still some issues that persist. For example, due to the inefficiency of Discovery Studio, the dry lab group could not finalize the data comparison. As these software tools have longer runtimes, especially when dealing with relatively large protein structures, the modeling team is still in the process of exploring alternative ZDOCK software to generate more data to enhance our understanding of the two receptor proteins.
The modeling team is also working on translating the data generated by these different ZDOCK software tools. This is necessary because different software tools operate on varying principles. One of the tools reacts primarily to amino acid sequences, much like Discovery Studio, but its computational approach differs from the former. Currently, it is not straightforward to compare the results from these tools. This is a challenge that we will discuss further in our subsequent work.

Mathematical

Abstract:
In this model, we tried to simulate the process for our engineered bacteria from successful colonization on the duodenum to multiplication until the lowest bacterial concentration required to treat the disease to be able to establish a direct data relationship between the initial bacterial concentration we introduce and the bacterial concentration that has been colonized, multiplied, and would finally achieve to be effective for the disease.
Main study variables:
In this model, since we only consider bacterial growth kinetic models over a short time frame, and to simplify this model, we only study and discuss the concentration of adhesion proteins, which are the most important factors affecting the growth rate of bacteria in this scenario. Nutrient concentrations and other competitive responses of the intestinal flora are not considered in this model. We divided our project into three parts, and we linked these parts with the concentration of adhesion proteins. Because effective potent proteases can be further secreted to help us relieve inflammation only if bacteria can colonize the intestinal wall of the duodenum depending on adhesion proteins. Ideally, we tried to integrate three quantitative relationship expressions during three linked periods which we would show below. (To be noticed that in this model we don't use concentration of adhesion proteins as the main variable. We choose to use the length of a villus and the time as main variables to fit our model better.)
Main study models:
(1) Fluid mechanics and mass flux model
To simulate the adhesion effect of engineering bacteria on the lumen wall, we borrowed the concept of mass flux of adhesion proteins to the surface of intestine villi. One reason for this is that even though celiac disease causes damage to the intestinal surface, there is still a considerable surface area of villi that can serve as adhesion sites for engineered bacteria. Another reason is that we only consider one intestinal plane, that is two-dimensional structure as the research object in this model. Therefore, the amount of material passing through the unit plane per unit time, that is, the mass flux, is the goal we need to consider. (Here we default that villi are a kind of cylinder structure and all intestinal villi are the same width and length. And we ignore the damage inflammation does to the intestinal lining. In this model we assume that there are no gaps between the villi, which are neatly arranged on the intestinal lining.)

In these formulas, J is the mass flux (g.s-1.m-2), q is the groundwater flux (Darcy velocity) (m3.m-2. s), Ct is the current concentration of adhesion proteins in groundwater (g.m-3), Qf is the volume flow rate (m-3. s-1), h is the length of villi (m), l is the width of villi (m), L is the length of the duodenum (m), t is the time (s), V is the volume of liquid. In this model, we mainly consider the fluid in a cylindrical range around the inner wall plane of the intestinal tract which we are studying to simplify the formula (m3); A is the cross-sectional area of a villus and we default flux from different angles (m2). In this model, we assume that each villus is a standard cylinder and has a rigid structure, which is not prone to deformation.


In these formulas, the absorption flux is integrated along the surface of the villi and the range is the length L of the duodenum. That is, the rate of mass flow to the whole plane of the wall is obtained. Further quadratic integration with time can obtain the change in the total mass of material flow towards the plane with time. m is the mass of material flowing to the plane. (g.s.m-1) Ø has no unit, we use it in the formulas to simplify our formulas.
(2) Adhesion efficiency model
This model is about the quantitative relationship between the concentration of engineered bacteria, the type of adhesion proteins, and the ability of engineered bacteria to adsorb to the duodenal wall. We wanted to integrate this model with the mass flux model we obtained earlier to achieve our simulation of what we call "adhesion flux".
In these formulas, θ is the degree of coverage of the adhesion reagent adsorbed to the surface (indicating the proportion of the surface covered) from the Langmuir model; k is the adhesion coefficient of this kind of adhesion protein, ρ is the density of adhesion protein (g.m-3). In this model, since we consider a short time interval, we assume that there will be no adhesion between bacteria and clumps. Therefore, the coverage rate of adhesion proteins represents the success rate of bacterial colonization on the duodenal wall.
(3) Bacterial growth kinetics model:
Because we introduced genes related to sticky proteins when we gene-edited engineered bacteria. Bacteria can colonize the duodenal wall and grow and multiply. Due to the stable chemical environment associated with the duodenum, and the ability to effectively inhibit engineered bacteria, it is always under conditions below its maximum growth rate for growth and reproduction. Therefore, we base this model on the chemostat model, and with appropriate adjustments, we try to simulate the growth and reproduction of engineered bacteria at the human duodenum.

In this formula, N(t) is the number of cells at a certain time; N0 is the number of cells in origin; r is the growth rate (s-1). Because C(t) is the concentration of adhesion proteins in the groundwater, those bacteria that couldn't be attached to the villi can still produce adhesion proteins. So, we default to the exponential growth of adhesion proteins. This formula links the number of original concentrations of bacteria that we introduced and the number of bacteria at each certain point in time. We could use this formula to find the lowest number of bacteria we need for effective inflammation treatment.
Main variables in this model:
1) J: mass flux (g.s-1.m-2)
2) q: groundwater flux (Darcy velocity) (m3.m-2. s)
3) Ct: current concentration of adhesion proteins in groundwater (g.m-3)
4) Qf: volume flow rate (m-3. s-1)
5) h: length of a villi (m)
6) l: width of a villi (m)
7) L: length of the duodenum (m)
8) t: time (s)
9) V: volume of liquid (m3)
10) A: cross-sectional area of a villus (m2)
11) m: mass of material flowing to the plane (g.s.m-1)
12) θ: degree of coverage of the adhesion reagent adsorbed to the surface
13) k: adhesion coefficient
14) ρ: density of adhesion protein (g.m-3)
15) N(t): the number of cells at a certain time
16) N0: the number of cells in origin
17) r: growth rate (s-1)

References

  • Grondin, J. A., Kwon, Y. H., Far, P. M., Haq, S., & Khan, W. I. (2020). Mucins in Intestinal Mucosal Defense and Inflammation: Learning From Clinical and Experimental Studies. Frontiers in immunology, 11, 2054.
    https://doi.org/10.3389/fimmu.2020.02054
  • Owczarzy, A., Kurasiński, R., Kulig, K., Rogóż, W., Szkudlarek, A., & Maciążek-Jurczyk, M. (2020). Collagen-structure, properties and application. Engineering of biomaterials, 23(156).
  • K. Yugandhar, M. Michael Gromiha, Protein–protein binding affinity prediction from amino acid sequence, Bioinformatics, Volume 30, Issue 24, December 2014, Pages 3583–3589,
    https://doi.org/10.1093/bioinformatics/btu580
  • Hunter, S. A., & Cochran, J. R. (2016). Cell-Binding Assays for Determining the Affinity of Protein-Protein Interactions: Technologies and Considerations. Methods in enzymology, 580, 21–44.
    https://doi.org/10.1016/bs.mie.2016.05.002
  • Mortensen, J. H., & Karsdal, M. A. (2016). Type VII collagen. In Biochemistry of collagens, laminins and elastin (pp. 57-60). Academic Press.
  • Biovia discovery studio
    https://www.3ds.com/products-services/biovia/products/molecular-modeling-simulation/biovia-discovery-studio/
  • Zhang, C., Liu, S. and Zhou, Y. (2005), Docking prediction using biological information, ZDOCK sampling technique, and clustering guided by the DFIRE statistical energy function. Proteins, 60: 314-318.
    https://doi.org/10.1002/prot.20576
  • Chen, R. and Weng, Z. (2003), A novel shape complementarity scoring function for protein-protein docking. Proteins, 51: 397-408.
    https://doi.org/10.1002/prot.10334
  • Pierce, B. and Weng, Z. (2007), ZRANK: Reranking protein docking predictions with an optimized energy function. Proteins, 67: 1078-1086.
    https://doi.org/10.1002/prot.21373
  • Grondin, J. A., Kwon, Y. H., Far, P. M., Haq, S., & Khan, W. I. (2020). Mucins in Intestinal Mucosal Defense and Inflammation: Learning From Clinical and Experimental Studies. Frontiers in immunology, 11, 2054.
    https://doi.org/10.3389/fimmu.2020.02054
  • Owczarzy, A., Kurasiński, R., Kulig, K., Rogóż, W., Szkudlarek, A., & Maciążek-Jurczyk, M. (2020). Collagen-structure, properties and application. Engineering of biomaterials, 23(156).
  • K. Yugandhar, M. Michael Gromiha, Protein–protein binding affinity prediction from amino acid sequence, Bioinformatics, Volume 30, Issue 24, December 2014, Pages 3583–3589,
    https://doi.org/10.1093/bioinformatics/btu580
  • Hunter, S. A., & Cochran, J. R. (2016). Cell-Binding Assays for Determining the Affinity of Protein-Protein Interactions: Technologies and Considerations. Methods in enzymology, 580, 21–44.
    https://doi.org/10.1016/bs.mie.2016.05.002