Binding Interaction
Introduction
When the engineered bacteria are released into the gut, they secrete the
adhesion protein
cp19k to help the bacteria to
colonize at the gut. The best targets for intestinal colonization are usually the small
intestinal villi.
However, in
the case of celiac disease, the villi of a patient's small intestine are severely damaged.
Therefore, the dry
lab group turned their attention to the mucosal layer of the small intestine. The
mucosal layer of the
small
intestine, which is primarily composed of a loosely structured mucin layer, provides an
opportunity for cp19k to
bind
because it can be penetrated by resident microbes. [1] Intestinal epithelial cells contain an
anchoring protein
called
type VI collagen, which primarily stabilizes the structure of the extracellular matrix (ECM) of
the stroma and
the
basement membrane. [2] Damaged intestinal villi expose type VI collagen, providing binding sites
for engineered
bacteria.
Both proteins are suitable candidates for wet assays, but given the
challenges associated
with mucin secretion and
safety, wet assays have focused primarily on collagen VI adhesion experiments. For the dry
lab group, a
computer modeling method was used to simulate the binding interaction between the two receptor
proteins and
cp19k. After
comparing different datasets and models, we finally decided to conduct a series of combined
model studies using
Discovery Studio [6] and PPA-Pred (Protein-Protein Affinity Predictor)[3].
Figure 5.1.1 3D structure of cp19k
Figure 5.1.2 3D structure of Collagen VII
Figure 5.1.3 3D structure of Mucin
Prediction of Affinity
Yugandhar and Gromiha (2014) introduced a new method to predict the
affinity between two
proteins using amino acid
sequences. They open sourced the PPA-Pred web server, allowing users to easily predict the
affinity between two
proteins
by uploading protein sequences less than 50 amino acids in length. The server provides feedback
to the user in
the form
of α-G (molar Gibbs free energy) and equilibrium dissociation constant (Kd) values to assess the
strength of the
affinity. [4]The relationship between these parameters follows the following equation:
(1)
where R is the gas constant (1.987 × 10^−3
kcal mol^−1 K^−1)
and T is temperature.
Result analysis:
Express Kd from the perspective of the binding reaction of two proteins:
(2)
In the formula above, [L] represents the
concentration of the
free ligand, [R] the concentration of the unbound
receptor, and [LR] the concentration of the bound ligand-receptor complex.
From this equation, we can observe that when the Kd value of two proteins
is lower, it
means that the [LR] value is
larger, indicating a closer interaction [4]. Therefore, based on the data presented by PPA-Pred,
we can assume
that
mucin exhibits a higher affinity for cp19k compared to collagen VII.
However, in addition to this equation, the algorithm also takes into
account other
influencing factors such as binding
site residues, number of hydrogen bonds, electrostatic energy/bond angles, etc. These factors
can be
interrelated and
may introduce a degree of error. Therefore, the output of these data can only be used as a rough
estimate for
the
discovery studio pro hypothesis and does not accurately describe the affinity between two
proteins.
Discovery studio
Discovery Studio is a suite of tools designed for simulating small and
large molecules,
which encompasses, but is not
limited to, ligand design, polymer engineering (protein-protein docking, antibody design, and
optimization),
quantitative structure-activity relationship (SAR) analysis, and more. The dry lab
group primarily
utilizes the
protein-protein coupling functionality within this suite, such as ZDOCK, to enhance the adhesive
properties in
the
project and provide more reliable data to estimate the adhesive performance of cp19k to the wet
lab
group.
ZDOCK is a protein-protein interaction docking program based on the fast
Fourier transform.
It explores all possible
spatial binding modes by translating and rotating two proteins, and evaluates each binding model
based on an
energy-based scoring function.
The main parameters involved in the process of use are:
PoseNum: Number of different protein poses generated during binding site prediction (Tipically larger PoseNum indicates high probability binding sites and poses)
ZDOCK score: The score comes from the ZDOCK algorithm used for protein-protein docking. (A lower ZDOCK score usually indicates a more favorable binding site and pose)
RMSD: Root Mean Square Deviation. Measure the structural difference between the predicted protein binding site and the experimentally determined site.(A smaller RMSD indicates a higher structure similarity between the predicted and lab sites)
Cluster: similar sites in the prediction site clustered together. It is used to identify sites that share similarities.
ZRANK: used to rank the quality of different binding sites. (Lower ZRank scores usually indicate more promising binding sites).
E-RDock: estimates the binding energy or the interaction energy between the protein and the ligand in a given docking pose. The value is used to assess the quality of a docking prediction. (Lower E_Rdock indicates more favorable or stable binding poses).
PoseNum: Number of different protein poses generated during binding site prediction (Tipically larger PoseNum indicates high probability binding sites and poses)
ZDOCK score: The score comes from the ZDOCK algorithm used for protein-protein docking. (A lower ZDOCK score usually indicates a more favorable binding site and pose)
RMSD: Root Mean Square Deviation. Measure the structural difference between the predicted protein binding site and the experimentally determined site.(A smaller RMSD indicates a higher structure similarity between the predicted and lab sites)
Cluster: similar sites in the prediction site clustered together. It is used to identify sites that share similarities.
ZRANK: used to rank the quality of different binding sites. (Lower ZRank scores usually indicate more promising binding sites).
E-RDock: estimates the binding energy or the interaction energy between the protein and the ligand in a given docking pose. The value is used to assess the quality of a docking prediction. (Lower E_Rdock indicates more favorable or stable binding poses).
ZDOCK Analysis Workflow:
Rigid sampling around the proposed binding
region:
ZDOCK was used to generate 2000 conformations around the putative binding region, excluding residues that do not belong to the region. Generally, the default parameters of ZDOCK and the 10 ° rotation sampling interval are used. Generally speaking, reducing the rotation angle can improve prediction accuracy but will significantly increase the running time of the algorithm.
ZDOCK was used to generate 2000 conformations around the putative binding region, excluding residues that do not belong to the region. Generally, the default parameters of ZDOCK and the 10 ° rotation sampling interval are used. Generally speaking, reducing the rotation angle can improve prediction accuracy but will significantly increase the running time of the algorithm.
Scoring based on DFIRE energy function:
The full atomic statistical energy function based on DFIRE is used to calculate the binding affinity of the ZDOCK-generated conformations. At least one pair of heavy atoms should be within 4.5 Å of each other.
The full atomic statistical energy function based on DFIRE is used to calculate the binding affinity of the ZDOCK-generated conformations. At least one pair of heavy atoms should be within 4.5 Å of each other.
Rigid sampling around the proposed binding
region:
Conformations ordered by DFIRE energy. The first cluster includes structures with the lowest energy root-mean-square distance (RMSD) within the structure. RMSD is based on Ca atoms and is calculated using only residues belonging to the putative binding region. For targets with unknown binding regions, the total RMSD was calculated. The next clustering is based on the structures around the lowest-energy structure of the remaining conformations. This process is repeated until the first 15 clusters are obtained. For each cluster, the conformation with the lowest DFIRE energy was selected as the representative structure for further analysis.
Conformations ordered by DFIRE energy. The first cluster includes structures with the lowest energy root-mean-square distance (RMSD) within the structure. RMSD is based on Ca atoms and is calculated using only residues belonging to the putative binding region. For targets with unknown binding regions, the total RMSD was calculated. The next clustering is based on the structures around the lowest-energy structure of the remaining conformations. This process is repeated until the first 15 clusters are obtained. For each cluster, the conformation with the lowest DFIRE energy was selected as the representative structure for further analysis.
Combined with the explanation of the above parameters, we can roughly judge
the meaning
expressed in the table below.
Analysis - 1:
Based on the intrinsic definition of these parameters, the fluorescently labeled Pose1789 was the best binding site among all predictions. At this site, both proteins had higher affinity and binding capacity compared to other sites tested. strength. Therefore, in general in ZDOCK results, most users will choose to output the first ten lines to judge whether the selection of the optimal site is reasonable enough. Most of the parameter values in the Pose1789 data in Table 1 meet the conditions for it to become the optimal position.
Based on the intrinsic definition of these parameters, the fluorescently labeled Pose1789 was the best binding site among all predictions. At this site, both proteins had higher affinity and binding capacity compared to other sites tested. strength. Therefore, in general in ZDOCK results, most users will choose to output the first ten lines to judge whether the selection of the optimal site is reasonable enough. Most of the parameter values in the Pose1789 data in Table 1 meet the conditions for it to become the optimal position.
In addition to ZRANK, it also uses scoring functions for calculations. This
scoring
function linearly weights the
variables (van der Waals forces, electrostatic short-range repulsion, electrostatic long-range
repulsion, etc.).
The
following equation describes ZRank:
(3)
The ZRank score is calculated using a
scoring function that incorporates various energy terms, including the following.
- EvdW_a: Van der Waals attractive energy.
- EvdW_r: Van der Waals repulsive energy.
- Eelec_sra: Short-range electrostatic attractive energy.
- Eelec_srr: Short-range electrostatic repulsive energy.
- Eelec_lra: Long-range electrostatic attractive energy.
- Eelec_lrr: Long-range electrostatic repulsive energy.
- Eds: Solvation energy.
- EvdW_a: Van der Waals attractive energy.
- EvdW_r: Van der Waals repulsive energy.
- Eelec_sra: Short-range electrostatic attractive energy.
- Eelec_srr: Short-range electrostatic repulsive energy.
- Eelec_lra: Long-range electrostatic attractive energy.
- Eelec_lrr: Long-range electrostatic repulsive energy.
- Eds: Solvation energy.
These energy terms are used to evaluate the interaction between molecules
and determine the
overall ZRank score, which
serves as an important metric in assessing the binding affinity and quality of protein-protein
interactions.
Figure 5.1.4 3D structure of the binding
protein of cp19k and mucin
Figure 5.1.5 3D structure of the binding
protein of cp19k and Collagen VII.
Analysis -2:
Based on the characteristics of the collagen VII structure, it is evident from these two images (Figure 5.1.3/5.1.2) that there is no significant chemical link between cp19k and mucin. On the contrary, cp19k and Collagen VI are clearly linked through Collagen VI's NC-1 and NC-2 domains (both α chains), which is almost entirely contrary to the data from PPA-Pred. This has been well validated in the wet lab data, where the combination of cp19k + collagen significantly enhances adhesion (p < 0.05). The next step will involve investigating the reasons for this unexpected result:
Based on the characteristics of the collagen VII structure, it is evident from these two images (Figure 5.1.3/5.1.2) that there is no significant chemical link between cp19k and mucin. On the contrary, cp19k and Collagen VI are clearly linked through Collagen VI's NC-1 and NC-2 domains (both α chains), which is almost entirely contrary to the data from PPA-Pred. This has been well validated in the wet lab data, where the combination of cp19k + collagen significantly enhances adhesion (p < 0.05). The next step will involve investigating the reasons for this unexpected result:
- In the dry lab group, when Discovery Studio was used, the mucin results could not be optimized, and thus we lost some data that could be used for comparison. Based on previous questions from users of Discovery Studio, we suspect that it may be due to the large size of mucin or significant conformational changes in mucin, making it challenging for the software to predict the binding site between mucin and cp19k. It is not possible to conclude the absence of a reaction solely based on this.
- During the calculation of ZRank, numerous mechanical factors are considered. The
elongated side
chains between two
molecules inadvertently increase the distance between them. According to the
equation, an
increase in distance may lead
to an increase in the Score value.
Figure 5.1.5 Interface structure of the binding protein of cp19k and Collagen VIISolvent Accessibility:The total surface area reached by each atom or residue in a molecule in water or solvent.
Ligand Contact Surface Area:Contact area between the ligand and the receptor protein
Ligand polar contact Surface Area:The contact surface area between the ligand and the receptor in terms of polar interaction - For the Pose1789 binding site, the binding of the ligand to the receptor protein has a higher total solvent accessibility (which may be caused by having more hydrogen bonds), making it easier to interact with water, and in the intestinal environment binding may be affected. The contact area and polar contact area between the two are approximately doubled, with a higher ligand contact area associated with tighter binding and lower dissociation constants. However, its polar contact area is lower than this, which also means that there are fewer polar interactions between the two to maintain binding stability.
Summary
Although the dry lab group has addressed the unexpected
results with
ZRank, there are still some issues that
persist. For example, due to the inefficiency of Discovery Studio, the dry lab
group could not
finalize the
data comparison. As these software tools have longer runtimes, especially when dealing
with relatively
large protein
structures, the modeling team is still in the process of exploring alternative ZDOCK
software to
generate more data to enhance our understanding of the two receptor proteins.
The modeling team is also working on translating the data generated
by these
different ZDOCK software tools. This is
necessary because different software tools operate on varying principles. One of the
tools reacts
primarily to amino
acid sequences, much like Discovery Studio, but its computational approach differs from
the former.
Currently, it is not
straightforward to compare the results from these tools. This is a challenge that we
will discuss
further in our subsequent work.
Mathematical
Abstract:
In this model, we tried to simulate the process for our engineered
bacteria
from successful colonization on the duodenum
to multiplication until the lowest bacterial concentration required to treat the disease to
be
able to establish a
direct data relationship between the initial bacterial concentration we introduce and the
bacterial concentration that
has been colonized, multiplied, and would finally achieve to be effective for the disease.
Main study variables:
In this model, since we only consider bacterial growth kinetic models
over
a short time frame, and to simplify this
model, we only study and discuss the concentration of adhesion proteins, which are the most
important factors affecting
the growth rate of bacteria in this scenario. Nutrient concentrations and other competitive
responses of the intestinal
flora are not considered in this model. We divided our project into three parts, and we
linked
these parts with the
concentration of adhesion proteins. Because effective potent proteases can be further
secreted
to help us relieve
inflammation only if bacteria can colonize the intestinal wall of the duodenum depending on
adhesion proteins. Ideally,
we tried to integrate three quantitative relationship expressions during three linked
periods
which we would show below.
(To be noticed that in this model we don't use concentration of adhesion proteins as the
main
variable. We choose to use
the length of a villus and the time as main variables to fit our model better.)
Main study models:
(1) Fluid mechanics and mass flux model
To simulate the adhesion effect of engineering bacteria on the lumen
wall,
we borrowed the concept of mass flux of
adhesion proteins to the surface of intestine villi. One reason for this is that even though
celiac disease causes
damage to the intestinal surface, there is still a considerable surface area of villi that
can
serve as adhesion sites
for engineered bacteria. Another reason is that we only consider one intestinal plane, that
is
two-dimensional structure
as the research object in this model. Therefore, the amount of material passing through the
unit
plane per unit time,
that is, the mass flux, is the goal we need to consider. (Here we default that villi are a
kind
of cylinder structure
and all intestinal villi are the same width and length. And we ignore the damage
inflammation
does to the intestinal
lining. In this model we assume that there are no gaps between the villi, which are neatly
arranged on the intestinal
lining.)
In these formulas, J is the mass flux (g.s-1.m-2), q is the groundwater
flux (Darcy velocity) (m3.m-2. s), Ct is the
current concentration of adhesion proteins in groundwater (g.m-3), Qf is the volume flow
rate
(m-3. s-1), h is the
length of villi (m), l is the width of villi (m), L is the length of the duodenum (m), t is
the
time (s), V is the
volume of liquid. In this model, we mainly consider the fluid in a cylindrical range around
the
inner wall plane of the
intestinal tract which we are studying to simplify the formula (m3); A is the
cross-sectional
area of a villus and we
default flux from different angles (m2). In this model, we assume that each villus is a
standard
cylinder and has a
rigid structure, which is not prone to deformation.
In these formulas, the absorption flux is integrated along the surface
of
the villi and the range is the length L of the
duodenum. That is, the rate of mass flow to the whole plane of the wall is obtained. Further
quadratic integration with
time can obtain the change in the total mass of material flow towards the plane with time. m
is
the mass of material
flowing to the plane. (g.s.m-1) Ø has no unit, we use it in the formulas to simplify our
formulas.
(2) Adhesion efficiency model
This model is about the quantitative relationship between the
concentration
of engineered bacteria, the type of adhesion
proteins, and the ability of engineered bacteria to adsorb to the duodenal wall. We wanted
to
integrate this model with
the mass flux model we obtained earlier to achieve our simulation of what we call "adhesion
flux".
In these formulas, θ is the degree of coverage of the adhesion reagent
adsorbed to the surface (indicating the
proportion of the surface covered) from the Langmuir model; k is the adhesion coefficient of
this kind of adhesion
protein, ρ is the density of adhesion protein (g.m-3). In this model, since we consider a
short
time interval, we assume
that there will be no adhesion between bacteria and clumps. Therefore, the coverage rate of
adhesion proteins represents
the success rate of bacterial colonization on the duodenal wall.
(3) Bacterial growth kinetics model:
Because we introduced genes related to sticky proteins when we
gene-edited
engineered bacteria. Bacteria can colonize
the duodenal wall and grow and multiply. Due to the stable chemical environment associated
with
the duodenum, and the
ability to effectively inhibit engineered bacteria, it is always under conditions below its
maximum growth rate for
growth and reproduction. Therefore, we base this model on the chemostat model, and with
appropriate adjustments, we try
to simulate the growth and reproduction of engineered bacteria at the human duodenum.
In this formula, N(t) is the number of cells at a certain time; N0 is
the
number of cells in origin; r is the growth
rate (s-1).
Because C(t) is the concentration of adhesion proteins in the groundwater, those bacteria
that
couldn't be attached to
the villi can still produce adhesion proteins. So, we default to the exponential growth of
adhesion proteins.
This formula links the number of original concentrations of bacteria that we introduced and
the
number of bacteria at
each certain point in time. We could use this formula to find the lowest number of bacteria
we
need for effective
inflammation treatment.
Main variables in this model:
1) J: mass flux (g.s-1.m-2)
2) q: groundwater flux (Darcy velocity) (m3.m-2. s)
3) Ct: current concentration of adhesion proteins in groundwater (g.m-3)
4) Qf: volume flow rate (m-3. s-1)
5) h: length of a villi (m)
6) l: width of a villi (m)
7) L: length of the duodenum (m)
8) t: time (s)
9) V: volume of liquid (m3)
10) A: cross-sectional area of a villus (m2)
11) m: mass of material flowing to the plane (g.s.m-1)
12) θ: degree of coverage of the adhesion reagent adsorbed to the surface
13) k: adhesion coefficient
14) ρ: density of adhesion protein (g.m-3)
15) N(t): the number of cells at a certain time
16) N0: the number of cells in origin
17) r: growth rate (s-1)
2) q: groundwater flux (Darcy velocity) (m3.m-2. s)
3) Ct: current concentration of adhesion proteins in groundwater (g.m-3)
4) Qf: volume flow rate (m-3. s-1)
5) h: length of a villi (m)
6) l: width of a villi (m)
7) L: length of the duodenum (m)
8) t: time (s)
9) V: volume of liquid (m3)
10) A: cross-sectional area of a villus (m2)
11) m: mass of material flowing to the plane (g.s.m-1)
12) θ: degree of coverage of the adhesion reagent adsorbed to the surface
13) k: adhesion coefficient
14) ρ: density of adhesion protein (g.m-3)
15) N(t): the number of cells at a certain time
16) N0: the number of cells in origin
17) r: growth rate (s-1)
References
- Grondin, J. A., Kwon, Y. H., Far, P. M., Haq, S., & Khan, W. I. (2020). Mucins in
Intestinal Mucosal Defense and
Inflammation: Learning From Clinical and Experimental Studies. Frontiers in
immunology, 11, 2054.
https://doi.org/10.3389/fimmu.2020.02054 - Owczarzy, A., Kurasiński, R., Kulig, K., Rogóż, W., Szkudlarek, A., & Maciążek-Jurczyk, M. (2020). Collagen-structure, properties and application. Engineering of biomaterials, 23(156).
- K. Yugandhar, M. Michael Gromiha, Protein–protein binding affinity prediction from
amino acid sequence, Bioinformatics,
Volume 30, Issue 24, December 2014, Pages 3583–3589,
https://doi.org/10.1093/bioinformatics/btu580 - Hunter, S. A., & Cochran, J. R. (2016). Cell-Binding Assays for Determining the
Affinity of Protein-Protein
Interactions: Technologies and Considerations. Methods in enzymology, 580, 21–44.
https://doi.org/10.1016/bs.mie.2016.05.002 - Mortensen, J. H., & Karsdal, M. A. (2016). Type VII collagen. In Biochemistry of collagens, laminins and elastin (pp. 57-60). Academic Press.
- Biovia discovery studio
https://www.3ds.com/products-services/biovia/products/molecular-modeling-simulation/biovia-discovery-studio/ - Zhang, C., Liu, S. and Zhou, Y. (2005), Docking prediction using biological
information, ZDOCK sampling technique,
and clustering guided by the DFIRE statistical energy function. Proteins, 60:
314-318.
https://doi.org/10.1002/prot.20576 - Chen, R. and Weng, Z. (2003), A novel shape complementarity scoring function for
protein-protein docking. Proteins,
51: 397-408.
https://doi.org/10.1002/prot.10334 - Pierce, B. and Weng, Z. (2007), ZRANK: Reranking protein docking predictions
with an optimized energy function.
Proteins, 67: 1078-1086.
https://doi.org/10.1002/prot.21373 - Grondin, J. A., Kwon, Y. H., Far, P. M., Haq, S., & Khan, W. I. (2020). Mucins in
Intestinal Mucosal Defense
and
Inflammation: Learning From Clinical and Experimental Studies. Frontiers in
immunology, 11, 2054.
https://doi.org/10.3389/fimmu.2020.02054 - Owczarzy, A., Kurasiński, R., Kulig, K., Rogóż, W., Szkudlarek, A., & Maciążek-Jurczyk, M. (2020). Collagen-structure, properties and application. Engineering of biomaterials, 23(156).
- K. Yugandhar, M. Michael Gromiha, Protein–protein binding affinity prediction from
amino acid sequence,
Bioinformatics,
Volume 30, Issue 24, December 2014, Pages 3583–3589,
https://doi.org/10.1093/bioinformatics/btu580 - Hunter, S. A., & Cochran, J. R. (2016). Cell-Binding Assays for Determining the
Affinity of Protein-Protein
Interactions: Technologies and Considerations. Methods in enzymology, 580,
21–44.
https://doi.org/10.1016/bs.mie.2016.05.002