Curli

Our research team set out to explore the potential effects of introducing the pBbB8k-csg-amylase in conjunction with the arabinose inducer on curli fiber expression. Our initial hypothesis was that this plasmid would increase the expression of curli fibers and aid in biofilm growth. After experimentation and analysis, our results show promising data working towards our engineering goals.

After running the Congo Red Fluorescence Assay (found in protocols), we revealed several interesting results.

Figure 1: Fluorescence measurements from plate reader

pBbB8k-csg-amylase and Arabinose Induction:

Confirming our hypotheses, it was evident that the introduction of pBbB8k-csg-amylase alongside an arabinose inducer indeed elevates the expression of curli fibers. In figure one, we found more than twice the fluorescence of Congo Red in the arabinose induced cultures over controls. Additionally, the E. coli Nissle culture carrying the curli plasmid without the arabinose inducer shows wild type level curli fiber expression. In contrast to similar experiments measuring Congo Red amyloid binding to stationary phase cultures on solid agar, the difference is significantly stronger in liquid culture. This may indicate a lack of curli fiber expression during liquid growth phase which is compensated for by upregulation of the curli plasmid.

Figure 2a: 0.1% Arabinose Induction

Figure 2b: 0% Arabinose Induction

Arabinose Concentration Sensitivity:

Interestingly, while arabinose acts as an inducer, concentrations beyond 0.01% adversely impact curli production. Both colorimetric and fluorescence measurements, taken in varied environments of liquid and solid cultures, have demonstrated this phenomenon. It is postulated that higher concentrations of arabinose potentially compromise survivability, thereby negatively impacting curli fiber production. Looking at Figure 1., we found an inducer concentration of 0.01% produced the greatest binding of Congo Red to amyloid fibers, represented by fluorescence. Additionally, we can see in Figure 2. that cultures grown on solid agar induced with a 0.1% concentration of arabinose have significantly worse growth than induced controls Moving forward, this data was used as metrics to establish effect induction levels of curli expression.

Figure 3a: 37°C

Figure 3a: 30°C

Nissle 202 Strain – A Suitable Candidate:

Our experiments unveiled that the Nissle 202 strain is proficient in curli expression and can support biofilm growth at both 30°C and 37°C. As seen in the images of Figure 3., we have documented for this first time to our knowledge, curli expression grown at a temperature of 37*C. By observing minimal to no colorimetric difference between cultures grown at these two temperatures, we conclude that temperatures within this 7*C range has no effect on Curli expression level and likely does not affect biofilm growth. However, cultures grown at 37*C appears to have more cohesive and larger growth areas, possibly indicating a stronger metabolic burden of curli fibers at higher temperatures.

Figure 4: Comparison with cloning strains

Comparison with Cloning Strains:

Figure 4. shows a comparative analysis with E. coli DH5α revealed that the E. coli Nissle strain has superior curli fiber production in a stationary growth phase on agar plates. This was determined by qualitative observations of the colony size and color indicating biofilm spread and accumulation of Congo Red dye respectively. When comparing color only, Nissle culture appears unmistakably scarlet red while DH5α appears as a white to pinkish colony completely distinct from Nissle. This trend held true regardless of whether pBbB8k-csg-amylase was present or whether arabinose induction wasapplied. Nissle's intrinsic ability to generate greater levels of curli fiber densities could offer considerable advantages in specific applications of stomach colonization.

Figure 5: Plate read data with curli

Curli Fibers Enhance Survival in Acidic Conditions:

The most promising results for further engineering goals is the enhanced survivability offered by curli fibers against low acidity environments. In experiments that compared the wild-type Nissle and curli-fiber-producing strains, it was found that the presence of curli fibers enabled survival at a pH level of 3.5. These results are reflected in the growth curves shown in Figure 5. In contrast, the wild-type Nissle strain could only endure up to a pH of 4. To our surprise, the over expression of curli fibers aided in acid resistance at a pH of 3.5.

Figure 6: Plate read data with curli and GadE

csg and gadE Synergistic Effects:

Once completed, we transformed into Nissle 1917 and ran a plate assay testing survival in different pH ranges, with supplemented glutamate (since one of the major acid resistant pathways activated by gadE requires glutamate) AND with curli to compare the resistance offered by both. We also performed a cotransformation with both of them to see if they exhibited potentially synergistic effects. After the plate read was completed, we analyzed the data and determined that GadE and curli coexpressed exhibited a synergistic effect and allowed survival at a pH of 2.0 for long term. From this data, we learned that supplementation with excess glutamate was required for this acid resistant phenotype. We also learned that GadE in the absence of curli does not exhibit strong acid resistance. Our next data driven step is to test for ability to perform tasks such as conjugation or chemotaxis in high levels of acid with coexpression of curli and GadE. These results are reflected in Figure 6.

Curli Discussion:

E. coli Nissle 1917 and its Modification for Enhanced Stomach Colonization

The Escherichia coli Nissle 1917 (EcN) strain has proven to possess therapeutic attributes, such as the capacity to compete with and exclude pathogenic bacteria in the gut and the ability to produce antimicrobial substances. Typically, EcN flourishes in the small intestine. However, to enable its colonization and thriving in the stomach, a strategic genetic modification is proposed.

Central to this initiative is the expression of curli fibers. These proteinaceous surface appendages, produced mainly by enteric bacteria, serve pivotal roles in bacterial physiology. Curli fibers enable cell-cell interactions and support bacterial adhesion to various surfaces. This adherence is integral to biofilm formation—a structured, protective community of bacterial cells enclosed in a self-produced polymeric matrix. Beyond their role in biofilm architecture, curli fibers enhance bacterial resilience against host defenses and offer resistance against external challenges, like acidity, desiccation, and antimicrobials.

In the unique environment of the stomach, characterized by its mucosal lining and acidic pH, curli fibers can offer a strategic advantage. The lower stomach lining or the antrum is a region where mucus is less acidic and thick, potentially allowing for bacterial colonization. Curli fibers, with their adhesive properties, can facilitate the binding of the modified Nissle strain to the epithelial cells of the antrum. This binding may offer a stable niche, enabling the bacteria to withstand stomach peristalsis and remain adhered to the antrum, thereby enhancing its colonization potential.

Furthermore, Helicobacter pylori is a bacterium known to colonize regions of the stomach, particularly in the protective mucus layer, leading to chronic gastritis and increasing the risk of ulcer disease and gastric cancer. The curli-enhanced EcN can be employed as a strategic tool to target these regions. By promoting co-aggregation or even competitive exclusion, the modified Nissle can interact with H. pylori, potentially disrupting its colonization or even reducing its numbers. This interaction could offer a novel therapeutic angle in managing H. pylori-related diseases, by utilizing the probiotic potential of Nissle in conjunction with its enhanced adhesive properties via curli fibers.

mCherry Binder Design

Summary

We designed 154,019 binders to the fluorescent protein mCherry over the course of our project. Every binder design went through the same pipeline,beginning with RFdiffusion to generate plausible backbone structures, ProteinMPNN to generate sequences, and finally ESMFold to extrapolate structure from sequence. We automated this pipeline and released the automation as our team’s software, EZBinder.

Our mCherry binder design proceeded in 5 runs: Initial, Partial Diffusion, Rescue, Recycle Rescue, and Recycle Partial Diffusion.

Initial: EZBinder outputs from 6IR2
Partial Diffusion: RFDiffusion partial diffusion using 6IR2 as the initial motif
Rescue: RFDiffusion partial diffusion using the highest pLDDT output from Initial as the initial motif
Recycle Rescue: RFDiffusion partial diffusion using the highest pLDDT output from Rescue as the initial motif
Recycle Partial Diffusion: RFDiffusion partial diffusion using the highest pLDDT output from Partial Diffusion as the initial motif

Every run except for the initial used 15 diffuser timesteps in RFDiffusion and no additional weights (in contrast to the EZBinder defaults: potentials.guiding_potentials=[\"type:interface_ncontacts,weight:1,type:binder_ROG,weight:1\"] denoiser.noise_scale_ca=0.5 denoiser.noise_scale_frame=0.5.

ESMFold metrics

Each run of binders we generated produced a plausible protein structure with very high confidence in several important computational metrics outputted by ESMFold. Every single one of our runs outperformed the real nanobodies for mCherry in every metric except for percentage high confidence pTM, which is biased by the different sample sizes.

pLDDT is a measure of local accuracy, e.g. how far ESMFold believes that its predicted residue location will differ from the real location. It ranges from 0-100 with 100 as the best confidence. The cutoff for very high pLDDT varies between sources, we chose 90 for our analysis. Every run generated thousands of potential binder candidates that met this criterion, despite the fact that pLDDT was by far our most demanding filter for binder designs. Interestingly, recycling binder designs inspired by an actual structure through our protocol (Partial Diffusion, Recycle Partial Diffusion) reduced the percentage of structures with very high pLDDT, whereas recycling binder designs completely hallucinated by RFDiffusion (Rescue, Recycle Rescue), increase the share of very high pLDDTs relative to the initial run. Successive recycling through the protocol (Recycle Partial Diffusion, Recycle Rescue) reduced both the top pLDDT and the share of very high pLDDTs relative to the parent runs. From these results we can conclude that successive recycles don’t improve the quality of the generated structure.

PAE is predicted aligned error, e.g. How far ESMFold believes that a particular predicted residue location will differ when the entire predicted structure is aligned to the entire real structure. A PAE of less than 10 means that a design is a good candidate for wetlab experimentation and lower values are better. PAE universally increases as binders are recycled through the protocol.

pTM is the predicted TM score, a similar metric to PAE, but it takes a global view of the protein rather than a per-residue score like PAE and pLDDT. RFDiffusion recommends using cutoffs based on mean AlphaFold PAE, but ESMFold uses pTM as a global metric already so we’ve included both values for completeness. The share of high confidence (>0.7) pTM scores increases as binders are recycled through our protocol and the mean PAE decreases. PAE and pTM should have the same information, but our results show that they’re inversely correlated. Future work that uses ESMFold as a drop-in replacement for AlphaFold should take this behavior into account.

Docking Funnels

We took the highest pLDDT binder from each run and docked it to mCherry using the default RosettaDock docking protocol. To evaluate docking effectiveness, we created a plot of energy in REU against rms from the input structure (commonly referred to as a docking funnel). We evaluated energy at the interface of the binder and target, as well as the total energy of the complexed structure. The relative y values should only be compared within plots, and the relative x values should be ignored. Rms is a measure of deviation from the initial input structure, which we created by concatenating pdb files. A tight column of points indicates a single docking position, and the presence of many low energy conformations indicates that the dock is good.

Docking funnel of lowest pLDDT mCherry Binders, complex energy

Docking funnel of lowest pLDDT mCherry Binders, truncated to the region between 1200 to -1200. This plot uses the total complexed energy of the protein in REU (Rosetta Energy Units).

Docking funnel of lowest pLDDT mCherry Binders, interface energy

Docking funnel of lowest pLDDT mCherry Binders, truncated to the region between 0 and -30. This plot uses the interface energy of the protein in REU (Rosetta Energy Units).

We omitted recycle partial diffusion from this graphic because none of the docked structures had a negative score, meaning that the complexed energy was predicted to be higher than the energy of the binder and monomer separately, and the binder does not work in silico. Recycled partial diffusion created a protein that was computationally predicted to be stable, but it is not predicted to dock, which makes it useless as a binder.

The energetics at the interface and for the entire complex (both docking funnels above) show the characteristic “funnel” behavior of a good dock. As RMSD from the input structure increases, the energy increases, and there are many structures that are slightly different from one another that are all low-energy.

An experimentally validated mCherry binder, Lam2, is a real nanobody that has been shown to bind to mCherry experimentally. The top binder from each run shows comparable energetics to the real nanobody, and several have lower energy conformations than the nanobody. Our best docked structure is from Recycle Partial Diffusion, which has a lower energy conformation than the real nanobody, and a tighter spread of RMSD, suggesting that the docked structure has even less “wobble” than the experimentally verified binder.

Data Availability

We have made all of our generated pdb ESMFold scores available on our wiki. We have archived all of our pdb structures as well, but the archive is too large to be hosted on the iGEM wiki. Please contact igem@uoregon.edu if you would like a copy of this archive. Each protein has a unique ID, the sha1sum of its amino acid sequence. This value is provided in the linked spreadsheet under the column seq_hash.

FliG Binder Design

Another computational arm of the project focused on the creation of E. coli FliG protein for the purpose of chemotaxis towards our selected bacteria. The goal was to create an inhibitor that binds to E. coli FliG so the flagella will only rotate counterclockwise. The rotation will allow it move towards the selected bacteria. From Harshey et.al, they explore the requirement of YcgR for motility inhibition. YcgR is a c-di-GMP binding protein and binds to FliG and FliM. It shifts the motility of the cell and curates it towards rotating in a counterclockwise direction¹. The inhibitor was created using computational modeling. In figure 3D of the article, it depicts which residues correlate to a motility change in FliG when mutated. We used that information to determine which hotspot binding sites on FliG to use so the inhibitor would change the motility.

To begin, the sequence for E coli FliG was found using Ecocyc². The sequence was pasted into Alphafold2 on google colaboratory³. The crystal structure of FliG from H. pylori in Thermotoga maritima with a pdb number of 1LKV was used as a template for AlphaFold2 to create E. coli FliG. Once the E. coli FliG protein was generated, it was cleaved to keep the c-terminal domain. Hotspots that Hatshey et. al. found from E. coli FliG that binds to YcgR were used as the hotspots that the created inhibitor would bind to. The pdb was relaxed using FastRelax. The protein was named Eflig and was run through RFDiffusion to generate the structure for the inhibitor to Eflig. It was initially run with no potentials and a denoiser scale of 0. The inhibitors were run through protein MPNN fast relax to generate the sequence. They were run through AlphaFold2 to predict if the proteins will take on the conformation and actually bind in the location desired.

AlphaFold2 had multiple results. The predicted local distance difference test (pLDDT) and predicted aligned error (PAE) of the designs were obtained using AlphaFold. A high pLDDT score means that there is a high likelihood that the protein designed is going to actually take on the structure it has⁴. A low pLDDT score means that there might be disordered regions in the designed protein and it won’t fold well. A pLDDT score over 80 is high and below 50 is a low score. A PAE score predicts if the binder and the original protein will have aligned residues and bind well. A low PAE score means that the orientation and position of the binder are most likely going to bind well. A PAE below 10 is considered good. A high PAE score means that the binder won’t be aligned and the position of it is not great so the binder will most likely not bind at all.

Initially, in round 1, the designs were run with no potentials and a denoiser scale of 0. These were chosen to begin with initial values to test how the binder would turn out. There were 1362 binder designs made with those potentials and denoiser scale. The following figure shows the average pLDDT and the number of designs that have that pLDDT out of the initial design. The average pLDDT is between 60 and 74, meaning that the model has low confidence so the prediction can’t be trusted to fold. A majority of the pLDDTs were around 70. The designs do have potential because they were all above a pLDDT of 50 so they did not have tremendously low values. From this data, we can not make any claims that the binders will actually fold.

The PAE data for Round 1 is shown below. The average PAE was around 21, which is a high PAE. The lowest PAE was around 17, which is still a high PAE. The high PAE values signify that there is a low confidence that the binder and FliG protein would bind.

After looking at the data obtained from AlphaFold2, we had an idea of how to improve the EfliG designs. In RFDiffusion, adding potentials and a denoiser scale would improve the design of our specific proteins. The binder radius of gyration (ROG) is a potential that tells the protein how compact it should be. Adding a binder ROG would improve the folding and design of the protein. We added an ROG weight of 0.5 and aninterface_ncontacts weight of 1. The RFdiffusion github notes that a denoiser scale at 0.5 improves binder design so we used a denoiser scale at 0.5. These new potentials and denoiser scales were incorporated into our second round of design.

From these designs, we obtained AlphaFold2 data, shown in the figure below. The pLDDTs from Round 2 AlphaFold2 were much higher. There was a larger spread of pLDDTs from this round. The values averaged around 75. There were also many above 80, meaning that many had the potential to actually fold into a working inhibitor. There were 153 binders above a pLDDT of 80 and 26 binders above a pLDDT of 90 out of 1793 designs. This round had actual designs that had the potential to fold compared to Round 1.

The PAE data from Round 2 was also gathered and is shown in the figure below. The average PAE from this set is around 16, which is lower than the first round of design. There are also 70 binders with a PAE lower than 10, meaning that there is a high likelihood for those binders to bind to the target protein.

In order to have a protein that actually folds well and binds to the target protein, both the pLDDT and PAE values have to be good. Of the 1793 proteins from Round 2, 38 were selected based on pLDDT and PAE scores and further analyzed. The highest pLDDT was 96 and the lowest was 80. The lowest PAE score was 5 and the highest 10.

Of the 38 binders, six were selected based on pLDDT and PAE and ordered. The following table are the pLDDT and PAE values of the binders ordered. The highest pLDDT value ordered was 96.39, which is very high and the lowest was 89.98. The lowest PAE value was 5.12 and the highest was 6.3. For the binders with a lower pLDDT, they had a higher PAE.

References:

Paul, K.; Nieto, V.; Carlquist, W. C.; Blair, D. F.; Harshey, R. M. The C-Di-GMP Binding Protein YcgR Controls Flagellar Motor Direction and Speed to Affect Chemotaxis by a “Backstop Brake” Mechanism. Molecular Cell 2010, 38 (1), 128–139. https://doi.org/10.1016/j.molcel.2010.03.001.
Keseler, I. M.; Gama-Castro, S.; Mackie, A.; Billington, R.; Bonavides-Martínez, C.; Caspi, R.; Kothari, A.; Krummenacker, M.; Midford, P. E.; Muñiz-Rascado, L.; Ong, W. K.; Paley, S.; Santos-Zavaleta, A.; Subhraveti, P.; Tierrafría, V. H.; Wolfe, A. J.; Collado-Vides, J.; Paulsen, I. T.; Karp, P. D. The EcoCyc Database in 2021. Front Microbiol 2021, 12, 711077. https://doi.org/10.3389/fmicb.2021.711077. https://biocyc.org/gene?orgid=ECOLI&id=EG11654
Mirdita, M.; Schütze, K.; Moriwaki, Y.; Heo, L.; Ovchinnikov, S.; Steinegger, M. ColabFold: Making Protein Folding Accessible to All. Nat Methods 2022, 19 (6), 679–682. https://doi.org/10.1038/s41592-022-01488-1.
Varadi, M.; Anyango, S.; Deshpande, M.; Nair, S.; Natassia, C.; Yordanova, G.; Yuan, D.; Stroe, O.; Wood, G.; Laydon, A.; Žídek, A.; Green, T.; Tunyasuvunakool, K.; Petersen, S.; Jumper, J.; Clancy, E.; Green, R.; Vora, A.; Lutfi, M.; Figurnov, M.; Cowie, A.; Hobbs, N.; Kohli, P.; Kleywegt, G.; Birney, E.; Hassabis, D.; Velankar, S. AlphaFold Protein Structure Database: Massively Expanding the Structural Coverage of Protein-Sequence Space with High-Accuracy Models. Nucleic Acids Res 2021, 50 (D1), D439–D444. https://doi.org/10.1093/nar/gkab1061.