Rational Protein Design

To prevent the self-assembly of the displayed mFadA, we need to delete the unrelated structural domains and remain only the functional structural domains that bind to the FadA.

We tried to truncate from the N-terminal position 20 to 30 and obtained 10 mFadA truncated fragments, which were referred to as nominated “B-domains”, and docked with mFadA monomers to verify the ability of different B-domains to target mFadA. The Evaluation index was interface_score(I_sc)

Figure 4. Truncate mFadA at different positions to divide into two function domains.

Utilize Rosetta to perform local_docking with mFadA for each of the 10 B-domains, nstruct=10000 (for each), take the value that results in the smallest I_sc and repeat 3 times.

Figure 5. Analysis of docking results with different B-domains.
a, the binding energy of different B-domains and mFadA.
b, the arginine has no contacts with other residues.

As we expected, the longer the retention length is, the higher the binding energy is(Fig. 5a). Among them cut28 and cut29 have higher binding energy. However, it can be seen that the largest contribution to the score is Arg27(Figure5b), which is not encapsulated in the centre of the active site and will not contribute to the self-assembly of the pilus. More importantly, we looked at the active site centre microinteractions (Figure5d), and the main hydrophobic interactions and secondary hydrophobic interactions play a major contribution to the pilus self-assembly. The leucine chain consisting of L7, L11, L14, L21, L53, L76, L84 plays the main hydrophobic binding role, and the salt-bonded shell consisting of ionic bond interactions wraps the hydrophobic centre to form a strong and stable bond.

Taking into consideration the docking results mentioned above, as well as the confirmation and identification of microscale interactions, thus, we ultimately chose a truncated 27AA peptide from the N-terminus of mFadA as the core component for targeting Fn, which we call it as “B-domain”.

Figure 5-1. microscopic interactions derived from dry experimental results.
Learn from us:

Utilizing dry experiments to guide the design of wet experiments is common in a variety of synthetic biology work. Like our work above, dry experiments are an excellent aid. Science is not arbitrary or idealistic; it pursues a series of rigorous logical chains. We have taken this into account and have carefully explored where the best place to cut is, rather than making arbitrary decisions and conducting wet experiments. We hope you can glean some insights from it.

Design of Fishing Rod Protein
We sequentially designed fishing rod proteins for E. coli and BL. We fused them with Lpp-OmpA, INP, OmpC, GL-BP, and our optimized GL-BP (violet) using a flexible linker (yellow) between them. Structural predictions using AlphaFold2 revealed that B-domain directly participated in the folding of Lpp-OmpA and OmpC, resulting in the loss of B-domain's targeting ability. INP and GL-BP, due to the flexibility of the linker, had their B-domains enclosed by membrane proteins, leading to a reduction in their targeting capabilities. Consequently, we carried out optimization design for GL-BP.

Figure 6. Structure prediction of different fishing rod proteins. a. Surface display with Lpp-OmpA;b. Surface display with INP c. Surface display with OmpC; d. Surface display with wild type GL-BP e. Surface display with GL-BP(Optimized)

We initially used the wild-type GLBP as the membrane protein (green) and designed the fishing rod protein by adding a (GSGSGSG)5 flexible hydrophilic linker (yellow) at the N-terminal end, along with the pilus fragment (orange). We hoped that this signal peptide-membrane protein-linker-pilus fragment “fishing rod protein” would participate in pilus self-assembly, therefore achieving “bacterium-bacterium targeting.” However, it can be observed that GL-BP has a groove thB-at consistently attracts the B-domain, thus diminishing its targeting capability.

Figure 7. B-domain was bound by a groove of GL-BP

Upon examining the binding interactions between the pilus fragment and the GLBP protein, we identified three binding sites. Among these, the ionic interaction between the 96th lysine residue of GLBP and the 487th aspartic acid residue was the most pronounced, leading us to consider the 487th aspartic acid as the “culprit”. Additionally, the asparagine at position 430 and the phenylalanine at position 431 were found to participate in the binding, but we do not believe they play a pivotal role. Consequently, we attempted to mutate 96K, replacing it with E, A, or V, in order to disrupt this ionic interaction.

Figure 8. Interactions between the groove and B-domain

We individually engineered mutant variants for 96K, 96V, and 96A. Structural predictions revealed that only the 96A mutant variant still exhibited binding between the pilus fragment and GLBP protein. In contrast, the 96V and 96K mutant variants successfully unfolded the pilus fragment, preserving its activity.

Figure 9. Structure prediction on mutations of position 96.

While the structural predictions for the mutant variants aligned with our expectations, we remained curious about the impact of the 430 asparagine and the 431 phenylalanine on the folding of the wild-type fishrod protein. Therefore, we constructed eight additional mutant variants and conducted structural predictions. Since both the 431 and 430 positions were involved in binding, besides the ionic interaction formed at the 96th position, we attempted similar mutations for 430 and 431 and systematically combined mutations at these three different positions.

Figure 10. The mutations we build and the result of B-domain condition

From the graph, it can be observed that the RMSD values of the mutants are quite close, indicating that the mutations have a relatively minor impact on the proper folding of GL-BP. Additionally, except for rod7 and rod9, the B-domain of all fishing rod proteins successfully folded and was not closed.

Figure 11. Results of structure prediction. a.RMSD of FRPs b. the condition of B-domain and GL-BP

Although this portion of the work provided some structural and functional guidance for wet experiments, we believed that rigorous wet experiments, such as molecular dynamics simulations or extensive wet experiments with GL-BP mutations, were still needed to demonstrate GL-BP's performance more conclusively. Consequently, in the final wet experiments, we abandoned the surface display systems of Lpp-OmpA, INP, and OmpC, opting instead to use the wild-type GL-BP as the membrane protein for surface display

Learn from us:

Dry lab experiments, like wet lab experiments, are subject to various serendipitous occurrences. When a phenomenon arises, we should not hastily assume, impose rigid biases, or draw conclusions. Instead, we should critically analyze the underlying reasons behind the observed phenomenon and persistently repeat trials, ultimately uncovering the natural laws.

Fusion protein design
In order to validate the feasibility of FK-13 based fusion protein design, and ensuring the cytotoxic activity of FK-13 remains intact, we constructed four different sequences: LinkerG as a flexible linker and LinkerA and B as a rigid linker.

Figure 12. Sequence with different linker and FK-13

The structures of the four sequences were predicted using AlphaFold2, and the results are as follows.

Figure 13. Structure predictions of FK-13 with different linkers.| a: High rigidity of FK-13; b: The flexible linker peptide still maintained flexibility with FK-13; c~d: EAAAK on C terminal would impair the anticancer activity of FK-13, and reversing the linker peptide would avoid this effect.

Further we constructed fusion BTP and TTP using both flexible and rigid linker peptides, and the sequences are as follows:

Figure 14. Sequence of HlpA, BTP and TTP. Notice: full sequence of HlpA can be found in registry: BBa_K4990007

First, we started with individual HlpA monomers and homodimers. Next, we used flexible linker peptides, forward and reverse rigid linker peptides, and designed fusion proteins both at the N-terminus and C-terminus of HlpA. Additionally, to ensure that the cytotoxic domain and the targeting domain are oriented in different directions, we designed an additional β-turn added to the N-terminus of HlpA, allowing the cytotoxic peptide to be adequately separated from the targeting structural domain.

Figure 14-1 Structure prediction of TTP and BTP
Learn from us:

Do you know why we predicted the HlpA homodimer? LZU_2022 used surface display of HlpA monomers for targeting CRC. Therefore, HlpA monomers were also considered as functional segments for targeting CRC. The structure of HlpA monomers is intricate, and the dimers formed are intricately intertwined. We were skeptical about whether split HlpA chains could still self-assemble into dimers. However, in the gel electrophoresis of the dual-targeting peptides, we observed a remarkably clear dimerization phenomenon, something we had not anticipated before. While in awe of nature's wonders, we immediately conducted dry lab simulations based on wet lab feedback. Using Rosetta local_docking, we selected the A and B chains of the HlpA homodimer, ran -nstruct=10000, and obtained a binding energy beyond our expectations, with an I_sc of -147.573 REU. This far exceeded the numerical values we had previously achieved for binding, adhesion, and targeting, providing strong evidence for the significant trend of HlpA forming dimers. This is a successful case of dry lab experiments guiding wet lab designs, with wet lab feedback influencing dry lab work.

Fig 14-2. Interaction of DEH and heparin. A. DEH monomers are typically observed at 17kDa (yellow squares) while dimers are observed at 35kDa (red squares). Breakage of the linker and dimerization of HlpA may form a DEH-HlpA complex with a molecular weight of approximately 28kDa (blue squares). B. The structure of DEH monomer (~15kDa, or ~17kDa with purification tag) C. The structure of DEH dimer (~30kDa, or ~35kDa with purification tag) D. The dimerization of HlpA and the interaction between the dimer and heparin.

The entire cellular lysate obtained from cells that expressed the HlpA-EGFP fusion protein was utilised. To remove adhesion and allow full interaction between the tumor cells and fusion protein, trypsin was applied to the cells. The trypsin was washed off with PBS and then co-incubation for a period of 6 hours. Following the incubation, the sample was resuspended in PBS and centrifuged thrice to eliminate weakly-bound fusion proteins from the solution. The results indicate that the HlpA-EGFP fusion protein accumulates abundantly and intensely on the surface of tumour cells, emitting a green fluorescence.

Moreover, we observed an increase in cell clusters under white light, possibly due to HlpA fragment dimerization and adhesion to the tumour cell surface. This provides further evidence of HlpA's dimerization ability.

In simple terms, the dual-targeting peptide is assembled from the best BTP and TTP designed in the previous design circle. It consists of five parts: 1. B-domain for targeting Fn. 2. Linker A for separating functional domains. 3. FK-13 for targeting Fn and CRC. 4. Linker B, which not only separates the functional domains but also ensures the positive center of FK-13. 5. HlpA monomer for targeting CRC. These segments have been individually uploaded as parts to the Registry, and you can find detailed descriptions of them in the contributions.

Figure14-3. Validation of the targeting ability of HlpA
Figure 15. Domains of DEH

Two DEH sequences were constructed sequentially and folded using Alphafold2.

The best DEH was subjected to wet lab characterization.

Figure 16. Sequence of DEH optimization.

Figure 17. Structure predictions of DEHs

It can be seen that if using the complete B-domain (①), there is an inability to form a complete α-helix. Therefore, the optimized DEH had the C-terminal five amino acids removed from the B-domain to ensure the stable formation of the α-helix structure at the N-terminus of DEH. As shown in ④, the added GPNG sequence once again played a role, targeting the Fn segment downward and the CRC targeting functional domain upward, effectively achieving "function domain isolation".

We find that fusion cytolytic peptides with such separated structural domains possess better targeting and cytotoxic activities, which will be advantageous for future synthetic biology modifications. Therefore, this optimized DEH was selected as our final targeting cytotoxic peptide.

Learn from us:

We recommend that future protein designers, when creating fusion proteins, always keep "function domain isolation" in mind. The methods we have employed include changing the types and lengths of linkers, altering the order of fusion protein splicing, and modifying certain amino acids to achieve "function domain isolation." No one desires a protein with a functional surface that results in "1+1 > or < 2." Only when "1+1=2" is achieved, can it be considered the most controllable and predictable outcome. Synthetic biology is akin to building with blocks, and we hope that these functional modules can be stacked like building blocks, each performing its designated role, which has always been our ultimate goal.

mFadA rational redesign
Though mFadA demonstrated outstanding adhesive stability in model predictions, we still aspired for it to possess superior self-assembly and adhesive capabilities. Consequently, the CPU-China team decided to modify the wild-type mFadA monomer, hoping that the modified mFadA monomer's B-chain (the longer orange chain in Figure 3.) would achieve a lower energy when binding with the A-chain (the turn of the purple chain in Figure 3.) of the wild-type mFadA monomer. In essence, the modified mFadA protein exhibits enhanced stability when adhering to the wild-type mFadA, bolstering the engineered bacterium's ability to target colorectal cancer cells.

To actualize our modifications, we undertook the following steps:

1. Initially, we utilized Alphafold2 to fold two adhering mFadA proteins to their optimal conformations, choosing "template_mode" as pdb and selecting the best solution which ranks the first from five relax results as the lowest energy docking conformation.
2.The lowest energy docking conformation was uploaded to the Rosetta Online Server that Includes Everyone (ROSIE). We leveraged its stabilize-pm feature to asymmetrically saturate mutate the mFadA segment displayed by the engineered bacterium, selecting mutations that reduce the overall energy.

3. The mFadA protein displayed by the engineered bacterium was subjected to mutations generated in step 2. Using Google Colab, the modified protein and the wild-type protein were folded again via Alphafold2 to obtain the optimal docking conformation, resulting in a design of 35 mutants in total.
4. The 35 mutants were uploaded to ROSIE, and the score feature was used to evaluate the docking results. Referring to the mutation docking results and the wild-type docking results total scores. A lower total score value indicates a more favorable interaction. Based on the output total scores, we ranked the 36 dimers, selecting the 5 dimers with the lowest scores.
5. The 5 mutated dimers obtained in step 4 underwent dynamic simulation on Gromacs, where their binding energies were calculated.

Figure 19. Saturate mutation of mFadA homodimer. a. position from 1-20 b. position from 21-40 c. position from 41-60 d. position from 61-80 e.position from 81-99 f. 100-111

The mutants obtained are as follows:

Mutant 1: 9G→R (I_sc=-494.085)
Mutant 2: 10E→L (I_sc=-497.952)
Mutant 3: 9G→K (I_sc=-494.072)
Mutant 4: 6S→Q, 9G→R (I_sc=-495.48)
Mutant 5: 9G→R, 10E→L, 19A→R (I_sc=-498.652)
Benchmark: Wild-type (I_sc=-479.283)

Figure 5. Mutants with the lower total scores The legend A, B, C, D, E corresponds to mutant 1, 2, 3, 4, 5

As a simple, efficient, and stable-connecting protein, mFadA is undoubtedly poised to shine in future applications such as tumor targeting, bacterial adhesion, and bacterial localization. Therefore, our ambitions for mFadA modification extend far beyond our current achievements. Presently, our modifications to the mFadA protein are confined to the first 20 amino acids, capable of producing strong interactive forces, and we have only conducted saturation mutagenesis for these 20 positions. In future research, we aim to delve deeper into mFadA, modifying other amino acid residues, anticipating the derivation of even more stable complexes between mutant mFadA and wild-type mFadA from alternative amino acid sites. We also plan to elongate the leucine zipper of the mFadA protein, aiming to achieve more stable aggregates through enhanced amino acid-hydro interactions. We look forward to the forthcoming experimental successes of the CPU-China team and anticipate that the modified mFadA will offer even more exciting surprises and applications for humanity.

figure n. The scores of mutations