Purpose & Methods

By using protein modeling tools such as Alpha Fold 2 (AF2) the design of DNA parts encoding complex fusion proteins can be streamlined. With near crystallographic accuracy, AF2 is the top protein structure prediction tool, capable of generating models for most protein types, (Jumper et al., 2021). Colab Fold, an open source, cloud-based version of AF2 provides researchers with free, user-friendly access to the AF2 software, (Mirdita et al., 2022). Given these benefits, any team can efficiently model the 3-dimensional (3D) fusion protein. Therefore, many different sequences can be tested beforehand, greatly reducing part assembly optimization time. In our case, we used Colab Fold to visualize the distance and orientation of domains, impact of various linker types, and the overall probability of folding.

Our team relied on Colab Fold which can be found here. Note that this link is for the developer’s Github page and will not take you directly to their Google Colab notebook. However, all of their notebooks are accessible through the Github page. We used the Google Colab notebook labeled “AlphaFold2_mmseqs2”, for all AF2 predictions. The notebook was run with “num_relax” set to 1, “template_mode” set to pdb100 (protein data bank), and the remaining settings were unchanged. Structure assessment, beyond that provided by Colab Fold, was performed using the Swiss-Model webserver found here. Under the “tools” tab you can find the “structure assessment” page. The default settings were used for all of the assessments.

In the field of protein structure modeling, Multiple Sequence Alignments (MSAs) are commonly used to obtain the homologous sequences required for structure predictions. AF2 is no exception and uses a variety of databases to obtain protein sequence information, (Jumper et al., 2021; Mirdita et al., 2022). While AF2 utilizes MSAs to build an initial framework, they are not required for further refinement of that model, (Jumper et al., 2021; Mirdita et al., 2022).

Predicted Local Distance Difference Tests (pLDDT) signify the per-residue accuracy of a folded protein relative to its reference structures that are derived from the MSA clusters, (Mariani et al., 2013). The higher the similarity between two comparable regions, the higher the quality of the model. Through the comparison of input structures with experimental structure of similar size, QMEAN Z-Scores estimate model quality along five metrics explained in Benkert et al. (2011).

Molprobity is another model quality determination tool focused on the identification of irregular backbone and side-chain conformations throughout the input structure, (Chen et al., 2009). For example, residues can be too close together or improperly oriented. Generally, higher overall Molprobity scores indicate lower quality models, (Chen et al., 2009). Ramachandran plots, which are used in Molprobity scores, group residues based on secondary structure and conformational energetics. Dark green regions represent optimal residue conformations for a particular secondary structure. Poorly positioned residues are indicated in lighter green or white regions of the plot.

Cameleon-Q-Sensor

Figure 1. GIF of Cam-Q-Sensor. The AF2 prediction of our Cameleon-Q-Sensor. mScarlet is red, Cameleon is green, ShadowR is purple. The N-terminal begins with mScarlet and the C-terminal ends with ShadowR. This image was generated using Pymol.

During the early stages of our project we performed in-silico analysis of de-novo designed anti-Fusarium nanobodies. A process eventually replaced with library screening. Unfortunately, this process (in-silico design) often ends in failure, meaning a functional sensor was likely out of our reach. However, by synthesizing an analogous system, the Cameleon-Q-Sensor (Figure 1), we could still characterize our chromogenic proteins. The Calmodulin protein placed between mScarlet and ShadowR was removed directly from a class of calcium-sensing biosensors called Cameleons. We had hoped to perform calcium titrations with purified versions of the sensor, allowing us to characterize the quenching interaction between mScarlet and ShadowR. However numerous challenges prevented this from occurring. Despite our difficulties, this sensor can be a valuable system for teams testing similar types of quenching interactions.

Figure 2. Cam-Q-Sensor Modeling Results. A. Multiple Sequence Alignments (MSAs) for the input sequence. Plots are generated automatically during the AF2 workflow. B. A per-residue confidence score using the Predicted Local Distance Difference Test (PLDDT). Plots are generated automatically during the AF2 workflow. C/D. Another PLDDT, QMEANDisCo, and QMEAN Z-Scores were obtained automatically during the structure assessment process. E-I. Molprobity Scores and the accompanying Ramachandran plots were also generated automatically during the structure assessment process. The images in this figure were partly generated using Bio Render.

Figure 2A shows query coverage as a percent identity of the input sequence relative to MSA references. Interestingly, only the Calmodulin domain shows a spike in MSA references with poor average accuracy, implying reduced prediction accuracy of that region. Due to the engineered nature of this domain, reduced homology between natural Calmodulin proteins is expected. The large number of reference sequences (30, 000) and the poor average similarity of those sequences can be explained by AF2’s use of partial homologies to improve accuracy. Conversely, mScarlet and ShadowR have easily obtainable alignments and require a smaller number of reference sequences due to its variety of PDB accessions and similarity to GFP, respectively. Figure 2A illustrates this through the absence of peaks. Figures 2B and C show pLDDT scores from AF2 and Swiss-Model respectively. Both show acceptable quality for the mScarlet and ShadowR domains, although Figure 2B is noticeably higher quality. Conceivably due to pLDDTs reliance on MSAs and the different reference sequences used by AF2 and Swiss-Model. Therefore, different reference sequences in the MSAs would cause a change in the pLDDT values. Figure 2D shows QMEAN Z-scores, all of which are within tolerable limits.

Figures 2E-I show the Molprobity score and Ramachandran plots. Figure 2E’s bad angels score is potentially a cause for concern. Swiss-Model classifies any angle as bad if it is more than four standard deviations from ideal. Although these residues are not a substantial proportion of the overall structure, they may severely hamper the folding process. Despite AF2’s automatic relaxation through the use of a brief Amber simulation, a more thorough relaxation/refinement process could reduce the number of bad angles. Roughly 1.67% of our residues are Ramachandran outliers, meaning they reside in the light-green or white regions of the plot. While suboptimal this is acceptable and may resolve with longer relaxation/refinement as mentioned above.

Antigen Fragments

Figure 3. GIF of Antigen Fragment. The AF2 prediction of the Antigen Fragment. Linker regions are pink, N and C-cap motifs are blue, and the Fusarium antigen fragment is green. The N-terminal is left while the C-terminal is right. This image was generated using Pymol.

Our sensor requires binding specificity against an extracellular protein produced by Fusarium graminearum. After searching the Uniprot database we found a well-documented Endo-1,4-beta xylanase enzyme, Uniprot accession I1RII8. Homologues of this enzyme are seen across a handful of Fusarium related families, potentially enabling rapid binder diversification. Overexpression and secretion seem to occur during times of infection, but can also be induced through simple alterations to the culture medium, making this a optimal target for the sensor, (Beliën et al., 2005; Brunner et al., 2007; Dong et al., 2012; Hatsch et al., 2006; Pollet et al., 2009; Sella et al., 2013). Eukaryotic protein expression is complex, posing numerous challenges that we needed to account for when performing heterologous expressions in E. coli. Therefore, a great deal of optimization would be required to isolate the enzyme. We also considered sensor integration by exploring how the fusion of this enzyme with a nanobody affects the sensor dynamics. The large size of this fusion means an increased probability of poor folding, function, and/or expression, all of which would require an immense optimization period.

To account for these factors we extracted a small alpha-helical region from the enzyme (Figure 3), which we call the antigen fragment. However, removing the helix from its natural environment could introduce folding instability. This became more concerning when it was revealed that the N cap was absent from the extracted sequence. The N-cap is present in the original structure by virtue of interactions with distant regions of the backbone. Therefore, we decided to create our own capping sequences using AF2.

Figure 4. Antigen Fragment and Improved Antigen Fragment Modeling Results. A & B. A per-residue confidence score using the Predicted Local Distance Difference Test (PLDDT). Plots are generated automatically during the AF2 workflow. C & D. Another PLDDT, QMEANDisCo, and QMEAN Z-Scores were obtained automatically during the structure assessment process. E & F. Antigen Fragment (E) and Improved Antigen Fragment (F). The graphs show secondary structure predictions with the QMEAN gradient. C indicates loop or coil type folds while H indicates alpha-helical folds. G-I. Ramachandran plots and Molprobity scores generated automatically during the structure assessment process. The images in this figure were partly generated using Bio Render.

After numerous simulations we derived two sequences capable of consistently producing AF2 models with pLDDT scores in the mid 80s to 90s (Figure 4A). Fragments extracted from homologous enzymes of related species were also predicted with similar accuracy when the capping sequences were applied. It appears our capping regions bond between one another, and not with residues in the fragment (Figure 5). This may explain why the same capping regions can be successfully applied to various fragments. Importantly, we have only applied these regions to a handful of other fragments, so their broad applicability remains unanswered.

Figure 5. Antigen Fragment Structure. A. AF2-derived model of the Antigen Fragment with sequence information for each section. B & C. Close up view of the predicted N (purple) and C (yellow) capping regions of the Antigen Fragment. Some residues are labelled with their single letter amino acid code and position number for easier reference. The images in this figure were generated using the Protein Data Bank (PDB) viewer and Bio Render.

Interestingly, Figure 4C shows a substantial drop in pLDDT score for these fragments using Swiss-Model. As mentioned previously we believe this is caused by different MSA clusters. Secondary structure predictions, illustrated in Figure 4E, seem to support the AF2 models (Figure 3) and indicate that the capping regions are successfully maintaining fragment helicity. Notably, the proportion of Ramachandran favored residues was quite low, (Figure 4G, I & L). Disordered regions, such as the linkers placed after the capping sequences, are always predicted with very low pLDDTs by AF2. The residues in these regions typically have irregular conformations as a result, explaining the odd Molprobity and Ramachandran values. Figure 5A-C shows the sequence and structure of our antigen fragment, including predicted capping interactions.

Figure 6. Improved Antigen Fragment Structure. A. AF2-derived model of the Improved Antigen Fragment with sequence information for each section. B & C. Close up view of the predicted N (pink/purple) and C (orange) capping regions of the Improved Antigen Fragment. Some residues are labelled with their single letter amino acid code and position number for easier reference. The images in this figure were generated using the Protein Data Bank (PDB) viewer and Bio Render.

We noticed that fusing the antigen fragment to larger proteins, even with long flexible linkers, caused massive drops in prediction quality. This is problematic in the context of our sensor. Even though the fragment was already being synthesized we decided to test a few more designs; hoping one might improve fusion stability. It turns out that the first seven and last seven residues of Alfa-Tag can be used as effective capping regions. We also included three alanine residues between the capping region and fragment to further ensure capping only occurs between residues in that region. The improved performance can be seen in Figure 4B, D, F, H, J & K. Unfortunately, these changes did not sufficiently improve fusion stability, only helical stability. Figure 6 shows the sequence, structure, and capping interactions of the improved fragment.

Alfa-Tag-Q-Sensor

Figure 7. GIF of Alfa-Tag-Q-Sensor. The AF2 prediction of our Alfa-Tag-Q-Sensor. mScarlet is red, Alfa-Tag is green, Alfa-Tag Nanobody is blue, and ShadowR is purple. The N-terminal begins with mScarlet and the C-terminal ends with ShadowR. This image was generated using Pymol.

We wanted to know if our desired sensor arrangement was even possible. At the time, however, we lacked the nanobody and antigen sequence information required to model it. Our primary challenge was actually identifying analogous systems. As it turns out, nanobodies targeting small helical peptides are very rare. Fortunately, the Alfa-Tag peptide and corresponding nanobody were recently developed, and its complex was deposited to the PDB, (Götzke et al., 2019). To create the model we replaced Cameleon with the Alfa-Tag complex, fusing the nanobody and alfa-tag with a GGGGS (x5) linker. If modeled in multimer mode, which is used for complexes, without any sequence alterations, AF2 accurately recreates the alfa-tag complex. Which suggests AF2 requires more diverse sequence information to predict how the complex will fold as a part of the sensor (Figure 8A-C). This seems likely for the Cameleon domain as well.

Figure 8. Alfa-Tag-Q-Sensor Modeling Results. A. Multiple Sequence Alignments (MSAs) for the input sequence. Plots are generated automatically during the AF2 workflow. B. A per-residue confidence score using the Predicted Local Distance Difference Test (PLDDT). Plots are generated automatically during the AF2 workflow. C & D. Another PLDDT, QMEANDisCo, and QMEAN Z-Scores were obtained automatically during the structure assessment process. E-I. Molprobity Scores and the accompanying Ramachandran plots were also generated automatically during the structure assessment process. The images in this figure were partly generated using Bio Render.

REFERENCES

  1. Beliën, T., Van Campenhout, S., Van Acker, M., & Volckaert, G. (2005). Cloning and characterization of two endoxylanases from the cereal phytopathogen Fusarium graminearum and their inhibition profile against endoxylanase inhibitors from wheat. Biochemical and Biophysical Research Communications, 327(2), 407–414. https://doi.org/10.1016/j.bbrc.2004.12.036.
  2. Benkert, P., Biasini, M., & Schwede, T. (2011). Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics, 27(3), 343–350. https://doi.org/10.1093/bioinformatics/btq662.
  3. Brunner, K., Lichtenauer, A. M., Kratochwill, K., Delic, M., & Mach, R. L. (2007). Xyr1 regulates xylanase but not cellulase formation in the head blight fungus Fusarium graminearum. Current Genetics, 52(5-6), 213–220. https://doi.org/10.1007/s00294-007-0154-x.
  4. Chen, V. B., Arendall, W. B., Headd, J. J., Keedy, D. A., Immormino, R. M., Kapral, G. J., Murray, L. W., Richardson, J. S., & Richardson, D. C. (2009). MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallographica Section D Biological Crystallography, 66(1), 12–21. https://doi.org/10.1107/s0907444909042073.
  5. Dong, X., Meinhardt, S. W., & Schwarz, P. (2012). Isolation and Characterization of Two Endoxylanases from Fusarium graminearum. Journal of Agricultural and Food Chemistry, 60(10), 2538–2545. https://doi.org/10.1021/jf203407p.
  6. Götzke, H., Kilisch, M., Martínez-Carranza, M., Sograte-Idrissi, S., Rajavel, A., Schlichthaerle, T., Engels, N., Jungmann, R., Stenmark, P., Opazo, F., & Frey, S. (2019). The ALFA-tag is a highly versatile tool for nanobody-based bioscience applications. Nature Communications, 10(1), 4403. https://doi.org/10.1038/s41467-019-12301-7.
  7. Hatsch, D., Phalip, V., Petkovski, E., & Jeltsch, J.-M. (2006). Fusarium graminearum on plant cell wall: No fewer than 30 xylanase genes transcribed. Biochemical and Biophysical Research Communications, 345(3), 959–966. https://doi.org/10.1016/j.bbrc.2006.04.171.
  8. Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., & Back, T. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589. https://doi.org/10.1038/s41586-021-03819-2.
  9. Mariani, V., Biasini, M., Barbato, A., & Schwede, T. (2013). lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics, 29(21), 2722–2728. https://doi.org/10.1093/bioinformatics/btt473.
  10. Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022). ColabFold: making protein folding accessible to all. Nature Methods, 1–4. https://doi.org/10.1038/s41592-022-01488-1.
  11. Miyawaki, A., Griesbeck, O., Heim, R., & Tsien, R. Y. (1999). Dynamic and quantitative Ca 2+ measurements using improved cameleons. Proceedings of the National Academy of Sciences, 96(5), 2135–2140. https://doi.org/10.1073/pnas.96.5.2135.
  12. Nagai, T., Yamada, S., Tominaga, T., Ichikawa, M., & Atsushi Miyawaki. (2004). Expanded dynamic range of fluorescent indicators for Ca 2+ by circularly permuted yellow fluorescent proteins. Proceedings of the National Academy of Sciences of the United States of America, 101(29), 10554–10559. https://doi.org/10.1073/pnas.0400417101.
  13. Pollet, A., Beliën, T., Fierens, K., Delcour, J. A., & Courtin, C. M. (2009). Fusarium graminearum xylanases show different functional stabilities, substrate specificities and inhibition sensitivities. Enzyme and Microbial Technology, 44(4), 189–195. https://doi.org/10.1016/j.enzmictec.2008.12.005.
  14. Ross, B. L., Tenner, B., Markwardt, M. L., Zviman, A., Shi, G., Kerr, J. P., Snell, N. E., McFarland, J. J., Mauban, J. R., Ward, C. W., Rizzo, M. A., & Zhang, J. (2018b). Single-color, ratiometric biosensors for detecting signaling activities in live cells. ELife, 7. https://doi.org/10.7554/elife.35458.
  15. Sella, L., Gazzetti, K., Faoro, F., Odorizzi, S., R. D’Ovidio, Wilhelm Schäfer, & Favaron, F. (2013). A Fusarium graminearum xylanase expressed during wheat infection is a necrotizing factor but is not essential for virulence. Plant Physiology and Biochemistry, 64, 1–10. https://doi.org/10.1016/j.plaphy.2012.12.008.
  16. Studer, G., Rempfer, C., Waterhouse, A. M., Gumienny, R., Haas, J., & Schwede, T. (2019). QMEANDisCo – Distance Constraints Applied on Model Quality Estimation. Bioinformatics, 36(6). https://doi.org/10.1093/bioinformatics/btz828.