Engineering Success | UniMuenster

Overview

Designing a project as complex as BeeVAX involves extensive planning and continually refining those plans to achieve optimal outcomes.

Our project aimed to create an enhanced version of Vitellogenin via protein engineering. To achieve this goal, we designed our experiments within each lab group in both, dry lab and wet lab. We were very excited to apply the plan to real laboratory conditions and test if our design worked out as intended. Throughout this process, failures, whether minor or major, were a challenge for our motivation. Nevertheless, these challenges often became opportunities to improve our design and learn something for future attempts. We believe that this is the essence of engineering success. In the following, we present three examples that passed every step of the engineering cycle to contribute to an optimized version of Vitellogenin: 1) Investigating the best binding affinity to zymosan, 2) Predicting zymosan binding in Vitellogenin models and 3) Enabling the expression of Vitellogenin.

"The human foot is a masterpiece of engineering and a work of art."

- Leonardo Da Vinci

Introduction

"The human foot is a masterpiece of engineering and a work of art." More broadly you could say nature in general is an evolutionary masterpiece of engineering and we are the ones to learn from it. As bioengineers we apply our knowledge about biological mechanisms and systems to create new innovative systems. Thereby, we can enable solutions for unsolved problems or invent new products using the inherent power of nature itself.

Our project is a middle-out approach combining forward and reverse bioengineering to generate the bee's egg-yolk protein Vitellogenin with an optimized binding affinity against fungi PAMPs to enable a new treatment against the Varroa mite infestation of bee hives. The reverse engineering approach is based on the aim to gain more knowledge about the binding sites and domains of Apis mellifera`s Vitellogenin. In the forward engineering steps, we focused on increasing the binding affinity of Vitellogenin to fungi's PAMPS by a random mutagenesis-based approach, more precisely, the directed evolution of the protein combined with a yeast surface display (YSD).

To successfully engineer our protein of interest, we passed through the engineering cycle in different parts of our project. The cycle consists of four successive steps: Design- Build- Test- Learn. During the Design and Build step, the known and unknown parts of the projects are estimated and implemented in a detailed lab protocol. By testing, the design is approved or qualified for optimization. The latter leads to the following step: Learn, which enables a re-design and optimization of the project design.

The graphic shows the model of an engineering cycle containing four steps: Design, Build, Test and Learn. The first step, Design, contains the following information: Planning which materials, methods, and steps to apply to reach the aims of your initial ideation. The second step, Build, contains the following information: Implement the design by creating the connected biological system. The third step, Test, contains the following information: Measure and check your created system if it works the way you designed it. The last step contains the following information: Analyse your test results and take advantage of it to eventually re- design your system and improve it. After the four steps an arrow indicates that the cycle starts again in the next design cycle.

Figure 1: Model of the engineering cycle with the four steps Design, Build, Test and Learn.

Each part of the project passed through several cycles to finally reach the obtained aims. But looking back, also the whole project itself went through various of these cycles. In the following, some aspects of the engineering process of our project are depicted. It is important to consider that an engineering process is only possible with a previous failure which is erased by finding the weak point and optimizing the project design by starting a new cycle.

Investigating the Best Binding Affinity to Zymosan for YSD

One principal part of our project was the use of a yeast surface display (YSD) to present the domains of the Apis mellifera`s Vitellogenin on the surface of Saccharomyces cerevisiae. To read more about the constructs visit the project description. The cells of the YSD were analyzed using fluorescence activated cell sorting (FACS) with the two following main aims: 1) to investigate which domain of Vitellogenin shows the highest zymosan binding affinity and 2) the cell sorting of error-prone PCR mutated versions of these domains to find mutations which lead to an increased binding affinity to zymosan.

This is the first attempt measuring zymosan binding to Vitellogenin domains via a YSD. Therefore, no protocols regarding the handling of zymosan and testing of the binding affinity using a YSD based approach were available. We were required to depend on YSD protocols employing diverse binding substrates, commonly utilized in antibody engineering, to assess an appropriate methodology. For antibody engineering the substrates are mainly biotinylated [1][2]. Hence, these approaches do not include fluorescence marked substrates and are inherently different from ours.

The graphic shows two engineering cycles for the best binding affinity for zymosan in the YSD. It starts at Design 1 with the following information: Adopting FACS washing protocols to bee Vitellogenin and zymosan. The next step is Build 1 with the following information: 500 microliter of PBS and 35° Celsius are incubated for 5 minutes and centrifugated at 13,000xg. The next step is Test 1 with the following information: After the washing steps the antibody signal of the constructs is decreased. The next step is Learn 1 with the following information: To adjust the zymosan concentration washing steps are not sufficient. Then, the second cycle begins. The step Design 2 contains the following information: Plan to test the binding kinetics of bee Vitellogenin to zymosan. The next step is Build 2 with the following information: Determine which concentrations are needed for a kinetic calculation. The next step is Test 2 with the following information: Realizing the zymosan rate is reached earlier than expected. The last step is Learn 2 with the following information: The dissociation constant of 0.61 microgram per microliter determines the right zymosan concentration.

Figure 2: Schematic illustration of the two engineering cycles to find the best binding affinity to zymosan for YSD.

Cycle I

Design & Build I: To find the best binding affinity, the zymosan had to be fragmented and labeled with a fluorescence marked molecule, Fluorescein-5-thiosemicarbazide (5-FTSC). 5-FTSC has an absorbance of 492 nm, and an emission of 516 nm. 1 mg/mL of the labeled fragments were solved in PBS buffer and the cells were incubated in 100 of ared mixture for 60 minutes. The final volume contained 500 µL of phosphate buffered saline (PBS) with 2x10⁶ Saccharomyces cerevisiae EBY100 cells displaying the four different constructs. The use of 1 mg of labeled zymosan was an estimated amount. Thereby, we wanted to ensure that the Vitellogenin domains on the surface of the cells are oversaturated. Afterwards, the cells were washed according to the following protocol: Each washing step included the centrifugation of the sample at 13,000 x g for 1 minute and resuspension of the cell pellet in 500 µL cold PBS buffer. PBS buffer is widely used to remove unbound and excessive compounds in enzyme immunoassays [3]. Following, the samples were incubated at 35 °C for 5 minutes. The temperature of 35° was based on the natural hive temperature of Western honey bees [4]. After every step the sample was analyzed in the FACS because we wanted to estimate the required amount of washing steps.

Test I: During the FACS run we performed three washing steps for every sample. The total amount was limited to a third wash because afterwards the cell counts in the FACS decreased drastically. So, there were not enough cells for a fourth run.

The FACS results were analyzed with the single cell flow cytometry FlowJo.  The first washing step showed a lower red fluorescence at a wavelength of 633 nm than the unwashed samples. Fig. 3 shows the decrease for three different zymosan concentrations of the construct displaying the α‑helical+DUF1943 domain. During the analysis we focused on the α‑helical+DUF1943 because it showed the best binding affinity to zymosan. The red fluorescence corresponds to the cMyc antibody binding of the construct. Therefore, it is independent from the zymosan binding and should not change throughout the washing steps. But apparently, the washing steps were too harsh for the low binding affinity of the cMyc antibody to the cMyc-tag. Additionally, the washing steps did not wash off the zymosan as indicated in Fig.4.

The figure shows a graph chart with three graphs representing three different tested zymosan concentrations. The x-axis only shows to points: pre wash and post wash. The y-axis shows the percentage of cells in the `high display` gate. All graphs show a strong decrease after the wash. — Figure 3: The red fluorescence signal at 633 nm wavelength before and after the first washing step, shown for the construct Vg_α‑helical+DUF1943 as an illustrative example. A decrease in the antibody signal was found in all samples with the three different zymosan concentrations of 0.02 µg/mL, 20 µg/mL and 200 µg/mL.

Learn I: With the unexpected results of the first FACS run we concluded that the washing steps are too harsh for the low binding force of the cMyc- antibody- antigen- complex. Furthermore, it did not affect the unbound zymosan concentration. Therefore, we needed to find a new way to estimate the right amount of zymosan for the measurement.

Cycle II

Design & Build II: Another approach to find the right amount of zymosan is the calculation of the binding kinetics of zymosan to the domains of Vitellogenin. Therefore, a normalization of the results with a cell control is required and a kinetics model fitting is needed. The binding kinetics lead to a dissociation constant K_D which is used to describe the binding affinity of one molecule to another. Using this approach, the ideal amount of zymosan for every domain can be estimated.

The concentrations of zymosan have been tested in a range of 0.2 µg/mL – 200 µg/mL. We chose a broad range because we could not disclose it further. The kinetics become more accurate the more zymosan concentrations within the range are measured.

Test II: During the second FACS run we tested four different concentrations in increments of 10, 0.2 µg/mL, 2 µg/mL, 20 µg/mL and 200 µg/mL. For the third FACS run the same concentrations were tested and enlarged by 0.5 µg/mL, 5 µg/mL and 50 µg/mL of zymosan. Unfortunately, the third FACS run did not lead to any valuable results because the constructs held deficient inserts. Therefore, we based the calculation of the dissociation constant K_D on a limited number of measuring points. During the process we decided to focus on the construct Vg_α-helical+DUF1943 because we observed the best binding affinity for this variant. To read more about the binding affinity of the constructs and the defects visit the results page.

Learn II: Overall, we were able to determine the K_D of the labeled zymosan and the construct Vg_α‑helical+DUF1943 which is 0.61 µg/mL. Based on the results we decided to use 2 µg/mL for the library sorting as they represent low and high saturation points of the Vg_α‑helical+DUF1943 construct. Thereby, we bypassed the washing steps and found a justified concentration of labeled zymosan to use for the YSD constructs. To obtain a K_D which is even more accurate, a higher number of measuring points should be considered.

Outlook

After all, we evaluated a protocol to test the zymosan binding affinity of Vitellogenin domains in a YSD. Still, more measuring points are required to further optimize the kinetics and consequently our working protocol. Also, the incubation time of zymosan with the domain- displaying cells could be adapted to additionally increase the binding affinity.

Predicting Zymosan Binding in Vitellogenin Models

For our protein engineering approach of Vitellogenin, a binding site prediction was required. These would allow us to check mutations for improved binding affinity in silico by performing docking simulations. Unfortunately, there are no papers available that predict concrete binding sites. So, we decided to perform a modeling of the binding sites for PAMPs as a dry lab approach.

The graphic shows two engineering cycles for the prediction of zymosan binding in Vitellogenin. It starts at Design 1 with the following information: Finding a pocket which predicts binding pockets for zymosan. The next step is Build 1 with the following information: Utilization of P2Rank with the AlphaFold structure of Vitellogenin. The next step is Test 1 with the following information: P2Rank indicates 42 binding pocket candidates. The next step is Learn 1 with the following information: P2Rank is not specific enough to focus on certain binding pockets. Then, the second cycle begins. The step Design 2 contains the following information: Finding a method to narrow down the predicted binding pockets. The next step is Build 2 with the following information: AlphaFold mean confidence score and previous research on the domains. The next step is Test 2 with the following information: 4 promising pocket candidates are left. The last step is Learn 2 with the following information: The combination of P2Rank and AlphaFold enclosed the potential binding pockets.

Figure 5: Schematic illustration of the two engineering cycles of the prediction of zymosan binding in Vitellogenin.

Cycle I

Design & Build I: As a first step, we tried to find computational tools to identify the relevant binding pocket of Vitellogenin. Most potential tools were not directly accessible and therefore out of question. In comparison, P2Rank was a tool that was developed with ease of use in mind and showed the most promising results. It is a stand-alone template-free tool for prediction of ligand binding sites based on machine learning [5]. Because no experimental structure of Vitellogenin is available we utilized the structure predicted by AlphaFold.

In P2Rank, the results are presented in .csv files with a given probability for every binding site candidate. To visualize the candidates inside the protein structure, we wrote a script for ChimeraX presenting each candidate in different colors and with corresponding numbers.

Test I: The tool P2Rank provided us with 42 predicted binding pocket candidates for the honey bee's Vitellogenin, with 14 of them having a predicted probability higher than 10%. For further analysis, we decided to set 10% probability as a threshold.

The picture shows the protein structure of Vitellogenin which is a huge coil of alpha helices and beta- sheets. The vWF domain can be estimated prominent outside of the protein. The 42 pocket candidates are spread in the main body of the protein without any structure.

Figure 6: Vitellogenin structure in ChimeraX, assembled with AlphaFold, indicating 42 predicted pocket candidates highlighted in different colors.

Learn I: Due to Vitellogenin's large size of 1770 amino acids, we already expected to find many potential pocket candidates. Additionally, Vitellogenin is known to show many different biological functions with binding activity to different types of substrates besides PAMP binding. Thereby, multiple binding sides even indicate the integrity of the applied tool [6]. Nevertheless, 14 pocket candidates are too many to appropriately work with.

Cycle II

Design & Build II: Our idea was to narrow down the pocket candidates by combining other tools with P2Rank to get more information about the candidates.

Whenever AlphaFold models are used for protein function prediction, the confidence score (pLDDT - predicted local distance difference test) provided by AlphaFold should be considered [7]. The pLDDT score is a per-residue confidence score. On average, the pocket candidates consist of 19 residues. Therefore, we implemented a tool to add the average confidence score of all involved residues to every pocket into the .csv file. Moreover, previous research [8] suggested only the α‑helical and the DUF1943 domains to bind PAMPs. Our own results of the yeast surface display align with this suggestion. Consequently, we planned to only consider pocket candidates placed in both domains.

For more insights, visit the modeling page.

Test II: After calculating the mean confidence score for all 14 pocket candidates of the first cycle, eight pockets showed a confidence score over 0.85. Four of them were present inside the relevant domains. Overall, we were able to limit the number of promising pocket candidates to four, as you can see in Fig. 7.

Table 1: The table shows the probability given by P2Rank for the 14 selected pocket candidates. The mean_plddt is the average AlphaFold confidence score for the involved residues. It represents the combination of P2Rank and AlphaFold which is required for further narrowing down the pockets

Name	Rank	Probability	Mean pLDDT
Pocket 1	1	0.98	90.93
Pocket 2	2	0.94	84.99
Pocket 3	3	0.92	95.61
Pocket 4	4	0.76	92.38
Pocket 5	5	0.65	76.17
Pocket 6	6	0.47	73.23
Pocket 7	7	0.27	84.66
Pocket 8	8	0.21	85.54
Pocket 9	9	0.18	83.89
Pocket 10	10	0.17	89.52
Pocket 11	11	0.16	81.08
Pocket 12	12	0.13	94.65
Pocket 13	13	0.12	95.63
Pocket 14	14	0.11	95.59

Learn II: If your protein of interest is large and has multiple proposed functions, additional information is required to make reasonable predictions of binding sites. The addition of the AlphaFold scores seems to be a useful complement to that.

Outlook

Additional research on a potential function of the four binding pockets is needed to clearly focus on one of them. According to literature [8], candidate 10 (shown in red in Fig.7) seems to be the most promising one. To combine the findings with the wet lab, we would like to obtain the sequencing of the improved variants from the directed evolution approach. In the sequencing data, the location of the mutations can be compared to the location of the pocket candidates.

Figure 7: Vitellogenin structure in ChimeraX, assembled with AlphaFold, indicating four predicted pocket candidates, highlighted in different colors. The α‑helical domain is presented in turquoise and the DUF1943 domain is presented in dark blue. Pocket 1 is yellow, pocket 3 is orange, pocket 4 is hot pink, and lastly, pocket 10 is red.

Enabling the Expression of Vitellogenin

The third engineering example is the encounter and fixing of the plasmid multimerization in Escherichia coli Turbo during the expression of Vitellogenin. The main contributor to plasmid dimerization and multimerization is the Recombinase A (RecA), an enzyme active in recombination. Previous studies were able to observe that RecA is a main contributor to plasmid dimerization and multimerization, but also RecA-independent dimerization and multimerization could be observed [9, 10]. During the process of the engineering cycle, we were able to observe decreased plasmid multimerization in the RecA-deficient E. coli Top10, when compared to E. coli Turbo, which has an active RecA [11, 12].

The graphic shows two engineering cycles for the Expression system of Vitellogenin. It starts at Design 1 with the following information: In silico design of pSB3CYP_Zeo shuttle vector. The next step is Build 1 with the following information: Construction of the vector in Escherichia coli Turbo. The next step is Test 1 with the following information: The plasmid shows an unexpected large size. The next step is Learn 1 with the following information: Escherichia Turbo multimerizes the plasmids due to RecA activity. Then, the second cycle begins. The step Design 2 contains the following information: Investigation into a recombinase A deficient strain. The next step is Build 2 with the following information: Reconstruction of the plasmids in Escherichia coli Top 10. The next step is Test 2 with the following information: Checking the plasmids for the correct size. The last step is Learn 2 with the following information: Escherichia coli Top10 is deficient of RecA which leads to significant less multimerization.

Figure 8: Schematic illustration of the two engineering cycles undertaken for the construction of the pSB3CYP plasmids and the resolution of the problem of plasmid dimerization/multimerization.

Cycle I

Design & Build I: After deciding on E. coli Turbo as the strain in which we wanted to assemble the construct, we started to design the plasmid for the expression of Apis mellifera Vitellogenin in Komagataella phaffii. During the assembly of the plasmid, we would construct the shuttle vector pSB3CYP_Zeo (BBa_K4675009). The shuttle vector can be used by other iGEM teams for the assembly of their own expression vectors. Our first aim was to interchange the selection marker and origin of replication of the pSB3CY_His (BBa_K4188002) plasmid with ones suitable for Komagataella phaffii. After the assembly of the zeocinR gene and PARS origin of replication into the vector, we tried to start with the assembly of the expression cassette. The zeocinR gene and PARS were incorporated into the vector by restriction digestions using the enzymes NotI and SpeI for the zeocinR and SpeI and BamHI for the PARS. Next, we assembled the expression cassette containing the promoter AOX1, terminator AOX1, and vitellogenin via Golden Gate Assembly (RCF [1000]) using the type IIs restriction enzyme SapI.

Test I: Throughout this phase, we tested the vectors and their intermediates for correct assembly. PCRs and gel electrophoresis were done to verify the integration of parts into the vector. Interestingly, the gel electrophoreses of some vectors revealed that they were larger than anticipated. For example, the constructs of the Lvl2_plasmids, based on the pSB3CY_aeBlue (BBa_K4188001), which were expected to be approximately 6-7 kb in size, were also observed to be 20 kb in length during gel electrophoresis.

Learn I: During this phase of the engineering cycle, we began to investigate the phenomenon of plasmids appearing larger in size than the sequence of the plasmids suggests. One of our advisors told us during a discussion that E. coli Turbo has the negative trait of often dimerizing or even multimerizing plasmids. Indeed, studies showcased that plasmids often dimerize or multimerize in E. coli strains that are not deficient in Recombinase A (RecA), like E. coli Turbo [9, 10, 12].

Cycle II

all relevant infos in caption — Figure 9: Gel electrophoresis checking for plasmid dimerization in *E. coli* Turbo and Top10. “Turbo” refers to a plasmid isolated from *E. coli* Turbo, “Top10” refers to a plasmid isolated from *E. coli* Top10. The plasmids were tested based on the pSB3CY_aeBlue (BBa_K4188001) with either a tryptophane synthase (Trp) or uracil synthase (Ura). The plasmid containing the tryptophane synthase has a length of 6818 bpsp;bp, the one containing the uracil synthase has a length of 6252 bp. In all tested plasmids, a second band at 20 kb was observed. These bands could be the result of dimerized or multimerized plasmids. The bands corresponding to the multimerized plasmids appear fainter in the Top10 samples.

Design & Build II: Based on the knowledge gained in the Learn I phase, we started to investigate RecA-deficient strains used for cloning. E. coli Top10 is such a strain with a deleted recA gene and should show reduced or no dimerization or multimerization of plasmids [9, 11]. The construction of the previously designed plasmid was then resumed as explained above, but with using E. coli Top10 instead of E. coli Turbo.

Test II: We tested the plasmids again via PCR and gel electrophoresis to detect possible dimerization and multimerization. Gel electrophoresis showed that the plasmids based on the pSB3CY_aeBlue (BBa_K4188001) cloned into E. coli Top10 appeared not only at 6-7 kb but also at approximately 20 kb. Comparing the plasmids from the two E. coli strains, we were able to see lighter bands at 20 kb, possibly indicating less plasmid multimerization occurring in E. coli Top10 (Fig. 2).

Learn II: E. coli Top10 is a RecA-deficient strain, and the gel electrophoresis showcased decreased plasmid dimerization and multimerization occurring during incubation (Fig. 2) [11]. Studies on recombination in E. coli suggest that recombinases such as RecA support the strand pairing and transfer of homologous sequences during recombination [13]. Still, we were able to see plasmid multimerization in plasmids isolated from E. coli Top10, which leads to the assumption that other factors play a role in plasmid multimerization [10]. Bi et al. were able to showcase in 1994 that plasmid dimerization can still occur in RecA-deficient strains. They concluded that RecA-independent recombination can still occur in homologous sequences of up to 300 bp in length [14]. Based on this data, we conclude that using E. coli Top10 is beneficial for plasmid cloning when compared to E. coli Turbo, but further optimization has to be done to decrease the occurrence of plasmid dimerization and multimerization.

References

[1] Bhupal Ban, Robert C. Blake, Diane A. Blake, “Yeast Surface Display Platform for Rapid Selection of an Antibody Library via Sequential Counter Antigen Flow Cytometry” (2022) Antibodies, doi: 10.3390/antib11040061
[2] David W. Colby, Brenda A. Kellogg, Christilyn P. Graff, Yik A. Yeung, Jeffery S. Swers, K. Dane Wittrup, “Engineering Antibody Affinity by Yeast Surface Display” Methods in Enzymology, (2004), doi: 10.1016/S0076-6879(04)88027-3
[3] Mandy Alhajj, Muhammad Zubair, Aisha Farhana, “Enzyme Linked Immunosorbent Assay”, National Library of Medicine (2023)
[4] Roberto Bava, Fabio Castagna, Christian Piras, Vincenzo Musolino, Carmine Lupia, Ernesto Palma, Domenico Britti, Vincenzo Musella, “Entomopathogenic Fungi for Pests and Predators Control in Beekeeping” (2022) doi: 10.3390/vetsci9020095
[5] Radoslav Krivák, David Hoksza,“P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure,” Journal of Cheminformatics, Aug. 2018
[6] V. Leipart et al., “Structure prediction of honey bee vitellogenin: a multi-domain protein important for insect immunity,” FEBS Open Bio, vol. 12, no. 1, pp. 51–70, Oct. 2021, doi: 10.1002/2211-5463.13316.
[7] M. Akdel et al., “A structural biology community assessment of AlphaFold2 applications,” Nature Structural & Molecular Biology, vol. 29, no. 11, pp. 1056–1067, Nov. 2022, doi: 10.1038/s41594-022-00849-w.
[8] V. Leipart et al., “Identification of 121 variants of honey bee Vitellogenin protein sequences with structural differences at functional sites,” Protein Science, vol. 31, no. 7, Jun. 2022, doi: 10.1002/pro.4369.
[9] J. R. Bedbrook and F. M. Ausubel, “Recombination between bacterial plasmids leading to the formation of plasmid multimers,” Cell, vol. 9, 4 PT 2, pp. 707-716, 1976, doi: 10.1016/0092-8674(76)90134-3.
[10] R. A. Fishel, A. A. James, and R. Kolodner, “recA-independent general genetic recombination of plasmids,” Nature, vol. 294, no. 5837, pp. 184-186, 1981, doi: 10.1038/294184a0.
[11] Genotypes and Genetic Markers of E. coli Competent Cells | Thermo Fisher Scientific - DE. [Online]. Available: https://www.thermofisher.com/de/de/home/life-science/cloning/cloning-learning-center/invitrogen-school-of-molecular-biology/molecular-cloning/transformation/competent-cell-genotypes-genetic-markers.html (accessed: Oct. 5, 2023).
[12] N. E. Biolabs, NEB^® Turbo Competent E. coli (High Efficiency) | NEB. [Online]. Available: https://www.neb.com/en/products/c2984-neb-turbo-competent-e-coli-high-efficiency#Product%20Information (accessed: Oct. 5, 2023).
[13] S. C. Kowalczykowski, “Initiation of genetic recombination and recombination-dependent replication,” Trends in Biochemical Sciences, vol. 25, no. 4, pp. 156–165, 2000, doi: 10.1016/s0968-0004(00)01569-3.
[14] X. Bi and L. F. Liu, “recA-independent and recA-dependent intramolecular plasmid recombination. Differential homology requirement and distance effect,” Journal of Molecular Biology, vol. 235, no. 2, pp. 414–423, 1994, doi: 10.1006/jmbi.1994.1002.

Modeling

Notebook

Measurement