Designing a project as complex as BeeVAX involves extensive planning and continually refining those plans to achieve optimal outcomes.
Our project aimed to create an enhanced version of Vitellogenin via protein engineering. To achieve this goal, we designed our experiments within each lab group in both, dry lab and wet lab. We were very excited to apply the plan to real laboratory conditions and test if our design worked out as intended. Throughout this process, failures, whether minor or major, were a challenge for our motivation. Nevertheless, these challenges often became opportunities to improve our design and learn something for future attempts. We believe that this is the essence of engineering success. In the following, we present three examples that passed every step of the engineering cycle to contribute to an optimized version of Vitellogenin: 1) Investigating the best binding affinity to zymosan, 2) Predicting zymosan binding in Vitellogenin models and 3) Enabling the expression of Vitellogenin.
"The human foot is a masterpiece of engineering and a work of art." More broadly you could say nature in general is an evolutionary masterpiece of engineering and we are the ones to learn from it. As bioengineers we apply our knowledge about biological mechanisms and systems to create new innovative systems. Thereby, we can enable solutions for unsolved problems or invent new products using the inherent power of nature itself.
Our project is a middle-out approach combining forward and reverse bioengineering to generate the bee's egg-yolk protein Vitellogenin with an optimized binding affinity against fungi PAMPs to enable a new treatment against the Varroa mite infestation of bee hives. The reverse engineering approach is based on the aim to gain more knowledge about the binding sites and domains of Apis mellifera`s Vitellogenin. In the forward engineering steps, we focused on increasing the binding affinity of Vitellogenin to fungi's PAMPS by a random mutagenesis-based approach, more precisely, the directed evolution of the protein combined with a yeast surface display (YSD).
To successfully engineer our protein of interest, we passed through the engineering cycle in different parts of our project. The cycle consists of four successive steps: Design- Build- Test- Learn. During the Design and Build step, the known and unknown parts of the projects are estimated and implemented in a detailed lab protocol. By testing, the design is approved or qualified for optimization. The latter leads to the following step: Learn, which enables a re-design and optimization of the project design.
Each part of the project passed through several cycles to finally reach the obtained aims. But looking back, also the whole project itself went through various of these cycles. In the following, some aspects of the engineering process of our project are depicted. It is important to consider that an engineering process is only possible with a previous failure which is erased by finding the weak point and optimizing the project design by starting a new cycle.
One principal part of our project was the use of a yeast surface display (YSD) to present the domains of the Apis mellifera`s Vitellogenin on the surface of Saccharomyces cerevisiae. To read more about the constructs visit the project description. The cells of the YSD were analyzed using fluorescence activated cell sorting (FACS) with the two following main aims: 1) to investigate which domain of Vitellogenin shows the highest zymosan binding affinity and 2) the cell sorting of error-prone PCR mutated versions of these domains to find mutations which lead to an increased binding affinity to zymosan.
This is the first attempt measuring zymosan binding to Vitellogenin domains via a YSD. Therefore, no protocols regarding the handling of zymosan and testing of the binding affinity using a YSD based approach were available. We were required to depend on YSD protocols employing diverse binding substrates, commonly utilized in antibody engineering, to assess an appropriate methodology. For antibody engineering the substrates are mainly biotinylated [1][2]. Hence, these approaches do not include fluorescence marked substrates and are inherently different from ours.
Design & Build I: To find the best binding affinity, the zymosan had to be fragmented and labeled with a fluorescence marked molecule, Fluorescein-5-thiosemicarbazide (5-FTSC). 5-FTSC has an absorbance of 492 nm, and an emission of 516 nm. 1 mg/mL of the labeled fragments were solved in PBS buffer and the cells were incubated in 100 of ared mixture for 60 minutes. The final volume contained 500 µL of phosphate buffered saline (PBS) with 2x106 Saccharomyces cerevisiae EBY100 cells displaying the four different constructs. The use of 1 mg of labeled zymosan was an estimated amount. Thereby, we wanted to ensure that the Vitellogenin domains on the surface of the cells are oversaturated. Afterwards, the cells were washed according to the following protocol: Each washing step included the centrifugation of the sample at 13,000 x g for 1 minute and resuspension of the cell pellet in 500 µL cold PBS buffer. PBS buffer is widely used to remove unbound and excessive compounds in enzyme immunoassays [3]. Following, the samples were incubated at 35 °C for 5 minutes. The temperature of 35° was based on the natural hive temperature of Western honey bees [4]. After every step the sample was analyzed in the FACS because we wanted to estimate the required amount of washing steps.
Test I: During the FACS run we performed three washing steps for every sample. The total amount was limited to a third wash because afterwards the cell counts in the FACS decreased drastically. So, there were not enough cells for a fourth run.
The FACS results were analyzed with the single cell flow cytometry FlowJo. The first washing step showed a lower red fluorescence at a wavelength of 633 nm than the unwashed samples. Fig. 3 shows the decrease for three different zymosan concentrations of the construct displaying the α‑helical+DUF1943 domain. During the analysis we focused on the α‑helical+DUF1943 because it showed the best binding affinity to zymosan. The red fluorescence corresponds to the cMyc antibody binding of the construct. Therefore, it is independent from the zymosan binding and should not change throughout the washing steps. But apparently, the washing steps were too harsh for the low binding affinity of the cMyc antibody to the cMyc-tag. Additionally, the washing steps did not wash off the zymosan as indicated in Fig.4.
Learn I: With the unexpected results of the first FACS run we concluded that the washing steps are too harsh for the low binding force of the cMyc- antibody- antigen- complex. Furthermore, it did not affect the unbound zymosan concentration. Therefore, we needed to find a new way to estimate the right amount of zymosan for the measurement.
Design & Build II: Another approach to find the right amount of zymosan is the calculation of the binding kinetics of zymosan to the domains of Vitellogenin. Therefore, a normalization of the results with a cell control is required and a kinetics model fitting is needed. The binding kinetics lead to a dissociation constant KD which is used to describe the binding affinity of one molecule to another. Using this approach, the ideal amount of zymosan for every domain can be estimated.
The concentrations of zymosan have been tested in a range of 0.2 µg/mL – 200 µg/mL. We chose a broad range because we could not disclose it further. The kinetics become more accurate the more zymosan concentrations within the range are measured.
Test II: During the second FACS run we tested four different concentrations in increments of 10, 0.2 µg/mL, 2 µg/mL, 20 µg/mL and 200 µg/mL. For the third FACS run the same concentrations were tested and enlarged by 0.5 µg/mL, 5 µg/mL and 50 µg/mL of zymosan. Unfortunately, the third FACS run did not lead to any valuable results because the constructs held deficient inserts. Therefore, we based the calculation of the dissociation constant KD on a limited number of measuring points. During the process we decided to focus on the construct Vg_α-helical+DUF1943 because we observed the best binding affinity for this variant. To read more about the binding affinity of the constructs and the defects visit the results page.
Learn II: Overall, we were able to determine the KD of the labeled zymosan and the construct Vg_α‑helical+DUF1943 which is 0.61 µg/mL. Based on the results we decided to use 2 µg/mL for the library sorting as they represent low and high saturation points of the Vg_α‑helical+DUF1943 construct. Thereby, we bypassed the washing steps and found a justified concentration of labeled zymosan to use for the YSD constructs. To obtain a KD which is even more accurate, a higher number of measuring points should be considered.
After all, we evaluated a protocol to test the zymosan binding affinity of Vitellogenin domains in a YSD. Still, more measuring points are required to further optimize the kinetics and consequently our working protocol. Also, the incubation time of zymosan with the domain- displaying cells could be adapted to additionally increase the binding affinity.
For our protein engineering approach of Vitellogenin, a binding site prediction was required. These would allow us to check mutations for improved binding affinity in silico by performing docking simulations. Unfortunately, there are no papers available that predict concrete binding sites. So, we decided to perform a modeling of the binding sites for PAMPs as a dry lab approach.
Design & Build I: As a first step, we tried to find computational tools to identify the relevant binding pocket of Vitellogenin. Most potential tools were not directly accessible and therefore out of question. In comparison, P2Rank was a tool that was developed with ease of use in mind and showed the most promising results. It is a stand-alone template-free tool for prediction of ligand binding sites based on machine learning [5]. Because no experimental structure of Vitellogenin is available we utilized the structure predicted by AlphaFold.
In P2Rank, the results are presented in .csv files with a given probability for every binding site candidate. To visualize the candidates inside the protein structure, we wrote a script for ChimeraX presenting each candidate in different colors and with corresponding numbers.
Test I: The tool P2Rank provided us with 42 predicted binding pocket candidates for the honey bee's Vitellogenin, with 14 of them having a predicted probability higher than 10%. For further analysis, we decided to set 10% probability as a threshold.
Learn I: Due to Vitellogenin's large size of 1770 amino acids, we already expected to find many potential pocket candidates. Additionally, Vitellogenin is known to show many different biological functions with binding activity to different types of substrates besides PAMP binding. Thereby, multiple binding sides even indicate the integrity of the applied tool [6]. Nevertheless, 14 pocket candidates are too many to appropriately work with.
Design & Build II: Our idea was to narrow down the pocket candidates by combining other tools with P2Rank to get more information about the candidates.
Whenever AlphaFold models are used for protein function prediction, the confidence score (pLDDT - predicted local distance difference test) provided by AlphaFold should be considered [7]. The pLDDT score is a per-residue confidence score. On average, the pocket candidates consist of 19 residues. Therefore, we implemented a tool to add the average confidence score of all involved residues to every pocket into the .csv file. Moreover, previous research [8] suggested only the α‑helical and the DUF1943 domains to bind PAMPs. Our own results of the yeast surface display align with this suggestion. Consequently, we planned to only consider pocket candidates placed in both domains.
For more insights, visit the modeling page.
Test II: After calculating the mean confidence score for all 14 pocket candidates of the first cycle, eight pockets showed a confidence score over 0.85. Four of them were present inside the relevant domains. Overall, we were able to limit the number of promising pocket candidates to four, as you can see in Fig. 7.
Name | Rank | Probability | Mean pLDDT |
---|---|---|---|
Pocket 1 | 1 | 0.98 | 90.93 |
Pocket 2 | 2 | 0.94 | 84.99 |
Pocket 3 | 3 | 0.92 | 95.61 |
Pocket 4 | 4 | 0.76 | 92.38 |
Pocket 5 | 5 | 0.65 | 76.17 |
Pocket 6 | 6 | 0.47 | 73.23 |
Pocket 7 | 7 | 0.27 | 84.66 |
Pocket 8 | 8 | 0.21 | 85.54 |
Pocket 9 | 9 | 0.18 | 83.89 |
Pocket 10 | 10 | 0.17 | 89.52 |
Pocket 11 | 11 | 0.16 | 81.08 |
Pocket 12 | 12 | 0.13 | 94.65 |
Pocket 13 | 13 | 0.12 | 95.63 |
Pocket 14 | 14 | 0.11 | 95.59 |
Learn II: If your protein of interest is large and has multiple proposed functions, additional information is required to make reasonable predictions of binding sites. The addition of the AlphaFold scores seems to be a useful complement to that.
Additional research on a potential function of the four binding pockets is needed to clearly focus on one of them. According to literature [8], candidate 10 (shown in red in Fig.7) seems to be the most promising one. To combine the findings with the wet lab, we would like to obtain the sequencing of the improved variants from the directed evolution approach. In the sequencing data, the location of the mutations can be compared to the location of the pocket candidates.
The third engineering example is the encounter and fixing of the plasmid multimerization in Escherichia coli Turbo during the expression of Vitellogenin. The main contributor to plasmid dimerization and multimerization is the Recombinase A (RecA), an enzyme active in recombination. Previous studies were able to observe that RecA is a main contributor to plasmid dimerization and multimerization, but also RecA-independent dimerization and multimerization could be observed [9, 10]. During the process of the engineering cycle, we were able to observe decreased plasmid multimerization in the RecA-deficient E. coli Top10, when compared to E. coli Turbo, which has an active RecA [11, 12].
Design & Build I: After deciding on E. coli Turbo as the strain in which we wanted to assemble the construct, we started to design the plasmid for the expression of Apis mellifera Vitellogenin in Komagataella phaffii. During the assembly of the plasmid, we would construct the shuttle vector pSB3CYP_Zeo (BBa_K4675009). The shuttle vector can be used by other iGEM teams for the assembly of their own expression vectors. Our first aim was to interchange the selection marker and origin of replication of the pSB3CY_His (BBa_K4188002) plasmid with ones suitable for Komagataella phaffii. After the assembly of the zeocinR gene and PARS origin of replication into the vector, we tried to start with the assembly of the expression cassette. The zeocinR gene and PARS were incorporated into the vector by restriction digestions using the enzymes NotI and SpeI for the zeocinR and SpeI and BamHI for the PARS. Next, we assembled the expression cassette containing the promoter AOX1, terminator AOX1, and vitellogenin via Golden Gate Assembly (RCF [1000]) using the type IIs restriction enzyme SapI.
Test I: Throughout this phase, we tested the vectors and their intermediates for correct assembly. PCRs and gel electrophoresis were done to verify the integration of parts into the vector. Interestingly, the gel electrophoreses of some vectors revealed that they were larger than anticipated. For example, the constructs of the Lvl2_plasmids, based on the pSB3CY_aeBlue (BBa_K4188001), which were expected to be approximately 6-7 kb in size, were also observed to be 20 kb in length during gel electrophoresis.
Learn I: During this phase of the engineering cycle, we began to investigate the phenomenon of plasmids appearing larger in size than the sequence of the plasmids suggests. One of our advisors told us during a discussion that E. coli Turbo has the negative trait of often dimerizing or even multimerizing plasmids. Indeed, studies showcased that plasmids often dimerize or multimerize in E. coli strains that are not deficient in Recombinase A (RecA), like E. coli Turbo [9, 10, 12].
Design & Build II: Based on the knowledge gained in the Learn I phase, we started to investigate RecA-deficient strains used for cloning. E. coli Top10 is such a strain with a deleted recA gene and should show reduced or no dimerization or multimerization of plasmids [9, 11]. The construction of the previously designed plasmid was then resumed as explained above, but with using E. coli Top10 instead of E. coli Turbo.
Test II: We tested the plasmids again via PCR and gel electrophoresis to detect possible dimerization and multimerization. Gel electrophoresis showed that the plasmids based on the pSB3CY_aeBlue (BBa_K4188001) cloned into E. coli Top10 appeared not only at 6-7 kb but also at approximately 20 kb. Comparing the plasmids from the two E. coli strains, we were able to see lighter bands at 20 kb, possibly indicating less plasmid multimerization occurring in E. coli Top10 (Fig. 2).
Learn II: E. coli Top10 is a RecA-deficient strain, and the gel electrophoresis showcased decreased plasmid dimerization and multimerization occurring during incubation (Fig. 2) [11]. Studies on recombination in E. coli suggest that recombinases such as RecA support the strand pairing and transfer of homologous sequences during recombination [13]. Still, we were able to see plasmid multimerization in plasmids isolated from E. coli Top10, which leads to the assumption that other factors play a role in plasmid multimerization [10]. Bi et al. were able to showcase in 1994 that plasmid dimerization can still occur in RecA-deficient strains. They concluded that RecA-independent recombination can still occur in homologous sequences of up to 300 bp in length [14]. Based on this data, we conclude that using E. coli Top10 is beneficial for plasmid cloning when compared to E. coli Turbo, but further optimization has to be done to decrease the occurrence of plasmid dimerization and multimerization.