Western honey bees (Apis mellifera) suffer under the influence of the Varroa mite, traditional control methods bear large disadvantages for beekeepers, as they accumulate in the honey and beeswax and spoil the final product. We wanted to create a new method on how to combat the Varroa mite using the egg yolk protein Vitellogenin.
Since Vitellogenin's involvement in transgenerational immune priming was only recently discovered [1], there is little information on where and how the PAMPs are bound to and transported by Vitellogenin. Hence, the heist to create a Vitellogenin variant optimized for our specific needs was a challenging task. Where in the protein should we start searching if we don't know what to look for? Only after heated discussions and extensive research did we finally develop a comprehensive plan aimed at optimizing Vitellogenin to facilitate binding to pathogen-associated molecular patterns (PAMPs) from fungi, thereby enhancing immunity to entomopathogenic fungi, allowing them to be used to combat Varroa mite infestation.
Starting from the wildtype Vitellogenin, we searched for optimized Vitellogenin in two ways: For the wet lab, we used directed evolution to introduce mutations into the protein and sort the proteins by their binding affinity through fluorescence activated cell sorting (FACS). As a dry lab approach, we did multiple predictions and simulations with individual Vitellogenin variants from the Western honey bee and other wild bee species. If we found a variant of Vitellogenin that exhibits the desired properties, the next step would be to express it in Komagataella phaffii and then test the binding affinity towards fungal PAMPs in a specifically designed binding assay.
Designing a protein whose mechanism of function is poorly understood presents numerous challenges, particularly when dealing with a complex protein like Vitellogenin. Its many different domains and functional regions [1] make it almost impossible to identify the residues responsible for PAMP (pathogen associated molecular pattern) affinity, without complex and lengthy trials. This made the prospect of success using exclusively rational design approaches limited.
That is why we decided to also pursue a different approach in the form of directed evolution. For this, a library of sequence variants is generated via random mutagenesis and is screened with a high throughput screening method to isolate mutants with the desired properties. The major advantage of this approach is that no prior knowledge of the proteins mechanisms is needed [2].
Directed evolution has been employed for a wide range of applications including optimization of wholecell biotechnological processes like fermentation [3–6], plant breeding [7], protein optimization [2] and biosensors [8], as well as new opportunities like the prediction of antibiotic resistances [9, 10] and many more [11, 12]. Its relevance is further underlined by the award of the Nobel prize in chemistry to Frances H. Arnold in 2018. She was honoured for developing the first procedures for enzyme directed evolution.
Many commercial proteins have been optimized via directed evolution. Among some of the modified parameters are protein stability and activity at different pH and temperature ranges [13–15], substrate turnover optimization [16, 17] and substrate specificity [18], but basically all properties of a protein can be influenced through a fitting directed evolution approach.
As previously discussed, the parameter we needed to improve was the zymosan-affinity of A. mellifera's’ Vitellogenin. We hypothesized that improving zymosan binding would enable the queen bee to bind and transport more zymosan into the oocytes [19] thus, mediate a higher protection of all offsprings against entomopathogenic fungi, which can be used for treating Varroa mite infestation.
With this aim, we needed to decide on a suitable workflow for generating and screening a library. We quickly discovered that surface display is one of the most widely used and effective methods for affinity engineering. It uses the capability of cells to produce and transport proteins onto their cell surface where they are easily accessible for screening, circumventing laborious purification. Single cell flow cytometry and fluorescent activated cell sorting (FACS) downstream make it possible to screen large libraries and isolate mutants that display proteins with the desired traits [20–24].
Multiple organisms can display mutated libraries such as phages [25, 26], bacterial [27], insect [28], mammalian [29, 30] and yeast cells [31–33]. Out of these options, we chose yeast as the system to use, as it offered the most complete platform with the least disadvantages. They are easier to cultivate and employ in experiments compared to insect or mammalian cells, while still being a eukaryotic expression system, meaning they can incorporate posttranslational modifications and guarantee correct protein conformation. Exactly these properties are not offered by phages and bacteria, making yeast the ideal organism for the screening of Vitellogenin mutant libraries.
Among yeast species, Saccharomyces cerevisiae is the most common choice for the display of protein libraries [31–33]. In general, a yeast surface display construct is composed of several elements (Fig. 1). A cell wall protein (CWP) interacts with a polypeptide on the cell wall, which in turn is typically linked to Glycosylphosphatidylinositol (GPI) [34, 35], thereby anchoring the construct onto the cell surface. The protein of interest (POI) is fused to the CWP via a flexible linker. It allows both the CWP and POI to fold and move independently from one another, increasing the viability of the displayed protein [36–38]. Lastly, an epitope tag is included in the construct. The epitope-tag allows the quantification of the fusion protein expression and the subsequent normalization of the measured binding affinity [24, 39].
There are several CWPs available as anchors. We decided to use the widely employed agglutinin mating protein Aga2p, which forms two covalent disulfide bonds with Agα1p on the cell surface. Aga2p also serves as a secretion factor for the construct, reducing the size and complexity of the expressed protein, as there is no need for a separate factor [31, 34, 40].
We decided to use a (G4S)3 linker as it offers a good compromise between flexibility and length. The polar amino acids also form hydrogen bonds with the surrounding water molecules, reducing the interaction of the linker with the protein domains [41].
The epitopes we used for quantification were the HA epitope and the Myc tag, which are very prevalent in many antibody engineering studies. These often have similar aims to our project in increasing binding affinity [24, 42, 43]. We used the two epitopes as a form of redundancy with the HA at the N-terminus and the cMyc C-terminal. If steric interference would have made the cMyc tag inaccessible for antibody binding, we could use the HA tag as a substitute.
The last decision we had to make for our construct was which parts of the protein to display. Vitellogenin is a large protein with a size of 202 kDA. Considering there was no literature regarding displayed proteins of this size and that full Vitellogenin secretion from yeasts was previously shown to be unsuccessful [44], we decided to display the different Vitellogenin domains.
Analyzing the functional domains [1] (Fig. 2), we discovered, a long sequence between the DUF1943 domain and the van Willebrand factor (vWf), which had no proposed function. As the vWf is predicted to be primarily involved in protein stabilization [1] we deemed this section of the protein non‑relevant to our engineering effort. Additionally, the N-terminal β-barrel domain is a putative transcription factor [45] linked to the protein via the cleavable poly-serin linker [46]. Considering its role, it will not necessarily be transported into the oocytes and likely serves no function in PAMP binding. Therefore, it was also excluded from the surface display constructs alongside the poly-serin linker. These decisions left us with only two domains of interest. The alpha-helical domain which was already suggested to bind PAMPs as a sole domain but with reduced affinity [47] and the adjacent DUF1943 domain which is known to contribute to PAMP binding in fish and coral species [48, 49]. We decided that reviewing the binding affinity of both domains separately, and combined could yield us valuable insight into the mechanism of PAMP binding, which was previously not described for A. mellifera's Vitellogenin. We also decided to try a display of a truncated Vitellogenin lacking the β-barrel domain, poly-serin linker, and ending directly after the vWf. This resulted in four constructs to be assembled and displayed. One displaying just the alpha-helical domain (45.5 kDA), one containing the DUF1943 domain (34.27 kDA), a third with the alpha-helical & DUF1943 region (97.75 kDA) and a lastly a truncated Vitellogenin version far larger than the other domains at 144.99 kDA.
Our choice to use the Aga2p anchor gave us a clear direction for which yeast strain to use. The widely employed S. cerevisiae EBY100 strain [24, 32, 39] contains the cell wall-bound Agα1p component of the anchor complex under the control of the galactose-inducible GAL1 promotor, making it ideal for our concept. The vector pCTCon2 is well suited for use in this strain, as it allows for the yeast surface display construct to also be expressed under the GAL1 promotor and auxotrophic selection via tryptophan dropout medium (Addgene #41843).
After constructing the display vectors, we needed to generate a highly diverse, non-biased library. We chose error-prone PCR with the nucleotide analogous dPTP and 8-oxo-GTP for that. It requires no prior knowledge of domain function like saturation mutagenesis [50] and is easier to execute in a short time frame compared to DNA shuffling [51]. The here utilized nucleotide analogues can mediate both transitions and transversions, while introducing less GC-bias into the mutated sequence when compared to manganese ion-based error‑prone PCRs [52, 53].
We generated the library by amplifying the plasmid backbone, excluding the insert, with a high-fidelity polymerase and the inserts in error-prone reactions with varying nucleotide analogue concentrations to cover a wider range of mutation rates. Generating libraries with the correct mutation rates is critical when considering, that 30-50% of mutations are deleterious to the protein function [54, 55]. This makes highly mutated libraries unusable for directed evolution, while too little amino acid exchanges result in only a small and less diverse variant pool. We decided to use 0.2; 2 and 20 µM of nucleotide analogues for our library generation to cover a wide range of mutation ranges.
80 bp long overhangs complementary to the plasmid backbone were introduced during the error-prone PCR to enable efficient homologous recombination of the library into the backbone inside the yeast cells [56]. The fragments were purified and concentrated via lyophilisation, pooled with the backbone, and used for yeast transformation via electroporation. The transformed yeasts were first grown in SD+CAA selective medium before being induced in SG+CAA medium inducing the presentation of the generated library (Fig. 3).
The library was then prepared for FACS by initially labelling the cMyc tag using mouse anti cMyc (ThermoFisher # 910MYCL) and goat antimouse AlexaFluor®633 (ThermoFisher # A-21126) antibodies. We then exposed the cells to FTSC labeled zymosan in varying concentrations. Protein variants with improved binding kinetics are more likely to retain the bound PAMP on the cell surface, leading to positive selection in the FACS. With this we were able to identify Vitellogenin variants with a higher affinity for zymosan, which can improve the transgenerational immune priming of honey bees against enteropathogenic fungi. Enabling their usage against Varroa mites.
To enable the screening process through FACS, it was crucial to fragment and label the zymosan with a fluorescent dye. As mentioned above, zymosan is a large, water-insoluble polysaccharide (approx. 300 kDa), which is why we initially subjected it to acid hydrolysis. We chose mild conditions to keep the zymosan fragments as large as possible so that they could still be recognized by Vitellogenin as PAMPs while remaining water-soluble. This was accomplished by subjecting them to a solution of 0.5 M HCl at 70 °C for a duration of 2 hours. Following acid hydrolysis, the soluble zymosan fragments were ultracentrifuged through a 3 kDa membrane to exclude very small sugar oligos, which were unlikely to be recognized as PAMPs (Fig. 4).
Subsequently, the larger zymosan fragments were labeled with the fluorescent dye fluorescein-5-thiosemicarbazide (FTSC). This dye reacts with the reducing ends of sugars, causing them to form hydrazones, which are subsequently reduced by sodium cyanoborohydride (NaBH3CN) to the fluorescent zymosan derivates. For ideal labeling conditions, a molar ratio of 8:1 (FTSC to reducing ends) and 70 µL of 0.4 M NaBH3CN per mg of zymosan, dissolved in a mixture of 7:3 DMSO:HOAc, can be used (Fig. 5).
After that, the FTSC-labeled zymosan was again passed through a 3 kDa membrane to remove the unreacted chemicals of the reaction. The FTSC-labeled zymosan was then precipitated in ethanol, a solvent in which the FTSC dye is soluble, to remove residual FTSC. To validate the successful acid hydrolysis and FTSC labeling, various methods were employed. Firstly, the acid hydrolysis was verified through Matrix-Assisted-Laser-Desorption-Ionization with Time-Of-Flight analysis of released ions for mass spectrometry (MALDI TOF MS). Further confirmation was obtained by observing fluorescence of the precipitated zymosan under UV light as well as unlabeled zymosan as a negative control. The protocols for this we created ourselves, as there was no such protocol available. You can find it in contribution.
Acid hydrolysis was verified by MALDI TOF MS. The results are depicted in Figure 6. Glucose has a molecular weight of 180 Da, and when forming the glycosidic bond, water with a molecular weight of 18 Da is eliminated. Considering that zymosan is a polymer comprised of beta-1,3- and beta-1,6-linked glucose monomers, the peaks in the mass spectrum exhibiting a consistent gap of 162 m/z prove that the hydrolysis was successful. Furthermore, the data also reveals a decrease in the quantity of zymosan fragments with increasing molecular size. As evident from the data, the hydrolysis indeed proved effective.
Following the successful fragmentation of the zymosan polymer, we proceeded to label the resulting fragments using the fluorescent dye FTSC. The labeled polymer demonstrated pronounced fluorescence at 492 nm, as depicted in figure 7. Following the washing steps, the flowthrough also exhibited fluorescence. This fluorescence decreased with each subsequent wash step, indicating a successful removal of the residual FTSC. Conversely, the negative control, consisting of unlabeled zymosan fragments, did not exhibit any fluorescence signal.
First, we constructed the base surface display plasmid via Gibbson Assembly. It contained the functional elements of the surface display, but not the Vitellogenin domains. We ran colony PCRs, amplifying the complete insert, to verify the presence of positive clones. The expected fragment size was 374 bp, which we could observe for several colonies. We send several colonies that showed the right-sized band to sequencing where the correct assembly of the construct was confirmed (Fig. 8).
Next, we needed to integrate the four different Vitellogenin fragments into the construct. We amplified the domains out of a fully assembled and codon optimized Vitellogenin sequence, ordered from IDT. After transformation we ran colony PCRs again, before sending the constructs to be sequenced to verify their identity.
Two colonies of Vg_α‑helical showed the expected band at ~1400 bp. Both were sent to sequencing, where one was confirmed positive (Fig. 10).
Two colonies showed the expected band at ~1000 bp. Both were sent for sequencing, and both were confirmed to contain the correct insert (Fig. 12 only one shown).
Again, two colonies showed the expected band at ~2200 bp. Both were sent to sequencing where they were confirmed to contain the correct construct (Fig. 14 only one sequence shown).
One colony showed a promising fragment at ~3900 bp. It was sent to sequencing where it was confirmed to contain the correct fragment (Fig. 16).
The first step of our engineering was to verify the correct display of our constructs. We were particularly concerned about the display of our larger constructs, Vg_alpha+DUF and Vg_truncated.
We used eight different controls (Fig. 17) to compare our constructs with a cell control, single antibody controls with the first and secondary antibodies, and a double antibody control. Each control was also run after incubation with zymosan to check for background yeast interaction with the PAMP.
Before analyzing flow cytometry data, it is important to isolate the so-called “events” relevant to the experiment. An event is any signal measured by a flow cytometer without any differentiation of its source. This means that events originating from debris, dead cells, or other sources need to be excluded before continuing with the analysis. There are several different measurements used to determine which accumulation of events, or “population,” shows the desired cells. Front scatter (FSC) measures the light passing through the event, while side scatter (SSC) measures the light being refracted sideways. The signal of each can be subdivided into a height (e. g., FSC-H), width (e.g., FSC-W), and area (e.g., FSC-A value). Height measures the signal intensity, while width measures the time the event passes through the beam. The integral of the measured event generates the area value. Different combinations of these can be used to investigate the properties of the measured events.
FSC-A vs SSC-A is used to isolate the cells from debris or other impurities. FSC-A indicates cell size, while SSC-A shows the internal complexity, also called granularity. As seen in Fig. 18, we observed a well-defined cell population with no debris and other populations that would indicate dead cells or other cell types. An area, or ‘gate,’ was set marking the cells in it for further analysis. We then checked if our sample contained cell aggregations. Cells can either stick together, or can by chance, be measured as a single event. As the presence of such multicell events would interfere with data interpretation they need to be sorted out. We did this via FSC-A vs FSC-H plots. Cell aggregations would appear as a lower population because of their different FSC-A composition. They have a larger FSC-W value due to their size with a lower FSC-H, which they can be sorted by. No subpopulations were observed in FSC‑A/FSC‑H, indicating that only single cells were measured with no cell aggregations present.
We then investigated which constructs presented the display construct. We used the two fluorescent markers fluorescein-5-thiosemicarbazide (FTSC) attached to zymosan and AlexaFluor®633 as our secondary antibody respectively to measure the presence of the PAMP in proportion to the displayed protein. A cell control with both antibodies and 0.2 mg of zymosan served as the baseline, which the displays were compared to. A gate named “high display” was set (Fig. 19) covering the whole PAMP fluorescence range, with the left border of the antibody fluorescence range at the edge of the main population. This gate included all cells that showed interaction with the antibodies and excluded those that did not. The event count in this gate was subsequently analyzed and the constructs were compared to the cell control. A higher percentage of cells in this gate indicates higher binding of the secondary antibody, which is expected if our construct is displayed.
As seen in Fig. 19 the displays of the Vg_α‑helical, Vg_α‑helical+DUF1943, and Vg_truncated domains were successful. The cell control shows a small population of cells interacting with the antibody, leading to 8.75% of the events being in the set gate. We tried normalizing the measurements with the control, but the low signal-to-noise ratio of our samples led all measured signals to be quashed to zero, making them unusable. But even without normalization we could observe a clear increase in event count across the three constructs. As expected, the size of the domains had a great impact on display efficiency with the Vg_α‑helical domain showing the strongest display, at 11.75% more events inside the chosen gate, while the larger Vg_α‑helical+DUF1943 was displayed to a slightly lower degree with an 10.05% increase in events. To our delight, we were also able to observe the display of the 150 kDA truncated sequence at a sufficient level, at 4.65% over the cell control. Only the separate display of the Vg_DUF1943 domain could not be demonstrated as the FACS runs were unusable due to technical issues of the cytometer.
Before analysis we viewed every sample in SSC-A against time plot to see if any irregularities were present. Almost all runs showed perfect readings but the VG_DUF1943 run had severe dropouts and shifts in readings (Fig. 20). We tried to negate these problems via a more extensive gating protocol. We first used a gate called ´FACS failure´ to remove all events from the measurement that fell into the timeframe of the dropout. Unfortunately, when plotting these cells using FSC-A against SSC-A we still saw a very irregular event pattern. One diffuse population was visible at a similar position to the normal cells, with a second population at SSC-A approaching zero. We used a gate to isolate the events at the usual position for cells and proceeded to analyze these. When gating them with the same ´high display´ and ´high binding´ gates as other samples, we realized that they possessed a display value about half of the cell control and only showed baseline PAMP binding affinity. We were unable to run another measurement due to time constraints and were therefore unable to make a verdict on the display of the Vg_DUF1943 domain.
After verifying the correct display of our domains of interest we investigated which of them were involved in Vitellogenin’s PAMP binding affinity. A new gate was created in the “high display” gate called ´high binding´ (Fig. 21). It was set to include the top ~50% events of the highest binding domain. This range was chosen as it reduced the effect of both positive and negative outliers. A gate set too high would increase the share of positive outliers in the overall events, while a broader gate, more focused on lower binding domains, would include almost all events of strong binding domains, making these domains more difficult to characterize.
The cell control again showed some affinity which could not be compensated. We expected the Vg_αhelical domain to have decent binding affinity, as it was previously suggested to be the main factor for PAMP recognition [57], but we could not confirm this. With 19.2% of events in the set gate it was only marginally more than the cell control. We suspected potential misfolding to be responsible for its poor binding characteristics, but in silico analysis did not support this. The Vg_alpha+DUF1943 domain showed the strongest binding, with 52.7% inside its gate, strongly suggesting a role of the Vg_DUF1943 domain in PAMP binding. This was further supported by the measurements of Vg_truncated. It had just 35.3% of events in the ´high binding´ gate, indicating that the remaining protein does not, or only to a small degree, participate in zymosan PAMP recognition.
As we had no ability to investigate the Vg_DUF1943 affinity we were unable to determine if zymosan binding is only mediated by the Vg_DUF1943 domain, or if the Vg_α‑helical contributes to it as well. Based on earlier studies describing Vg_α‑helical PAMP recognition and the marginal binding affinity observed in our measurement we suggest that both domains are cooperatively binding the polymer.
Based on these results, we chose Vg_alpha+DUF1943 for further work. It presented the best balance between display efficiency and strong binding affinity. Before screening the library, we needed to determine the binding kinetics of Vg_alpha+DUF1943. Without the correct PAMP concentration, we could either saturate the different protein variants regardless of their affinity or present the variants with too little PAMP to generate a sufficient signal to sort the cells. We measured a range of zymosan dilutions between 0.02 µg/mL and 200 µg/mL with the display and gated them as detailed above. We then used the event ratio to determine the degree of PAMP saturation of the presented domains. When fitting a curve to our data points assuming Michaelis‑Menten-like kinetics we calculated a KD = 0.61 µg/mL. As there is no substrate turnover with Vitellogenin KD is equal to KM (Fig. 22). Molar concentrations could not be determined, as we were unsure about the ratio of different zymosan oligomers in the labeled solution. Based on this we chose a concentration of 2 µg/mL for library screening as it presented a concentration which should saturate cells to ~75%. We expected that this would improve library sorting as binding cells would be strongly labeled and easier to isolate. Lower zymosan concentrations around KD or below would be more sensitive but could compromise sorting as binding cells are harder to differentiate from low or non‑binding ones.
We generated the library via error‑prone PCR using 0.1, 1, and 10 µM of 8‑o‑xoGTP and dPTP each. The fragments and a backbone amplified by a high-fidelity polymerase were used for the transformation of yeast. Subsequently, the cells were prepared for sorting. We decided to use two gates for sorting. One gate isolated cells with the combined properties of a high display with high binding affinity, while the other was set to sort out cells with slightly lower display and strong binding. These were chosen to not favor cells displaying variants with improved display characteristics, as these would be enriched in a high display gate. Due to time constraints, we were limited to a single FACS run and had to sort single cells immediately. Iterative FACS runs would have enabled us to sort out the top percent of binding cells and regrow them to a higher quality library, which would be sorted again. This would enrich improved variants to a point where they could be easily isolated with a lower risk of losing improved variants from a too aggressive sorting approach.
Unfortunately, when analyzing the data of the cell sorting it became apparent that the binding affinity of all libraries was not significantly different from the level of the cell control (Fig. 23).
To investigate the source of our findings, we cloned fraction of each library into a pBSK vector and send them for sequencing. It was revealed that the error-prone PCR amplified a range of unspecific and gapped vector fragments (Fig. 24). This, of course, made the library unusable as the lack of the homologous ends, only present to the left and right border of Vg_α‑helical+DUF1943, made recombination of the fragments inside the yeast impossible. The reason for the poor performance of the primers was likely in the switch from the NEB Q5® polymerase, where they were very reliable, to a Taq polymerase required for error-prone PCR and the accompanying changes in annealing temperature. The new reaction conditions apparently led to unspecific primer annealing. In the PCR, smaller fragments are generated with a higher amplification efficiency, which leads to them enriching in the reaction. Thereby, the desired sequence is largely replaced.
While our assay proved to be a reliable and easy-to‑perform, there are some adjustments that can be made to improve it for future applications. We would advise the addition of a terminator like PGK1 or CYC1 in the display constructs. It could increase protein yield and display capability of the cells by terminating transcription directly after the protein, reducing the metabolic burden of expression [58].
Another issue was the poor separation of displaying cells from non‑displaying ones in the cell staining. This appears to be a potential issue for large or flexible display constructs [39, 59]. The first approach for improving cell staining would be to utilize the second immune epitope to avoid potential steric hinderance of antibody binding at the c‑Myc tag. If this yields no improvement changes in the staining protocol could prove promising. Increased antibody incubation time or decreasing the incubation temperature present viable approaches [39]. Wen et al. (2010) also encountered poor staining separation issues similar to ours, which they resolved by using a single antibody stain instead of the standard two-step antibody protocol. We think these adjustments would greatly enhance the precision of the analysis and cell sorting which leads to an improved identification of the displayed cells while the carryover of non‑displaying cells is greatly improved.
For library generation we advise to check the error-prone PCR on a gel to see if unspecific fragments are present. We did not find sources suggesting that nucleotide analogues are responsible for unspecific primer binding or generation of gapped PCR sequences. If our issues reoccur with adjusted annealing temperature and primers it would be advisable to change the error-prone PCR to a variant employing manganese ions and adjusted dNTP mixes to counteract GC-bias.
As an alternative approach to increase pathogen associated molecular pattern (PAMP) binding affinity, we wanted to express the C-type lectin 14 (CTL14) of Helicoverpa armigera in Escherichia coli for testing in the zymosan binding assay. C-type lectins are a broad superfamily of proteins characterized by their C-type lectin domain (CTLD). A common feature of CTLs is calcium dependent binding of ligands [56]. These proteins play important roles in immunity. For example, they recognize microbes, such as fungi and bacteria, through PAMPs through various glycan structures [56]. CTLs are key factors in the immune system of invertebrates and most importantly for insects, such as Apis mellifera [57, 58]. We found out that the CTL14 of H. armigera can bind the PAMP zymosan [59]. Zymosan is also a major cell wall component of Beauveria bassiana, one of the entomopathogenic fungi which could be used to fight against the Varroa mite [60], thus making the CTL14 a suitable protein to test in the zymosan binding assay. Furthermore, we wanted to test a fusion protein of Apis mellifera Vitellogenin and CTL14. We hypothesized that a fusion protein would show improved zymosan binding activity, due to the binding capabilities of CTL14 and Vitellogenin. We started planning our lab work and looking into possible expression vectors. We identified the pET28b as a suitable expression vector for the expression of recombinant proteins. The pET28b has the advantage that expression can be induced through the lac-operon with isopropyl‑beta‑D‑thiogalactopyranoside (IPTG). Additionally, we needed a possible linker for the A. mellifera Vitellogenin and H. armigera CTL14 to construct a fusion protein. Our research was focused on possible linkers, which would keep the activity of both proteins intact and their structures distinct. We simulated the protein conformation of the fusion protein of Vitellogenin and CTL14 with AlphaFold using different linkers. In the end, we chose to express the fusion protein with a helical linker with the amino acid sequence A(EAAAK)2A. This linker was chosen as it was simulated to keep the proteins structures distinct and the His-Tag accessible for purification. More information regarding the fusion protein modeling is available on our modeling page. For the construction of the vector, we chose the E. coli Top10 strain for its high transformation rates and for the expression of the proteins we chose E. coli BL21, as it has reduced protein degrading and the T7 RNA-polymerase of the T7 phage under the control of a lac operon, making it ideal for high protein expression using the pET28b vector. For this project, we chose E. coli strains instead of Komagataella phaffii, as we did for the expression solely of the A. mellifera Vitellogenin. E. coli is often used in the literature for the expression of CTL´s [61]. Consequently, we aimed to investigate whether the interaction between Vitellogenin and CTL14 could lead to a substantial change in binding affinity and potentially enhance the bee's immune response positively. Thus, we opted to evaluate this combination using a fusion protein.
Having decided on the expression strain, linker, and vector for the expression, we started the construction of the pET28b-vector containing the CTL14, Vitellogenin and them linked together, so three different vectors. The proteins CTL14 and Vitellogenin were cloned solely into the pET28b vector to normalize the binding of zymosan by them. We decided to do Gibson assemblies for the construction of the expression vectors as the pET28b lacks the necessary restriction sites for golden gate assemblies.
Our plan was to add BsaI restriction sites flanking the genes during the construction of the expression vectors for RCF[1000]compatibility. The protein expression of the three recombinant proteins would then have been induced through the lac-operon located on the pET28b vector.
We began by preparing the vitellogenin and ctl14 genes, cloning a 10x His-tag onto the C-terminal ends of the genes by PCR amplification. This would allow us to purify the proteins later through affinity chromatography. Gel electrophoresis was used to verify the length of the DNA fragments and is shown in Fig. 28.
In the next step, the previously amplified fragments were appended to the pET28b vector using PCR with optimized primers to create the necessary overhangs for Gibson Assembly (Fig. 29). Initially, primers were used to create an overhang that included specific bases from the pET28b vector, ensuring that the fusion sides for the Gibson Assembly were located in the pET28b vector and not in the genes. For this purpose, the pET28b empty vector was linearized at the desired locations by PCR, with the ends containing the necessary overhangs for Gibson Assembly.
Following the amplification, a DpnI digestion was done to remove the backbone from the reaction mixture and increase the cloning efficiency. DpnI is a restriction enzyme cutting only methylated DNA, which is only present in the plasmid isolated from E. coli and not the PCR product. Subsequently, two Gibson assemblies were performed for the linearized vectors pET28b_Vg and pET28b_CTL14, followed by a transformation into E. coli Top10. Since the pET28b vector contains an antibiotic resistance gene for chloramphenicol, selection was performed on LB agar plates supplemented with chloramphenicol. The grown colonies were checked for the presence of the desired genes using a colony PCR. No positive colonies were found during screening.
In consultation with our instructors, we decided to subject the pET28b vector to extended DpnI digestion overnight. Additionally, we optimized the Gibson assembly overhang primers to not only extend the overhang in the pET28b vector but also extend the vector into the genes. These overhangs were designed to be 20 to 44 bases in length. Unfortunately, the results with the extended Gibson assembly overhang primers were unsatisfactory, prompting us to attempt gradient PCR and touchdown PCR to create optimal conditions for the primers. After many optimization steps, we were able to successfully complete the PCRs and amplify the genes with the necessary overhangs we needed for cloning them into the pET28b vector. One result is shown in Fig. 30.
At this point, we shifted our focus on the project involving a vector in which Vitellogenin is attached to CTL14 using a protein linker. A His-tag was first cloned onto the C-terminal end of the CTL14 gene through PCR. Subsequently, optimized primers were used again to clone the protein linker onto the C‑terminal end of the vitellogenin gene. Due to the linker between the Vitellogenin and CTL14, different Gibson overhangs were cloned via PCR. We used the same linearized vector from the pET28b_Vg plasmid for the Gibson Assembly. The Gibson Assembly for the pET28b_Vg_alphahelical_CTL14His10xC was then carried out and transformed into E. coli Top10. Unfortunately, no positive colonies were observed in subsequent colony PCRs. We repeated the colony PCRs using optimized primers for the incorporated genes, but still, no positive colonies could be detected.
Despite encountering similar problems, we diligently attempted to troubleshoot and rectify the issues. However, we regret to report that we were not successful in assembling the desired vector, and further investigation was impeded by time constraints. Therefore, we reached a point where we were unable to explore additional potential sources of error due to the limited available time for our research.
The aim of our project was to express the optimized Apis mellifera Vitellogenin, which was created either through directed evolution or modeled in the dry lab via rational design choices, for further analysis and characterization. The expression system used was the yeast Komagataella phaffii, formerly Pichia pastoris.
Our research into the A. mellifera Vitellogenin began after we had decided to tackle the problem Varroa mites and other pests pose to Western honey bees. We became aware that Vitellogenin is glycosylated under native conditions, thus directing our research towards an organism capable of glycosylation [1]. As a eukaryote, Komagataella phaffii is capable of glycosylation and is often used for the expression of glycosylated proteins instead of Escherichia coli, as glycosylation does not occur in the latter [62–66]. For this reason, we chose K. phaffii as the organism for expression instead of E. coli. We acquired the K. phaffii CBS7435 PEP4 KO strain in cooperation with the Institute for Microbiology and Biotechnology from the University of Vienna. The strain exhibits reduced protein degradation, as the knocked-out pep4 gene is the precursor of a major protease.
Next, we investigated promoters and terminators suitable for expression in K. phaffii. The AOX1 promoter, a methanol inducible promoter, and its corresponding terminator are used for their high expression rates in K. phaffii [66, 67]. For our project, we decided to build upon the work of the iGEM Münster team 2022 and use their previously designed vector pSB3CY_His (BBa_K4188002) for S. cerevisiae (Fig. 33). The pSB3CY_His has a 2µ origin of replication and a his3 auxotrophy marker. We replaced both parts with parts commonly used in literature for recombinant protein expression in K. phaffii. Most often, positive transformants are selected by introducing either an auxotrophy marker or an antibiotic resistance to the plasmid. The selection of positive transformants in the project was done by the antibiotic Zeocin®. Many expression vectors for K. phaffii carry PARS for replication. Based on the widespread usage of both Zeocin® for selection and PARS as the origin of replication, these parts were integrated into our vector to generate the pSB3CYP_Zeo plasmid (BBa_K4675009) (Fig. 34). Apis mellifera’s vitellogenin gene was ordered from IDT in 4 fragments and assembled via Gibson Assembly (Fig. 35).
To begin with the construction of the pSB3CYP_VgHis10xC, we designed primers to amplify the PARS, zeoR, promoter AOX1 and terminator AOX1 from the Open Yeast Library. Additionally, PCRs were carried out amplifying the A. mellifera vitellogenin. The E. coli Turbo strain was used to assemble and amplify our newly constructed plasmids. The primers were designed to add SapI restriction sites and fusion sites to the promotor AOX1, the vitellogenin gene and the terminator AOX1 for the assembly into the vector, replacing the mRFP gene. Additionally, the primer designed for vitellogenin added a C-terminal 10x His-Tag. The zeoR was amplified with primers adding NotI and SpeI restriction sites and PARS with primers adding SpeI and BamHI restriction sites. Before PARS could be integrated into the pSB3CY_His shuttle vector (BBa_K4188002) a site-directed mutagenesis had to be performed, deleting a BamHI restriction site in the PARS origin by changing one base of the sequence. The PCRs were done using high fidelity polymerase to minimize the introduction of additional mutations into the DNA sequences. The PCRs amplifying the parts from the distribution kit did not work and optimization of the primers yielded no positive results (Fig. 36). Additionally, we tested the plasmids in a gel electrophoresis for length, to check for dimerization or multimerization. The gel electrophoresis showcased that nearly all tested plasmid multimerized during the incubation (Fig. 37). Hence, we decided to order the genes present in the distribution kit from IDT after discussing the issue with our instructors and advisors. Using the genes synthesized by IDT as a template, the fragments containing restriction sites and fusion sites were successfully amplified. The zeoR and PARS were integrated into the vector by restriction cloning, replacing the original 2µ and his3. The next step was to exchange the mRFP with the cassette for expression of Vitellogenin. The Golden Gate assemblies (RCF[1000]) using the SapI restriction enzymes did not work using our protocol, as SapI did not seem to cut in the vector. Here again we saw that dimerization occurred in E. coli Turbo. Shortly here after we switched to E. coli Top10 (Fig. 38). The PCRs for the amplification of the parts for the expression cassettes were repeated and the protocols checked by our advisors. Still no successful construction could be achieved. We then ordered the pSB3CYP_VgHis10xC vector as fragments and assembled the vector via Gibson Assembly. We divided the vector into seven fragments and added the overhangs necessary for the Gibson Assembly directly to the ordered sequences (Fig. 39). However, sequencing of the construct showcased that the stop codon in the zeoR gene had undergone a mutation, making the gene non-functional and selection of transformed K. phaffii impossible. Due to time constraints, we were unable to conduct site-directed mutagenesis in the stop codon of zeoR.