General Design

Our goal in the beginning was to modify Blood Antigens A and B to 0 to alleviate blood conserve shortages. In our literature search we found a paper proved efficient conversion of A to 0. We decided to try to convert Antigen B molecules into Antigen A, as efficient conversion from A to 0 is already possible[1], and the general conversion repertoire would thus be expanded to include conversion from B to AB, B to A and AB to A, making tailored blood types possible.

For this reaction no known enzymes existed yet. Because of this we had the idea of using a screening method to identify potential candidate enzymes. As a cost and time effective screening method, we decided to employ AlphaFold 2 structures of enzymes belonging to promising enzyme families, which were screened for general ligand fit using state of the art molecular docking approaches, which we planned to validate using wetlab experiments. This in silico screening approach allowed us to start with hundreds of candidate enzyme sequences, of which no known experimental crystallographic structures were known, and only test the most promising candidates in the wetlab.

Furthermore, we decided to investigate whether a family of proteins, the so-called carbohydrate binding modules, (CBMs) could increase the affinity of enzymes to the surfaces of erythrocytes. CBMs were shown to fold independently from an enzyme's catalytic domain and can localize an enzyme to a specific molecule or surface. [2].

generalized overview scheme
Fig.1 - generalized overview scheme

Drylab

We designed a funnel like screening pipeline, getting rid of as many enzymes as early as possible in the screening, as the complexity of employed algorithm steadily increases. This process consisted of multiple design cycles gradually expanding the pipeline from a brute force search of the structural space to a pre screening pipeline that allowed to compare more sequences with less compute time in the end.

Build of screening pipeline

First, suitable enzyme families, that were in theory capable of modifying the correct hydroxy group were researched in literature. For this, multiple families of monooxygenases were evaluated on the criteria of known characterized homologues, their likelihood of ligand binding, the ease of purification, known homologue structures as a control for AlphaFold structures and docking experiments, as well as the complexity of binding pocket and known, similar, ligands.

The following enzyme families were considered on the above mentioned criteria:

CAZY Family AA5, which are copper radical oxidases, that are known to oxidize sugar substrates, and even more specifically the 6th hydroxy group of galactose.

CAZY Family AA3, which are flavoprotein oxidoreductases. They are known to oxidize sugar substrates, for example the second hydroxy group of glucose.

CAZY Family AA10, which are copper lytic monooxygenases. As these enzymes are catalyzing oxidative cleavages of polysaccharides, this family was ruled out very early in our research.

Initially, Family AA3 was selected due to one enzyme, called Pyranose 2 Oxidase, which according to literature, is capable of catalyzing reactions involving certain pyranoses and disaccharides, making this family a promising one for screening of enyzmes specific to terminal galactose containing glycans. Other criteria were favorable as well, such as known experimental structures, catalytic mechanism and other characterized enzmyes in this family. However, the flavin cofactor is quite large and hard to model accurately, as AlphaFold is only capable of modelling protein structures without any cofactors. After attempts to transplant the cofactor from known experimental structures proved too work intensive for larger scale screening, we decided to start another design cycle and rebuild the pipeline using the initially discarded Family AA5, due to its very high structural conservation, copper ion cofactor that is comparatively easy to model and insert, as well as its known catalytic mechanism, number of characterized enzymes in the family, and the number of experimentally solved structures in this family.

A phylogenetic tree of the Family AA5 was constructed using protein sequences from Cazy and UniProt, especially focusing on bacterial homologues for ease of expression in E.coli. Conserved domains were used to identify clades of potentially different function. This was further validated using structural alignments of the enzyme structures, in addition to the sequence alignment used for construction of the phylogenetic tree. Samples from each clade of different structure were selected for further analysis using Molecular docking.

Structures were docked against a library of selected blood glycan fragments, ranging from mono- to tetrasaccharides. This served to test, if docking behavior observed in monosaccharides could be generalized to larger fragments as well, thereby giving a first indication of the required size of our glycan models. It was found, that if the larger fragment was capable of being docked well to the enyzme, so were the monosaccharides of the terminal fragment sugar and that there was little difference in binding pose between terminal fragment sugar and its corresponding monosaccharide. This screening library contained known substrates of control enzyme structures, like galactose as a known substrate of the enyzme galactose oxidase and glucose which is known to be virtually not acted upon by galactose oxidase. We screened against a collection of different known blood antigens to ensure specificity of our selected enzymes. We selected enzymes based on the computed euclidian distance of docked ligands hydroxy groups to the enzymes active site, as it was observed that for a comparison of galactose and glucose docked to GOX, that not binding strength determined by molecular mechanics, was an indicator of catalytic activity on this substrate, but instead distance of the docked ligand to the active site. As the mechanism for this family proposes a covalent bond of the hydroxy group to the copper, we modeled our distance calculations on coper oxygen bonds, a distance between 2 and 3 Angstroms, with a lower distance being better.

Testing

The phylogenetic tree was validated using known biological information like conserved domains, as well as information obtained from literature research. Clades were designated and validated using structural alignment of the enzymes. AlphaFold 2 Structures were tested using experimentally solved crystal structures of GlxA and GOX by performing a structural alignment of predicted and ground truth structure. Molecular docking was validated using known structures and ligands and performing redocking on these structures. The next step in the design cycle is testing our predictions in the wetlab and refining our molecular docking predictions based on wetlab results.

Learn

During this process, we expanded the screening pipeline to include as much prior knowledge from different sources as possible. Initially, the pipeline did not include any preselection steps, and instead we tried to brute force dock all structures in the database. This would have been a more exhaustive way to explore the entire structural space, but it would also have taken a long time and therefore we wouldn’t have been able to explore as much of the structural space as we wanted in the time we had. Therefore, we decided to construct a preselection pipeline to more efficiently search the structural space instead. At first , this led to the construction of phylogenetic trees and then the selection of clades of hypothesized similar function.

We realized that, at that point, we would be able to screen a lot more structures than initially expected, therefore we expanded the database using more distantly related homologues. We decided to start a new design cycle to build, test and learn from the expanded database. After conversations with our advisor Dr. Thales Kronenberger, we decided to switch our docking software from Autodock to Glide, as Autodock is not capable of docking involving metal ions. Because our enzymes are metalloenzymes with an active center copper ion, this could have lead led to inaccurate docking results. This allowed for refinement of the docking using the copper cofactor in the simulations, which was done by transplantation of copper ions into the AlphaFold structure using homology modeling and manual review for each docked structure.

The next design cycle would be to incorporate the results of wetlab experiments. If the wetlab results showed that the predictions were accurate, then a next step could be to improve the system for time efficiency by training a machine learning algorithm on predicting which enzyme can bind which ligand. However, if the wetlab experiments proved the predictions as inaccurate, a next step could be to refine the docking using flexible sidechains or to relax the structures using molecular dynamics.

Wetlab

Design of Enzymes

The sequences for expression were designed based on structures selected from our drylab screening. DNA sequences were codon optimized for E. coli and synthesized by Twist Bioscience. Plasmids containing the sequences of a control enzyme (GlxA, an AA5 Family enzyme from Streptomyces lividans with soluble and membrane anchor containing sequence) were supplied by Dr. Xiaobo Zhong from the University of Leiden and Professor Jonathan Worrall .from the University of Essex.

The CBM domain was selected based on literature and was adapted from the sequence reported in [3], in which the authors found that the domain bound to blood antigen B. A fusion protein of this domain and a GFP Tag was designed, the sequence was codon optimized for expression in e.coli and synthesized by commercial supplier IDT.

Build

The first step of our wetlab building stage was to assemble the plasmids containing our selected sequences. First, we transformed the empty plasmid into Stellar E. coli, purified the plasmid and linearized the plasmid using PCR. Then we amplified the linearized gene fragments of our inserts using PCR, creating the required Gibson overhangs. For assembly we decided to use Gibson Assembly due to its high probability of success and restriction site independent method of assembly. The assembled plasmids were transformed into Stellar E. coli, from which plasmids were purified and transformed into Nova blue E. coli for expression.

Test

After transformation, we screened the colonies for inserts using Colony PCR. From the positive colonies, a few were sent for sanger sequencing to confirm that the insert sequence was correct and free of mutations. Unfortunately, we only had a very limited amount of time to purify and test our enzymes, as we had problems during our wetlab building phase and only managed to perform one purification and did not have enough time to troubleshoot the purification.

For the CBM domain gene construct, GFP fluorescence was tested on the expressing E.coli using a fluorescence microscope without purification of the protein.

The next steps in the testing phase were to express and purify the enzyme, and afterwards to test the oxidase activity on different ligands using a peroxidase assay. Due to time constraints we only managed to test it on Galactose. Protein purification was done using a His6¬ tag and a Ni-NTA based resin. For this, the gene construct was induced using IPTG and lysis was performed using sonification. Protein concentration was checked with a Bradford assay and the process of the purification was documented with an SDS Page. As the protein was predicted to produce hydrogen peroxide during catalysis, based on homologues, we expected to measure peroxide production over time after the addition of multiple predicted substrates as a proxy for catalytic efficiency. We were planning to measure hydrogen peroxide production using a peroxidase assay. But only managed to perform the assay once.

Learn

Several setbacks in the wetlab have shown us what to improve. First, we had issues with availability of supervision and lab space, which meant that we started quite late in the year. This coupled with further issues during the Wetlab, we learned that everything takes a lot longer than originally planned, especially experiments.

We had a lot of issues with amplification of the linearized plasmid using Pfu Polymerase, as this polymerase tended to produce wrong fragments. We tried a range of annealing temperatures using gradient PCR, as well as a buffer for high GC sequences. When this did not work, we tried different primers, suspecting unspecific primer binding to be the issue. After none of the above mentioned changes to the PCR reaction worked, we next tried the reaction using Q5 Polymerase, which in the end produced the correct fragment on the first try. From this we have learned a lot about PCR troubleshooting techniques and that the selection of the correct Polymerase can be crucial to PCR success.

Unfortunately, we were only barely able to fully complete one engineering cycle in the wetlab, and did not get a chance to improve on the experiments or figure mistakes out. The next engineering cycle would depend on the results of the peroxidase assay done multiple times and would most likely start with another drylab engineering cycle of improving the predictions of our screening approach.

References

[1] Rahfeld, P., Sim, L., Moon, H. et al. An enzymatic pathway in the human gut microbiome that converts A to universal O type blood. Nat Microbiol 4, 1475 - 1485 (2019).

[2] Armenta, S., Moreno-Mendieta, S., Sánchez-Cuapio, Z., Sánchez, S. and Rodríguez-Sanoja, R. (2017), Advances in molecular engineering of carbohydrate-binding modules. Proteins, 85: 1602-1617.

[3] Takura Wakinaka, Masashi Kiyohara, Shin Kurihara, Akiko Hirata, Thida Chaiwangsri, Takayuki Ohnuma, Tamo Fukamizo, Takane Katayama, Hisashi Ashida, Kenji Yamamoto, Bifidobacterial α-galactosidase with unique carbohydrate-binding module specifically acts on blood group B antigen, Glycobiology, Volume 23, Issue 2, February 2013, Pages 232–240