PCB pollution is a dire environmental issue that demands our immediate attention. To address this problem, we mutated an existing aerobic dehalogenase RdhANp to confer it novel activity against 3 PCB congeners – PCB 11, 132 and 209. Our project is strongly guided by the principles of the engineering cycle. The procession of the drylab to wetlab pipeline represents a macro iteration of design, build, test and learn while the details of our work can be further split into micro iterations of the engineering cycle.
To ensure that our computationally generated mutations of RdhANp would indeed dehalogenate selected PCB congeners, we began by developing an evaluation protocol using Rosetta, a powerful command-line program suite for computational macromolecular modeling. The initial evaluation protocol was generated based on a basic Rosetta GALigandDock to recapitulate the reported activities of RdhANp against a set of bromo/chloro organic halides.
We downloaded the set of bromo/chloro organic halides from PubChem and docked them against the crystal structure of RdhANp (PDB ID: 6ZY1 [1]). We then relaxed the structure using the following Rosetta script:
$ROSETTA/main/source/bin/relax.static.macosclangrelease -s complex_relaxed.pdb -relax:constrain_relax_to_start_coords -relax:coord_constrain_sidechains -relax:ramp_constraints false -missing_density_to_jump -extra_res_fa params/B12.params -beta -extra_res_fa params/LIGAND.params -default_repeats 2 -optimization:default_max_cycles 200
The relaxed structure was docked using a basic GALigandDock protocol on “eval” mode, which calculates the Rosetta energy scores of the entire structure without significant repacking (allowing significant rotamer changes or sampling ligand binding space). dH and dG obtained from the score files were our primary measures of interest. Unfortunately, there was only some correlation between the dH/dG values with the reported activities of RdhANp against the set of bromo/chloro halides we tested with this initial protocol, and the ligand positions varied.
With further research, we found that the dehalogenase facilitates catalysis primarily via proton transfer at Y426 and the ideal distance between O of Y426A and the halogenated carbon is 3.5 Å [2]. A 5.0 Å distance between cobalamin and halogenated carbon was also an important determinant for effective coordination [3]. We thus added these parameters into our relax protocol using constraint files.
We also experimented with different relax parameters and constraint weights, repeating the relaxation and docking steps to better recapitulate the reported activities. We tried pairfitting the tested substrates with the ligand of another RdhANp crystal structure (PDB ID: 6ZXX [1]). These repeated experiments led us to a set of parameters that more accurately and reliably recapitulated the reported activities with output dH values. Our final the relaxation command used was the following:
$ROSETTA/main/source/bin/relax.static.macosclangrelease -s $prefix.pdb -relax:constrain_relax_to_start_coords -relax:coord_constrain_sidechains -relax:ramp_constraints true -missing_density_to_jump -extra_res_fa B12.params -extra_res_fa LIGAND.params -default_repeats 2 -optimization:default_max_cycles 200 -beta -constraints:cst_fa_file LIGAND.txt -constraints:cst_fa_weight 0.1
And our final constraint file was the following:
AtomPair OH 426A C9 1X SCALARWEIGHTEDFUNC 1000 HARMONIC 3.5 0.1
AtomPair CO 703A C9 1X SCALARWEIGHTEDFUNC 50 HARMONIC 5.0 0.1
Our working protocol suggests that a dH of -17 to -20 would represent a RdhANp activity of 65-80%. This would hence be the threshold dH value we use for determining computationally whether the RdhANp mutation sequence confers sufficient activity against the selected PCB congeners.
Once we have established a working protocol for evaluating RdhANp activity against various substrates, we constructed models of the PCB congeners the wet lab would be using (PCB 11, 132 and 209) with RdhANp. PCB 132 has two docking conformations – para and ortho – which were treated as two separate docked structures. We evaluated the initial activities of the structures with our protocol. PCB 11 seemed to have the potential for highest activity with the wild-type enzyme based on the dH values resulting from the docking evaluation. All results obtained may be found in the results page of our wiki.
To redesign the protein-ligand interface such that RdhANp may possess greater affinity for the PCB congeners, we looked into different softwares to use. This is as although RosettaDesign is a plausible platform for making redesigns, it is less effective for native protein structures which we are using. Our graduate advisor, Preetham, strongly recommended that we use an updated version of ProteinMPNN [4], a MPNN protocol for redesigning proteins to better interact with ligands of choice which is yet to be published.
We sent our PCB-docked enzyme structures to Preetham to generate the mutated sequences. Mutations were constrained to be within 8 angstroms of the ligand and 5 angstroms away from the cobalamin as ligand interaction with the cofactor is critical to the dehalogenation activity.
As a preliminary test, the mutational sequences for RdhANp docked to 2,6-dichlorophenol were generated to verify whether the mutations generated changed RdhANp energy according to our evaluation model. Then, Preetham helped us generate 16 sequences for each of the PCB-RdhANp constructs. However, the first set of mutations did not fix the catalytic tyrosine Y426 and it was mutated to a phenylalanine. As phenylalanine is structurally similar to tyrosine, we reverted the mutation and moved forward with our evaluations as well. At the same time, a second set of mutations were generated with ligandMPNN, with Y426 fixed.
These mutated sequences (both the Y426F and fixed Y426) were uploaded to Benchling and aligned to identify the full set of mutations. Of the mutation sequences generated, 70-80% were unique sequences. We then ran our evaluation protocol against the PDB files for each sequence. While sequences resulted in worse dH, most sequences improved the dH energy of our PCB-RdhANp structure. However, only a select few crossed the dH threshold of -17 to -20. We selected the sequences that resulted in the lowest dH to be sent for experimental validation.
The mutations which appeared most frequently were across all 4 docked PCB models were K295L, E298L, K310A, P314Y, M315L, S423T, S425V, M426L, H457D, V556I and A563G. Hence it is likely that the same mutation sequence may have good activity against all the PCB congeners.
We choose the expression vector pET24(+) for E.coli, 5232 bp (commercially available, by novagen). It is kanamycin resistant, has C terminal 6xHis tag for later steps such as western blots for protein verification, restriction enzyme site, and is medium copy.
We take the sequence from pm, codon optimization for E. coli k-12 using IDT codon optimization tool. Since the backbone lacks ribosome binding site/start codon, add ATG (Met) to each protein sequence, as well as a stop codon (TAA since it's most common for E. coli).
Unfortunately as of the wiki freeze our experimental team was unable to validate the expression of the plasmids due to purchasing complications with Twist Bioscience.
While waiting on feedback from Wetlab, we decided to further evaluate the mutation sequences generated by the MPNN design. This was to assess whether the mutations made were indeed beneficial to improving RdhANp activity against the target PCB congener.
Each mutation of the sequences sent to the wet lab was reverted to the wildtype sequence using Pymol and saved as a new structure. The structures were then relaxed and scored using our evaluation metrics previously established.
Using the original dH as a baseline for comparison, we found that indeed certain mutations improved the dH energy while others have made it worse. Additionally, reverting all mutations that made the dH score worse did not result in a significant improvement in overall dH. Instead, only a unique combination of reversions effectively improved the dH energy of the structures. We have thus far managed to improve the dH of PCB 132 ortho and para structures with a unique set of reversions
We learnt that not all mutations introduced by MPNN led to better scores and that energy scores are not additive. Thus moving forward, an additional assessment of each mutation of the best mutational sequence generated by MPNN should be included to ensure that all mutations introduced are effectively improving the activity of the protein against the target ligand.