Protein Modeling

Objective:

Our purpose was to redesign existing dehalogenases with improved specificity for different PCB congeners. While there are multiple methods such as creating a mutant library via mutagenesis, we decided to use a computational approach. We utilize various protein modeling and designing software for guided, precise predictions of advantageous mutations, decreasing the time and resources required to experimentally validate the mutated enzymes.

Protein Selection:

Conversations between human practices and stakeholders informed the team that while dehalogenases which dechlorinate PCB exist, they are largely anaerobic and would not be stable for use in the natural environment. Hence we scanned through literature for well characterized aerobic dehalogenase which we may modify to possess specificity for PCBs. After much deliberation, we selected RdhANp, an aerobic dehalogenase that has great potential for the remediation of organohalides due to its exceptional enzymatic activity in oxygen environments [1]. Additionally, RdhANp is a registered iGEM part, which means working on RdhANp would give us the opportunity to improve an existing part.

Catabolic reductive dehalogenase NpRdhA, PDB ID = 6ZXX. Visualized using ChimeraX [2]

Evaluation Model:

During our literature review of existing dehalogenases, we found a paper that described the activities of RdhANp, an aerobic dehalogenase against a set of bromo/chloro organic halides. We decided to use this paper as a guide to generate a protocol using Rosetta GALigandDock as the basis to recapitulate the activities via modeling. Rosetta, the tool used for this, is a powerful command-line program suite for computational macromolecular modeling initially developed by Dr. David Baker’s lab at the University of Washington. By developing metrics by which to compare designs based on known substrate activities, our subsequent evaluation of computationally generated designs for our compounds of interest (PCBs) would be more efficient.

Before beginning the docking, pre-processing to generate all necessary input files was required. We started building our protocol by first generating parameter files (.params) for ligands using the command copied below. To do so, the substrate molecule structures were downloaded from pubchem [3]. Charge was added using protein modeling software Chimera [4] to the ligands, which were saved as Mol2 files, before running a rosetta script to generate the params files. This step is crucial for ligand docking, without which Rosetta will not recognize the ligands.

[$ROSETTA/main/source/scripts/python/public/generic_potential/mol2genparams.py -s ligand_name.mol2]

The different bromo/chloro ligands were manually docked to the RdhANp protein (PDB ID: 6ZY1 [5] ) using PyMOL pairfitting [6]. Each structure was then relaxed using a rosetta script [7,8,9]:

$ROSETTA/main/source/bin/relax.static.macosclangrelease -s complex_relaxed.pdb -relax:constrain_relax_to_start_coords -relax:coord_constrain_sidechains -relax:ramp_constraints false -missing_density_to_jump -extra_res_fa params/B12.params -beta -extra_res_fa params/LIGAND.params -default_repeats 2 -optimization:default_max_cycles 200

The relaxed structure was docked using a basic GALigandDock protocol on “eval” mode, where the protein and ligand would not be moved so that the Rosetta energy scores could be calculated [10]. From the resulting score files, dH and dG were the primary measures of interest.

Unfortunately, while there was a little correlation between the dH/dG values with the reported activities with this initial protocol, the distances between ligand and catalytic residues were inconsistent. With further research, we found that the dehalogenase facilitates catalysis primarily via proton transfer at Y426. The best distance between the hydroxide oxygen of Y426A and halogenated carbon was determined to be 3.5 Å [11]. Distance between cobalamin and halogenated carbon is also important for effective coordination, distance at 5.0 Å [12]. These distances were added to our relax protocol using constraint files.

Active site. Dark blue = cofactor, pink = ligand.

Consequently, the energy scores showed an improvement in the correlation with reported activities and distances, but the dH was too high to effectively recapitulate the reported activity of the RdhANp. Therefore, we experimented with different relax parameters and constraint weights, repeating the relaxation and docking steps. Additionally, we switched to docking to another similar model (PDB ID: 6ZXX [5]), where the ligands in the crystal structure were in slightly different and more convenient positions/states. As a result, we found a set of parameters that recapitulated the reported activities more and identified dH values that could be representative of higher activity. The relaxation command used was the following:

Our working protocol indicates that a dH of -17 to -20 would represent a RdhANp activity of 65-80%. With this threshold established, we constructed models using the PCB congeners the wet lab would be using (PCB 11, 132 and 209) by the same methods. Of these, PCB 132 has two docking conformations – para and ortho – which were treated as two separate docked structures. PCB 11 seemed to have the potential for highest activity with the wild-type enzyme based on the dH values resulting from the docking evaluation. All results obtained may be found in the results page of our wiki.

Summary of the design evaluation model, using Rosetta scripts.

Protein Design with Rosetta and LigandMPNN:

To design the enzyme binding site for a small molecule ligand, we looked into what potential softwares could be accessible for our team and best fulfill our requirements. Rosetta, which we had been using up to this point for docking, has some design capabilities, but is less effective in designing mutations in native protein structures. Therefore, at the suggestions of our graduate advisor Preetham, we decided to pursue a different approach, using a deep-learning based protein design method. We sent our PCB-docked enzyme structures to Preetham, who works in the Institute of Protein Design at the University of Washington and could generate the mutated sequences using an updated version of ProteinMPNN that can take smaller molecules into context (LigandMPNN) [13]. Our requirements were that residues interacting with the cofactor cobalamin and specific catalytic residues such as tyrosine 426 were retained, while the residues interacting with the “back” of the substrate were modified to increase sensitivity and specificity for each of the ligands. Therefore, the potential amino acids to be mutated were constrained to be within 8 angstroms of the ligand and 5 angstroms away from the cobalamin.

Redesign interface: yellow represents a substrate, gray – the cofactor, and red - residues < A away from the cofactor (were not redesigned). Created by Preetham Venkatesh.

Sequences for RdhANp docked to 2,6-dichlorophenol were generated first to ensure that the mutations generated improved RdhANp energy according to our evaluation model. Then, Preetham helped us generate 16 sequences for each of the PCB-RdhANp constructs. However, in this initial test, the catalytic tyrosine Y426 was not fixed and was mutated to a phenylalanine.

Since phenylalanine is structurally very similar to tyrosine, we decided to move forward with evaluating these structures as well, while the structures with Y426 fixed (so it could never be mutated) were generated using MPNN again, allowing for further comparisons. To undo the mutation, PyMOL was used to edit the phenylalanine back into a tyrosine.

These mutated sequences (both the F426Y and never mutated Y426) were uploaded to Benchling and aligned. Of the 16-32 sequences for each PCB, 70-80% were unique sequences. PDB files for each of the sequences were then evaluated using our evaluation model, resulting in some sequences with up to a -8 decrease in dH score (an improvement), and one sequence for PCB 132 reaching our goal of -20 dH score. The mutations which appeared most frequently were across all 4 docked PCB models were K295L, E298L, K310A, P314Y, M315L, S423T, S425V, M426L, H457D, V556I and A563G. These sequences can be found on the results page. The sequences resulting in the lowest dH score were sent in FASTA format to the wet lab for experimental validation.

To further evaluate the mutated sequences, each mutation was reverted using PyMOL with the following command: alter resi [residue number], resn=’[amino acid lettering’. The resulting molecule was then relaxed and scored using our evaluation metrics. As of the wiki freeze deadline, we have obtained the energy scores for reversions of the best mutational sequences for PCB 132 ortho, para conformation and PCB 209. We found that indeed certain mutations increased the dH energy. However as energy scores are not additive, only a unique combination of reversions would improve the dH energy of the structures. We have thus far managed to improve the dH of PCB 132 ortho and para structures with a set of reversions. We also began work to generate a second set of mutations using LigandMPNN but this time including the second shell – residues colored in red in the redesign interface diagram. Unfortunately, as of the wiki freeze deadline, we have yet to complete the evaluation of the new mutations, hence identification of the best sequence to send for experimental validation will be left as a future direction.

As a point of comparison, we used Rosetta docking protocols to (1) design mutations in the residues which when visualized in PyMOL or ChimeraX appeared to have the most potential to interact with the back of the ligand, and (2) to insert the mutations generated using LigandMPNN. The residues identified for the former purpose were F294, K310, F313, N455 (or F291, K307, K310, N452 using PDB numbering, chain A). The Rosetta resfile for each mutation experiment included the respective residue using resfile syntax, 452 A ALLAA, where ALLAA signifies that the selected residue can be mutated to any standard amino acid. Of these mutations, K310A seemed to be the most successful mutation in lowering dH by approximately 1 for each ligand, while the result of mutations in the other three residues varied from ligand to ligand, and in some cases worsened the energy score. The improvement, if any, with just these mutations was far less significant than the LigandMPNN generated sequences.

Since each of the four ligands/orientations had different mutations designed by LigandMPNN, we did an additional test in which the mutations that gave the best results for each ligand complex were repeated in the original complex using Rosetta and in the other three ligand complexes. These mutations were designed using resfiles in 2 ways (one flexible, the other strict): (1) the residues can be mutated to any amino acid (ALLAA), and (2) the residues are mutated to be the exact same amino acids as the mutated sequence generated by LigandMPNN. Hence, for each of the four ligand complexes, we tested eight different cases. For PCB 132 ortho (LG10 in the results) and PCB 209 (LG12), the best (lowest) dH score was calculated for the structure with the strict mutations generated by LigandMPNN for the respective ligands, although some of the other mutations also gave comparable results. For example, the mutations (flexible) of the mutated sequence for PCB 11 also had good results for each of the other ligand complexes. All mutations improved the energies for every ligand complex. The full quantitative results of all rosetta design experiments can be found in the third protein modeling result on the results page.

Summary of the protein design workflow.

Conclusion:

Four sequences were generated for testing in the wet lab. While the laboratory validation has not been completed at this moment, based on the eventual feedback from the wet lab, the models may be adjusted. Furthermore, we have identified three potential methods and areas for further investigation:

Repeat the ligandMPNN design steps but expand the residue selection for mutations. Residues further from the active site may not affect specificity as much, but could improve the enzyme stability or potentially generate larger changes in the structure that increases affinity for the PCBs.
Explore alternative software for ligand modeling, and
Search for alternative crystal structures or proteins that could have similar dehalogenation capabilities such as PceA, and repeat the modeling experiments.

The goal of computationally modeling potential mutated sequences has been met, but experimental validation would be the necessary next step to make further progress with future modeling efforts.

References:

Halliwell, T., Fisher, K., Rigby, S. E. J., & Leys, D. (2022). Heterologous production and biophysical characterization of catabolic Nitratireductor pacificus pht-3B reductive dehalogenase. Methods in enzymology, 668, 327–347. https://doi.org/10.1016/bs.mie.2022.01.004
UCSF ChimeraX: Structure visualization for researchers, educators, and developers. Pettersen EF, Goddard TD, Huang CC, Meng EC, Couch GS, Croll TI, Morris JH, Ferrin TE. Protein Sci. 2021 Jan;30(1):70-82.
Kim, S., Chen, J., Cheng, T., Gindulyte, A., He, J., He, S., Li, Q., Shoemaker, B. A., Thiessen, P. A., Yu, B., Zaslavsky, L., Zhang, J., & Bolton, E. E. (2023). PubChem 2023 update. Nucleic Acids Res., 51(D1), D1373–D1380. https://doi.org/10.1093/nar/gkac956
UCSF Chimera--a visualization system for exploratory research and analysis. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE. J Comput Chem. 2004 Oct;25(13):1605-12.
Halliwell, T., Fisher, K., Payne, K. A. P., Rigby, S. E. J., & Leys, D. (2020). Catabolic Reductive Dehalogenase Substrate Complex Structures Underpin Rational Repurposing of Substrate Scope. Microorganisms, 8(9), 1344. https://doi.org/10.3390/microorganisms8091344
The PyMOL Molecular Graphics System, Version 2.5.4 Schrödinger, LLC.
*Tyka MD, *Keedy DA, André I, Dimaio F, Song Y, Richardson DC, Richardson JS, and Baker D. (2011). Alternate states of proteins revealed by detailed energy landscape mapping. J Mol Biol 405(2):607-18. doi: 10.1016/j.jmb.2010.11.008. (*Co-primary authors.)
Khatib F, Cooper S, Tyka MD, Xu K, Makedon I, Popovic Z, Baker D, and Players F. (2011). Algorithm discovery by protein folding game players. Proc Natl Acad Sci USA 108(47):18949-53. doi: 10.1073/pnas.1115898108.
Maguire JB, Haddox HK, Strickland D, Halabiya SF, Coventry B, Griffin JR, Pulavarti SVSRK, Cummins M, Thieker DF, Klavins E, Szyperski T, DiMaio F, Baker D, and Kuhlman B. (2021). Perturbing the energy landscape for improved packing during computational protein design. Proteins 89(4):436-449. doi: 10.1002/prot.26030.
Fleishman SJ, Leaver-Fay A, Corn JE, Strauch E-M, Khare SD, Koga N, Ashworth J, Murphy P, Richter F, Lemmon G, Meiler J, and Baker D. (2011). RosettaScripts: A Scripting Language Interface to the Rosetta Macromolecular Modeling Suite. PLoS ONE 6(6):e20161. doi: 10.1371/journal.pone.0020161.
Payne, K. A., Quezada, C. P., Fisher, K., Dunstan, M. S., Collins, F. A., Sjuts, H., Levy, C., Hay, S., Rigby, S. E., & Leys, D. (2015). Reductive dehalogenase structure suggests a mechanism for B12-dependent dehalogenation. Nature, 517(7535), 513–516. https://doi.org/10.1038/nature13901
Johannissen, L. O., Leys, D., & Hay, S. (2017). A common mechanism for coenzyme cobalamin-dependent reductive dehalogenases. Physical chemistry chemical physics : PCCP, 19(8), 6090–6094. https://doi.org/10.1039/c6cp08659d
Dauparas, J., Anishchenko, I., Bennett, N., Bai, H., Ragotte, R. J., Milles, L. F., Wicky, B. I. M., Courbet, A., de Haas, R. J., Bethel, N., Leung, P. J. Y., Huddy, T. F., Pellock, S., Tischer, D., Chan, F., Koepnick, B., Nguyen, H., Kang, A., Sankaran, B., Bera, A. K., … Baker, D. (2022). Robust deep learning-based protein sequence design using ProteinMPNN. Science (New York, N.Y.), 378(6615), 49–56. https://doi.org/10.1126/science.add2187

Kinetic Modeling

PCB and Nitrogen Uptake

Understanding the distribution and accumulation of PCBs and nitrogen in the ecosystems which the team could deploy the final product is essential. The goal of this project was to design code that could model real-world conditions for wet lab for the areas that human practices had determined to be good locations for deployment.

The plan for this project was to create a fleshed-out model based on data that was accessible online in order to be able to predict the speeds at which our species could grow and uptake PCBs. We also wanted to answer the question: How can we best predict how a species will behave in the presence of nitrogen and PCBs?

Through literature review, we were able to find a few papers that allowed us to model biofilm nitrogen transformations in experiments investigating the treatment of wastewaters. Thus, we were able to analyzed and create a model through biofilm bulk water equations and various microbiological process rate equations to output biofilm thickness and bulk water concentrations of ammoniacal nitrogen, and nitrogen uptake and PCB in biofilm.

Our first model was a general model of biofilm thickness based off of nitrogen concentration. We took values for our species from other cyanobacteria and used the explicit Euler method.

Our second model was a general model of how quickly PCBs could be uptaken into our species. Much of the values used in this model were estimated. However, the initial value of PCBs is based off of the amount of PCBs in the Spokane River. This model used adsorption and desorption rates as the modeling method.

Our final model analyzed NO3 influx into our species as a final look at nitrogen uptake. We used the Michaelis-Menten equation as our base equation.

Ultimately, the modeling for this project was not able to model too many situations before wet lab began modifying the species. Due to the need for data to train the models, we had to resort to simple models that did not require data, but also did not give more detailed descriptions of performance. Some walls we ran into regarding data were Wetlab running into issues in determining their bacteria and obtaining it. We had hoped to circumvent it through extensive digging into related papers, but were still unable to obtain several parameters such as the Synechococcus elongatus PCB adsorption and desorption values. If the team had more time, we would like to be able to do genome scale modeling. It is a process that has been used when engineering other metabolic processes, and has been used on our specific species in the past as well. We would also like to do other general modeling techniques based off of data provided by wet lab, if they are able to get any eventually.

Species Remediation

Our project aims to create an organism to break down PCBs by inserting a modified reductive dehalogenase enzyme into it. Our model seeks to explore how this modification will affect our organism as well as to find ways of optimizing our organism’s growth. To do this we used Flux Balance Analysis (FBA)

Flux Balance Analysis is a constraints based genomic scale modeling approach. It uses an assumption of steady state, reaction stoichiometry as well as data on uptake to predict fluxes through an organism's metabolic pathways (Orth). We used iEC1344_C from the BIGG database as our wild type organism (Monk). FBA allows us to make quantitative predictions about the growth rate of our organism. Also the rate of PCB breakdown is likely to be limited by uptake, not the efficiency of the enzyme as PCB concentration in waste water is relatively low. This makes FBA a good choice of model as we can assume that PCBs aren’t building up in the organism and our organism is in steady state.

PCB uptake
-> PCB
PCB Dehalogenation Reaction
pcb + co2dam_c + atp_c -> biphenyl + co2dam_c + adp_c + 0.42 pcb
Biphenyl Secretion
biphenyl ->
A PCB exchange reaction was added as well as a biphenyl exchange. The average PCB uptaken from wastewater results in approximately 0.58 PCBs fully dehalogenated (Min). Vitamin B12 acts as a cofactor for our dehalogenation enzyme (Payne). The energy cost was assumed to be 1 atp.

Bph Pathway (Deneth)
BphA
Biphenyl + O2 + NADH + [H+] -> cis-(2R,3S)-dihydroxy-2,3-dihydrobiphenyl + [NAD+]
BphB
cis-(2R,3S)-dihydroxy-2,3-dihydrobiphenyl + [NAD+] -> 2,3-dihydroxybipheny + [H+] + NADH
BphC
2,3-dihydroxybiphenyl + O2 -> 2-Hydroxy-6-oxo-6-phenylhexa-2,4-dienoate
BphD
2-Hydroxy-6-oxo-6-phenylhexa-2,4-dienoate + H2O -> benzoate + 2-Hydroxypenta-2,4-dienoate
BphH
2-Hydroxypenta-2,4-dienoate -> 4-Hydroxy-2-oxovalerate
BphI
4-Hydroxy-2-oxovalerate -> pyruvate + acetaldehyde
BphJ:
pyruvate + acetaldehyde -> Acetyl-CoA
The bph pathway breaks down biphenyls, the result after PCBs are dechlorinated. While we originally considered adding this pathway to our organism it was somewhat unfeasible to add so many enzymes. Also while interviewing experts Human Practices found that this pathway was fairly ubiquitous in nature and that it may not have a significant beneficial effect for our organism. To evaluate whether we should add this to our model we added the above pathway to our model and compared the growth rate before and after adding it.

Bph Added vs No Bph
To see the effect of the bph pathway on our organism we compared our model with and without the bph pathway. As we can see in the graph on the left, the bph pathway provides benefit to the growth of our organism.

Growth Rate Given data for uptake from our other model we set our PCB uptake reaction’s flux to uptake at each time point and ran FBA.

Sources
Denef VJ, Park J, Tsoi TV, Rouillard JM, Zhang H, Wibbenmeyer JA, Verstraete W, Gulari E, Hashsham SA, Tiedje JM. Biphenyl and benzoate metabolism in a genomic context: outlining genome-wide metabolic networks in Burkholderia xenovorans LB400. Appl Environ Microbiol. 2004 Aug;70(8):4961-70. doi: 10.1128/AEM.70.8.4961-4970.2004. PMID: 15294836; PMCID: PMC492332.

Min Yao, Zhongjian Li, Xingwang Zhang, Lecheng Lei, "Polychlorinated Biphenyls in the Centralized Wastewater Treatment Plant in a Chemical Industry Zone: Source, Distribution, and Removal", Journal of Chemistry, vol. 2014, Article ID 352675, 10 pages, 2014. https://doi.org/10.1155/2014/352675

Monk JM, Koza A, Campodonico MA, Machado D, Seoane JM, Palsson BO, Herrgård MJ, Feist AM. Multi-omics Quantification of Species Variation of Escherichia coli Links Molecular Features with Strain Phenotypes. Cell Syst. 2016 Sep 28;3(3):238-251.e12. doi: 10.1016/j.cels.2016.08.013. Epub 2016 Sep 22. PMID: 27667363; PMCID: PMC5058344.

Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nat Biotechnol. 2010 Mar;28(3):245-8. doi: 10.1038/nbt.1614. PMID: 20212490; PMCID: PMC3108565.

Payne KA, Quezada CP, Fisher K, Dunstan MS, Collins FA, Sjuts H, Levy C, Hay S, Rigby SE, Leys D. Reductive dehalogenase structure suggests a mechanism for B12-dependent dehalogenation. Nature. 2015 Jan 22;517(7535):513-516. doi: 10.1038/nature13901. Epub 2014 Oct 19. PMID: 25327251; PMCID: PMC4968649.

GAC Affinity

Sorption kinetics refers to the rate and mechanism by which substances (in this case, contaminants like PCBs) get attached to or absorbed by another material (here, activated carbon or AC).

Studies have shown that activated carbon adsorption kinetics can limit removal rates of PCB’s from contaminated waste-water solutions in treatment systems. We hope to employ various models to predict and describe the sorption kinetics. We hope the models will be linear, but some may be mathematical formulations. These will help predict how AC will behave in real-world conditions. First, we would have not narrow down which factors affect sorption kinetics (in a manner relevant to PCB degradation). Notable factors include temperature, pH, and presence of other chemicals. Second, we would need to find and derive the necessary modeling equations. We plan to do this in MATLAB.

With our project, we wanted to answer the question of: How can we best calculate and model adsorption rates constants using a biofilm/GAC system?

Our first model was originally based on a different published study researching PAH’s (Polycyclic Aromatic Hydrocarbon) sorption kinetics equations on GAC)

Where: Deff = effective diffusion coefficient, Ds = surface diffusivity, ɛP = intraparticle porosity, Dp = pore diffusivity of GAC, Kʟ<ᴘᴀʜ> = Langmuir adsorption constant (to be translated to Kʟ<ᴘᴄʙ>), qm = solid-phase concentration inside particles, and ρp = particle density.

If we could get the effective diffusion coefficient, we could find an optimized contact time with the biofilm and wastewater, as well as other useful values for bioengineering a solution. One immediate issue with this was that there were 209 different congeners to apply the above equation to, and thus 209 different effective diffusion coefficients to consider in all future models. We figured we could get around this since our team was only focused on removing 3 specific congener types. Literature data was to be used to fit linear driving force models with both linear and Freundlich models of equilibrium. We also planned on using MATLAB for our future values.

Even though we were only really searching for literature values from 3 specific congeners, reliable information was limited and there were certain parameters we could not verify, such as their solid-phase concentration in GAC’s (qm). We had devised some potential experimentation for Wetlab to conduct in order to find these values, however by then we had run out of time. We had also tried a more extensive literature search, however the best we could find were somewhat related models, none of which had any relevant numbers we could use. If we had more time, we would have liked to use sorption kinetics models to help with multiple aspects of the project, optimizing contact time for starters. We would also like to provide specific species (bacteria used by wetlab) PCB adsorption and desorption values to better model PCB and nitrogen uptake.

MODEL

Protein Modeling

Objective:

Protein Selection:

Evaluation Model:

Protein Design with Rosetta and LigandMPNN:

Conclusion:

References:

Protein

Kinetic Modeling

PCB and Nitrogen Uptake

Species Remediation

GAC Affinity

Kinetics

MODEL

Protein Modeling

Objective:

Protein Selection:

Evaluation Model:

Params Script

Relax Script

Shell Script to Run Docking Protocol: run.sh

Docking protocol: dock.xml

Constrain File Example: constrain.txt

Relax Script Command: relax.sh

Protein Design with Rosetta and LigandMPNN:

Conclusion:

References:

Protein

Kinetic Modeling

PCB and Nitrogen Uptake

Species Remediation

GAC Affinity

Kinetics