Due to the time limitations of iGEM and the amount of time it takes to acquire results in the laboratory, we wanted to strive towards taking our project further with the use of computational tools, as well as support the findings in the laboratory. The aim of our project was to improve the degradation of PFOA by DeHa catalysis.
To perform computational analyses on the dehalogenases, protein structures are needed. With the intention of performing docking simulations, we used cutting-edge machine learning tools to determine the structures of the four dehalogenases.
Method
Given the DNA sequences from the USAFA21 team, a protein structure prediction was performed using ColabFold. The structure was corrected and minimized with the preparation wizard in Schrödinger’s Maestro for further analysis. To verify the integrity of the predictions, the evaluations generated by AlphaFold were considered, and the structural alignment tool, FoldSeek, was used to further verify.
ColabFold
ColabFold is an open-source software that uses AlphaFold2 and MMseqs2 to perform rapid structure predictions1.
AlphaFold uses a set of similar sequences and homologous structures from databases, to perform multiple sequence alignment and accurately predict the structure of the primary sequence2.
To improve the search for similar sequences and homologous structures, ColabFold implements the MMseqs2 software. MMseqs2 enables a more sensitive search at a higher speed, approximately 400 times faster than native AlphaFold, to find these with higher surety.
Evaluation
To evaluate the results generated by the model, it generates 3 quality measure: (1) Predicted Local Distance Difference Test, (2) Predicted Aligned Error and (3) Sequence Coverage.
Predicted Local Distance Difference Test (pLDDT)
The pLDDT is a measure of the plausibility of local distances between residues in the predicted structure3. This evaluation improves the integrity of local domains in the structure, and therefore the confidence of possible functionally important domains.
It scores from 0 to 100, with >90 being very high.
The pLDDT score tends towards <50 at the terminal position of the structure, which removes the structural value of the prediction in this region but indicates a possible disordered domain.
Like DeHa 2, the terminals of tend towards a scoring of <50, removing structural value from these regions.
Like DeHa 2 and DeHa 4, the first ~20 residues and the terminal lacks structural value due to the scores being <50. This is especially seen in the structure of DeHa 5 with the possibly disordered tail region.
Predicted Aligned Error
pLDDT does not verify the position of domains relative to each other. For this, the Predicted Aligned Error is applied. This scores the relative position of each residue to each other in the structure.
The visualization of the scores is depicted in a 2D graph with the residues on each axis, and the colour indicating the predicted distance between a residue in the predicted structure and the true structure. If the error is high, it means that the distance between two given residues is high. The colour will tend towards red from blue. A lower score is preferred.
The strong red bands at the left side and top of the graph show that residue 0 to ~10 have very little positional confidence in relation to all other residues. This tells us that there is very little confidence in the relative position between these two domains. This supports the pLDDT scoring, in showing to be a possibly disordered domain at the N-terminus.
Sequence Coverage
The sequence coverage displays the number of homologous along the length of the sequence. The varying colours indicate the level of identity to the homologous. It can be a useful measure for predicting conserved domains if the level of identity to homologues proteins is high.
There is a large decrease in homologous sequences at the beginning of the DeHa 5 sequence, compared to the other Sequence Coverages. The rest of the DeHas show good coverage over their entire sequences, enabling more confident speculation.
Verification with FoldSeek
To further verify the confidence of the structures generated by ColabFold, FoldSeek is applied to find tertiary resemblance to other structures. This will tell us whether the functions of the enzymes match the predicted structures.
FoldSeek is a tool developed to align tertiary structures to those found in protein databases. In this verification, uncharacterized proteins from the databases will be ignored, since we are interested in linking the function to the structures predicted.
Querying the dehalogenases all gave results related to haloacid dehalogenases, with very similar structures. The alignment score (TM-score) and RMSD are given in the table below, and quantify the quality of the alignments:
Table 1) An overview of the scoring from FoldSeek. TM-score is a scoring of the alignment, while RMSD is the root-mean-square deviation. The closer TM-score is to 1, the better, and the lower the RMSD, the better.
TM-Score
RMSD
DeHa 1
0.88061
2.21
DeHa 2
0.86421
2.7
DeHa 4
0.96049
1.27
DeHa 5
0.73167
3.39
Concluding Remarks
The predictions show reliable structural evaluation from the pLDDT, for most of the structure for each of the dehalogenases. For DeHa 2, 4 and 5 the terminals lack some structural value, especially DeHa 5. The same can be said about DeHa 5s Predicted Aligned Error score, where a large uncertainty is found in the relative position of two domains, possibly indicating a disordered region. This region in DeHa 5 is also found to have less hits in homologous sequences in the Sequence Coverage.
From the FoldSeek verification it is found that the dehalogenases are similar in tertiary structure to enzymes with functions related to the proposed function of the dehalogenases. DeHa 4 was found to have 45% sequence identity to a fluoroacetate dehalogenase specifically, while the other queried dehalogenases were found to have ~25% sequence identity to haloacid dehalogenases.
DeHa 4 has an almost identical structure to the fluoroacetate dehalogenase, as is shown in the TM-score and RMSD. As can be seen in the alignment with DeHa 5, the long-disordered terminal region is present as suspected from the structure prediction evaluation. This deviation is also found in the scoring of DeHa 5, as it has a lower TM-score of 0.73167 and a higher RMSD of 3.39. This disordered region could be problematic in the docking investigations, if it found to be involved with the active site (see Docking subpage).
These results show high validity in the structural prediction of the dehalogenases, as can be seen in both the evaluation from ColabFold and in the functional likeness from FoldSeek. This will increase the legitimacy of the subsequent investigation using the predicted structures.
Introduction
The degree of improved efficiency can be investigated computationally by calculating relative binding energies using docking, using the structures predicted in the Structure Prediction subpage. By comparing the binding energies before and after Error-prone PCR, we can acquire an estimate of the degree of increased efficiency.
PFOA Binding Prediction
Method
To reach these goals we had to create a pipeline starting with the DeHa protein structures from the protein structure prediction. The figure below shows the steps leading to the docking simulations:
The resulting .pdb file from the structure prediction did not include an annotated active site, which is a requirement in docking simulations. It was deduced using NCBIs Conserved Domain Search (CD-search) tool to find conserved domains, as well as the SiteMap tool in Maestro, which uses algorithms to predict possible active sites. Docking was carried out in Maestro as well, using the Glide tool on receptors generated by the Receptor Generation tool.
These simulations resulted in docking scores found in table 2, and the intermolecular ligand-residues interactions underlying these scores.
Point of Reference
Docking scores are rather relative, and the true results will come from comparison between the optimized and wild-type dehalogenases as they will most likely have similar structures. As a point of reference, docking has been performed on the human heart fatty acid binding protein (7FEK), that has been shown experimentally to bind PFOA12. The resulting docking scored as -7.880 kcal/mol, with the PFOA posing in the binding pocket relative similar, to that which is found in x-ray crystallography, as can be seen in figure 5. Given that the posing of PFOA in the protein has been experimentally found, and our estimated posing is similar, the binding pockets found with SiteMap are accurate. Hydrogen bonding through bound water molecules on the carboxyl group is an intermolecular interaction that stabilizes PFOA in the binding pocket in human heart fatty acid binding protein, along with other interactions – primarily hydrophobic around the fluoride atoms.
Results - Identification of Binding Pockets
Table 2) An assembly of results from the docking of PFOA to the dehalogenases.
CD-search result
CD-search shared residues
Dscore
SiteScore
Docking Score [kcal/mol]
DeHa 1
100% (12/12)
9
1.079
0.933
-4.040
DeHa 2
100% (16/16)
3
0.920
0.900
-5.281
DeHa 4
NA
NA
0.881
0.721
-3.603
DeHa 5
100% (12/12)
9
1.035
0.943
-3.665
The table above summarizes the results from the docking of PFOA to the dehalogenases.
The CD-search results showed 100% identity in residues from an active site conserved domain in enzymes from the family of L-haloacid dehalogenase family (CDD: cd02588). There were however no decisive results for DeHa 4, due to a lack of specific annotated functional domains. The hits do, however, include enzymes involved in hydrolysis as well as haloalkane dehalogenases (accession: PRK03204).
The CD-search shared residues are the number of residues shared between the predicted active site of the enzymes from SiteMap, and the residues of the conserved domain from the CD-search. As can be seen in the figure 6-7 showing the predicted active sites in the next section, the spatial positioning of the residues is close to each other, which validates the prediction from SiteMap.
Dscores and SiteScores vary between the dehalogenases, with DeHa 4 scoring the worst. They all, however, have scores showing promising binding pockets.
The Docking Scores are the predicted energies of the binding. The lower the binding energy, the better the binding. The docking scores cannot be directly compared, as is explained later in the Limitations and Assumptions section. However, it is still an indicator of the ability of the enzyme to bind a ligand.
L-2 Haloacid Dehalogenase Family
Members of the L-2 haloacid dehalogenase family are involved in hydrolytic dehalogenation and convert L-2-haloacids to D-2-hydroxyacids, releasing the halogen. The substrates and specific halogens hydrolyzed depend on the structure of the enzyme and the catalytic mechanism, in which there is great diversity in the family.
The S2N nucleophilic substitution involves a carboxyl-group in residues aspartic acid or glutamic acid4,5.
SiteScore and Dscore
SiteScore is based on various properties such as the number of residues in the pocket and hydrophobicity. A score greater than 0.80 has been found to distinguish probable binding sites. A score greater than 1 is a strong indicator of a site being likely to bind ligands6
Dscore evaluates the druggability (how well a drug would bind the pocket) of a pocket. It is based on the same properties but weighs them differently because some sites might bind a ligand tightly but are not druggable.
This fact can be imagined considering a pocket with specific charged residues. They will bind a specific ligand tightly, but most drugs might not be able to interact with these residues, and therefore bind in the pocket. A score greater than 0.80 has been found to be druggable. A score greater than 1 is associated with very druggable sites.
Results - Posing and Ligand Interactions
The intermolecular interactions formed between PFOA and the dehalogenases, and the orientation of PFOA in the binding pockets, are depicted in the following section. In the figures 6-7 below, the SiteMap predicted binding pockets will be colored red, and the conserved domains associated with the proposed function of the dehalogenases will be colored green. Residues that are found with both SiteMap and CD-search will be colored green as well.
Results - Experimental Optimization
Experimentally, a set of variants were generated with Error-Prone PCR. These have been assesed, and 3 of them are evaluated to be of interest to investigate computationally with docking, similarly to what was done in the previous section. Read more about the mutations and the sequences on the Results page.
The 3 promosing variants are DeHa1 variant number 5 (DeHa1_var5), DeHa2 variant 8 (DeHa2_var8) and DeHa5 variant 1 (DeHa5_var1), due to their level of fluorescence when exposed to PFOA. The variants underwent the same steps of strucutral prediction, minimization, active site identification, receptor generation and docking, as previously.
Table 3 below displays the Dscore, SiteScore and Docking Score for the variants. In figure 9 below, the posing of PFOA in these variants are shown. In figure 10, the ligand interactions are shown.
Table 2) An assembly of results from the docking of PFOA to the dehalogenases.
Dscore
SiteScore
Docking Score [kcal/mol]
DeHa1_var5
0.860
1.091
-3.328
DeHa2_var8
1.045
1.029
-4.724
DeHa5_var1
1.123
1.091
-4.074
Concluding Remarks
Proposed active sites were detected with confidence:
Conserved domains were detected for the DeHas, except DeHa 4, verifying the catalytic function of the enzymes.
These domains were in close proximity of the active sites detected with SiteMap, making the docking performed with these sites more valid.
The significance of SiteScores and Dscores depends on the system, receptor, and ligand. PFOA is a mainly hydrophobic molecule, making Dscore a better indicator of promising binding pockets, due to the difference in penalization of hydrophobicity7.
Generally, common intermolecular interactions were seen: hydrogen bonding with the carboxyl group, hydrophobic interactions around the fluoride residues, and some charged interactions, though these are somewhat unexpected given the hydrophobic nature of PFOA. Usually, it is found that charged residues are located on the surface of the protein unless the natural ligand itself is charged8.
The conserved domains found to be from L-2 haloacid dehalogenases could help clarify the mechanism and product of the degradation reactions. From this it would seem the fluoride on the 2C carbon of PFOA would be cleaved off, in an SN2 reaction involving a carboxyl group from aspartic acid or glutamic acid. This was shown by Adamu et al.9, and the resulting molecule would contain a hydroxyl group in place of the fluoride. This could potentially spark a method of continued degradation, by oxidizing it to a carbonyl that resembles the carboxyl group of PFOA and could perhaps “trick” the dehalogenase into continuing its degradation on the 3C carbon of PFOA. This is, however, merely speculation.
DeHa 1 and DeHa 5 supposedly cleave the terminal C-C bond of PFOA, but that cannot be explained by these results and will have to be determined experimentally.
The experimental mutagenesis gave variants which showed greater fluorescence in the laboratory (see Results page). This can, however, not be supported by the results from this docking investigation. The only DeHa to recieve at better Docking Score is DeHa5_var1. As will be explained further in Limitations and Assumptions, the binding process is very dynamic, and the rigid docking performed here can therefore be inacurate in relation to the experimental findings.
Limitations and Assumptions
Finding candidates for active sites for docking was done with CD-search and SiteMap. CD-search has a limitation in finding domains that span multiple sections of the peptide, that are brought together in the tertiary structure. In the case of DeHa 4, it was impossible to verify the active sites generated by SiteMap with annotated active sites found experimentally, which reduces the confidence of the results.
As mentioned above in the Results section, the Docking Scores of different enzymes cannot easily be compared. The Docking Scores are the result of a range of parameters and properties, which make the scoring very conditional. We cannot therefore conclude anything directly from the Docking Scores alone.
The docking simulations were performed with standard precision, with standard settings, constraints, and conditions. Results more accurate to the real-life in vitro functioning of the DeHa enzymes could be achieved with greater knowledge of the biology we are working with. However, we are not in possession of such information, and the specificity of our analysis is therefore limited. This would include the allowance of rotation of side chains, Van der Waals radii scaling and specific interactions with solvent water molecules.
Generally, it can be said that these computational approaches greatly simplify the system at hand, since the calculations are based on equations trying to describe the energies of the interaction. In real life, enzymes are in a constant dynamic state with binding pocket changing conformations and allowing for entry of the ligand. Rigid docking has very limited give, in terms of the ligand being allowed to induce structural changes in the enzyme. From this stem the fact, that even though the docking is found to have a high docking score, it does not consider the entry of the ligand to the binding pocket, which is greatly influenced by the dynamic structural changes the enzyme is constantly undergoing. This points out the importance of performing experiments to verify the results.
It should be noted that, the predicted binding affinity alone does not inform on the catalytic activity on the ligand by the enzyme. That is because the docking does not take into account how the induced fit by the ligand and how a change in conformation could reveal residues involved in the mechanisms of catalysis. The residues-ligand interactions still, however, could help elucidate the underlying mechanisms of PFOA dehalogenation, if such catalysis is shown experimentally.
Introduction
The reason behind wanting to perform a virtual screening was to find possible candidates for future investigation. The power of high-throughput methods is to analyze thousands of ligands in a relatively short time span.
PFAS Library
To perform a virtual screening, a library containing various PFAS molecules is needed. From the EPA, a list of molecules created by the Swedish Chemicals Agency, can be found on the CompTox Chemicals Dashboard10. The library includes 2418 entries but is reduced to 1632 when entries with missing structures are removed.
Workflow
To perform the virtual screening, the virtual screening workflow tool in Schrödinger's Maestro was used. Docking was performed with the receptor grids previously generated (see the docking subpage). The ligands were prepared with the LigPrep tool, generating a 3D structure and stereoisomers. No filtering was applied during the ligand selection step. All 1632 ligands were docked using a high throughput docking method, where the top 10% scoring ligands moved on to standard precision docking. Following this, the top 10% scoring ligands were extracted and investigated. Ligands with different stereoisomers generated are removed, and only the best scoring is kept.
The ligands are scored on multiple parameters, where Docking Score and Prime Molecular Mechanics with Generalized Born and Surface Area solvation (MM-GBSA) are of specific interest. The posing of the 3 highest scoring ligands from each dehalogenase were investigated, in a similar manner to the previous investigation on the Docking page.
Prime MM-GBSA vs Docking Score
Prime MM-GBSA is a technology that calculates binding energies, while considering a set of parameters, such as ligand strains, when docking a set of ligands to a single receptor. It is based on a Variable dielectric Surface Generalized Born (VSGB) energy model that includes physics-based corrections for intermolecular interactions, to acquire a closer-to-reality estimation of the binding energies. Prime MM-GBSA can be utilized as a means of post-processing the ligand poses11,12.
Prime MM-GBSA can be used in induced fit docking, where extensive alterations in the tertiary structure of the binding pocket occur. Induced fit docking is however a more complicated endeavor, compared to rigid docking13.
The Docking Score is generated by the rigid docking of the Glide tool. Performing docking with Glide is a simpler procedure, and therefore the primary utilization in this project (see the Docking subpage)14.
Results - Selection of Best Ligands
The number of extracted ligands after curation and clean-up, and the limits used to extract the best three, are given in the following table:
Table 3) An overview of the virtual screening output. The number of ligands found to dock well, and the limits set for each parameter for determining the 3 best ligands are included.
Virtual-screening output
Docking Score limit
Prime MM-GBSA limit
DeHa 1
11 ligands
0.50
0.30
DeHa 2
20 ligands
0.45
0.2
DeHa 4
15 ligands
0.40
0.25
DeHa 5
17 ligands
0.40
0.30
These results are depicted in the scatterplots below (figure 11), with the top 3 ligands depicted with orange dots. The limits were set to sort out the rest as illustrated with the dotted lines.
The following tables summarize the energies of these 3 ligands for each dehalogenase:
Table 4) DeHa 1 results of the 3 ligands evaluated to dock the best.
The following plots show the docking results and ligand interaction diagrams for the best evaluated ligand for each DeHa.
DeHa 1
DeHa 2
DeHa 4
DeHa 5
Concluding Remarks
It was found that all the dehalogenases bound the 3 highest scoring ligands with docking scores greater than that of PFOA. It is found when comparing the ligand-residues interaction diagrams, that many of the same residues from the binding pockets are involved. However, with some of the ligands from the virtual screening, additional strong interactions are present, such as hydrogen bonding and salt bridge interactions in the case of DeHa 1.
It would be of interest for future investigators to investigate the catalytic activity of the enzymes on these compounds, as the docking simulations alone cannot determine whether the dehalogenases are able to degrade these PFAS molecules.
Limitations and Assumptions
When determining the 3 best ligands for each dehalogenase, two parameters were considered. The weighting of these parameters is, however, difficult to determine. Arbitrary decisions had to be made in determining the limits in relation to the parameters. Generally, an attempt was made to minimize the limits in a sensible manner, such that 3 ligands were isolated. Both binding energy evaluations use different algorithms to calculate binding energies. Docking Score is used to compare the high-throughput docking results, with the results from the Docking subpage, as these were evaluated on Docking Score as well.
To ensure realistic computational demands, only high throughput docking, and standard precision docking were applied. Though better posing of the ligands might have come from using extreme precision docking. The computational time would have been significantly longer, which is not desirable for the scale of the project.
Introduction
For our project, it is of interest to investigate in silico how the DeHas could be improved in their ability to degrade PFOA, by inducing mutations. Our experimental approach in the laboratory utilizes random mutations in the sequence of the dehalogenases, but the approach in this section will be directed and supervised, meaning that a knowledge of biochemistry will be applied to induce specific rational substitutions.
The method for this investigation will be based on the methods of the docking previously performed (see the docking subpage). DeHa 1 was chosen to demonstrate the method. The other DeHas will not be discussed here as the method is a lengthy process, and we were only able to explore one enzyme.
After looking at the current interaction with PFOA in the dehalogenase’s binding pockets, a residue of choice will be substituted in relation to the wanted effect and specified restrictions. This choice will be based on which stronger interactions could be achieved by substituting a residue, and substitution restrictions defined by the Point Accepted Mutation (PAM) matrix.
Multiple candidates will be generated. Having substituted this amino acid, a new structure will be generated with ColabFold, minimized, aligned with the wild-type, and docked with PFOA, with the same settings as previously. Then a selection process will occur, with the top performing substitutions receiving an additional substitution, followed by structure prediction, minimization and so on.
Results
Finding candidates for substitution involves looking at the ligand interaction diagram, as well as the spatial orientation of PFOA in the binding pocket. This is done while considering the residues proposed to be involved in the catalysis mechanism, which was investigated in the Docking section.
Considering this and the steric impact of the substitution, by being in accordance with the Point Accepted Mutation (PAM) matrix, the following initial candidates were chosen to strengthen the interaction with the carboxyl group of PFOA: D50E (meaning residue number 50 being substituted from aspartic acid to glutamic acid), G208D, G208E, G208N, S210D, S210E, G211D, G211E and G211N.
First cycle:
Table 8) An overview of the virtual screening output. The number of ligands found to dock well, and the limits set for each parameter for determining the 3 best ligands are included.
Alignment Score
RMSD [Å]
Docking Score [kcal/mol]
G211D
0.002
0.226
-5.552
G211E
0.002
0.227
-4.461
G211N
0.002
0.196
-5.689
S210D
0.000
0.109
-5.078
S210E
0.002
0.203
-2.885
G208D
0.002
0.236
-3.211
G208E
0.004
0.333
-5.245
G208N
0.002
0.243
-3.674
D50E
0.000
0.109
-2.237
Though the S210D serine substitution yielded a good docking score of –5.078 kcal/mol, evaluating the posing of PFOA in this mutated binding pocket shows a strong change in orientation of the molecule. This could possibly affect the DeHa’s ability to degrade the molecule, though it cannot be verified at this stage.
G211N will be built upon in the second iteration, due to the high similarity of the spatial orientation of PFOA in the wild-type and this mutant, which can be seen in figure 16, and the high docking score of - 5.689 kcal/mol.
In this last iteration, the D50E will be used with the G211N mutation even though the docking score was quite low. It is however attempted again due to the position of the aspartic acid, and the fact that it is close to being able to form hydrogen bonding with the carboxyl of PFOA but falls short due to the length of the side chain. Glutamic acid is very similar but has a side chain one carbon longer and could therefore perhaps interact with the carboxyl.
Table 9) An overview of the virtual screening output. The number of ligands found to dock well, and the limits set for each parameter for determining the 3 best ligands are included.
Alignment Score
RMSD [Å]
Docking Score [kcal/mol]
G211N + D50E
0.005
0.346
-3.598
G211N + G208E
0.006
0.376
-2.750
G211N + S210D
0.002
0.215
-5.197
Both the worst scoring mutations fall short on entering the binding pocket. The highest scoring set of mutations does make it inside the binding pocket but is not found to interact similarly to that of the wild-type with PFOA.
Concluding Remarks
The posings found in this investigation are not definite, as molecular interactions are very dynamic, and one of the other mutants with a different orientation of PFOA might be able to bind PFOA in the same way as the wild-type due to the ever-changing structure of the binding pocket. It is therefore worth it to test a range of mutants and take the docking scores as not being definitive evaluations. A general approach to validate the binding is whether the PFOA can enter the binding pocket even though the orientation might be altered by the mutation. Mutations resulting in PFOA binding preferably outside the binding pocket, are deemed the most destructive, as there will likely be no catalysis.
Of the 3 final candidates, G11N + G208E fails to enter the binding pocket during the docking prediction. This combination of mutations likely alters the tertiary structure of the enzyme in the region of the binding pocket, not allowing PFOA to enter. This is not a surprise given the relatively large alignment score and RMSD. G211N + D50E suffers the same fate as it is predicted to also not allow entry of PFOA into the binding pocket.
G211N + S210D scores the highest in the second iteration and would of the 3 candidates be the one that should be investigated experimentally. Additionally, the G211N mutant should be tested in the laboratory, as it appears to bind PFOA in a similar way to that of the wild-type, but with greater strength.
Limitations and Assumptions
When considering possible candidates, it is important to consider the 3D orientation of PFOA in the binding pocket as predicted by Glide. The ligand interaction diagram alone is not sufficient, as it is a 2D projection of posing.
In silico enzyme optimization can be done in many ways. Unsupervised methods where many nucleotide substitutions occur are a very computationally heavy and time-consuming process. Given the biochemical knowledge in the team, we opted for a supervised specific methodology, where very few but specific residues are substituted. An attempt is made to take into account the possible downsides of a substitution by considering PAM, to ensure that substitutions are not unlikely. These could have a greater impact on the tertiary structure of the enzyme, which cannot be directly considered when substituting.
This method has shortcomings compared to unsupervised methods, as it is very narrow and does not investigate a lot of possibilities. It is impossible to know whether some substitutions might enhance the binding and catalytic activity through changes in the tertiary structure. These substitutions could however be found in unsupervised highly computational methods. The combination of glycine residues, G211N + G208E is found in proximity of the carboxyl group of PFOA but cannot easily be substituted for a hydrogen bond forming residue, due to the large change in spatial occupation. This could however be exactly the change that improves the binding, though not likely, and not considered with this method. To investigate whether this could be the case, experiments would have to be carried out. Perhaps other computational methods involving molecular dynamics could give deeper insights.
A downside of the unsupervised method could however be the possibility that the stronger binding affinity comes with a loss of catalytic activity, because this can be highly dependent on very specific residues being present in the enzyme.
With this method, the binding affinity of PFOA alone is considered, while experimental optimization is unsupervised, and the increased binding affinity is the result of changes in various elements. Given that the enzymes already can bind and to some extent degrade PFOA, it is fair to expect that there will not be major structural changes in the dehalogenases from the experimental mutagenesis. Therefore, this computational approach is more similar to the experimental approach.
It should however be noted that this investigation is a simplification of a massively complex area of in silico enzyme engineering, which takes many cycles and iterations in order to achieve results. This is a peek into the possibility of targeted mutagenesis for improving the function of the dehalogenase.
Implementation
Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022). ColabFold: making protein folding accessible to all. Nature Methods, 19(6), 679-682. https://doi.org/10.1038/s41592-022-01488-1
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., . . . Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2
Mariani, V., Biasini, M., Barbato, A., & Schwede, T. (2013). lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics, 29(21), 2722-2728. https://doi.org/10.1093/bioinformatics/btt473
Lu, S., Wang, J., Chitsaz, F., Derbyshire, M. K., Geer, R. C., Gonzales, N. R., Gwadz, M., Hurwitz, D. I., Marchler, G. H., Song, J. S., Thanki, N., Yamashita, R. A., Yang, M., Zhang, D., Zheng, C., Lanczycki, C. J., & Marchler-Bauer, A. (2020). CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res, 48(D1), D265-d268. https://doi.org/10.1093/nar/gkz991
Wang, Y., Xiang, Q., Zhou, Q., Xu, J., & Pei, D. (2021). Mini Review: Advances in 2-Haloacid Dehalogenases [Review]. Frontiers in Microbiology, 12. https://doi.org/10.3389/fmicb.2021.758886
Ghattas, M. A., Raslan, N., Sadeq, A., Al Sorkhy, M., & Atatreh, N. (2016). Druggability analysis and classification of protein tyrosine phosphatase active sites. Drug Des Devel Ther, 10, 3197-3209. https://doi.org/10.2147/dddt.S111443
Ghattas, M. A., Raslan, N., Sadeq, A., Al Sorkhy, M., & Atatreh, N. (2016). Druggability analysis and classification of protein tyrosine phosphatase active sites. Drug Des Devel Ther, 10, 3197-3209. https://doi.org/10.2147/dddt.S111443
Isom, D. G., Castañeda, C. A., Cannon, B. R., Velu, P. D., & García-Moreno E., B. (2010). Charges in the hydrophobic interior of proteins. Proceedings of the National Academy of Sciences, 107(37), 16096-16100. https://doi.org/doi:10.1073/pnas.1004213107
Adamu, A., Wahab, R. A., Shamsir, M. S., Aliyu, F., & Huyop, F. (2017). Deciphering the catalytic amino acid residues of l-2-haloacid dehalogenase (DehL) from Rhizobium sp. RC1: An in silico analysis. Comput Biol Chem, 70, 125-132. https://doi.org/10.1016/j.compbiolchem.2017.08.007
U.S Envitronmental Protection Agency, “The CompTox Chemistry Dashboard: a community data resource for environmental chemistry” https://comptox.epa.gov/dashboard/.
Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.; Pollard, W. T.; Banks, J. L., "Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening," J. Med. Chem., 2004, 47, 1750–1759
Li, J., Abel, R., Zhu, K., Cao, Y., Zhao, S., & Friesner, R. A. (2011). The VSGB 2.0 model: a next generation energy model for high resolution protein structure modeling. Proteins, 79(10), 2794-2812. https://doi.org/10.1002/prot.23106
Jacobson, M. P.; Pincus, D. L.; Rapp, C. S.; Day, T. J. F.; Honig, B.; Shaw, D. E.; Friesner, R. A., "A Hierarchical Approach to All-Atom Protein Loop Prediction," Proteins: Structure, Function and Bioinformatics, 2004, 55, 351-367
Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren,T. A.; Sanschagrin, P. C.; Mainz, D. T., "Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes," J. Med. Chem., 2006, 49, 6177–6196