Introduction

Due to the time limitations of iGEM and the amount of time it takes to acquire results in the laboratory, we wanted to strive towards taking our project further with the use of computational tools, as well as support the findings in the laboratory. The aim of our project was to improve the degradation of PFOA by DeHa catalysis.

Fig 0) The method for predicting protein structure. The open-source tool ColabFold is used, which applies the AlphaFold2 model and MMseqs2 to predict structures. These are then verified by AlphaFold and with FoldSeek.

To perform computational analyses on the dehalogenases, protein structures are needed. With the intention of performing docking simulations, we used cutting-edge machine learning tools to determine the structures of the four dehalogenases.

Method

Given the DNA sequences from the USAFA21 team, a protein structure prediction was performed using ColabFold. The structure was corrected and minimized with the preparation wizard in Schrödinger’s Maestro for further analysis. To verify the integrity of the predictions, the evaluations generated by AlphaFold were considered, and the structural alignment tool, FoldSeek, was used to further verify.

ColabFold

ColabFold is an open-source software that uses AlphaFold2 and MMseqs2 to perform rapid structure predictions¹.

AlphaFold uses a set of similar sequences and homologous structures from databases, to perform multiple sequence alignment and accurately predict the structure of the primary sequence².

To improve the search for similar sequences and homologous structures, ColabFold implements the MMseqs2 software. MMseqs2 enables a more sensitive search at a higher speed, approximately 400 times faster than native AlphaFold, to find these with higher surety.

Evaluation

To evaluate the results generated by the model, it generates 3 quality measure: (1) Predicted Local Distance Difference Test, (2) Predicted Aligned Error and (3) Sequence Coverage.

Predicted Local Distance Difference Test (pLDDT)

The pLDDT is a measure of the plausibility of local distances between residues in the predicted structure³. This evaluation improves the integrity of local domains in the structure, and therefore the confidence of possible functionally important domains.

It scores from 0 to 100, with >90 being very high.

**Fig 1)** The pLDDT results for DeHa 1.

The pLDDT score tends towards <50 at the terminal position of the structure, which removes the structural value of the prediction in this region but indicates a possible disordered domain.

Like DeHa 2, the terminals of tend towards a scoring of <50, removing structural value from these regions.

Like DeHa 2 and DeHa 4, the first ~20 residues and the terminal lacks structural value due to the scores being <50. This is especially seen in the structure of DeHa 5 with the possibly disordered tail region.

Predicted Aligned Error

pLDDT does not verify the position of domains relative to each other. For this, the Predicted Aligned Error is applied. This scores the relative position of each residue to each other in the structure.

The visualization of the scores is depicted in a 2D graph with the residues on each axis, and the colour indicating the predicted distance between a residue in the predicted structure and the true structure. If the error is high, it means that the distance between two given residues is high. The colour will tend towards red from blue. A lower score is preferred.

**Fig 2)** The Predicted Alignment Error plots for the dehalogenases. Starting from the top, the plots are from DeHa 1, DeHa 2, DeHa 4 and DeHa 5

The strong red bands at the left side and top of the graph show that residue 0 to ~10 have very little positional confidence in relation to all other residues. This tells us that there is very little confidence in the relative position between these two domains. This supports the pLDDT scoring, in showing to be a possibly disordered domain at the N-terminus.

Sequence Coverage

The sequence coverage displays the number of homologous along the length of the sequence. The varying colours indicate the level of identity to the homologous. It can be a useful measure for predicting conserved domains if the level of identity to homologues proteins is high.

There is a large decrease in homologous sequences at the beginning of the DeHa 5 sequence, compared to the other Sequence Coverages. The rest of the DeHas show good coverage over their entire sequences, enabling more confident speculation.

Verification with FoldSeek

To further verify the confidence of the structures generated by ColabFold, FoldSeek is applied to find tertiary resemblance to other structures. This will tell us whether the functions of the enzymes match the predicted structures.

FoldSeek is a tool developed to align tertiary structures to those found in protein databases. In this verification, uncharacterized proteins from the databases will be ignored, since we are interested in linking the function to the structures predicted.

Querying the dehalogenases all gave results related to haloacid dehalogenases, with very similar structures. The alignment score (TM-score) and RMSD are given in the table below, and quantify the quality of the alignments:

**Table 1)** An overview of the scoring from FoldSeek. TM-score is a scoring of the alignment, while RMSD is the root-mean-square deviation. The closer TM-score is to 1, the better, and the lower the RMSD, the better.
	TM-Score	RMSD
DeHa 1	0.88061	2.21
DeHa 2	0.86421	2.7
DeHa 4	0.96049	1.27
DeHa 5	0.73167	3.39

**Fig 4)** The alignment between DeHa1 and a structurally similar haloacid dehalogenase.

Concluding Remarks

The predictions show reliable structural evaluation from the pLDDT, for most of the structure for each of the dehalogenases. For DeHa 2, 4 and 5 the terminals lack some structural value, especially DeHa 5. The same can be said about DeHa 5s Predicted Aligned Error score, where a large uncertainty is found in the relative position of two domains, possibly indicating a disordered region. This region in DeHa 5 is also found to have less hits in homologous sequences in the Sequence Coverage.

From the FoldSeek verification it is found that the dehalogenases are similar in tertiary structure to enzymes with functions related to the proposed function of the dehalogenases. DeHa 4 was found to have 45% sequence identity to a fluoroacetate dehalogenase specifically, while the other queried dehalogenases were found to have ~25% sequence identity to haloacid dehalogenases.

DeHa 4 has an almost identical structure to the fluoroacetate dehalogenase, as is shown in the TM-score and RMSD. As can be seen in the alignment with DeHa 5, the long-disordered terminal region is present as suspected from the structure prediction evaluation. This deviation is also found in the scoring of DeHa 5, as it has a lower TM-score of 0.73167 and a higher RMSD of 3.39. This disordered region could be problematic in the docking investigations, if it found to be involved with the active site (see Docking subpage).

These results show high validity in the structural prediction of the dehalogenases, as can be seen in both the evaluation from ColabFold and in the functional likeness from FoldSeek. This will increase the legitimacy of the subsequent investigation using the predicted structures.

Introduction

The degree of improved efficiency can be investigated computationally by calculating relative binding energies using docking, using the structures predicted in the Structure Prediction subpage. By comparing the binding energies before and after Error-prone PCR, we can acquire an estimate of the degree of increased efficiency.

Fig 4) The method used in this docking investigation. A structure prediction is generated with the AlphaFold model. Active sites are then generated and verified, and the docking is performed.

PFOA Binding Prediction

Method

To reach these goals we had to create a pipeline starting with the DeHa protein structures from the protein structure prediction. The figure below shows the steps leading to the docking simulations:

The resulting .pdb file from the structure prediction did not include an annotated active site, which is a requirement in docking simulations. It was deduced using NCBIs Conserved Domain Search (CD-search) tool to find conserved domains, as well as the SiteMap tool in Maestro, which uses algorithms to predict possible active sites. Docking was carried out in Maestro as well, using the Glide tool on receptors generated by the Receptor Generation tool.

These simulations resulted in docking scores found in table 2, and the intermolecular ligand-residues interactions underlying these scores.

Point of Reference

Docking scores are rather relative, and the true results will come from comparison between the optimized and wild-type dehalogenases as they will most likely have similar structures. As a point of reference, docking has been performed on the human heart fatty acid binding protein (7FEK), that has been shown experimentally to bind PFOA12. The resulting docking scored as -7.880 kcal/mol, with the PFOA posing in the binding pocket relative similar, to that which is found in x-ray crystallography, as can be seen in figure 5. Given that the posing of PFOA in the protein has been experimentally found, and our estimated posing is similar, the binding pockets found with SiteMap are accurate. Hydrogen bonding through bound water molecules on the carboxyl group is an intermolecular interaction that stabilizes PFOA in the binding pocket in human heart fatty acid binding protein, along with other interactions – primarily hydrophobic around the fluoride atoms.

**Fig 5)** The posing and ligand interactions of human heart fatty acid binding protein with PFOA.

Results - Identification of Binding Pockets

**Table 2)** An assembly of results from the docking of PFOA to the dehalogenases.
	CD-search result	CD-search shared residues	Dscore	SiteScore	Docking Score [kcal/mol]
DeHa 1	100% (12/12)	9	1.079	0.933	-4.040
DeHa 2	100% (16/16)	3	0.920	0.900	-5.281
DeHa 4	NA	NA	0.881	0.721	-3.603
DeHa 5	100% (12/12)	9	1.035	0.943	-3.665

The table above summarizes the results from the docking of PFOA to the dehalogenases.

The CD-search results showed 100% identity in residues from an active site conserved domain in enzymes from the family of L-haloacid dehalogenase family (CDD: cd02588). There were however no decisive results for DeHa 4, due to a lack of specific annotated functional domains. The hits do, however, include enzymes involved in hydrolysis as well as haloalkane dehalogenases (accession: PRK03204).

The CD-search shared residues are the number of residues shared between the predicted active site of the enzymes from SiteMap, and the residues of the conserved domain from the CD-search. As can be seen in the figure 6-7 showing the predicted active sites in the next section, the spatial positioning of the residues is close to each other, which validates the prediction from SiteMap.

Dscores and SiteScores vary between the dehalogenases, with DeHa 4 scoring the worst. They all, however, have scores showing promising binding pockets.

The Docking Scores are the predicted energies of the binding. The lower the binding energy, the better the binding. The docking scores cannot be directly compared, as is explained later in the Limitations and Assumptions section. However, it is still an indicator of the ability of the enzyme to bind a ligand.

L-2 Haloacid Dehalogenase Family

Members of the L-2 haloacid dehalogenase family are involved in hydrolytic dehalogenation and convert L-2-haloacids to D-2-hydroxyacids, releasing the halogen. The substrates and specific halogens hydrolyzed depend on the structure of the enzyme and the catalytic mechanism, in which there is great diversity in the family.

The S2N nucleophilic substitution involves a carboxyl-group in residues aspartic acid or glutamic acid^4,5.

SiteScore and Dscore

SiteScore is based on various properties such as the number of residues in the pocket and hydrophobicity. A score greater than 0.80 has been found to distinguish probable binding sites. A score greater than 1 is a strong indicator of a site being likely to bind ligands⁶

Dscore evaluates the druggability (how well a drug would bind the pocket) of a pocket. It is based on the same properties but weighs them differently because some sites might bind a ligand tightly but are not druggable.

This fact can be imagined considering a pocket with specific charged residues. They will bind a specific ligand tightly, but most drugs might not be able to interact with these residues, and therefore bind in the pocket. A score greater than 0.80 has been found to be druggable. A score greater than 1 is associated with very druggable sites.

Results - Posing and Ligand Interactions

The intermolecular interactions formed between PFOA and the dehalogenases, and the orientation of PFOA in the binding pockets, are depicted in the following section. In the figures 6-7 below, the SiteMap predicted binding pockets will be colored red, and the conserved domains associated with the proposed function of the dehalogenases will be colored green. Residues that are found with both SiteMap and CD-search will be colored green as well.

**Fig 6)** DeHa 1 with the most probable binding pocket illustrated. The red residues are a result of the SiteMap prediction, and the green residues are a result of the conserved domain analysis.

**Fig 7)** The posing of PFOA in DeHa 1.

**Fig 8)** The ligand-receptor interactions on an intermolecular level in DeHa 1. The primary interactions being polar, electrostatic and hydrophobic.

Results - Experimental Optimization

Experimentally, a set of variants were generated with Error-Prone PCR. These have been assesed, and 3 of them are evaluated to be of interest to investigate computationally with docking, similarly to what was done in the previous section. Read more about the mutations and the sequences on the Results page.

The 3 promosing variants are DeHa1 variant number 5 (DeHa1_var5), DeHa2 variant 8 (DeHa2_var8) and DeHa5 variant 1 (DeHa5_var1), due to their level of fluorescence when exposed to PFOA. The variants underwent the same steps of strucutral prediction, minimization, active site identification, receptor generation and docking, as previously.

Table 3 below displays the Dscore, SiteScore and Docking Score for the variants. In figure 9 below, the posing of PFOA in these variants are shown. In figure 10, the ligand interactions are shown.

**Fig 9)** The posing of PFOA in DeHa1_var5.

**Fig 10)** The ligand-receptor interactions on an intermolecular level in DeHa1_var5.

**Table 2)** An assembly of results from the docking of PFOA to the dehalogenases.
	Dscore	SiteScore	Docking Score [kcal/mol]
DeHa1_var5	0.860	1.091	-3.328
DeHa2_var8	1.045	1.029	-4.724
DeHa5_var1	1.123	1.091	-4.074

Concluding Remarks

Proposed active sites were detected with confidence:

Conserved domains were detected for the DeHas, except DeHa 4, verifying the catalytic function of the enzymes.
These domains were in close proximity of the active sites detected with SiteMap, making the docking performed with these sites more valid.

The significance of SiteScores and Dscores depends on the system, receptor, and ligand. PFOA is a mainly hydrophobic molecule, making Dscore a better indicator of promising binding pockets, due to the difference in penalization of hydrophobicity⁷.

Generally, common intermolecular interactions were seen: hydrogen bonding with the carboxyl group, hydrophobic interactions around the fluoride residues, and some charged interactions, though these are somewhat unexpected given the hydrophobic nature of PFOA. Usually, it is found that charged residues are located on the surface of the protein unless the natural ligand itself is charged⁸.

The conserved domains found to be from L-2 haloacid dehalogenases could help clarify the mechanism and product of the degradation reactions. From this it would seem the fluoride on the 2C carbon of PFOA would be cleaved off, in an SN2 reaction involving a carboxyl group from aspartic acid or glutamic acid. This was shown by Adamu et al.⁹, and the resulting molecule would contain a hydroxyl group in place of the fluoride. This could potentially spark a method of continued degradation, by oxidizing it to a carbonyl that resembles the carboxyl group of PFOA and could perhaps “trick” the dehalogenase into continuing its degradation on the 3C carbon of PFOA. This is, however, merely speculation.

DeHa 1 and DeHa 5 supposedly cleave the terminal C-C bond of PFOA, but that cannot be explained by these results and will have to be determined experimentally.

The experimental mutagenesis gave variants which showed greater fluorescence in the laboratory (see Results page). This can, however, not be supported by the results from this docking investigation. The only DeHa to recieve at better Docking Score is DeHa5_var1. As will be explained further in Limitations and Assumptions, the binding process is very dynamic, and the rigid docking performed here can therefore be inacurate in relation to the experimental findings.

Limitations and Assumptions

Finding candidates for active sites for docking was done with CD-search and SiteMap. CD-search has a limitation in finding domains that span multiple sections of the peptide, that are brought together in the tertiary structure. In the case of DeHa 4, it was impossible to verify the active sites generated by SiteMap with annotated active sites found experimentally, which reduces the confidence of the results.

As mentioned above in the Results section, the Docking Scores of different enzymes cannot easily be compared. The Docking Scores are the result of a range of parameters and properties, which make the scoring very conditional. We cannot therefore conclude anything directly from the Docking Scores alone.

The docking simulations were performed with standard precision, with standard settings, constraints, and conditions. Results more accurate to the real-life in vitro functioning of the DeHa enzymes could be achieved with greater knowledge of the biology we are working with. However, we are not in possession of such information, and the specificity of our analysis is therefore limited. This would include the allowance of rotation of side chains, Van der Waals radii scaling and specific interactions with solvent water molecules.

Generally, it can be said that these computational approaches greatly simplify the system at hand, since the calculations are based on equations trying to describe the energies of the interaction. In real life, enzymes are in a constant dynamic state with binding pocket changing conformations and allowing for entry of the ligand. Rigid docking has very limited give, in terms of the ligand being allowed to induce structural changes in the enzyme. From this stem the fact, that even though the docking is found to have a high docking score, it does not consider the entry of the ligand to the binding pocket, which is greatly influenced by the dynamic structural changes the enzyme is constantly undergoing. This points out the importance of performing experiments to verify the results.

It should be noted that, the predicted binding affinity alone does not inform on the catalytic activity on the ligand by the enzyme. That is because the docking does not take into account how the induced fit by the ligand and how a change in conformation could reveal residues involved in the mechanisms of catalysis. The residues-ligand interactions still, however, could help elucidate the underlying mechanisms of PFOA dehalogenation, if such catalysis is shown experimentally.

Introduction

The reason behind wanting to perform a virtual screening was to find possible candidates for future investigation. The power of high-throughput methods is to analyze thousands of ligands in a relatively short time span.

Fig 10) The method used in this section. The structures generated previously are used with a PFAS library to perfrom a virtual screening.

PFAS Library

To perform a virtual screening, a library containing various PFAS molecules is needed. From the EPA, a list of molecules created by the Swedish Chemicals Agency, can be found on the CompTox Chemicals Dashboard¹⁰. The library includes 2418 entries but is reduced to 1632 when entries with missing structures are removed.

Workflow

To perform the virtual screening, the virtual screening workflow tool in Schrödinger's Maestro was used. Docking was performed with the receptor grids previously generated (see the docking subpage). The ligands were prepared with the LigPrep tool, generating a 3D structure and stereoisomers. No filtering was applied during the ligand selection step. All 1632 ligands were docked using a high throughput docking method, where the top 10% scoring ligands moved on to standard precision docking. Following this, the top 10% scoring ligands were extracted and investigated. Ligands with different stereoisomers generated are removed, and only the best scoring is kept.

The ligands are scored on multiple parameters, where Docking Score and Prime Molecular Mechanics with Generalized Born and Surface Area solvation (MM-GBSA) are of specific interest. The posing of the 3 highest scoring ligands from each dehalogenase were investigated, in a similar manner to the previous investigation on the Docking page.

Prime MM-GBSA vs Docking Score

Prime MM-GBSA is a technology that calculates binding energies, while considering a set of parameters, such as ligand strains, when docking a set of ligands to a single receptor. It is based on a Variable dielectric Surface Generalized Born (VSGB) energy model that includes physics-based corrections for intermolecular interactions, to acquire a closer-to-reality estimation of the binding energies. Prime MM-GBSA can be utilized as a means of post-processing the ligand poses^11,12.

Prime MM-GBSA can be used in induced fit docking, where extensive alterations in the tertiary structure of the binding pocket occur. Induced fit docking is however a more complicated endeavor, compared to rigid docking¹³.

The Docking Score is generated by the rigid docking of the Glide tool. Performing docking with Glide is a simpler procedure, and therefore the primary utilization in this project (see the Docking subpage)¹⁴.

Results - Selection of Best Ligands

The number of extracted ligands after curation and clean-up, and the limits used to extract the best three, are given in the following table:

**Table 3)** An overview of the virtual screening output. The number of ligands found to dock well, and the limits set for each parameter for determining the 3 best ligands are included.
	Virtual-screening output	Docking Score limit	Prime MM-GBSA limit
DeHa 1	11 ligands	0.50	0.30
DeHa 2	20 ligands	0.45	0.2
DeHa 4	15 ligands	0.40	0.25
DeHa 5	17 ligands	0.40	0.30

These results are depicted in the scatterplots below (figure 11), with the top 3 ligands depicted with orange dots. The limits were set to sort out the rest as illustrated with the dotted lines.

⇦ ⇨

Fig 11) Scatterplots showing the ligands discovered during the virtual screening. The 3 ligands assessed to be better are shown with orange dots. The striped lines show the limits with respect to each parameter. Refer to the Limitations and Assumptions section at the bottom of the page for more information.

The following tables summarize the energies of these 3 ligands for each dehalogenase:

**Table 4)** DeHa 1 results of the 3 ligands evaluated to dock the best.
	Prime MM-GBSA	Docking Score
Untitled_Molecule_000188	-30.02	-5.836
2,2,3,3,4,4,5,5,6,6,7,7-Dodecafluorooctanediamide	-14.80	-6.156
2,2,3,3,4,4,4-Heptafluoro-1-(1H-imidazol-1-yl)butan-1-one	-13.88	-5.922

**Table 5)** DeHa 2 results of the 3 ligands evaluated to dock the best.
	Prime MM-GBSA	Docking Score
Potassium 2,3,4,5-tetrachloro-6-({3-[(1,1,2,2,3,3,4,4,4-nonafluorobutane-1-sulfonyl)oxy]phenyl}carbamoyl)benzoate	-39.34	-6.015
Octafluoronaphthalene	-38.45	-6.528
Lithium ({1,4-bis[(3,3,4,4,5,5,6,6,6-nonafluorohexyl)oxy]-1,4-dioxobutan-2-yl}sulfanyl)acetate	-36.82	-6.109

**Table 6)** DeHa 4 results of the 3 ligands evaluated to dock the best.
	Prime MM-GBSA	Docking Score
Untitled_Molecule_000085	-24.78	-5.648
2,2,3,3,3-Pentafluoro-1-(1H-imidazol-1-yl)propan-1-one	-22.05	-5.607
1,1,2,2-Tetrafluoro-N,N-dimethylethan-1-amine	-19.12	-5.572

**Table 7)** DeHa 5 results of the 3 ligands evaluated to dock the best.
	Prime MM-GBSA	Docking Score
7-[9-(4,4,5,5,5-Pentafluoropentane-1-sulfonyl)nonyl]estra-1,3,5(10)-triene-3,17beta-diol	-56.41	-5.779
7-{9-[(4,4,5,5,5-Pentafluoropentyl)sulfanyl]nonyl}estra-1,3,5(10)-triene-3,17beta-diol	-56.21	-5.667
3-[(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,8-Heptadecafluorooctane-1-sulfonyl){3-[(2-hydroxyethyl)(dimethyl)azaniumyl]propyl}amino]propane-1-sulfonate	-30.13	-5.721

Results - Docking and Interactions

The following plots show the docking results and ligand interaction diagrams for the best evaluated ligand for each DeHa.

DeHa 1

DeHa 2

DeHa 4

DeHa 5

Concluding Remarks

It was found that all the dehalogenases bound the 3 highest scoring ligands with docking scores greater than that of PFOA. It is found when comparing the ligand-residues interaction diagrams, that many of the same residues from the binding pockets are involved. However, with some of the ligands from the virtual screening, additional strong interactions are present, such as hydrogen bonding and salt bridge interactions in the case of DeHa 1.

It would be of interest for future investigators to investigate the catalytic activity of the enzymes on these compounds, as the docking simulations alone cannot determine whether the dehalogenases are able to degrade these PFAS molecules.

Limitations and Assumptions

When determining the 3 best ligands for each dehalogenase, two parameters were considered. The weighting of these parameters is, however, difficult to determine. Arbitrary decisions had to be made in determining the limits in relation to the parameters. Generally, an attempt was made to minimize the limits in a sensible manner, such that 3 ligands were isolated. Both binding energy evaluations use different algorithms to calculate binding energies. Docking Score is used to compare the high-throughput docking results, with the results from the Docking subpage, as these were evaluated on Docking Score as well.

To ensure realistic computational demands, only high throughput docking, and standard precision docking were applied. Though better posing of the ligands might have come from using extreme precision docking. The computational time would have been significantly longer, which is not desirable for the scale of the project.

Introduction

For our project, it is of interest to investigate in silico how the DeHas could be improved in their ability to degrade PFOA, by inducing mutations. Our experimental approach in the laboratory utilizes random mutations in the sequence of the dehalogenases, but the approach in this section will be directed and supervised, meaning that a knowledge of biochemistry will be applied to induce specific rational substitutions.

Fig 13) The method used in the optimization. A cyclic approach is taken, where multiple iteration are performed.

The method for this investigation will be based on the methods of the docking previously performed (see the docking subpage). DeHa 1 was chosen to demonstrate the method. The other DeHas will not be discussed here as the method is a lengthy process, and we were only able to explore one enzyme.

After looking at the current interaction with PFOA in the dehalogenase’s binding pockets, a residue of choice will be substituted in relation to the wanted effect and specified restrictions. This choice will be based on which stronger interactions could be achieved by substituting a residue, and substitution restrictions defined by the Point Accepted Mutation (PAM) matrix.

Multiple candidates will be generated. Having substituted this amino acid, a new structure will be generated with ColabFold, minimized, aligned with the wild-type, and docked with PFOA, with the same settings as previously. Then a selection process will occur, with the top performing substitutions receiving an additional substitution, followed by structure prediction, minimization and so on.

Results

Finding candidates for substitution involves looking at the ligand interaction diagram, as well as the spatial orientation of PFOA in the binding pocket. This is done while considering the residues proposed to be involved in the catalysis mechanism, which was investigated in the Docking section.

Considering this and the steric impact of the substitution, by being in accordance with the Point Accepted Mutation (PAM) matrix, the following initial candidates were chosen to strengthen the interaction with the carboxyl group of PFOA: D50E (meaning residue number 50 being substituted from aspartic acid to glutamic acid), G208D, G208E, G208N, S210D, S210E, G211D, G211E and G211N.

First cycle:

**Table 8)** An overview of the virtual screening output. The number of ligands found to dock well, and the limits set for each parameter for determining the 3 best ligands are included.
	Alignment Score	RMSD [Å]	Docking Score [kcal/mol]
G211D	0.002	0.226	-5.552
G211E	0.002	0.227	-4.461
G211N	0.002	0.196	-5.689
S210D	0.000	0.109	-5.078
S210E	0.002	0.203	-2.885
G208D	0.002	0.236	-3.211
G208E	0.004	0.333	-5.245
G208N	0.002	0.243	-3.674
D50E	0.000	0.109	-2.237

Though the S210D serine substitution yielded a good docking score of –5.078 kcal/mol, evaluating the posing of PFOA in this mutated binding pocket shows a strong change in orientation of the molecule. This could possibly affect the DeHa’s ability to degrade the molecule, though it cannot be verified at this stage.

G211N will be built upon in the second iteration, due to the high similarity of the spatial orientation of PFOA in the wild-type and this mutant, which can be seen in figure 16, and the high docking score of - 5.689 kcal/mol.

In this last iteration, the D50E will be used with the G211N mutation even though the docking score was quite low. It is however attempted again due to the position of the aspartic acid, and the fact that it is close to being able to form hydrogen bonding with the carboxyl of PFOA but falls short due to the length of the side chain. Glutamic acid is very similar but has a side chain one carbon longer and could therefore perhaps interact with the carboxyl.

**Table 9)** An overview of the virtual screening output. The number of ligands found to dock well, and the limits set for each parameter for determining the 3 best ligands are included.
	Alignment Score	RMSD [Å]	Docking Score [kcal/mol]
G211N + D50E	0.005	0.346	-3.598
G211N + G208E	0.006	0.376	-2.750
G211N + S210D	0.002	0.215	-5.197

Both the worst scoring mutations fall short on entering the binding pocket. The highest scoring set of mutations does make it inside the binding pocket but is not found to interact similarly to that of the wild-type with PFOA.

**Fig 16)** The binding of PFOA in the wild-type with the residues of interst tagged.

Concluding Remarks

The posings found in this investigation are not definite, as molecular interactions are very dynamic, and one of the other mutants with a different orientation of PFOA might be able to bind PFOA in the same way as the wild-type due to the ever-changing structure of the binding pocket. It is therefore worth it to test a range of mutants and take the docking scores as not being definitive evaluations. A general approach to validate the binding is whether the PFOA can enter the binding pocket even though the orientation might be altered by the mutation. Mutations resulting in PFOA binding preferably outside the binding pocket, are deemed the most destructive, as there will likely be no catalysis.

Of the 3 final candidates, G11N + G208E fails to enter the binding pocket during the docking prediction. This combination of mutations likely alters the tertiary structure of the enzyme in the region of the binding pocket, not allowing PFOA to enter. This is not a surprise given the relatively large alignment score and RMSD. G211N + D50E suffers the same fate as it is predicted to also not allow entry of PFOA into the binding pocket.

G211N + S210D scores the highest in the second iteration and would of the 3 candidates be the one that should be investigated experimentally. Additionally, the G211N mutant should be tested in the laboratory, as it appears to bind PFOA in a similar way to that of the wild-type, but with greater strength.

Limitations and Assumptions

When considering possible candidates, it is important to consider the 3D orientation of PFOA in the binding pocket as predicted by Glide. The ligand interaction diagram alone is not sufficient, as it is a 2D projection of posing.

In silico enzyme optimization can be done in many ways. Unsupervised methods where many nucleotide substitutions occur are a very computationally heavy and time-consuming process. Given the biochemical knowledge in the team, we opted for a supervised specific methodology, where very few but specific residues are substituted. An attempt is made to take into account the possible downsides of a substitution by considering PAM, to ensure that substitutions are not unlikely. These could have a greater impact on the tertiary structure of the enzyme, which cannot be directly considered when substituting.

This method has shortcomings compared to unsupervised methods, as it is very narrow and does not investigate a lot of possibilities. It is impossible to know whether some substitutions might enhance the binding and catalytic activity through changes in the tertiary structure. These substitutions could however be found in unsupervised highly computational methods. The combination of glycine residues, G211N + G208E is found in proximity of the carboxyl group of PFOA but cannot easily be substituted for a hydrogen bond forming residue, due to the large change in spatial occupation. This could however be exactly the change that improves the binding, though not likely, and not considered with this method. To investigate whether this could be the case, experiments would have to be carried out. Perhaps other computational methods involving molecular dynamics could give deeper insights.

A downside of the unsupervised method could however be the possibility that the stronger binding affinity comes with a loss of catalytic activity, because this can be highly dependent on very specific residues being present in the enzyme.

With this method, the binding affinity of PFOA alone is considered, while experimental optimization is unsupervised, and the increased binding affinity is the result of changes in various elements. Given that the enzymes already can bind and to some extent degrade PFOA, it is fair to expect that there will not be major structural changes in the dehalogenases from the experimental mutagenesis. Therefore, this computational approach is more similar to the experimental approach.

It should however be noted that this investigation is a simplification of a massively complex area of in silico enzyme engineering, which takes many cycles and iterations in order to achieve results. This is a peek into the possibility of targeted mutagenesis for improving the function of the dehalogenase.

Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022). ColabFold: making protein folding accessible to all. Nature Methods, 19(6), 679-682. https://doi.org/10.1038/s41592-022-01488-1
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., . . . Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2
Mariani, V., Biasini, M., Barbato, A., & Schwede, T. (2013). lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics, 29(21), 2722-2728. https://doi.org/10.1093/bioinformatics/btt473
Lu, S., Wang, J., Chitsaz, F., Derbyshire, M. K., Geer, R. C., Gonzales, N. R., Gwadz, M., Hurwitz, D. I., Marchler, G. H., Song, J. S., Thanki, N., Yamashita, R. A., Yang, M., Zhang, D., Zheng, C., Lanczycki, C. J., & Marchler-Bauer, A. (2020). CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res, 48(D1), D265-d268. https://doi.org/10.1093/nar/gkz991
Wang, Y., Xiang, Q., Zhou, Q., Xu, J., & Pei, D. (2021). Mini Review: Advances in 2-Haloacid Dehalogenases [Review]. Frontiers in Microbiology, 12. https://doi.org/10.3389/fmicb.2021.758886
Ghattas, M. A., Raslan, N., Sadeq, A., Al Sorkhy, M., & Atatreh, N. (2016). Druggability analysis and classification of protein tyrosine phosphatase active sites. Drug Des Devel Ther, 10, 3197-3209. https://doi.org/10.2147/dddt.S111443
Ghattas, M. A., Raslan, N., Sadeq, A., Al Sorkhy, M., & Atatreh, N. (2016). Druggability analysis and classification of protein tyrosine phosphatase active sites. Drug Des Devel Ther, 10, 3197-3209. https://doi.org/10.2147/dddt.S111443
Isom, D. G., Castañeda, C. A., Cannon, B. R., Velu, P. D., & García-Moreno E., B. (2010). Charges in the hydrophobic interior of proteins. Proceedings of the National Academy of Sciences, 107(37), 16096-16100. https://doi.org/doi:10.1073/pnas.1004213107
Adamu, A., Wahab, R. A., Shamsir, M. S., Aliyu, F., & Huyop, F. (2017). Deciphering the catalytic amino acid residues of l-2-haloacid dehalogenase (DehL) from Rhizobium sp. RC1: An in silico analysis. Comput Biol Chem, 70, 125-132. https://doi.org/10.1016/j.compbiolchem.2017.08.007
U.S Envitronmental Protection Agency, “The CompTox Chemistry Dashboard: a community data resource for environmental chemistry” https://comptox.epa.gov/dashboard/.
Halgren, T. A.; Murphy, R. B.; Friesner, R. A.; Beard, H. S.; Frye, L. L.; Pollard, W. T.; Banks, J. L., "Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening," J. Med. Chem., 2004, 47, 1750–1759
Li, J., Abel, R., Zhu, K., Cao, Y., Zhao, S., & Friesner, R. A. (2011). The VSGB 2.0 model: a next generation energy model for high resolution protein structure modeling. Proteins, 79(10), 2794-2812. https://doi.org/10.1002/prot.23106
Jacobson, M. P.; Pincus, D. L.; Rapp, C. S.; Day, T. J. F.; Honig, B.; Shaw, D. E.; Friesner, R. A., "A Hierarchical Approach to All-Atom Protein Loop Prediction," Proteins: Structure, Function and Bioinformatics, 2004, 55, 351-367
Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren,T. A.; Sanschagrin, P. C.; Mainz, D. T., "Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes," J. Med. Chem., 2006, 49, 6177–6196

Model

Introduction

Method

ColabFold

Evaluation

Predicted Local Distance Difference Test (pLDDT)

Predicted Aligned Error

Sequence Coverage

Verification with FoldSeek

Concluding Remarks

Introduction

PFOA Binding Prediction

Method

Point of Reference

Results - Identification of Binding Pockets

L-2 Haloacid Dehalogenase Family

SiteScore and Dscore

Results - Posing and Ligand Interactions

Results - Experimental Optimization

Concluding Remarks

Limitations and Assumptions

Introduction

PFAS Library

Workflow

Prime MM-GBSA vs Docking Score

Results - Selection of Best Ligands

Results - Docking and Interactions

DeHa 1

DeHa 2

DeHa 4

DeHa 5

Concluding Remarks

Limitations and Assumptions

Introduction

Results

First cycle:

Concluding Remarks

Limitations and Assumptions

Implementation

References