Model | PuiChing-Macau

Background

As the efficiency of FRET is largely dependent on the degree of the molecular separation and overall spatial arrangement of the two fluorophores involved, further ratification is needed to ensure the reliability of the sequences we designed for our FRET system and thereby validate the credibility of the data acquired from FRET imaging. To accomplish this, we have undertaken the development of a protein model. Our protein model involves a series of scientific methodologies, including sequence predictions, RMSD calculations, protein-protein docking, and distance measurements. We aim to predict and elucidate the interaction between ATRIP-eGFP and RPA1-mCherry, shedding light on the properties exhibited by the two fluorescent proteins as they interact. Our model also serves to outrule the possibility of unexpected or failed experimental outcomes being due to the feasibility of protein binding interactions. As our experiments progressed and our model advanced to measure the separation between fluorescent proteins, our model serves as further proof that the observation of increased FRET after UV treatment was most likely due to the close proximity of the proteins.

Approach 1: Discovory Studio

Initially, we planned to use Discovery Studio’s function ZDOCK to predict the protein conformations. ZDOCK is an algorithm that predicts the structures of protein complexes by leveraging individual protein structures. It also takes into account critical factors such as Pairwise Shape Complementarity (PSC), Desolvation (DE), and Electrostatics (ELEC) to calculate and rank the docking[1].

Below are the equations of the scoring function[1]:

We obtained the structures of the fluorophores (eGFP, mCherry, eCFP, YFP) and their respective DNA damage response (DDR) proteins (ATRIP, RPA1) from Protein Data Bank (PDB). By docking them together, we plan to gain insight into their arrangements, their respective feasibilities, and their binding configurations, and then dock the bound structures together to further our understanding. The results of our attempt are represented in the table below:

Receptor	Ligand	ZDock Score
ATRIP	EGFP	25.02
RPA	mCherry	19.74
ATRIP	ECFP	27.54
RPA	EYFP	17.62

	Figure 1. ZDOCK results of ATRIP-EGFP		Figure 2. ZDOCK results of RPA-mCherry
	Figure 3. ZDOCK results of ATRIP-ECFP		Figure 4. ZDOCK results of RPA-EYFP

We then docked RPA-1 with ATR-ATRIP, intending to combine the result with the fluorophore-receptor docking result for further analysis:

Figure 5, 6. Left: ZDock results of ATR-ATRIP complex-RPA. Right: Graph of cluster, density, and, ZDock score, where each point represents a proposed structural configuration generated by ZDock.

In this approach, six structures from PDB were used:

5YZ0: ATR-ATRIP complex
2B29: RPA
4EUL: EGFP
4ZIO: mCherry
5OX8: ECFP
1YFP: YFP

Results were yielded consisting of clusters, cluster sizes, and ZDock scores. ZDock scores are generated from a scoring algorithm that ranks each pose based on the scoring function mentioned above; the higher the score, the more feasible the pose. In the results of RPA1-ATRATRIP, there is a general downward trend in the number of clusters as ZDock scores increase. This correlation could have been more conclusive evidence to prove our system was effective and efficient, however. Also, after extensive reading in the relative field, we recognized flaws in the previous approach: using the whole existing complex to model is not realistic; it does not complement our vector design, which only includes ATRIP, and the fluorescent proteins were not attaching to the sites we had intended them to. Simply docking the PDB files to form our engineered protein, instead of predicting our own proteins’ structures, neglects the possibility of any structural deformation due to these special circumstances, as FRET and fluorophore-tagged proteins are not naturally occurring.

Improvement

After consulting Bioinformatics Scientist Mr. Edison Ong from Moderna, he advised us to apply an alternative approach and use a different set of software tools to achieve our goals. The recommended method included predicting the protein structure, validating it with existing proteins, docking the structures, and measuring the docked molecular separation. Subsequently, we begin with predicting how the peptide chains from our designed sequences will fold into proteins. At the same time, we found out that the PDB file of RPA-1 we used in Cycle 1.0 does not contain the protein-interacting domain(N-terminal) but only the DNA binding domain(dbd-A, dbd-B, dbd-C). After correcting the sequence, we subsequently began by predicting how the peptide chains from our designed sequences would fold into proteins.

Preparation: Alphafold protein structure prediction

In order to give an accurate representation of the proteins expressed from our vectors, we obtained the amino acid sequences of the involved proteins from Vectorbuilder, the platform where our vectors were designed. Given the lack of known protein structures for our target proteins, ATRIP and RPA1, we employed AlphaFold[2][3] for the prediction of their 3D models. To initiate the process, we utilized the modified amino acid sequence of each protein and subjected it to AlphaFold's simulation.

The sequences used for this prediction are listed below in the format of RefSeq Transcript ID-linker-Marker:

EGFP-ATRIP: (NM_130384.3)-3xGGGGS-EGFP
mCherry-RPA1:(NM_002945.5)-3xGGGGS-mCherry

Figure 7. Pre-Residue Count

Figure 8. Confidence Score

Figure 9, 10. AlphaFold predictions of RPA1-mCherry(left), ATRIP-EGFP(right)

Verification: Using PyMOL to verify AlphaFold's outcomes

To ensure the predicted structure’s accuracy, we aligned AlphaFold’s prediction with existing structures from PDB and calculated the RMSD (Root Mean Square Deviation) of each part of the protein compared to the known structures, using PyMOL. The align function superimposes the provided predicted structure with existing structures and identifies their similarity. In simple terms, a large RMSD indicates an unstable and deviating predicted protein structure and a RMSD below 5 is considered ideal.

Figure 11. PyMOL's analysis of our predicted structure of RPA-mCherry from AlphaFold and existing structures of RPA and mCherry.

Below are the RMSD values of RPA1-mCherry with two structures of RPA70 and mCherry. Codes in parantheses indicate the structure obtained from PDB.

	RPA1-mCherry
RPA70 N-term (2b29)	0.369
RPA70 dbd-A, dbd-B, dbd-C (4gnx-C sequence: dbd-A:181-290, dbd-B:301-422, dbd-C:436-616)	0.494, 0.662, 0.958
mCherry (4zio)	2.581

Below are the RMSD values of ATRIP-EGFP with ATRIP and EGFP. Codes in parantheses indicate the structure obtained from PDB.

	ATRIP-eGFP
ATRIP(5yx0-C)	43.318
eGFP(6b7t-A)	0.298

However, in PyMOL’s calculation of the RMSD of the predicted ATRIP-eGFP to ATRIP, the RMSD value was 43.318, which was extraordinarily high. Still, this is expected, as the reference structure of ATRIP is from an ATR-ATRIP complex not only ATRIP. However, our vector only contains ATRIP as the size of ATR-ATRIP would be too great to put into a single lentiviral vector. Therefore, The ATRIP’s structure will be slightly different from the one in the complex.

Approach 2: ClusPro

We used ClusPro 2.0[4][5][6][7] next, utilizing its CPU, to perform molecular docking simulations. The docking calculations were carried out with hydrophobic-favored coefficients to enhance the accuracy of the results.

E = 0.40E_rep +− 0.40E_att + 600E_elec +2.00E_DARS

A total of 30 models were generated through the docking simulations. Subsequently, we focused our analysis on the top 10 models with the highest scores. Remarkably, all of these models exhibited a range of FRET effectiveness within the distance range of 10-100Å, indicating that our designed bio-reporter system is feasible in terms of configuration and proximity.

Figure 12. ClusPro hydrophobic-favored model no.6, with an angstrom distance of 56.5Å

Figure 13. ClusPro hydrophobic-favored model no.7, with an angstrom distance of 97.4Å

Approach 3: HDOCK

In addition to ClusPro, we opted for HDOCK to dock our predicted structures in hopes that the results from these simulations would be able to co-validate each other. In the case of any disparities in the outcomes of these two algorithms, it would be meaningful to compare the different scores, parameters and configurations that they may provide.

Each HDOCK model comes with two scores, a docking score and a confidence score: The docking scores are calculated by a knowledge-based iterative scoring function. More negative docking scores indicate more likely binding models. However, since the score has not been calibrated to experimental data, it should not be interpreted as the actual binding affinity of two molecules. The confidence score is determined based on the docking score and is designed to indicate the likelihood of binding between the protein-protein/RNA/DNA complexes. Generally, when the confidence score is above 0.7, the two molecules would be very likely to bind in this pose.[8] The calculation of the confidence score is defined as follows:

Confidence_score = 1.0/[1.0+e^{0.02*(Docking_Score+150)}]

Below are the docking scores and confidence scores for the top ten models, which are ranked by their docking scores:

Rank	1	2	3	4	5	6	7	8	9	10
Docking Score	-252.39	-239.71	-231.76	-228.53	-221.35	-217.78	-217.14	-215.96	-214.32	-213.43
Confidence Score	0.8857	0.8574	0.8369	0.8279	0.8064	0.7950	0.7930	0.7890	0.7835	0.7805

Figure 14. The pose with the 1st highest score generated by HDOCK, with an angstrom distance od 132.5Å.

Figure 15. The pose with the 10th highest score generated by HDOCK, with an angstrom distance of 12.7Å.

Using PyMOL to measure the distance

Based on a comprehensive literature review, FRET can be an accurate measurement of molecular proximity within the range of angstrom distances (10–100 Å). Using PyMOL, we analyzed the results of both ClusPro and HDOCK by calculating the angstrom distance between the two fluorophores attached to ATRIP and RPA in the poses generated by these algorithms. Remarkably, all of the top 10 ClusPro models exhibited a range of FRET effectiveness within the distance range of 56.5-97.4Å. The results in HDOCK displayed a wider spectrum, ranging from 12.7Å to 132.5Å. It is important to emphasize that the scores given by ClusPro and HDOCK are not directly correlated with the distances determined by PyMOL; poses with higher scores do not necessarily indicate a larger or smaller degree of separation. However, these highly-ranked poses are more likely to form, so analysing their distance is relatively meaningful as it encompasses a substantial portion of the poses generated.

Future improvements

In this investigation, we obtained results from AlphaFold, then PyMOL, ClusPro and HDOCK, and finally PyMOL again. The results were generally in favor of our system design despite a few discrepancies, such as a high RMSD between ATRIP-EGFP and ATRIP(5yx0-C), and a large range of distances between EGFP and mCherry in the poses generated by HDOCK. After analyzing the results, we believe that further validation can be completed experimentally through western blotting, which is able to confirm the proper folding of the expressed proteins.

References

[1] Chen, R., Li, L., & Weng, Z. (2003). ZDOCK: an initial-stage protein-docking algorithm. Proteins, 52(1), 80–87. https://doi.org/10.1002/prot.10389

[2] Mirdita, M., Schütze, K., Moriwaki, Y., Heo, L., Ovchinnikov, S., & Steinegger, M. (2022). ColabFold: Making protein folding accessible to all. Nature Methods, 19(6), 679-682. https://doi.org/10.1038/s41592-022-01488-1

[3] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A., Ballard, A. J., Cowie, A., Nikolov, S., Jain, R., Adler, J., Back, T., . . . Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2

[4] Desta IT, Porter KA, Xia B, Kozakov D, Vajda S. Performance and Its Limits in Rigid Body Protein-Protein Docking. Structure. 2020 Sep; 28 (9):1071-1081. https://doi.org/10.1016/j.str.2020.06.006

[5] Vajda S, Yueh C, Beglov D, Bohnuud T, Mottarella SE, Xia B, Hall DR, Kozakov D. New additions to the ClusPro server motivated by CAPRI. Proteins: Structure, Function, and Bioinformatics. 2017 Mar; 85(3):435-444. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5313348/pdf/nihms834822.pdf

[6] Kozakov D, Hall DR, Xia B, Porter KA, Padhorny D, Yueh C, Beglov D, Vajda S. The ClusPro web server for protein-protein docking. Nature Protocols. 2017 Feb;12(2):255-278. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5540229/pdf/nihms883869.pdf

[7] Kozakov D, Beglov D, Bohnuud T, Mottarella S, Xia B, Hall DR, Vajda, S. How good is automated protein docking? Proteins: Structure, Function, and Bioinformatics. 2013 Dec; 81(12):2159-66. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3934018/pdf/nihms556382.pdf

[8]Yan, Y., Tao, H., He, J., & Huang, S. (2020). The HDOCK server for integrated protein–protein docking. Nature Protocols, 15(5), 1829-1852. https://doi.org/10.1038/s41596-020-0312-x

[9] Sekar, R. B., & Periasamy, A. (2003). Fluorescence resonance energy transfer (FRET) microscopy imaging of live cell protein localizations. The Journal of cell biology, 160(5), 629–633. https://doi.org/10.1083/jcb.200210140