Introduction

Structure

Prediction

Docking

Quantum Computation

MD Simulation

Summary

Reference

Introduction

Through the previous experiments conducted by Professor Zhou's team, it was found that oncolytic viruses can bind to the transferrin receptor 1 (TfR1) on the surface of tumor cells by interacting with the NA protein on their surface, thereby mediating viral internalization. Therefore, the initial experiments will focus on these two proteins.

Our physical simulations primarily include protein structure prediction, protein-protein docking, quantum chemistry calculations, and molecular dynamics simulations.

Structure of Human Transferrin Receptor

The transferrin receptor (TfR) is involved in a dynamic process of endocytosis and subsequent reemergence at the cell surface. During this process, the receptor internalizes iron-loaded transferrin (Tf) through clathrin-mediated endocytosis. Once inside the cell, iron is released from Tf in the endosome, allowing the TfR to recycle apotransferrin back to the cell surface for further use.

Fig. 1. Structure of TfR1 on plasma membrane.

Human TfR1 is a homodimeric type II transmembrane protein consisting of two 90-kD subunits. Each subunit comprises a short NH2-terminal cytoplasmic region (residues 1 to 67) containing the internalization motif YTRF, followed by a single transmembrane pass (residues 68 to 88). The large extracellular portion (ectodomain, residues 89 to 760) of each subunit contains a binding site for the 80-kD Tf molecule.

The TfR1 dimer has a globular extracellular structure, separated from the membrane by a stalk of about 30Å, which includes residues immediately following the transmembrane pass and an O-linked glycan at Thr104, along with two intermolecular disulfide bonds (formed by Cys89 and Cys98). Notably, the intermolecular disulfide bonds are not required for dimerization.

Fig. 2. Individual TfR domains.

In the ribbon diagrams, we can observe three domains of the protein structure: domain I, which is the protease-like domain (labeled as A); domain II, known as the apical domain (labeled as B); and domain III, referred to as the helical domain (labeled as C). The secondary structure elements are appropriately labeled on the diagrams. In the text, we identify these elements first based on their respective domain numbers and then based on their linear order within each domain[1].

Structure of Neuraminidase

The viral NA (Neuraminidase) assembles as a tetramer of four identical polypeptides and constitutes approximately 10-20% of the total glycoproteins on the virion surface when embedded in the viral envelope. An average-sized virion of 120 nm typically possesses around 40-50 NA spikes and 300-400 HA spikes[2][3].Each of the four NA monomers consists of approximately 470 amino acids and folds into four distinct structural domains: the cytoplasmic tail, the transmembrane region, the stalk, and the catalytic head.

Fig. 3. NA exists as a tetramer of four identical monomers.

The NA active site comprises two parts: an inner shell consisting of eight highly conserved residues (Arg118, Asp151, Arg152, Arg224, Glu276, Arg292, Arg371, and Tyr406) that directly interact with sialic acids. Additionally, there is an outer shell of ten residues (Glu119, Arg156, Trp178, Ser179, Asp198, Ile222, Glu227, Glu277, Asn294, and Glu425) which do not interact with sialic acid but play a crucial structural role, defined as framework residues[4].Among the inner shell residues, three arginine residues (Arg118, 292, 371) interact with the carboxylate of the sialic acid substrate. Arg152 binds to the acetamido group on the sugar ring, while Glu276 interacts with the 8- and 9-hydroxyl groups on the glycerol side chain. The active site of the enzyme is highly conserved in terms of spatial orientation and sequence properties, making it an excellent target for drug inhibition.

Fig. 4. The catalytic sites responsible for cleaving the sugar residues.

Structure Prediction With Alphafold

Abstract

Scientific Problems

Proteins play a crucial role in life, and understanding their structure is essential to comprehend their functions. Although significant efforts have led to the determination of structures for about 100,000 unique proteins, this is just a fraction of the vast number of known protein sequences. The process of determining protein structures is time-consuming, causing a bottleneck in achieving comprehensive structural coverage. To overcome this limitation, accurate computational methods are necessary to predict a protein's three-dimensional structure based solely on its amino acid sequence, a longstanding problem known as the "protein folding problem."

Solution ideas

In recent years, with the continuous development of computer hardware and algorithms, artificial intelligence, especially neural networks, has made significant advancements. In addressing the challenging task of protein structure prediction from sequences, researchers have started to explore the use of neural networks as a potential solution.One of the most remarkable advancements in this field is AlphaFold2.

Expected results
  • Investigating structural changes due to amino acid Insertion and deletion and in the stalk region of the NA monomer.
  • Exploring the feasibility of the project at the structural level.
  • Evaluating the importance of homologous sequences in protein structure prediction.

Technical Principles

AlphaFold 2 is a cutting-edge deep learning system developed by DeepMind, an artificial intelligence research lab under Alphabet Inc. It is designed to predict the 3D structure of proteins, a crucial task in the field of biology and bioinformatics. Released in 2020, AlphaFold 2 utilizes advanced deep learning techniques, including deep convolutional neural networks and attention mechanisms, to accurately predict the folding of proteins from their amino acid sequences[5].

Fig. 5. Architectural details of alphafold2.

1.Trunk of the Network (Evoformer):

  • The inputs to AlphaFold are the primary amino acid sequence of a protein and aligned sequences of homologous proteins (Multiple Sequence Alignments or MSAs).
  • The trunk of the network processes these inputs through repeated layers of a novel neural network block called the Evoformer. The Evoformer incorporates attention-based and non-attention-based components to exchange information within the MSA and pair representations.
  • The MSA representation is initialized with the raw MSA and is continuously refined as the processing proceeds. The Evoformer allows direct reasoning about the spatial and evolutionary relationships of the protein's residues.
  • The trunk produces two arrays: an Nseq x Nres array representing the processed MSA and an Nres x Nres array representing residue pairs.

2.Structure Module:

  • The structure module takes the processed MSA and residue pair representations as inputs and introduces an explicit 3D structure in the form of rotations and translations for each residue of the protein (global rigid body frames).
  • Key innovations in this section include breaking the chain structure to allow simultaneous local refinement of all parts of the structure, a novel equivariant transformer to reason about unrepresented side-chain atoms, and a loss term that emphasizes the orientational correctness of the residues.
  • The network reinforces the notion of iterative refinement by repeatedly applying the final loss to outputs and feeding the outputs recursively into the same modules (recycling), contributing significantly to accuracy with minor extra training time.

Besides,the above is just a rough introduction to AlphaFold2. For more information about AlphaFold, please refer to its supplementary materials.[4]

Results and Discussion

Predicted Structure

Our project experimentally inserted amino acids into the stalk region of the NA receptor on the virus surface. To explore the structural changes of NA before and after the insertion of amino acids, we utilized alphafold2 to predict the structures of NA-Insert-14, NA-Insert-28, NA-Delete-9, NA-Delete-24 respectively(The numbers following indicate the quantity of inserted or deleted amino acids).The predictions were conducted on Google Colab platform.

pLDDT (predicted Local Distance Difference Test) evaluates the deviation of the distance between each pair of amino acid residues in the predicted structure from statistically acceptable distance distributions. Higher plDDT scores indicate better quality of the predicted structure.

PAE (Predicted Aligned Error) measures the average root-mean-square error of the backbone atoms between the predicted structure and its sequence comparative template. Lower PAE values indicate the predicted structure aligns better with the template structure and thus is of higher quality.

An MSA contains multiple homologous protein sequences aligned together.The per-residue counts provide an indication of how conserved each alignment position is and higher counts indicate more conserved positions. Regions with very low per-residue counts are more likely to be variable or gapped regions.

As for Insert-14, its predicted structure(the portion highlighted in blue represents the predicted results of insertion segment), MSA, pLDDT, and pAE are as follows:

Fig. 6-1 The predicted structure of NA with the insertion of 14 amino acids was determined using AlphaFold2.
Fig. 6-2 The multiple sequence alignment of NA with the insertion of 14 amino acids.
Fig. 6-3 The predicted Local Distance Difference Test and predicted aligned error of NA with the insertion of 14 amino acids.We can see that the pLDDT score is lower in the region where amino acids are inserted, indicating that the predictions for this region are relatively unreliable.

As for Insert-28, its predicted structure(the portion highlighted in blue represents the predicted results of insertion segment), MSA, pLDDT, and pAE are as follows:

Fig. 7-1 The predicted structure of NA with the insertion of 28 amino acids was determined using AlphaFold2.
Fig. 7-2 The multiple sequence alignment of NA with the insertion of 28 amino acids.
Fig. 7-3 The predicted Local Distance Difference Test and predicted aligned error of NA with the insertion of 28 amino acids.Similarly,We can see that the pLDDT score is lower in the region where amino acids are inserted, indicating that the predictions for this region are relatively unreliable.

As for Delete-9, its predicted results(The portion highlighted in red indicates the positions of deleted segment), MSA, pLDDT, and pAE are as follows:

Fig. 8-1 The predicted structure of NA with the deletion of 9 amino acids was determined using AlphaFold2.
Fig. 8-2 The multiple sequence alignment of NA with the deletion of 9 amino acids.
Fig. 8-3 The predicted Local Distance Difference Test and predicted aligned error of NA with the deletion of 9 amino acids.

As for Delete-24, its predicted results(The portion highlighted in red indicates the positions of deleted segment), MSA, pLDDT, and pAE are as follows:

Fig. 9-1 The predicted structure of NA with the deletion of 24 amino acids was determined using AlphaFold2.
Fig. 9-2 The multiple sequence alignment of NA with the deletion of 24 amino acids.
Fig. 9-3 The predicted Local Distance Difference Test and predicted aligned error of NA with the deletion of 24 amino acids.
Prediction Convergence

AlphaFold2 can quickly reach its final state when predicting the structures of simple proteins, but it requires a longer period to converge for more complex protein structures. In order to investigate whether the predicted protein structures have converged in the prediction results, a comparison was made between the structures of si28 and si14 at epoch numbers of 20 and 40. It was found that the structures they obtained were very close in the end. Therefore, it can be considered that the predicted protein structures have converged.

For the insertion of 14 amino acid residues, we used AlphaFold to predict their structures over 20 and 40 epochs, and then compared the structural differences using PyMOL.

Fig. 10 The align result of NA with the insertion of 14 amino acids.

For the insertion of 28 amino acid residues, we used AlphaFold to predict their structures over 20 and 40 epochs, and then compared the structural differences using PyMOL.

Fig. 11 The align result of NA with the insertion of 28 amino acids.

The root mean square deviation (RMSD) of atomic positions is a measure of the average distance between corresponding atoms (typically backbone atoms) in superimposed proteins.Typically, a lower RMSD value is considered better.

The RMSD for SI14 and SI28 is 0.073 and 0.010, respectively. The structures at different cycles for each of them show no significant differences, indicating that the predicted protein structures have converged.

Protein-Protein Docking

Abstract

Scientific Problems

Since we have performed sequence insertions and deletions in the NA protein and predicted their structures, we are uncertain whether the structural modifications in the modified NA protein will lead to changes in its binding to TfR1.

Solution Idea

Protein-protein docking is a computational method used to simulate the interactions between two or more proteins, aiming to predict their binding configurations and affinities. With the continuous development of computational algorithms, scoring functions, and data-driven approaches, protein-protein docking has evolved into a vital tool in the field of life sciences, facilitating research on protein interactions, drug discovery, and protein engineering. Despite existing challenges, protein-protein docking holds promise for applications in scientific research and pharmaceutical development.

Expected Results
  • Calculate the interface and energy of NA and TfR1 free docking.
  • Identify the amino acids with strong mutual interactions in the two proteins for subsequent quantum chemistry calculations.
  • Simulate the docking of NA and TfR1 through computational methods.

Technical Principles

pyDock is a computational tool developed for studying and predicting protein-protein interactions. It primarily focuses on providing insights into the structural aspects of these interactions. Here's an overview of pyDock:

  • Functionality: pyDock is designed to predict the structural aspects of protein-protein interactions, with a particular emphasis on rigid-body docking. It employs various scoring functions, including electrostatics, desolvation energy, and limited van der Waals contribution, to evaluate the quality of docking orientations.
  • Algorithms: It utilizes FFT-based algorithms, such as FTDock, to generate potential docking poses for the interacting proteins. These poses are then assessed using the pyDock scoring function to identify the best docking orientations.
  • Applications: pyDock is valuable for researchers in structural biology and related fields who seek to understand the structural aspects of protein-protein interactions. It aids in predicting how proteins come together to form complexes.

On the other hand, pyDockWEB is a web server that provides a user-friendly interface for accessing pyDock's capabilities. Here's an overview of pyDockWEB[6]:

  • Web-Based Platform: pyDockWEB is an online platform that allows users to submit their protein structures (the 3D coordinates of two interacting proteins) through a web interface.
  • Predictive Services: Upon receiving the protein coordinates, pyDockWEB utilizes pyDock's algorithms and scoring functions to perform structural predictions. It returns the best rigid-body docking orientations for the submitted proteins.
  • Scoring Function: The scoring function employed by pyDockWEB includes factors such as electrostatics, desolvation energy, and limited van der Waals contributions. These factors are used to assess the quality of the predicted docking orientations.

Rosetta is a comprehensive software suite for modeling macromolecular structures. As a flexible, multi-purpose application, it includes tools for structure prediction, design, and remodeling of proteins and nucleic acids. Since 1998, Rosetta web servers have run billions of structure prediction and protein design simulations, and billions or trillions more have been run on supercomputer clusters[7].

Researchers use Rosetta to better understand treatments of infectious diseases, cancers, and autoimmune disorders. Further applications involve the development of vaccines, new materials, targeted protein binders, and enzyme design.

Rosetta began as a structure prediction tool, and has consistently been a strong performer in the Critical Assessment of Structure Prediction (CASP) community-wide blind prediction exercises. It has grown to offer a wide variety of effective sampling algorithms to explore backbone, side-chain and sequence space, and its excellence has generalized to more community-wide exercises including RNA-puzzles and Critical Assessment of PRotein Interactions (CAPRI). Rosetta boasts broadly tested scoring (energy) functions and contains an unparalleled breadth of applications from folding to docking to design.

Results and Discussion

Binding Pose

We have obtained the structure of the modified NA using AlphaFold2. Next, we will investigate the binding interaction between NA and TfR1 using pyDock, and attempt to qualitatively analyze the binding interaction between the virus and tumor cells based on the predicted results.

We first used PyDock to preliminarily explore the binding states of four predicted structures with TfR1(PDB ID:1SUV). Using the four structures as ligands and TfR1 as the receptor, we initially allowed them to dock freely. The docking results were then sorted, and the best docking states for each were displayed.

Insert-14 with TfR1:

Fig. 12-1 The binding pose of NA with the insertion of 14 amino acids to TfR1.

Insert-28 with TfR1:

Fig. 12-2 The binding pose of NA with the insertion of 28 amino acids to TfR1.

Delete-9 with TfR1:

Fig. 12-3 The binding pose of NA with the deletion of 9 amino acids to TfR1.

Delete-24 with TfR1:

Fig. 12-4 The binding pose of NA with the deletion of 24 amino acids to TfR1.

Due to the observed disparities between the predicted and actual protein structures, some of the protein-protein docking results are less reliable. Then we restrained the docking sites of INSERT-28 and INSERT-14 to the head region of NA and subsequently compared their top 10 docking outcomes.

Restrained insert-14 with TfR1:

Fig. 12-5 The restrained binding pose of NA with the insertion of 14 amino acids to TfR1.

Restrained insert-28 with TfR1:

Fig. 12-6 The restrained binding pose of NA with the insertion of 28 amino acids to TfR1.

Below are the top 10 scoring complexes for each of the six docking scenarios mentioned above:

Conf Ele Desolv VDW Total RANK
3773 -21.221 -17.865 32.385 -35.848 1
27 -22.753 -15.409 34.624 -34.700 2
49 -23.403 -17.937 70.862 -34.254 3
66 -22.555 -15.748 42.657 -34.037 4
1103 -22.826 -12.336 18.966 -33.265 5
2957 -23.983 -6.367 -5.691 -30.919 6
3929 -14.823 -18.411 34.965 -29.738 7
9022 -17.340 -12.717 5.852 -29.472 8
1549 -19.767 -9.760 28.889 -26.638 9
264 -21.957 -14.317 102.712 -26.003 10
table. 1. The top 10 docking results of Insert-14 with TfR1:
Conf Ele Desolv VDW Total RANK
940 -23.440 -19.393 60.159 -36.817 1
9101 -18.558 -16.044 15.990 -33.003 2
426 -22.368 -16.249 66.277 -31.989 3
5135 -8.381 -31.885 113.747 -28.891 4
2282 -23.589 -11.497 73.340 -27.752 5
1270 -20.864 -18.246 115.257 -27.585 6
386 -21.418 -12.918 69.598 -27.376 7
7439 -4.074 -30.570 80.983 -26.545 8
6282 -21.574 -8.113 34.229 -26.265 9
822 -26.372 -6.331 65.737 -26.129 10
table. 2. The top 10 docking results of Insert-28 with TfR1:
Conf Ele Desolv VDW Total RANK
4013 -37.735 -2.817 -32.463 -43.798 1
4008 -17.058 -20.229 31.478 -34.139 2
2420 -15.568 -19.289 22.107 -32.646 3
5828 -19.634 -9.540 -16.283 -30.803 4
4511 -20.089 -15.028 61.492 -28.968 5
5800 -24.615 -5.355 11.137 -28.856 6
2737 -10.754 -23.431 54.599 -28.725 7
2998 -10.861 -23.195 56.826 -28.374 8
9266 -24.351 -2.255 -8.501 -27.456 9
7073 -22.812 -8.298 43.351 -26.775 10
table. 3. The top 10 docking results of Delete-9 with TfR1:
Conf Ele Desolv VDW Total RANK
7703 -29.995 -2.332 -23.245 -34.651 1
6484 -26.766 -7.388 1.565 -33.997 2
9153 -29.034 -6.918 22.928 -33.659 3
8844 -15.825 -16.123 -2.201 -32.168 4
6829 -26.091 -4.627 -13.630 -32.081 5
3544 -21.154 -14.105 47.515 -30.507 6
917 -25.163 -9.679 45.251 -30.318 7
7296 -28.028 2.416 -41.867 -29.799 8
2154 -21.186 -5.412 -21.411 -28.739 9
8051 -9.118 -24.172 46.513 -28.638 10
table. 4. The top 10 docking results of Delete-24 with TfR1:
Conf Ele Desolv VDW relRST Total RANK
1138 -22.741 -12.364 18.167 7.349 -40.637 1
50 -23.363 -17.955 70.387 6.299 -40.578 2
25 -22.695 -15.412 34.620 5.512 -40.157 3
65 -22.529 -15.752 42.736 5.512 -39.519 4
2950 -21.359 -17.557 33.264 1.837 -37.427 5
5148 -22.857 -6.277 -5.319 5.774 -35.440 6
8544 -17.137 -12.504 5.874 5.512 -34.566 7
4511 -23.895 -3.694 18.670 8.399 -34.121 8
8137 -11.906 -19.657 31.963 4.462 -32.829 9
105 -21.972 -14.346 102.738 6.299 -32.343 10
table. 5. The top 10 docking results of Restrained insert-14 with TfR1:
Conf Ele Desolv VDW relRST Total RANK
851 -23.499 -19.433 60.382 0.000 -36.894 1
2738 -27.255 -6.373 67.237 8.136 -35.040 2
6729 -21.633 -8.434 33.382 8.136 -34.865 3
1858 -27.740 -6.351 75.684 8.136 -34.659 4
961 -21.268 -13.027 70.815 5.249 -32.463 5
402 -22.380 -16.349 66.357 0.000 -32.093 6
6275 -11.302 -20.723 37.487 1.575 -29.851 7
8455 -14.753 -12.155 40.347 6.562 -29.435 8
9467 -24.090 -11.485 69.066 0.000 -28.668 9
5395 -12.425 -25.094 92.783 0.262 -28.503 10
table. 6. The top 10 docking results of Restrained insert-28 with TfR1:
  • Conf:Column containing the conformation number of the docking pose as in the rot file .
  • Ele:Electrostatic energy component.
  • Desolv:Desolvation energy component.
  • relRST:Percentage of the restrictions imposed that the conformation complies with.
  • VDW:Van der Waals energy component (term weighted to 0.1 by default).
  • Total:Total binding energy (representing the sum of the 3 previous energies).
  • RANK:Conformation rank according to its computed binding energy.

Due to the unusual nature of the best docking conformations for NA with the insertion of 14 and 28 amino acids into TfR1, the practical significance of their top docking scores is limited. Among the remaining four groups, we can see that the best docking score is achieved when nine amino acids are deleted from NA, followed by the insertion of 14 amino acids into NA. This suggests that when considering alterations to the stalk region of NA, it may also affect the affinity of NA for TfR1, potentially having implications in real-world scenarios.

Binding Site Analysis

Since the above binding pose was generated using a predicted structure, it inherently contains some level of error compared to the actual situation. Therefore, when delving deeper into the analysis of protein-protein interactions, we utilize experimentally determined structures of NA (PDB ID:3BEQ) and TfR1 (PDB ID:1SUV)for analysis. We employ Rosetta for analyzing protein-protein interactions.

First, we removed unwanted solvent and other molecular components from the obtained PDB files and performed PDB file repairs on NA and TfR1, resulting in input conformations for the docking of the two large molecules. NA was treated as the ligand, while TfR1 was treated as the receptor. Using PyDock, we obtained multiple docking conformations and we selected the one with the highest overall docking score for protein-protein interaction analysis using Rosetta.

Fig. 13-1 The optimal docking conformation of the protein-protein interaction between the NA head and TfR1.
Fig. 13-2 An enlarged view of the docking interface of the docking complex, with annotations highlighting the amino acid residues involved in the formation of polar bonds between NA and TfR1 , respectively numbered as G461 , R107 , K111 of NA and S644 , D468 , R651 , E612 of TfR1. The short yellow lines represent the formed polar bonds.

Then we performed protein-protein interaction calculations for TfR1 and NA using RosettaDock, and selected the two amino acid residues with the strongest binding interactions for quantum dynamics calculations.

click to download short-range.csv
Fig. 13-3 Residue-wise calculated amino acid affinity within interacting regions.
Fig. 13-4 The amino acid residues with stronger interactions in NA and TfR1, with the ones enclosed in red boxes being the most affinity-enhancing.

In this case, NA-Lys111(D) and TfR1-Glu612(B) will be selected for further calculations.

Gaussian quantum computation

An introduction to Gaussian and Quantum computation

Chemistry, as a discipline that studies various molecular structures and properties, focuses on the electrons outside the atomic nucleus. It is the distinct electronic structures of atoms and molecules that determine their unique properties. Guided by this principle, chemists strive to obtain precise quantitative information about electronic structures to gain fundamental insights into chemical processes. However, it is unfortunate that the behavior of electrons is governed by the wave equation within the framework of quantum mechanics, and for multi-electron systems, the equations derived from quantum mechanics are so complex that they are mathematically intractable. As a result, the scientific community has historically relied on empirical formulas, qualitative methods, and semi-quantitative approaches to describe systems, but the accuracy of these methods is often insufficient for practical applications.

Until the end of the last century, continuous iterations in computer technology led to a significant improvement in computational power. Simultaneously, various approximation methods for the Schrödinger equation were developed, making it possible to perform quantum chemistry calculations with the assistance of computers. Among them, the ab initio methods and density functional theory (DFT) emerged as outstanding representatives of first-principles approaches, showcasing remarkable achievements in quantum chemistry calculations.

As a widely used quantum computational software, Gaussian performs the core part of its calculations by solving the Schrödinger equation using approximation methods of different accuracies. Specifically, in the case of ab initio calculations, the Bohr-Oppenheimer approximation is first employed to simplify the system. The Hartree-Fock self-consistent field equations are then solved, incorporating various levels of electron correlation energy corrections through perturbation theory. This process ultimately yields an approximate solution to the Schrödinger equation.

On the other hand, if density functional theory (DFT) is employed, the solution to the exchange-correlation potential term in the Kohn-Sham equation is obtained using techniques such as local density approximation, generalized gradient approximation, or hybrid functionals. This allows for an approximate solution to the Kohn-Sham equation, which is equivalent to obtaining an approximate solution to the Schrödinger equation.

Once the computational method is determined, various quantitative research methods can be employed for the input system, such as conformational optimization, energy calculations, and transition state search. These methods can be further enhanced by selecting appropriate atomic orbital basis sets, incorporating solvent effects, applying corrections for polarization and dispersion effects, and considering weak interactions such as dispersion forces and long-range electromagnetic interactions.

By considering these factors and applying the necessary modifications, a comprehensive quantitative analysis can be conducted on the system of interest.

Relying on quantum chemistry calculations provides us with a deeper understanding of chemical systems, helping us comprehend the essence of chemical processes and even make quantitative predictions. The results obtained from quantum chemistry calculations allow us to explore the electronic structure, chemical bonding, energetics, and reactivity of molecules and materials with high precision. This knowledge can be utilized to elucidate reaction mechanisms, optimize reaction conditions, design novel compounds, and predict properties and behaviors of chemical systems under different conditions.

Quantum chemistry calculations serve as powerful tools in guiding experimental investigations and unveiling the underlying principles governing chemical phenomena. By leveraging the quantitative insights gained from these calculations, researchers can make informed decisions and contribute to advancements in various fields, including drug discovery, materials science, catalysis, and environmental chemistry.

Gaussian quantum computation

Based on the molecular docking results obtained from Rosetta Dock, we identified the lowest energy residue interaction pair from the computed interaction energies. It consists of glutamic acid at position 612 in chain B and lysine at position 111 in chain D.

Fig. 14-1.The result of pyDock.

The crystal structure analysis using Pymol indicates that the docking region involves two α-helices, one from chain B and the other from chain D, which are in contact with each other. The critical binding residues within this region are identified as B612E and D111K.

Fig. 14-2.The core site of docking complex.

Considering that molecular docking based on Rosetta Dock relies mainly on molecular force fields and empirical formulas to calculate conformational potential energy, its accuracy is limited, and often qualitative analysis is performed. Therefore, in order to accurately characterize the residue status of the docking site, we have undertaken quantitative calculations and research on the core residues using the Gaussian quantum chemistry software.

Specifically, we begin by using Pymol to select the two residues of interest at the docking site. Subsequently, we extract and convert them into GIF format files to serve as the basis for the Gaussian input files.

Fig. 14-3. Two residues of core site.

Taking into account the time complexity of common ab initio and DFT calculation methods, it is typically proportional to the cube of the number of atoms N, denoted as t-N3. To simplify the calculations, we have chosen to input only the outermost fragment of the system, building upon the existing foundation.

Fig. 14-4. Part of the two residues that selected to be computed next.

To strike a balance between the demands of quantitative research and computational cost, we have conducted thorough investigations and selected the B3LYP calculation method within the density functional theory (DFT) framework. The B3LYP method, known as a hybrid functional, offers a high level of accuracy and applicability, particularly in large molecular systems. Moreover, it incorporates electron correlation energy through the orbital approximation provided by the Kohn-Sham equation, eliminating the need for additional multi-level perturbation system corrections required in various ab initio methods.

Formula. 1. Kohn-Shame equation, which is the cornerstone equation in the field of electronic structure.

To optimize the conformation and hydrogen bonding angles, we have selected the widely used 6-31G atomic orbital basis set. In order to account for higher-order angular momentum (advanced d and p orbital components) and polarization effects, we have incorporated correction factors, such as polarizable continuum models. Additionally, to achieve more accurate calculations of various energy terms, we have considered the influence of electron clouds over a larger spatial range, including the inclusion of dispersion functionals to account for weak intermolecular interactions, as well as dispersion corrections for common macroscopic systems. Finally, the system is configured in a solvent environment, specifically water, and the initial Cartesian coordinates of the system's conformation are imported to commence calculation and optimization.

Fig. 14-5. The input Gaussian file.

By iteratively solving the Hartree-Fock self-consistent field equations, the system begins the search for a fixed point. Through the convergence criteria, the iteration process is terminated, and a final computation result is obtained through a stationary calculation. This final result includes the optimized conformation and various energy terms of the system.

Fig. 14-6. Computation process and control panel.

convergence criteria

Fig. 14-7. The convergence criteria of Hartree-Fock self consistent equation.

Using the self-consistent iteration process in Gaussian, we have obtained the computationally optimized conformation with the lowest energy after the docking process.

Fig. 14-8. The optimized conformation of two residues part.

The hydrogen bonding angle in the obtained optimized conformation is 175.37°, and the hydrogen bond length is 1.6 Å.

Fig. 14-9. The optimized hydrogen bonding angle.
Fig. 14-10. The optimized hydrogen bond length.

The quantitative results obtained are consistent with the general understanding of hydrogen bonding interactions in the scientific community. The hydrogen bond length indicates a strong interaction, which aligns with the nature of acid-base residue docking. Furthermore, based on the energy calculations, the single-point energy is -325.0316 Hartree, and the Gibbs free energy after correction is -324.9396 Hartree.

Fig. 14-11. Summary of output file in energy(after optimized)

By comparing this result with the energy before docking, the change in Gibbs free energy is determined.

Formula. 2. The Gibbs free energy change for calculating conformational changes.
Fig. 14-12. Summary of output file in energy(before optimized).

In conjunction with thermodynamic principles, the change in free energy is related to the power of the transformation equilibrium. It can be quantitatively described by a specific formula.

Formula. 3. Computing the theoretical equilibrium constant based on the Gibbs free energy change.

After calculation, the change in free energy leads to an enlargement of the equilibrium constant by a factor of 10^7. This suggests that the docking of the core binding site contributes to the protein's affinity at a similar magnitude.

Our quantum chemistry calculations, research, and optimization are now completed.

Molecular Dynamic Simulation

Abstract

Scientific Problems

Because the predicted structure of NA that we obtained is based on homologous sequences and does not take into account the interactions of different amino acids within the same protein, it does not consider the influence of the protein's environment. In reality, protein structures can undergo changes in different environments, such as solvents, solutes, temperature, pressure, and so on. We want to investigate what kind of alterations the predicted structure we obtained might undergo when it is exposed to conditions that are closer to the real environment.

Solution Idea

Molecular dynamics simulation is a powerful computational technique used in the field of molecular modeling and computational chemistry to study the behavior of atoms and molecules over time. It provides a window into the dynamic world of molecules by numerically solving the equations of motion for individual particles within a simulated system.

In this method, a molecular system is represented as a collection of atoms or molecules, each with defined positions and velocities. These particles interact with each other according to a chosen force field, which describes the potential energy and forces between them. By iteratively updating the positions and velocities of these particles over very short time steps, typically on the order of femtoseconds, one can observe how the system evolves over time.

Molecular dynamics simulations offer valuable insights into various phenomena, such as protein folding, chemical reactions, phase transitions, and material properties. They are widely used in fields ranging from biochemistry and material science to drug discovery and nanotechnology, allowing researchers to gain a deeper understanding of molecular behavior and design novel molecules and materials with specific properties.

Expected Results

We will individually introduce the four protein structures obtained into a solvent along with ions to investigate their respective specific changes during the simulation. By simulating their behavior in a more realistic environment, we aim to explore the stability of predicted protein structures through computational methods.

Technical Priciples

GROMACSis a versatile and high-performance software package used for molecular dynamics simulations. It allows researchers to simulate the behavior of systems with hundreds to millions of particles, primarily focusing on biochemical molecules like proteins, lipids, and nucleic acids. However, it is also used for non-biological systems such as polymers and fluid dynamics[8].

Key features of GROMACS include:

  • Exceptional Performance: GROMACS is known for its exceptional computational speed due to algorithmic optimizations, SIMD intrinsics for CPUs, and GPU support.
  • CPU and GPU Usage: It can utilize both CPU and GPU resources simultaneously, with load-balancing options.
  • Advanced Algorithms: GROMACS includes state-of-the-art algorithms for extending simulation time steps while maintaining accuracy.
  • Topology Builder: It provides an automated topology builder for various biomolecules and small molecules.
  • Besides, Gromacs offers user-friendly features such as hardware compatibility, CPU and GPU usage, parallel execution, and lossy compression.

Result and Discussion

Firstly , We used charmm-gui to Generate Topology , Define box and Solvate , Add Ions . Then We got gromacs input files for each predicted structure .

The following are the input files for GROMACS for insert-14, insert-28, delete-9, and delete-24. As you can see, the solvent and ions are simply mechanically added to the PDB file, without any energy minimization or system equilibration.

Fig. 15-1. This is a PDB file for molecular dynamics simulations with the insertion of 14(left) and 28(right) amino acids into NA.Among them, the purple spheres represent chloride ions, the green spheres represent sodium ions, and the red dots represent water molecules. The input NA structure is presented in green.
Fig. 15-2.This is a PDB file for molecular dynamics simulations with the deletion of 9(left) and 24(right) amino acids into NA.Among them, the purple spheres represent chloride ions, the green spheres represent sodium ions, and the red dots represent water molecules. The input NA structure is presented in green.

Before we can begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry. The structure is relaxed through a process called energy minimization (EM).

As shown in the figure, as the step size increases, the potential energy of the system gradually decreases, and it stops energy minimization when it reaches a certain value.

Fig. 15-3.Here is the potential energy change plot for NA with the insertion of 14 amino acids(left) and 28 amino acids(right) during energy minimization. It can be observed that with increasing time, the potential energy of both systems decreases.
Fig. 15-4.Here is the potential energy change plot for NA with the deletion of 9 amino acids(left) and 24 amino acids(right) during energy minimization. It can be observed that with increasing time, the potential energy of both systems decreases.

Energy minimization (EM) ensures that we start with a reasonable structure concerning geometry and solvent orientation. However, before we can commence actual dynamics simulations, we need to equilibrate the solvent and ions surrounding the protein. Attempting unrestrained dynamics at this stage could lead to system collapse. The main reason is that the solvent is primarily optimized within itself rather than with the solute. We must first bring the solvent to the desired simulation temperature and establish the correct orientation around the solute (the protein). Once we've achieved the correct temperature based on kinetic energies, we'll apply pressure to the system until it reaches the desired density.

Fig. 15-5.These are the temperature and pressure variation plots for NA with the insertion of 14 amino acids during system equilibration. It can be observed from the plots that the temperature and pressure remain relatively stable.
Fig. 15-6.These are the temperature and pressure variation plots for NA with the insertion of 28 amino acids during system equilibration. It can be observed from the plots that the temperature and pressure remain relatively stable.
Fig. 15-7.These are the temperature and pressure variation plots for NA with the deletion of 9 amino acids during system equilibration. It can be observed from the plots that the temperature and pressure remain relatively stable.
Fig. 15-8.These are the temperature and pressure variation plots for NA with the deletion of 24 amino acids during system equilibration. It can be observed from the plots that the temperature and pressure remain relatively stable.

After successfully completing the two equilibration phases, the system has reached a stable state at the desired temperature and pressure. We can now proceed by removing the position restraints and initiating production molecular dynamics simulations for data acquisition.

Fig. 15-9.This depicts the molecular dynamics simulation of NA with 14 amino acids inserted into a solution with water as the solvent and NaCl as the solute. On the left is the total energy of the system, which shows fluctuations within a certain range, indicating relative stability. On the right is a visualization of the molecular dynamics simulation process for this system.
Fig. 15-10.This depicts the molecular dynamics simulation of NA with 28 amino acids inserted into a solution with water as the solvent and NaCl as the solute. On the left is the total energy of the system, which shows fluctuations within a certain range, indicating relative stability. On the right is a visualization of the molecular dynamics simulation process for this system.
Fig. 15-11.This depicts the molecular dynamics simulation of NA with 9 amino acids deleted into a solution with water as the solvent and NaCl as the solute. On the left is the total energy of the system, which shows fluctuations within a certain range, indicating relative stability. On the right is a visualization of the molecular dynamics simulation process for this system.
Fig. 15-12.This depicts the molecular dynamics simulation of NA with 24 amino acids deleted into a solution with water as the solvent and NaCl as the solute. On the left is the total energy of the system, which shows fluctuations within a certain range, indicating relative stability. On the right is a visualization of the molecular dynamics simulation process for this system.

Summary

Our physical simulation focuses on two crucial proteins, TfR1 and NA, which are key in experiments. Starting from the sequences used in wet lab experiments, we have conducted structure prediction, protein docking, quantum chemistry calculations, and molecular dynamics simulations. All of these processes rely on computational methods to simulate their specific behaviors in the microscopic world. Furthermore, based on the simulation results obtained, in conjunction with the specifics of wet lab experiments, they assist us in better exploring unknown results and phenomena.

In addition to the content presented on our website, we also explored various other physical simulations during our iGEM modeling process. These included enhancing sampling through temperature-based changes and employing QMMM (Quantum Mechanics/Molecular Mechanics) simulations using CHARMM. However, due to existing constraints, including our understanding of these methods, the computational resources at our disposal, and time limitations, we have decided to leave these aspects for further exploration in our ongoing learning journey or for future teams to use as references.

Reference

[1]C. Martin Lawrence et al. ,Crystal Structure of the Ectodomain of Human Transferrin Receptor.Science286,779-782(1999).DOI:10.1126/science.286.5440.779

[2]McAuley Julie L. et al. ,Influenza Virus Neuraminidase Structure and Functions.Frontiers in Microbiology(VOLUME:10,YEAR:2019).DOI:10.3389/fmicb.2019.00039

[3]V. Moules et al. ,In vitro characterization of naturally occurring influenza H3NA− viruses lacking the NA gene segment: Toward a new mechanism of viral resistance?Virology,Volume 404, Issue 2,2010,Pages 215-224.DOI:10.1016/j.virol.2010.04.030.

[4]Colman, P., Varghese, J. & Laver, W. Structure of the catalytic and antigenic sites in influenza virus neuraminidase. Nature 303, 41–44 (1983).DOI:/10.1038/303041a0.

[5]Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). DOI:10.1038/s41586-021-03819-2

[6]Jimenez-Garcia B., Pons C. and Fernandez-Recio J. pyDockWEB: a web server for rigid-body protein-protein docking using electrostatics and desolvation scoring. Bioinformatics (2013) 29(13):1698-1699.

[7]Marcel G. Schaap, Feike J. Leij, Martinus Th. van Genuchten.rosetta: a computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions.Journal of Hydrology,Volume 251, Issues 3–4,2001,Pages 163-176. DOI:10.1016/S0022-1694(01)00466-8.

[8]Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A.E. and Berendsen, H.J.C. (2005), GROMACS: Fast, flexible, and free. J. Comput. Chem., 26: 1701-1718.DOI:10.1002/jcc.20291