Introduction
Structure
Prediction
Docking
Quantum Computation
MD Simulation
Summary
Reference
Through the previous experiments conducted by Professor Zhou's team, it was found that oncolytic viruses can bind to the transferrin receptor 1 (TfR1) on the surface of tumor cells by interacting with the NA protein on their surface, thereby mediating viral internalization. Therefore, the initial experiments will focus on these two proteins.
Our physical simulations primarily include protein structure prediction, protein-protein docking, quantum chemistry calculations, and molecular dynamics simulations.
The transferrin receptor (TfR) is involved in a dynamic process of endocytosis and subsequent reemergence at the cell surface. During this process, the receptor internalizes iron-loaded transferrin (Tf) through clathrin-mediated endocytosis. Once inside the cell, iron is released from Tf in the endosome, allowing the TfR to recycle apotransferrin back to the cell surface for further use.
Human TfR1 is a homodimeric type II transmembrane protein consisting of two 90-kD subunits. Each subunit comprises a short NH2-terminal cytoplasmic region (residues 1 to 67) containing the internalization motif YTRF, followed by a single transmembrane pass (residues 68 to 88). The large extracellular portion (ectodomain, residues 89 to 760) of each subunit contains a binding site for the 80-kD Tf molecule.
The TfR1 dimer has a globular extracellular structure, separated from the membrane by a stalk of about 30Å, which includes residues immediately following the transmembrane pass and an O-linked glycan at Thr104, along with two intermolecular disulfide bonds (formed by Cys89 and Cys98). Notably, the intermolecular disulfide bonds are not required for dimerization.
In the ribbon diagrams, we can observe three domains of the protein structure: domain I, which is the protease-like domain (labeled as A); domain II, known as the apical domain (labeled as B); and domain III, referred to as the helical domain (labeled as C). The secondary structure elements are appropriately labeled on the diagrams. In the text, we identify these elements first based on their respective domain numbers and then based on their linear order within each domain[1].
The viral NA (Neuraminidase) assembles as a tetramer of four identical polypeptides and constitutes approximately 10-20% of the total glycoproteins on the virion surface when embedded in the viral envelope. An average-sized virion of 120 nm typically possesses around 40-50 NA spikes and 300-400 HA spikes[2][3].Each of the four NA monomers consists of approximately 470 amino acids and folds into four distinct structural domains: the cytoplasmic tail, the transmembrane region, the stalk, and the catalytic head.
The NA active site comprises two parts: an inner shell consisting of eight highly conserved residues (Arg118, Asp151, Arg152, Arg224, Glu276, Arg292, Arg371, and Tyr406) that directly interact with sialic acids. Additionally, there is an outer shell of ten residues (Glu119, Arg156, Trp178, Ser179, Asp198, Ile222, Glu227, Glu277, Asn294, and Glu425) which do not interact with sialic acid but play a crucial structural role, defined as framework residues[4].Among the inner shell residues, three arginine residues (Arg118, 292, 371) interact with the carboxylate of the sialic acid substrate. Arg152 binds to the acetamido group on the sugar ring, while Glu276 interacts with the 8- and 9-hydroxyl groups on the glycerol side chain. The active site of the enzyme is highly conserved in terms of spatial orientation and sequence properties, making it an excellent target for drug inhibition.
Proteins play a crucial role in life, and understanding their structure is essential to comprehend their functions. Although significant efforts have led to the determination of structures for about 100,000 unique proteins, this is just a fraction of the vast number of known protein sequences. The process of determining protein structures is time-consuming, causing a bottleneck in achieving comprehensive structural coverage. To overcome this limitation, accurate computational methods are necessary to predict a protein's three-dimensional structure based solely on its amino acid sequence, a longstanding problem known as the "protein folding problem."
Solution ideasIn recent years, with the continuous development of computer hardware and algorithms, artificial intelligence, especially neural networks, has made significant advancements. In addressing the challenging task of protein structure prediction from sequences, researchers have started to explore the use of neural networks as a potential solution.One of the most remarkable advancements in this field is AlphaFold2.
Expected resultsAlphaFold 2 is a cutting-edge deep learning system developed by DeepMind, an artificial intelligence research lab under Alphabet Inc. It is designed to predict the 3D structure of proteins, a crucial task in the field of biology and bioinformatics. Released in 2020, AlphaFold 2 utilizes advanced deep learning techniques, including deep convolutional neural networks and attention mechanisms, to accurately predict the folding of proteins from their amino acid sequences[5].
1.Trunk of the Network (Evoformer):
2.Structure Module:
Besides,the above is just a rough introduction to AlphaFold2. For more information about AlphaFold, please refer to its supplementary materials.[4]
Our project experimentally inserted amino acids into the stalk region of the NA receptor on the virus surface. To explore the structural changes of NA before and after the insertion of amino acids, we utilized alphafold2 to predict the structures of NA-Insert-14, NA-Insert-28, NA-Delete-9, NA-Delete-24 respectively(The numbers following indicate the quantity of inserted or deleted amino acids).The predictions were conducted on Google Colab platform.
pLDDT (predicted Local Distance Difference Test) evaluates the deviation of the distance between each pair of amino acid residues in the predicted structure from statistically acceptable distance distributions. Higher plDDT scores indicate better quality of the predicted structure.
PAE (Predicted Aligned Error) measures the average root-mean-square error of the backbone atoms between the predicted structure and its sequence comparative template. Lower PAE values indicate the predicted structure aligns better with the template structure and thus is of higher quality.
An MSA contains multiple homologous protein sequences aligned together.The per-residue counts provide an indication of how conserved each alignment position is and higher counts indicate more conserved positions. Regions with very low per-residue counts are more likely to be variable or gapped regions.
As for Insert-14, its predicted structure(the portion highlighted in blue represents the predicted results of insertion segment), MSA, pLDDT, and pAE are as follows:
As for Insert-28, its predicted structure(the portion highlighted in blue represents the predicted results of insertion segment), MSA, pLDDT, and pAE are as follows:
As for Delete-9, its predicted results(The portion highlighted in red indicates the positions of deleted segment), MSA, pLDDT, and pAE are as follows:
As for Delete-24, its predicted results(The portion highlighted in red indicates the positions of deleted segment), MSA, pLDDT, and pAE are as follows:
AlphaFold2 can quickly reach its final state when predicting the structures of simple proteins, but it requires a longer period to converge for more complex protein structures. In order to investigate whether the predicted protein structures have converged in the prediction results, a comparison was made between the structures of si28 and si14 at epoch numbers of 20 and 40. It was found that the structures they obtained were very close in the end. Therefore, it can be considered that the predicted protein structures have converged.
For the insertion of 14 amino acid residues, we used AlphaFold to predict their structures over 20 and 40 epochs, and then compared the structural differences using PyMOL.
For the insertion of 28 amino acid residues, we used AlphaFold to predict their structures over 20 and 40 epochs, and then compared the structural differences using PyMOL.
The root mean square deviation (RMSD) of atomic positions is a measure of the average distance between corresponding atoms (typically backbone atoms) in superimposed proteins.Typically, a lower RMSD value is considered better.
The RMSD for SI14 and SI28 is 0.073 and 0.010, respectively. The structures at different cycles for each of them show no significant differences, indicating that the predicted protein structures have converged.
Since we have performed sequence insertions and deletions in the NA protein and predicted their structures, we are uncertain whether the structural modifications in the modified NA protein will lead to changes in its binding to TfR1.
Solution IdeaProtein-protein docking is a computational method used to simulate the interactions between two or more proteins, aiming to predict their binding configurations and affinities. With the continuous development of computational algorithms, scoring functions, and data-driven approaches, protein-protein docking has evolved into a vital tool in the field of life sciences, facilitating research on protein interactions, drug discovery, and protein engineering. Despite existing challenges, protein-protein docking holds promise for applications in scientific research and pharmaceutical development.
Expected ResultspyDock is a computational tool developed for studying and predicting protein-protein interactions. It primarily focuses on providing insights into the structural aspects of these interactions. Here's an overview of pyDock:
On the other hand, pyDockWEB is a web server that provides a user-friendly interface for accessing pyDock's capabilities. Here's an overview of pyDockWEB[6]:
Rosetta is a comprehensive software suite for modeling macromolecular structures. As a flexible, multi-purpose application, it includes tools for structure prediction, design, and remodeling of proteins and nucleic acids. Since 1998, Rosetta web servers have run billions of structure prediction and protein design simulations, and billions or trillions more have been run on supercomputer clusters[7].
Researchers use Rosetta to better understand treatments of infectious diseases, cancers, and autoimmune disorders. Further applications involve the development of vaccines, new materials, targeted protein binders, and enzyme design.
Rosetta began as a structure prediction tool, and has consistently been a strong performer in the Critical Assessment of Structure Prediction (CASP) community-wide blind prediction exercises. It has grown to offer a wide variety of effective sampling algorithms to explore backbone, side-chain and sequence space, and its excellence has generalized to more community-wide exercises including RNA-puzzles and Critical Assessment of PRotein Interactions (CAPRI). Rosetta boasts broadly tested scoring (energy) functions and contains an unparalleled breadth of applications from folding to docking to design.
We have obtained the structure of the modified NA using AlphaFold2. Next, we will investigate the binding interaction between NA and TfR1 using pyDock, and attempt to qualitatively analyze the binding interaction between the virus and tumor cells based on the predicted results.
We first used PyDock to preliminarily explore the binding states of four predicted structures with TfR1(PDB ID:1SUV). Using the four structures as ligands and TfR1 as the receptor, we initially allowed them to dock freely. The docking results were then sorted, and the best docking states for each were displayed.
Insert-14 with TfR1:
Insert-28 with TfR1:
Delete-9 with TfR1:
Delete-24 with TfR1:
Due to the observed disparities between the predicted and actual protein structures, some of the protein-protein docking results are less reliable. Then we restrained the docking sites of INSERT-28 and INSERT-14 to the head region of NA and subsequently compared their top 10 docking outcomes.
Restrained insert-14 with TfR1:
Restrained insert-28 with TfR1:
Below are the top 10 scoring complexes for each of the six docking scenarios mentioned above:
Conf | Ele | Desolv | VDW | Total | RANK |
---|---|---|---|---|---|
3773 | -21.221 | -17.865 | 32.385 | -35.848 | 1 |
27 | -22.753 | -15.409 | 34.624 | -34.700 | 2 |
49 | -23.403 | -17.937 | 70.862 | -34.254 | 3 |
66 | -22.555 | -15.748 | 42.657 | -34.037 | 4 |
1103 | -22.826 | -12.336 | 18.966 | -33.265 | 5 |
2957 | -23.983 | -6.367 | -5.691 | -30.919 | 6 |
3929 | -14.823 | -18.411 | 34.965 | -29.738 | 7 |
9022 | -17.340 | -12.717 | 5.852 | -29.472 | 8 |
1549 | -19.767 | -9.760 | 28.889 | -26.638 | 9 |
264 | -21.957 | -14.317 | 102.712 | -26.003 | 10 |
Conf | Ele | Desolv | VDW | Total | RANK |
---|---|---|---|---|---|
940 | -23.440 | -19.393 | 60.159 | -36.817 | 1 |
9101 | -18.558 | -16.044 | 15.990 | -33.003 | 2 |
426 | -22.368 | -16.249 | 66.277 | -31.989 | 3 |
5135 | -8.381 | -31.885 | 113.747 | -28.891 | 4 |
2282 | -23.589 | -11.497 | 73.340 | -27.752 | 5 |
1270 | -20.864 | -18.246 | 115.257 | -27.585 | 6 |
386 | -21.418 | -12.918 | 69.598 | -27.376 | 7 |
7439 | -4.074 | -30.570 | 80.983 | -26.545 | 8 |
6282 | -21.574 | -8.113 | 34.229 | -26.265 | 9 |
822 | -26.372 | -6.331 | 65.737 | -26.129 | 10 |
Conf | Ele | Desolv | VDW | Total | RANK |
---|---|---|---|---|---|
4013 | -37.735 | -2.817 | -32.463 | -43.798 | 1 |
4008 | -17.058 | -20.229 | 31.478 | -34.139 | 2 |
2420 | -15.568 | -19.289 | 22.107 | -32.646 | 3 |
5828 | -19.634 | -9.540 | -16.283 | -30.803 | 4 |
4511 | -20.089 | -15.028 | 61.492 | -28.968 | 5 |
5800 | -24.615 | -5.355 | 11.137 | -28.856 | 6 |
2737 | -10.754 | -23.431 | 54.599 | -28.725 | 7 |
2998 | -10.861 | -23.195 | 56.826 | -28.374 | 8 |
9266 | -24.351 | -2.255 | -8.501 | -27.456 | 9 |
7073 | -22.812 | -8.298 | 43.351 | -26.775 | 10 |
Conf | Ele | Desolv | VDW | Total | RANK |
---|---|---|---|---|---|
7703 | -29.995 | -2.332 | -23.245 | -34.651 | 1 |
6484 | -26.766 | -7.388 | 1.565 | -33.997 | 2 |
9153 | -29.034 | -6.918 | 22.928 | -33.659 | 3 |
8844 | -15.825 | -16.123 | -2.201 | -32.168 | 4 |
6829 | -26.091 | -4.627 | -13.630 | -32.081 | 5 |
3544 | -21.154 | -14.105 | 47.515 | -30.507 | 6 |
917 | -25.163 | -9.679 | 45.251 | -30.318 | 7 |
7296 | -28.028 | 2.416 | -41.867 | -29.799 | 8 |
2154 | -21.186 | -5.412 | -21.411 | -28.739 | 9 |
8051 | -9.118 | -24.172 | 46.513 | -28.638 | 10 |
Conf | Ele | Desolv | VDW | relRST | Total | RANK |
---|---|---|---|---|---|---|
1138 | -22.741 | -12.364 | 18.167 | 7.349 | -40.637 | 1 |
50 | -23.363 | -17.955 | 70.387 | 6.299 | -40.578 | 2 |
25 | -22.695 | -15.412 | 34.620 | 5.512 | -40.157 | 3 |
65 | -22.529 | -15.752 | 42.736 | 5.512 | -39.519 | 4 |
2950 | -21.359 | -17.557 | 33.264 | 1.837 | -37.427 | 5 |
5148 | -22.857 | -6.277 | -5.319 | 5.774 | -35.440 | 6 |
8544 | -17.137 | -12.504 | 5.874 | 5.512 | -34.566 | 7 |
4511 | -23.895 | -3.694 | 18.670 | 8.399 | -34.121 | 8 |
8137 | -11.906 | -19.657 | 31.963 | 4.462 | -32.829 | 9 |
105 | -21.972 | -14.346 | 102.738 | 6.299 | -32.343 | 10 |
Conf | Ele | Desolv | VDW | relRST | Total | RANK |
---|---|---|---|---|---|---|
851 | -23.499 | -19.433 | 60.382 | 0.000 | -36.894 | 1 |
2738 | -27.255 | -6.373 | 67.237 | 8.136 | -35.040 | 2 |
6729 | -21.633 | -8.434 | 33.382 | 8.136 | -34.865 | 3 |
1858 | -27.740 | -6.351 | 75.684 | 8.136 | -34.659 | 4 |
961 | -21.268 | -13.027 | 70.815 | 5.249 | -32.463 | 5 |
402 | -22.380 | -16.349 | 66.357 | 0.000 | -32.093 | 6 |
6275 | -11.302 | -20.723 | 37.487 | 1.575 | -29.851 | 7 |
8455 | -14.753 | -12.155 | 40.347 | 6.562 | -29.435 | 8 |
9467 | -24.090 | -11.485 | 69.066 | 0.000 | -28.668 | 9 |
5395 | -12.425 | -25.094 | 92.783 | 0.262 | -28.503 | 10 |
Due to the unusual nature of the best docking conformations for NA with the insertion of 14 and 28 amino acids into TfR1, the practical significance of their top docking scores is limited. Among the remaining four groups, we can see that the best docking score is achieved when nine amino acids are deleted from NA, followed by the insertion of 14 amino acids into NA. This suggests that when considering alterations to the stalk region of NA, it may also affect the affinity of NA for TfR1, potentially having implications in real-world scenarios.
Binding Site AnalysisSince the above binding pose was generated using a predicted structure, it inherently contains some level of error compared to the actual situation. Therefore, when delving deeper into the analysis of protein-protein interactions, we utilize experimentally determined structures of NA (PDB ID:3BEQ) and TfR1 (PDB ID:1SUV)for analysis. We employ Rosetta for analyzing protein-protein interactions.
First, we removed unwanted solvent and other molecular components from the obtained PDB files and performed PDB file repairs on NA and TfR1, resulting in input conformations for the docking of the two large molecules. NA was treated as the ligand, while TfR1 was treated as the receptor. Using PyDock, we obtained multiple docking conformations and we selected the one with the highest overall docking score for protein-protein interaction analysis using Rosetta.
Then we performed protein-protein interaction calculations for TfR1 and NA using RosettaDock, and selected the two amino acid residues with the strongest binding interactions for quantum dynamics calculations.
click to download short-range.csvIn this case, NA-Lys111(D) and TfR1-Glu612(B) will be selected for further calculations.
Chemistry, as a discipline that studies various molecular structures and properties, focuses on the electrons outside the atomic nucleus. It is the distinct electronic structures of atoms and molecules that determine their unique properties. Guided by this principle, chemists strive to obtain precise quantitative information about electronic structures to gain fundamental insights into chemical processes. However, it is unfortunate that the behavior of electrons is governed by the wave equation within the framework of quantum mechanics, and for multi-electron systems, the equations derived from quantum mechanics are so complex that they are mathematically intractable. As a result, the scientific community has historically relied on empirical formulas, qualitative methods, and semi-quantitative approaches to describe systems, but the accuracy of these methods is often insufficient for practical applications.
Until the end of the last century, continuous iterations in computer technology led to a significant improvement in computational power. Simultaneously, various approximation methods for the Schrödinger equation were developed, making it possible to perform quantum chemistry calculations with the assistance of computers. Among them, the ab initio methods and density functional theory (DFT) emerged as outstanding representatives of first-principles approaches, showcasing remarkable achievements in quantum chemistry calculations.
As a widely used quantum computational software, Gaussian performs the core part of its calculations by solving the Schrödinger equation using approximation methods of different accuracies. Specifically, in the case of ab initio calculations, the Bohr-Oppenheimer approximation is first employed to simplify the system. The Hartree-Fock self-consistent field equations are then solved, incorporating various levels of electron correlation energy corrections through perturbation theory. This process ultimately yields an approximate solution to the Schrödinger equation.
On the other hand, if density functional theory (DFT) is employed, the solution to the exchange-correlation potential term in the Kohn-Sham equation is obtained using techniques such as local density approximation, generalized gradient approximation, or hybrid functionals. This allows for an approximate solution to the Kohn-Sham equation, which is equivalent to obtaining an approximate solution to the Schrödinger equation.
Once the computational method is determined, various quantitative research methods can be employed for the input system, such as conformational optimization, energy calculations, and transition state search. These methods can be further enhanced by selecting appropriate atomic orbital basis sets, incorporating solvent effects, applying corrections for polarization and dispersion effects, and considering weak interactions such as dispersion forces and long-range electromagnetic interactions.
By considering these factors and applying the necessary modifications, a comprehensive quantitative analysis can be conducted on the system of interest.
Relying on quantum chemistry calculations provides us with a deeper understanding of chemical systems, helping us comprehend the essence of chemical processes and even make quantitative predictions. The results obtained from quantum chemistry calculations allow us to explore the electronic structure, chemical bonding, energetics, and reactivity of molecules and materials with high precision. This knowledge can be utilized to elucidate reaction mechanisms, optimize reaction conditions, design novel compounds, and predict properties and behaviors of chemical systems under different conditions.
Quantum chemistry calculations serve as powerful tools in guiding experimental investigations and unveiling the underlying principles governing chemical phenomena. By leveraging the quantitative insights gained from these calculations, researchers can make informed decisions and contribute to advancements in various fields, including drug discovery, materials science, catalysis, and environmental chemistry.
Based on the molecular docking results obtained from Rosetta Dock, we identified the lowest energy residue interaction pair from the computed interaction energies. It consists of glutamic acid at position 612 in chain B and lysine at position 111 in chain D.
The crystal structure analysis using Pymol indicates that the docking region involves two α-helices, one from chain B and the other from chain D, which are in contact with each other. The critical binding residues within this region are identified as B612E and D111K.
Considering that molecular docking based on Rosetta Dock relies mainly on molecular force fields and empirical formulas to calculate conformational potential energy, its accuracy is limited, and often qualitative analysis is performed. Therefore, in order to accurately characterize the residue status of the docking site, we have undertaken quantitative calculations and research on the core residues using the Gaussian quantum chemistry software.
Specifically, we begin by using Pymol to select the two residues of interest at the docking site. Subsequently, we extract and convert them into GIF format files to serve as the basis for the Gaussian input files.
Taking into account the time complexity of common ab initio and DFT calculation methods, it is typically proportional to the cube of the number of atoms N, denoted as t-N3. To simplify the calculations, we have chosen to input only the outermost fragment of the system, building upon the existing foundation.
To strike a balance between the demands of quantitative research and computational cost, we have conducted thorough investigations and selected the B3LYP calculation method within the density functional theory (DFT) framework. The B3LYP method, known as a hybrid functional, offers a high level of accuracy and applicability, particularly in large molecular systems. Moreover, it incorporates electron correlation energy through the orbital approximation provided by the Kohn-Sham equation, eliminating the need for additional multi-level perturbation system corrections required in various ab initio methods.
To optimize the conformation and hydrogen bonding angles, we have selected the widely used 6-31G atomic orbital basis set. In order to account for higher-order angular momentum (advanced d and p orbital components) and polarization effects, we have incorporated correction factors, such as polarizable continuum models. Additionally, to achieve more accurate calculations of various energy terms, we have considered the influence of electron clouds over a larger spatial range, including the inclusion of dispersion functionals to account for weak intermolecular interactions, as well as dispersion corrections for common macroscopic systems. Finally, the system is configured in a solvent environment, specifically water, and the initial Cartesian coordinates of the system's conformation are imported to commence calculation and optimization.
By iteratively solving the Hartree-Fock self-consistent field equations, the system begins the search for a fixed point. Through the convergence criteria, the iteration process is terminated, and a final computation result is obtained through a stationary calculation. This final result includes the optimized conformation and various energy terms of the system.
convergence criteria
Using the self-consistent iteration process in Gaussian, we have obtained the computationally optimized conformation with the lowest energy after the docking process.
The hydrogen bonding angle in the obtained optimized conformation is 175.37°, and the hydrogen bond length is 1.6 Å.
The quantitative results obtained are consistent with the general understanding of hydrogen bonding interactions in the scientific community. The hydrogen bond length indicates a strong interaction, which aligns with the nature of acid-base residue docking. Furthermore, based on the energy calculations, the single-point energy is -325.0316 Hartree, and the Gibbs free energy after correction is -324.9396 Hartree.
By comparing this result with the energy before docking, the change in Gibbs free energy is determined.
In conjunction with thermodynamic principles, the change in free energy is related to the power of the transformation equilibrium. It can be quantitatively described by a specific formula.
After calculation, the change in free energy leads to an enlargement of the equilibrium constant by a factor of 10^7. This suggests that the docking of the core binding site contributes to the protein's affinity at a similar magnitude.
Our quantum chemistry calculations, research, and optimization are now completed.
Because the predicted structure of NA that we obtained is based on homologous sequences and does not take into account the interactions of different amino acids within the same protein, it does not consider the influence of the protein's environment. In reality, protein structures can undergo changes in different environments, such as solvents, solutes, temperature, pressure, and so on. We want to investigate what kind of alterations the predicted structure we obtained might undergo when it is exposed to conditions that are closer to the real environment.
Solution IdeaMolecular dynamics simulation is a powerful computational technique used in the field of molecular modeling and computational chemistry to study the behavior of atoms and molecules over time. It provides a window into the dynamic world of molecules by numerically solving the equations of motion for individual particles within a simulated system.
In this method, a molecular system is represented as a collection of atoms or molecules, each with defined positions and velocities. These particles interact with each other according to a chosen force field, which describes the potential energy and forces between them. By iteratively updating the positions and velocities of these particles over very short time steps, typically on the order of femtoseconds, one can observe how the system evolves over time.
Molecular dynamics simulations offer valuable insights into various phenomena, such as protein folding, chemical reactions, phase transitions, and material properties. They are widely used in fields ranging from biochemistry and material science to drug discovery and nanotechnology, allowing researchers to gain a deeper understanding of molecular behavior and design novel molecules and materials with specific properties.
Expected ResultsWe will individually introduce the four protein structures obtained into a solvent along with ions to investigate their respective specific changes during the simulation. By simulating their behavior in a more realistic environment, we aim to explore the stability of predicted protein structures through computational methods.
GROMACSis a versatile and high-performance software package used for molecular dynamics simulations. It allows researchers to simulate the behavior of systems with hundreds to millions of particles, primarily focusing on biochemical molecules like proteins, lipids, and nucleic acids. However, it is also used for non-biological systems such as polymers and fluid dynamics[8].
Key features of GROMACS include:
Firstly , We used charmm-gui to Generate Topology , Define box and Solvate , Add Ions . Then We got gromacs input files for each predicted structure .
The following are the input files for GROMACS for insert-14, insert-28, delete-9, and delete-24. As you can see, the solvent and ions are simply mechanically added to the PDB file, without any energy minimization or system equilibration.
Before we can begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry. The structure is relaxed through a process called energy minimization (EM).
As shown in the figure, as the step size increases, the potential energy of the system gradually decreases, and it stops energy minimization when it reaches a certain value.
Energy minimization (EM) ensures that we start with a reasonable structure concerning geometry and solvent orientation. However, before we can commence actual dynamics simulations, we need to equilibrate the solvent and ions surrounding the protein. Attempting unrestrained dynamics at this stage could lead to system collapse. The main reason is that the solvent is primarily optimized within itself rather than with the solute. We must first bring the solvent to the desired simulation temperature and establish the correct orientation around the solute (the protein). Once we've achieved the correct temperature based on kinetic energies, we'll apply pressure to the system until it reaches the desired density.
After successfully completing the two equilibration phases, the system has reached a stable state at the desired temperature and pressure. We can now proceed by removing the position restraints and initiating production molecular dynamics simulations for data acquisition.
Our physical simulation focuses on two crucial proteins, TfR1 and NA, which are key in experiments. Starting from the sequences used in wet lab experiments, we have conducted structure prediction, protein docking, quantum chemistry calculations, and molecular dynamics simulations. All of these processes rely on computational methods to simulate their specific behaviors in the microscopic world. Furthermore, based on the simulation results obtained, in conjunction with the specifics of wet lab experiments, they assist us in better exploring unknown results and phenomena.
In addition to the content presented on our website, we also explored various other physical simulations during our iGEM modeling process. These included enhancing sampling through temperature-based changes and employing QMMM (Quantum Mechanics/Molecular Mechanics) simulations using CHARMM. However, due to existing constraints, including our understanding of these methods, the computational resources at our disposal, and time limitations, we have decided to leave these aspects for further exploration in our ongoing learning journey or for future teams to use as references.
[1]C. Martin Lawrence et al. ,Crystal Structure of the Ectodomain of Human Transferrin Receptor.Science286,779-782(1999).DOI:10.1126/science.286.5440.779
[2]McAuley Julie L. et al. ,Influenza Virus Neuraminidase Structure and Functions.Frontiers in Microbiology(VOLUME:10,YEAR:2019).DOI:10.3389/fmicb.2019.00039
[3]V. Moules et al. ,In vitro characterization of naturally occurring influenza H3NA− viruses lacking the NA gene segment: Toward a new mechanism of viral resistance?Virology,Volume 404, Issue 2,2010,Pages 215-224.DOI:10.1016/j.virol.2010.04.030.
[4]Colman, P., Varghese, J. & Laver, W. Structure of the catalytic and antigenic sites in influenza virus neuraminidase. Nature 303, 41–44 (1983).DOI:/10.1038/303041a0.
[5]Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). DOI:10.1038/s41586-021-03819-2
[6]Jimenez-Garcia B., Pons C. and Fernandez-Recio J. pyDockWEB: a web server for rigid-body protein-protein docking using electrostatics and desolvation scoring. Bioinformatics (2013) 29(13):1698-1699.
[7]Marcel G. Schaap, Feike J. Leij, Martinus Th. van Genuchten.rosetta: a computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions.Journal of Hydrology,Volume 251, Issues 3–4,2001,Pages 163-176. DOI:10.1016/S0022-1694(01)00466-8.
[8]Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A.E. and Berendsen, H.J.C. (2005), GROMACS: Fast, flexible, and free. J. Comput. Chem., 26: 1701-1718.DOI:10.1002/jcc.20291