Model | UESTC-China

Outline

In recent years, with the rapid development of structural biology, computational biology and artificial intelligence technology, Computer-guided protein design strategy has greatly facilitated protein modification. The method for protein design based on structure simulation and energy calculation.

Developing efficient indole degradation methods has important significance. But the efficiency of wet experiments largely depends on our starting point. If we can narrow down the range of mutants, reduce the workload and cost, then we have a better chance of finding effective enzymes in a limited time^[1].

For this purpose, we developed a multi-step virtual screening model based on molecular docking and molecular dynamics simulation to identify beneficial mutations and predict high-efficiency degradation enzymes.

Theoretical Background of Mutation Scanning

Despite the fact that there are many molecular docking tools and methods, all of them follow a specific workflow that consists of five basic steps:

Receptor Selection and Preparation
Ligand Selection and Preparation
Docking
Molecular Dynamics (MD)
Evaluation of Docking and MD Results

Molecular docking is a computational method that aims to predict the optimal conformation and orientation of a protein and a ligand that result in the lowest free energy of the complex. Molecular docking consists of two main components: a scoring function and a search algorithm. The scoring function evaluates the binding energy and stability of a given protein-ligand pose by considering various factors, such as intermolecular interactions, solvation effects, and entropy. The search algorithm explores the conformational and rotational space of the protein and the ligand to find the best scoring poses^[2].

Molecular docking can also predict how mutations affect the binding energy and stability of the complex by comparing the docked poses of the wild-type and mutant proteins with the same ligand. However, molecular docking has some limitations, such as neglecting the effects of solvation, entropy, and flexibility on binding.

Therefore, molecular docking results need to be validated by experimental data or further refined by more accurate methods, such as molecular dynamics (MD).

MD simulations can capture the dynamic behavior of proteins and ligands in solution and account for their conformational changes, fluctuations, and interactions with solvent molecules^[3].

MD simulations consist of three main steps: initialization, integration, and analysis. The initialization step involves setting up the initial coordinates, velocities, forces, and parameters of the system. The integration step involves advancing the system in time by using numerical methods, such as Verlet or leapfrog algorithms. The analysis step involves extracting relevant information from the trajectories, such as structural properties, thermodynamic properties, kinetic properties, and binding properties.

Receptor Selection and Preparation

The structure of the wild-type ycnE protein was constructed by using AlphaFold2, a state-of-the-art deep learning method that predicts protein structures from amino acid sequences. The predicted structure has a high level of confidence, as indicated by the PLDDT score, which measures the per-residue accuracy of the prediction^[4].

Figure 1. The predicted structure of ycnE

The structure of the mutant ycnE protein was derived from the wild-type structure by using Rosetta, a software suite that performs homology modeling and protein design. Rosetta uses a Monte Carlo algorithm to sample the conformational space of the mutant protein and optimize its energy and sequence compatibility^[5].

Figure 2. Alphafold2 output prediction result graph. It includes PAE diagram, sequence coverage diagram and PLDDT image of 5 best conformations.

The process of protein receptor preparation for AutoDock Vina involves the following steps:

- Add hydrogens to the protein structure using PYMOL. Hydrogens are necessary for AutoDock Vina to recognize the donor and acceptor atoms for hydrogen bonding.

- Remove any water molecules, non-standard residues, or other unwanted molecules from the protein structure.

- Convert the PDB file of the protein to PDBQT format using MGLTools^[6].

Ligand Selection and Preparation

Retrieve the three-dimensional structure of indole from the PubChem database (CID: 798).

Add hydrogens to the indole structure using Chimera or other software.
Convert the SDF file of indole to PDBQT format using MGLTools.

Molecular Docking

The algorithm of AutoDock Vina is based on an empirical scoring function that evaluates the binding energy and stability of a given pose by considering various factors, such as intermolecular interactions, solvation effects, and entropy. The algorithm also uses an iterated local search method that explores the conformational and rotational space of the ligand and receptor by applying random moves and local optimizations. The algorithm employs multiple threads to parallelize the search and improve the speed and diversity of the results^[7].

We wrote a bat file for batch docking with support for starting at any time after an interruption.

Figure 3. Heat map of change in affinity of a single point mutation. Warm colors indicate mutations that increase affinity.

We have performed molecular docking of 1900 mutant structures of ycnE protein with indole as the ligand using AutoDock Vina, a popular and open-source program for predicting protein-ligand binding poses and affinities . We have selected nine mutants that have lower binding affinity than the wild-type ycnE protein, as shown in Table 1.

model-the-conformation-bonding-between-ycne-and-indole-in-wild-type-and-mutant.png — Figure 4. The conformation bonding between ycnE and indole in wild type and mutant.

These mutations demonstrate how different amino acids can affect the binding affinity of ycnE protein with indole by changing the physical and chemical properties of the binding site.

Molecular Dynamics (MD)

Prior to computational analysis, the 3D structures of ycnE protein and indole were subjected to energy minimization using the GROMACS software package version 2022.5^[8]. The steepest descent method was used first, followed by the conjugate gradient method. the TIP3P water model was used to generate the topology of the Aβ peptide, the CHARMM generic force field was used in the energy minimization process, and Cgenff was used to generate the topology of the force field of the ligand^[9-11]. The purpose of this process is to optimize the geometry of the protein-ligand complex, eliminate steric clashes, and minimize the energy of the system, ensuring more stable and reliable results for subsequent docking and MD simulations.

Energy minimization is performed using the steepest descent algorithm, which consists of iteratively moving atoms in the direction of the negative gradient of the potential energy until a local energy minimum is reached. The convergence criteria for the energy minimization were set to a maximum force tolerance of 1000kJ mol-1 nm-1 and a maximum step size of 0.01 nm. The system was equilibrated under NVT and NPT conditions, where the temperature was 300 K and the pressure was 1 bar, and the temperature and pressure were controlled by the Berendsen algorithm. The temperature and pressure were controlled by Berendsen algorithm^[12].

The coupling times were 0.1 ps and 1 ps, respectively, to ensure that the system was stabilized before subsequent analysis. Simulated trajectories (100 ns) in each MD system were analyzed at 300 K and 1 bar atmospheric pressure settings. The Root Mean Square Deviation (RMSD) and Root Mean Square Fluctuation (RMSF) of the protein-ligand complexes, among others, were calculated using the GROMACS tool. In this case, the hydrogen bonding was judged by GROMACS default judgment method with a hydrogen bond length of 0.35 nm and a hydrogen bond angle of 30 degrees^[13].

Coul-SR and LJ-SR, short for Coulombic self-repulsion and Lennard-Jones self-repulsion, respectively, are terms used in molecular dynamics simulations to describe the electrostatic and van der Waals interactions between atoms or molecules within a system.

Figure 5. The results of molecular dynamics simulations compare two important properties between WT (wild type) and seven different wild types: Lennard-Jones short-range interaction (LJ-SR) and Coulomb short-range interaction (Coul-SR).

Experimental Validation

Through a comparative analysis, it was observed that four mutants of ycnE (Phe75Ser, Glu54Ser, Glu54Ala, and Glu54Thr) exhibited a significant enhancement in their functional properties relative to the wild-type ycnE. Specifically, the activity of these mutants was increased by approximately 1.03 to 1.92-fold compared to the wild-type, indicating that targeted mutations at specific amino acid positions in ycnE can significantly improve its functionality.

Table 1: Results of vector construction

ycnE variant	Amino acid substitution position	U/OD	Fold
ycnE（WT）	-	0.1178	1.00
ycnE-1	Phe75Ser	0.1213	1.03
ycnE-2	Phe75Ala	0.1170	0.69
ycnE-3	Phe75Thr	0.1272	0.75
ycnE-4	Glu54Ser	0.2261	1.92
ycnE-5	Glu54Ala	0.1348	1.14
ycnE-6	Glu54Thr	0.1802	1.53
ycnE-7	His65Trp	0.1535	0.91

model-isatin-production-and-affinity-of-wild-type-and-mutant-ycne.png — Figure 6. Isatin production and affinity of wild type and mutant ycnE. The concentration of isatin was determined after 1mM of exogenous indole was added and cultured at 37°C for 24h.

Our iGEM team, dedicated to pioneering advancements in synthetic biology, has successfully navigated the patent application process and has been granted a patent. This recognition pertains to the mathematical modeling component of our research, with a specific emphasis on the computer-guided protein design of the ycnE protein. This achievement not only underscores our commitment to innovation but also positions us at the forefront of this particular field of study.