Model

MOLECULAR SIMULATION

H-fibroin + L-fibroin preliminary simulations. Blue: L-fibroin; green: truncated H-fibroin; red: Cysteine (H-fibroin AA422, L-fibroin AA164)

Model of our engineered sequence is consistent with prior measurements of a disulfide bond

While the AlphaFold [1] predictions are low confidence of the H-fibroin structure as a whole, the use of energy minimization and molecular dynamics [2],[3],[4] builds confidence in the quality of this output. Most importantly, our model suggests that key residue-specific interactions between L- and H-fibroin are preserved in our engineered sequences, relative to their native counterparts. A strong cysteine-cysteine disulfide interaction between h- and L-fibroin is predicted to be preserved—for instance, the model suggests that the cysteine ~20 residues from the end of our engineered H-fibroin and the cysteine between residues 150 and 200 of the L-fibroin are consistent with their expected relative positions and that they maintain the strong cysteine-cysteine disulfide interaction observed in existing caddisfly silk proteins.[5]

In prior biochemical experiments on homologous l- and H-fibroin subunits in the silkworm Bombyx mori, performed by Tanaka et al., Cys-172 of L-chain and Cys-c20 of H-chain had been experimentally validated as the residues to form a disulfide bond in that species. These measurements and validations guided our identification of similar interactions proposed by our model—between cysteine residues in the same regions.

In short, Tanaka et al. identified the disulfide bond by chemical detection. They separated fractions of the whole peptide chains by chromatography, which facilitated a search for the specific residues involved in disulfide bonding (divide and conquer). The fragments containing cysteine residues were deduced by sequencing, and peptide sequence analysis indicated which fractions formed disulfide bonds. Two iterations of chromatography were enough to deduce that H-fibroin Cys-c1 and Cys-c4 forms an intramolecular disulfide bond, while Cys-c20 forms an intermolecular disulfide linkage with Cys-172 of the L-chain. [7]

Chromatography of the L-fibroin fractions and peptide sequence analysis, identifying the specific fractions containing disulfide interactions (L2 and L7, which was further narrowed down to L7-2 in a re-chromatography). With the known amino acid sequence, the exact cysteine residues involved in disulfide bonding can be deduced. [7]

A retained correlation between secondary structures and sequence composition

We can also observe that the beta-sheets in our folded secondary structures have a high serine content, another expectation of our model organism (beta-sheets highlighted in red and orange):

Sequence of homologous H-fibroins and visualization of the beta-sheets (red, orange nodes). Note the high proportion of serine (S) residues. [8]

Serine-rich beta-sheets, reflective of native caddisfly silk proteins. Blue: L-fibroin; green: truncated H-fibroin; red: Cysteine (H-fibroin AA422, L-fibroin AA164); orange: Serine.

Further refinements are achievable

The model can be improved through the use of Monte Carlo protein folding and docking as provided in software such as pyRosetta, and through finer refinement with full-atom simulations using AMBER force fields. Neither of these tools was used for our preliminary simulations due to hardware constraints and software limitations; however, they are ideal next steps to more rigorously develop these silk protein models. One current plan of model development involves the inclusion of divalent ions (Ca2+) in the simulation to assess how these synthetic caddisfly silk protein structures may change as the interaction with ions is expected to cause the silk to harden into tough fibers. [6]

METHOD AND SPECIFICATIONS

We first folded the H-fibroin and L-fibroin amino acid sequences using AlphaFold2.

  • AlphaFold2 had low confidence in the folding for H-fibroin, but moderate confidence on L-fibroin
  • We proceeded to coarse-grain simulation to assess the stability of AlphaFold2 predictions

We then used Gromacs + Martini3 Force Field to perform a coarse-grained simulation.

  • Coarse-grained models approximate the amino acid backchains as pseudo-atoms and different side-chains as different pseudo-atoms
    • We converted the system using martinize2
    • The system system was filled with explicit water and ions (Na+, Cl-) using insane.py
  • Energy minimization of the AlphaFold2 prediction using the following parameters:
    • Integrator: steepest descent minimization
    • Energy step-size: 0.001 kJ/mol
  • Structure equilibration with NPT ensemble:
    • Timestep: 20 femtoseconds
    • Simulation Steps: 10^6 (20 nanosecond total)
    • Cutoff Scheme: Verlet
    • Temperature Coupling: v-rescale
    • Pressure Coupling: Berendson Barostat
    • Periodic Boundary Conditions: XYZ
    • Simulation temperature: 400K. 400K was used to simulate a higher energy system to allow for faster protein structure changes
  • Production simulation with NVT ensemble:
    • Timestep: 20 femtoseconds
    • Simulation Steps: 10^6 (20 nanosecond total)
    • Cutoff Scheme: Verlet
    • Temperature Coupling: v-rescale
    • Pressure Coupling: Berendson Barostat
    • Periodic Boundary Conditions: XYZ
    • Simulation temperature: 400K

The source code for our model of protein-folding and docking between L- and H-fibroin may be found on our GitLab.

NEXT STEPS

PyRosetta will be set up to perform more rigorous protein-docking and folding, where the coarse-grained molecular dynamics step would be repeated for longer durations. Atomic resolution fine-tuning would be done using AMBER Force Fields. Divalent ions will be included to better emulate a "natural" environment for caddisfly silk proteins.

REFERENCES