While the AlphaFold [1] predictions are low confidence of the H-fibroin structure as a whole, the use of energy minimization and molecular dynamics [2],[3],[4] builds confidence in the quality of this output. Most importantly, our model suggests that key residue-specific interactions between L- and H-fibroin are preserved in our engineered sequences, relative to their native counterparts. A strong cysteine-cysteine disulfide interaction between h- and L-fibroin is predicted to be preserved—for instance, the model suggests that the cysteine ~20 residues from the end of our engineered H-fibroin and the cysteine between residues 150 and 200 of the L-fibroin are consistent with their expected relative positions and that they maintain the strong cysteine-cysteine disulfide interaction observed in existing caddisfly silk proteins.[5]
In prior biochemical experiments on homologous l- and H-fibroin subunits in the silkworm
In short, Tanaka et al. identified the disulfide bond by chemical detection. They separated fractions of the whole peptide chains by chromatography, which facilitated a search for the specific residues involved in disulfide bonding (divide and conquer). The fragments containing cysteine residues were deduced by sequencing, and peptide sequence analysis indicated which fractions formed disulfide bonds. Two iterations of chromatography were enough to deduce that H-fibroin Cys-c1 and Cys-c4 forms an intramolecular disulfide bond, while Cys-c20 forms an intermolecular disulfide linkage with Cys-172 of the L-chain. [7]
We can also observe that the beta-sheets in our folded secondary structures have a high serine content, another expectation of our model organism (beta-sheets highlighted in red and orange):
The model can be improved through the use of Monte Carlo protein folding and docking as provided in software such as pyRosetta, and through finer refinement with full-atom simulations using AMBER force fields. Neither of these tools was used for our preliminary simulations due to hardware constraints and software limitations; however, they are ideal next steps to more rigorously develop these silk protein models. One current plan of model development involves the inclusion of divalent ions (Ca2+) in the simulation to assess how these synthetic caddisfly silk protein structures may change as the interaction with ions is expected to cause the silk to harden into tough fibers. [6]
We first folded the H-fibroin and L-fibroin amino acid sequences using AlphaFold2.
We then used Gromacs + Martini3 Force Field to perform a coarse-grained simulation.
The source code for our model of protein-folding and docking between L- and H-fibroin may be found on our GitLab.
PyRosetta will be set up to perform more rigorous protein-docking and folding, where the coarse-grained molecular dynamics step would be repeated for longer durations. Atomic resolution fine-tuning would be done using AMBER Force Fields. Divalent ions will be included to better emulate a "natural" environment for caddisfly silk proteins.