Contribution

Protein Engineering

We used the machine learning algorithms protein MPNN and RFDiffusion to design new arsenic binding proteins. Proteins desgined with RFdiffusion had very high GC content which resutled in difficutlies in synethesis and cloning. As a result none of the RFDisffusion designed binders were progressed to the expression phase.

 

Our project was possible because of the knowledge of others. We also want to help drive synthetic biology solutions that address real world problems. In our project, we designed new metal-binding proteins using the neural network ProteinMPNN which may have improved activity and thermostability compared to naturally existing metal-binding proteins. Previous research projects conducted at the ANU have validated the possibility of incorporating these neural networks into a protein engineering work-flow to improve thermostability and expression of natural proteins for further synthetic biology engineering attempts. Indeed, preliminary results suggested that our newly engineered arsenic-binding proteins may have improved soluble expression in commonly used E. coli overexpression lines such as BL21 (DE3).

 
 
The processes behind our MPNN protein structures.
 

Not only this, we thought that it would be interesting to assess the possibility of using a newly developed protein diffusion model to “hallucinate” entirely de novo protein sequences with a conserved protein binding site taken from a natural ArsR protein (6J05.pdb). These structures entirely new to nature structures represent the future of synthetic biology, whereby designed binders and protein interactions could be designed completely in silico. These RFDiffusion structures we present as contributional parts to IGEM so that future teams are able to incorporate our binders into any related synthetic biology projects.

In addition to this, our project utilized a novel class of bacterial protein nano compartments known as encapsulins. We specifically utilized the encapsulin originating from the heat-tolerant specialist bacteria Quasibacillus thermotolerans due to its distinct chemical and physical properties that position it as an ideal synthetic biology tool. These include it’s size, being the largest encapsulin with a diameter of more than 40 nm, its high melting temperature of more than 85 degrees celsius, and it’s well characterized targeting peptide that can be appended to heterologous cargo and facilitate its encapsulation. Furthermore, preliminary research at the ANU suggests that Q. thermotolerans encapsulin is tolerant to mutation and thus a suitable target for protein engineering at locations such as the pore forming residues. We have provided the sequences of the Q. thermotolerans encapsulin and it’s associated targeting peptide to IGEM with the hope that others will help to unlock the relatively unexplored synthetic biology potential of this unique protein nanocage and apply it to novel biological systems and contexts.

Appreciating the complexity of synthetic protein design, there is still much to be done in testing and further optimizing these proteins and validating that they have retained their desired metal-binding function. We have shared our de novo metal-binding proteins, with the hope that others will join us in our engineering efforts. We hope future iGEM teams and others can learn from our efforts and implement rapidly improving technologies such as machine learning to propel their protein engineering goals forwards , and are excited to see the future synthetic biology solutions that will sprout.

 

Parts

Protein MPNN Designed Binding Proteins based on ArsR

RFDiffusion Designed Arensenic Binding Proteins

Encapsulins

Metallothioneins

gRNAs

gRNA designs

vB_BsuP_Goe1 NCBI Reference Sequence: NC_049975.1
gRNA Targeted gene: gene code number
1 TACTTGCCCATTCTTTCCAA CGG rev Head fibre protein Goe1_c00180
2 GATGTGAAAGCTTTTCTCAA TGG fw Head fibre protein: Goe1_c00180
3 CTTGCGGCGATGAGAACAGC TGG fw Head fibre protein: Goe1_c00180
4 TGATAGGTTTGAATGCATCA AGG rev Hypothetical protein 1: Goe1_c00030
5 TCAAATCGTTAAATTGAATC TGG rev Hypothetical protein 2: Goe1_c00040
6 TCTGCTCTCAACACTTCTAA CGG fw Hypothetical protein 3: Goe1_c00050
7 TGGTTAGAGACAAATAAACG AGG rev Hypothetical protein 4 Goe1_c00060
8 GACAAACACATGCGGAAACA GGG rev Hypothetical protein 5: Goe1_c00300
9 TTGGAATAATCGAGAGAAGG AGG rev Hypothetical protein 6: Goe1_c00310
 
Φ29 NCBI Reference Sequence: NC_011048.1
gRNA Targeted gene: gene code number
1 ACGTGCAAAGTCTAATGTTA TGG fw Head fibre protein: 6446512
2 AGCAACCGACGTGGCGACTG CGG fw Head fibre protein: 6446512
3 GCCGTTAAAGACGTTGACAA GGG fw Head fibre protein: 6446512
4 CCATTGTCATGTGTATGTTG GGG fw Hypothetical protein 1: 6446513
5 CCATTGTCATGTGTATGTTG GGG fw Hypothetical protein 2: 6446500
6 ATCTTCTCTATTAACGACAA TGG fw Hypothetical protein 3: 6446504
7 ATGTTGCTTCAAGATGATGA CGG fw Hypothetical protein 4 6446503
8 TTATTTGGAGGTGCTCCACA TGG fw Hypothetical protein 5: 6446501
9 ACGCCAATAAAAAGGGGACA AGG fw Hypothetical early protein 1: 6446502
10 ATCTTGATAAAGTCGTTGAG GGG fw Hypothetical early protein 2: 6446515
11 GACGTGCGAAGTACTATATT GGG rev Non-coding region
 

Plasmid construction: Φ29 gRNAs

All gRNAs were cloned into pRH030 plasmids1. gRNA no. 6 could not be cloned.

Sequence design for homologous recombination targeting Φ29 gRNA no. 11

 
Component Sequence Description
Homology arm CTTTAAACATATTCGCAATATGTTCTTCTAACTCTGCACGCTTCATGATA Complementary 50 bp sequence upstream of gRNA cut site
Fluorescent marker ATGGTCTTCACACTCGAAGATTTCGTTGGGGACTGGCGACAGACAGCCGGCTACAAC CTGGACCAAGTCCTTGAACAGGGAGGTGTGTCCAGTTTGTTTCAGAATCTCGGGGTGT CCGTAACTCCGATCCAAAGGATTGTCCTGAGCGGTGAAAATGGGCTGAAGATCGACAT CCATGTCATCATCCCGTATGAAGGTCTGAGCGGCGACCAAATGGGCCAGATCGAAAAA ATTTTTAAGGTGGTGTACCCTGTGGATGATCATCACTTTAAGGTGATCCTGCACTATG GCACACTGGTAATCGACGGGGTTACGCCGAACATGATCGACTATTTCGGACGGCCGTA TGAAGGCATCGCCGTGTTCGACGGCAAAAAGATCACTGTAACAGGGACCCTGTGGAAC GGCAACAAAATTATCGACGAGCGCCTGATCAACCCCGACGGCTCCCTGCTGTTCCGAG TAACCATCAACGGAGTGACCGGCTGGCGGCTGTGCGAACGCATTCTGGCGTTGACAAC TATTACAGAGTATGCTATAAT Reverse complement codon-optimised luciferase2, under 29 bp Φ29 A1 promoter
Homology arm TAACTCTCCTTATCAGATGTCAAATATAGTTTTCTCACGGCTCAACTCAA Complementary 50 bp sequence downstream of gRNA cut site
 

References

  1. "Otte, K., Kühne, N.M., Furrer, A.D., Baena Lozada L.P., Lutz, V.T., Schilling, T., Hertel, R. A CRISPR‐Cas9 tool to explore the genetics of Bacillus subtilis phages, Letters in Appl Microbiol 71(6) (2020). https://doi.org/10.1111/lam.13349"
  2. Alonzo, L.F., Jain, P., Hinkley, T. et al. Rapid, sensitive, and low-cost detection of Escherichia coli bacteria in contaminated water samples using a phage-based assay. Sci Rep 12, 7741 (2022). https://doi.org/10.1038/s41598-022-11468-2