The concept of synthetic biology essentially encompasses the application of molecular biology tools and techniques associated with microbiology, combined with typical engineering approaches and laboratory procedures (Cameron, D., Bashor, C. & Collins, 2014).
Based on this concept, the iGEM USP EEL Brazil team decided to develop a project aimed at producing cannabidiol, a phytocannabinoid originating from the secondary metabolism of the Cannabis sativa plant, using the yeast Saccharomyces cerevisiae as a chassis. To carry out this proposal, this year we focused on developing the project using mathematical modeling tools to consolidate our proof of concept and developing our parts and wet-lab strategy, so that next year, in 2024, we can return with the CBDynamics project with the wet lab experiments to complement it.
In terms of the methodology developed, the CBDynamics project was based on two main metabolic pathways: the mevalonate pathway, which generates GPP, and the olivetolic acid pathway (Zirpel et al., 2017). The chassis was chosen, among other reasons, because yeasts already synthesize the GPP pathway naturally, so it would only be necessary to insert the olivetolic acid synthesis pathway, which would generate CBDA.
To ensure the effective synthesis of the five enzymes in the biological circuit and guarantee the expected performance, we applied T2A sequences into our biological circuits, making it possible to obtain simple circuits that are highly effective in synthesizing the proteins’ sequences and separating them into individual proteins. This methodology guarantees the synthesis of multiple individual proteins from a single mRNA molecule, as it is a technique based on exploiting the inability of the eukaryotic ribosome to form a peptide bond between the final proline and glycine residues, from a single monocistronic mRNA molecule. Therefore, through this limitation, a jump is caused between these amino acids without interrupting protein production, generating several individual proteins from a single mRNA. The T2A sequence can be easily obtained from bioinformatics databases (Mansouri, 2014). The details of how we carried out this step can be seen in this Wiki’s Parts section .
Starting from this premise, we initially carried out mathematical structural modeling for all five enzymes highlighted, followed by phylogenetic analyses. In addition, molecular docking, interaction, and expression analyses were also carried out, so that our entire proof of concept was consolidated on these pillars. During these analyses, we developed the biological circuits which were built and optimized for our chosen chassis.
Figure 1. Synthesis of olivetol, catalyzed by olivetol synthase.
Source: KEGG Pathway Database, 2023, adapted.
Figure 2. Synthesis of olivetolic acid, catalyzed by olivetolic acid cyclase.
Source: KEGG Pathway Database, 2023, adapted.
Figure 3. Synthesis of cannabigerolic acid, catalyzed by prenyltransferase 4.
Source: KEGG Pathway Database, 2023, adapted.
Figure 4. Synthesis of cannabidiolic acid, catalyzed by cannabidiolic acid synthase.
Source: KEGG Pathway Database, 2023, adapted.
To develop our strategy, we began by searching for all the sequences of the five enzymes in databases such as NCBI. Obtaining these FASTA sequences, structural modeling by homology was carried out using the SWISS-MODEL software for the five native enzymes (Figure 5).
Figure 5. Models from our proteins.
Source: Authors, 2023.
In which, A represents the CsHCS1 model; B, the dimeric OS; C, the dimeric OAC; D, the CsPT4 and E, the CBDAS model.
After obtaining these models, we again carried out structural modeling but with the incorporation of the T2A sequences using an extension of PyMOL (PyMOL Molecular Graphics System): MODELLER, which provided a way of incorporating these new amino acids into the initial structure in three different ways: at the beginning of the original sequence, at the end of this sequence or both. Next, we used ProCheck and MolProbity to determine which of the constructed sequences presented the most stable models. This allowed us to observe the structural differences resulting from the insertion of the new sequences and to identify which of the constructions with the T2A sequences presented the best energetic, structural and geometric qualities (Figure 6). Once this stage was completed, we used the best results to assemble the necessary biological circuits.
Figure 6. Models from our proteins with T2A sequence
Source: Authors, 2023.
In which, A represents the CsHCS1 T2A model with the sequence at the end; B, the monomeric OS T2A model with the sequence on both beginning and end; C, the monomeric OAC T2A model with the sequence at the beginning; D, the CsPT4 without the transit peptide and T2A at the beginning and E, the CBDAS T2A model with the sequence at the end. All the T2A sequences are in red.
Expression vectors were constructed using the plasmids pRS425 and pRS426 using the Gibson Assembly method, a technique used to clone DNA fragments continuously. Around 72 biological circuits compatible with the expression system were designed from Snapgene, using the enzymes along with the T2A sequences. The details of this construction can be seen in Model page - Biological Circuits .
The criteria used by the team to select the circuits was based on the ease of separating the synthesis into two main parts: olivetolic acid synthesis and cannabidiol synthesis, so that the synthesis could be carried out independently.
Figure 7. Developed pRS425s plasmid for the expression of the enzymes CsPT4 (prenyltransferase) and CsCBDAS (cannabidiolic acid synthase).
Source: Authors, 2023.
Figure 8. Developed pRS426 plasmid for the expression of the enzymes CsHCS (Hexanoyl-CoA Synthetase), OS (Olivetol synthase) and OAC (Olivetolic acid synthase).
Source: Authors, 2023.
Finally, we conducted a phylogenetic analysis by running MUSCLE-aligned sequences on the Jalview software, in which we compared the sequences of the enzymes in Cannabis sativa with other sequences of similar proteins in different families and genera. This allowed us to study whether the protein domains had been maintained, confirming that the catalytic sites had not been altered and that there were no problems related to the structural integrity and function of the enzymes. This analysis also allowed us to discover uncatalogued characteristics of the enzymes studied, such as some binding and catalytic positions.
To complement our analyses, we performed molecular docking, which consists of a computational technique that models the interaction between a protein and substrate and predicts how these molecules fit together, and spatially, how they interact. We used this technique to study how different substrates interact with our five proteins of interest, taking into account conserved residues. By looking at which residues are involved in the interaction with substrates and how these residues have evolved, we can gain insights into how the protein works and how its activity may have evolved to play different biological roles.
This integrated approach between molecular docking and phylogenetic analysis allows for a deeper understanding of how the protein interacts with its substrate and how this interaction has evolved over time (see Modeling page, on the Docking section for more information).
It is important to remember that for the analysis and validation of the docking, some procedures were followed. Firstly, phylogenetics helped to understand the structural conformation based on the functionality of different proteins from the same protein family and, therefore, similar roles in reaction catalysis. In this way, it was possible to identify catalytic, binding, H+ acceptor or donor sites and their respective physicochemical characteristics, based on the conservation study of certain regions of the domain of the primary sequences of the enzymes evaluated.
Furthermore, the corroboration of the catalytic pockets inserted in the most likely cavities indicated by the CavityPlus server (Wang et al., 2023), helped to corroborate how likely the ligand would be to fit into the given conformation. Finally, the ten best evaluated docking results by energy quality and occurrence, on the chosen server, HDOCK (Yan et al., 2020), were related to existing information, with attention to the description of each molecule and each chemical space involved, finding a better model for the enzyme-substrate complexes mentioned in the metabolic pathway.
Figures 9a and 9b. Dockings between the enzymes and substrates presented at metabolic pathway.
Source: Authors, 2023.
In this figure, “A” represents the CsHCS docked with Hexanoic Acid; “B”, “C”, “D” and “E”, the OS docked with different substrates - respectively, Hexanoyl-CoA and Malonyl-CoA in three conformations -; “F”, the OAC docked with Olivetol; “G” and “H”, CsPT4 sequentially docked with Olivetolic Acid and GPP, and finally, CBDAS docked with CBGA and catalyzing the reaction to produces CBDA.
Thinking of a way to model the expression and evidence mechanism to obtain CBDA (cannabidiolic acid) and subsequently dehydrate it to CBD (cannabidiol) to obtain the final product of this research, our time searched for different standards and models applicable to yeasts. From this, we found the mixed effects (ME) model, being a statistical representation of the form of gene expression after transformation (Llamosi et al. , 2016).
With ME, we are examining a simplified dynamic scenario of gene expression, where we take into account that the translational parameter depends on the transcriptional parameter, which, in turn, is associated with the different factors that affect the initiation of transcription (see Modeling page at Expression Modeling section for more information). This type of understanding is very important to describe, as we have several parameters to be evaluated in the Wet Lab and the representativeness of the cellular process can be cataloged.
In this manner, we've developed a model to guide our experimental analysis. It allows us to examine different conditions for each sample point and grasp the expression patterns within our yeast profile. By doing so, we can identify potential bottlenecks in the modified cellular metabolism. Consequently, utilizing regression analysis, we can derive the values of the constants k and g for each equation. This model is constructed based on observed sample values, enhancing our comprehension of the production process for the CBDA.
Furthermore, we use a concept that is very present in synthetic biology: logic gates. At the interface between electrical engineering and genetic engineering, doors emerged as an analogy for visualizing and understanding the cell as a productive process, a chassis. Thus, thinking about Boolean logic for biomolecules, as described by Miyamoto et al. (2013), we identified some determinants as our gates: a, being the CDS promoter that first encodes all other enzymes of the biological pathway, in addition to the subsequent proteins, since they are interdependent on each other.
As seen on the Modeling page (see Expression Modeling section), different circuits were designed for different cases of experimentation: the first one with the metabolic pathway of CBDA (Figure 10) and the second one fragmented, between the olivetolic acid production circuit and the CBDA production from olivetolic acid circuit (Figure 11).
Figure 10. Complete Boolean-based logic circuit for CBDA metabolic pathway
Source: Authors, 2023.
Figure 11. Boolean-based logic circuit for Olivetolic Acid and the circuit from Olivetolic Acid to CBDA.
Source: Authors, 2023.
With both analyses made, we could evaluate the probability from each case (see Expression Modeling), concluding, in first, that the second one could give us better responses, despite being a longer-time than the first case.
In summary, the CBDynamics project, developed by the iGEM USP EEL Brazil team, draws on various theoretical foundations of synthetic biology. The application of advanced structural modeling techniques, phylogenetic analyses and molecular docking demonstrates a thorough and multidisciplinary research to produce cannabidiol (CBD) using the yeast Saccharomyces cerevisiae as a base.
The strategy of adding T2A sequences to the enzymes to ensure effective synthesis uses concepts of gene manipulation. In addition, the careful selection of biological circuits and the appropriate use of expression vectors demonstrate the need for good planning, accompanied by a great deal of study. Phylogenetic analysis and molecular docking increase our understanding of the interactions between proteins and substrates, providing valuable information on the function of enzymes and their evolution. This integrated approach is crucial to the success of the project.
All in all, the work done so far lays a solid and promising foundation for the next phase of the project in 2024. This research exemplifies excellence in the application of synthetic biology to produce relevant compounds with the potential to positively impact the pharmaceutical industry and the field of biotechnology.
We know that synthetic biology is a very recent science, based on the application of engineering principles to biological problems, which, among many other things, allows researchers, with the help of various software programs, to build biological systems. We can therefore say that we have in our hands a tool with powerful potential to expand the search for and improve the finding of solutions to real problems, such as issues linked to the environmental crisis or the development of drugs for diseases, for example.
In the previous paragraphs (modeling part), you can see how we have used bioinformatics tools to achieve excellence in synthetic biology. However, when we are dealing with people trying to solve real problems, dealing directly with the affected community is an indispensable part of the project. Seeking effective solutions using only the support of academia can be extremely ineffective. Therefore, during the development of the CBDynamics project, we sought to break down walls and have close contact with the population.
In order to have direct contact with the local community, we collected data in Lorena's central square (the city where our university campus is located) on different days and times to reach distinct audiences. During these collections we didn't just ask questions related to the common knowledge surrounding CBD, but also publicized the project and research information. These data collections were fundamental in letting us know that there is a great lack of knowledge about the subject, and that, normally, prejudice against cannabidiol comes from misinformation and not from a negative characteristic of the compound.
Unfortunately, there is still a huge stigma attached to the cannabis plant. Out of ignorance (and we don't mean to blame the population for this), many people don't associate the plant with its medicinal properties. In order to this stop being a "taboo" subject, the population needs correct guidance on the subject, even so that people affected by diseases for which THC-based treatment is recommended can feel safe in seeking and adhering to treatment.
One of the main motivations of our team and the Synthetic Biology Club is to spread the word about synthetic biology. With this in mind, throughout the development of our project, we have organized classes and discussion groups to ensure that the information we have is passed on and that any doubts are cleared up through debate.
Our project also took part in the preparation of questions for the Brazilian Synthetic Biology Olympiad (OBBS), which is a competition motivated by the desire to spread the word about synbio and engage young people in this very new and promising science, thus ensuring scientific dissemination in general.
Cameron, D. E., Bashor, C. J., & Collins, J. J. (2014). A brief history of synthetic biology. Nature reviews. Microbiology, 12(5), 381–390. https://doi.org/10.1038/nrmicro3239
Mansouri, M., & Berger, P. (2014). Strategies for multigene expression in eukaryotic cells. Plasmid, 75, 12–17. https://doi.org/10.1016/j.plasmid.2014.07.001
Zirpel, B., Degenhardt, F., Martin, C., Kayser, O., & Stehle, F. (2017). Engineering yeasts as platform organisms for cannabinoid biosynthesis. Journal of biotechnology, 259, 204–212. https://doi.org/10.1016/j.jbiotec.2017.07.008
Llamosi, A., Gonzalez-Vargas, A. M., Versari, C., Cinquemani, E., Ferrari-Trecate, G., Hersen, P., & Batt, G. (2016). What population reveals about individual cell identity: single-cell parameter estimation of models of gene expression in yeast. PLoS computational biology, 12(2), e1004706.
Miyamoto, T., Razavi, S., DeRose, R., & Inoue, T. (2013). Synthesizing biomolecule-based Boolean logic gates. ACS synthetic biology, 2(2), 72-82.
Biological circuits designed with SnapGene software (www.snapgene.com).
Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M., & Ishiguro-Watanabe, M. (2023). KEGG for taxonomy-based analysis of pathways and genomes. Nucleic acids research, 51(D1), D587–D592. https://doi.org/10.1093/nar/gkac963
Yan Y, Tao H, He J, Huang S-Y.* The HDOCK server for integrated protein-protein docking. Nature Protocols, 2020; doi: https://doi.org/10.1038/s41596-020-0312-x.
Shiwei Wang, Juan Xie, Jianfeng Pei, Luhua Lai, CavityPlus 2022 Update: An Integrated Platform for Comprehensive Protein Cavity Detection and Property Analyses with User-friendly Tools and Cavity Databases, Journal of Molecular Biology, Volume 435, Issue 14, 2023, 168141, ISSN 0022-2836, https://doi.org/10.1016/j.jmb.2023.168141. (https://www.sciencedirect.com/science/article/pii/S002228362300219X)
The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.