Our first steps

Starting from the beginning, even before our project was written, we began our research to understand how the CBD synthesis pathway functioned in the Cannabis sativa plant. This research was fruitful, getting us a consistent foundation of the metabolic pathway necessary for the production of CBD, along with an understanding of each stage and the mechanism of each enzyme involved in the process, as shown below.

Figure 1: Biosynthetic pathway of cannabinoids in C. sativa.

Source: 2017, Zirpel et al.

Note. Image from “Engineering yeasts as platform organisms for cannabinoid biosynthesis”, by B. Zirpel, F. Degenhardt, C. Martin, and O. Kayser, and F. Stehle, 2017, Journal of Biotechnology 259, p.205 ( https://doi.org/10.1016/j.jbiotec.2017.07.008 ).

Although there was the possibility of considering various fungi or plants for the project, the use of the yeast Saccharomyces cerevisiae seemed quite obvious to us, since not only it has the necessary machinery for synthesis and modification of the needed enzymes, it already carries out the mevalonate metabolic pathway, one of the two key pathways necessary for the production of CBD, which would be a big step towards the construction of our project (Zirpel et al. 2017). Adding to that, we already had easy access to a very competent strain of S. cerevisiae, with several auxotrophies we could explore, and other iGEM teams from previous editions had deposited sequences for all of the needed enzymes optimized for S. cerevisiae (See: iGEM22 Waterloo). Even though iGEM22 Waterloo team had some success carrying out their project, we felt there was still room for improvement and a more simplified approach to the challenge of synthesizing cannabidiol in yeast cells. What we ended with is an elegant, modifiable and easy-to-use strategy for cloning and expressing the metabolic pathway needed for CBD synthesis, with two “blocks” (see Biological Circuits in this section) that can be studied individually, in concert and further improved and exchanged into other circuits, cassettes and chassis.

Now, with the project written based on the metabolic pathways found, the next step was to study and research the current bibliography of each stage and each enzyme present in the metabolic pathway, with the aim of understanding the structure and functionality of each enzyme. Among the characteristics sought, we can mention the presence or absence of signal peptides, the need for cofactors, the functioning of the catalytic site, the position it is found in the cell, and, in general, any information that could guide the modeling of these proteins.

In this case, the following characteristics were found for each enzyme researched:

Olivetol Synthetase (OS, OLS or CsOS): no signal peptides (Taura et al., 2009).
Cannabidiolic acid synthase (CBDAS or CsCBDAS): has a signal peptide and there are glycosylations at various sites, specifically at the amino acid sites: 45, 65, 168, 296, 304, 328, 498. (Taura et al., 2007).
Prenyltransferase 4 (CsaPT4 or CsPT4): no signal peptides, no glycosylation, Mg2+ ion as a cofactor, membrane protein (Brujin et al., 2020), has a transport peptide directing it to the chloroplast plastid.
Olivetolic acid cyclase (OAC): no signal peptide, conducts intramolecular C2 → C7 aldol condensation without decarboxylation (Tahir et al., 2021).
Hexanoyl-CoA synthetase (CsHCS1): has no signal peptides, presents Mg2+ ion as a cofactor, its highest enzymatic activity was detected at 40°C and pH 9 and its activity can be inhibited at high CoA concentrations (Degenhardt et al., 2017) (Stout et al., 2012).

References

de, J. C., Levisson, M., Beekwilder, J., Berkel, van, & Jean‐Paul Vincken. (2020). Plant Aromatic Prenyltransferases: Tools for Microbial Cell Factories. Trends in Biotechnology, 38(8), 917–934. https://doi.org/10.1016/j.tibtech.2020.02.006

Degenhardt, F., Stehle, F., & Kayser, O. (2017). The Biosynthesis of Cannabinoids. Handbook of Cannabis and Related Pathologies, 13–23. https://doi.org/10.1016/b978-0-12-800756-3.00002-8

Stout, J. M., Boubakir, Z., Ambrose, S. J., Purves, R. W., & Page, J. E. (2012). The hexanoyl-CoA precursor for cannabinoid biosynthesis is formed by an acyl-activating enzyme in Cannabis sativa trichomes. The Plant Journal, no-no. https://doi.org/10.1111/j.1365-313x.2012.04949.x

Tahir, M. N., Shahbazi, F., Rondeau-Gagné, S., & Trant, J. F. (2021). The biosynthesis of the cannabinoids. Journal of Cannabis Research, 3(1). https://doi.org/10.1186/s42238-021-00062-4

Taura, F., Sirikantaramas, S., Shoyama, Y., Yoshikai, K., Shoyama, Y., & Morimoto, S. (2007). Cannabidiolic-acid synthase, the chemotype-determining in the fiber-typeCannabis sativa. FEBS Letters, 581(16), 2929–2934. https://doi.org/10.1016/j.febslet.2007.05.043

Taura, F., Tanaka, S., Taguchi, C., Fukamizu, T., Tanaka, H., Shoyama, Y., & Morimoto, S. (2009). Characterization of olivetol synthase, a polyketide synthase putatively involved in cannabinoid biosynthetic pathway. FEBS Letters, 583(12), 2061–2066. https://doi.org/10.1016/j.febslet.2009.05.024

Zirpel, B., Degenhardt, F., Martin, C., Kayser, O., & Stehle, F. (2017). Engineering yeasts as platform organisms for cannabinoid biosynthesis. Journal of Biotechnology, 259, 204–212. https://doi.org/10.1016/j.jbiotec.2017.07.008

Structural modeling

OVERVIEW

Structural modeling of enzymes was carried out in order to review the function of enzymes and their properties. Starting with a bibliographical review of the enzyme and its function, moving on to its kinetic properties, FASTA sequence, evaluation of signal peptide and structural parameters. We then generated our first native protein models via homology models. checking possibilities, improved stability and angles of amino acids that would appear to be less stable. Finally, it was verified whether the models could be used correctly.

We gathered the FASTA sequences for all of our enzymes from several sources, as shown in the following table:

Protein name / Protein code / Database	FASTA sequence
CsHCS1 / AFD33345.1 / GenBank	MGKNYKSLDSVVASDFIALGITSEVAETLHGRLAEIVCNYGAATPQTWINIANHILSPDLPFSLHQMLFYGCYKDFGPAPPAWIPDPEKVKSTNLGALLEKRGKEFLGVKYKDPISSFSHFQEFSVRNPEVYWRTVLMDEMKISFSKDPECILRRDDINNPGGSEWLPGGYLNSAKNCLNVNSNKKLNDTMIVWRDEGNDDLPLNKLTLDQLRKRVWLVGYALEEMGLEKGCAIAIDMPMHVDAVVIYLAIVLAGYVVVSIADSFSAPEISTRLRLSKAKAIFTQDHIIRGKKRIPLYSRVVEAKSPMAIVIPCSGSNIGAELRDGDISWDYFLERAKEFKNCEFTAREQPVDAYTNILFSSGTTGEPKAIPWTQATPLKAAADGWSHLDIRKGDVIVWPTNLGWMMGPWLVYASLLNGASIALYNGSPLVSGFAKFVQDAKVTMLGVVPSIVRSWKSTNCVSGYDWSTIRCFSSSGEASNVDEYLWLMGRANYKPVIEMCGGTEIGGAFSAGSFLQAQSLSSFSSQCMGCTLYILDKNGYPMPKNKPGIGELALGPVMFGASKTLLNGNHHDVYFKGMPTLNGEVLRRHGDIFELTSNGYYHAHGRADDTMNIGGIKISSIEIERVCNEVDDRVFETTAIGVPPLGGGPEQLVIFFVLKDSNDTTIDLNQLRLSFNLGLQKKLNPLFKVTRVVPLSSLPRTATNKIMRRVLRQQFSHFE
OLS / BAG14339.1 / GenBank	MNHLRAEGPASVLAIGTANPENILLQDEFPDYYFRVTKSEHMTQLKEKFRKICDKSMIRKRNCFLNEEHLKQNPRLVEHEMQTLDARQDMLVVEVPKLGKDACAKAIKEWGQPKSKITHLIFTSASTTDMPGADYHCAKLLGLSPSVKRVMMYQLGCYGGGTVLRIAKDIAENNKGARVLAVCCDIMACLFRGPSESDLELLVGQAIFGDGAAAVIVGAEPDESVGERPIFELVSTGQTILPNSEGTIGGHIREAGLIFDLHKDVPMLISNNIEKCLIEAFTPIGISDWNSIFWITHPGGKAILDKVEEKLHLKSDKFVDSRHVLSEHGNMSSSTVLFVMDELRKRSLEEGKSTTGDGFEWGVLFGFGPGLTVERVVVRSVPIKY
OAC / I6WU39 / UniProt	MAVKHLIVLKFKDEITEAQKEEFFKTYVNLVNIIPAMKDVYWGKDVTQKNKEEGYTHIVEVTFESVETIQDYIIHPAHVGFGDVYRSFWEKLLIFDYTPRK
CsPT4 / A0A455ZJC3 / UniProt	MGLSLVCTFSFQTNYHTLLNPHNKNPKNSLLSYQHPKTPIIKSSYDNFPSKYCLTKNFHLLGLNSHNRISSQSRSIRAGSDQIEGSPHHESDNSIATKILNFGHTCWKLQRPYVVKGMISIACGLFGRELFNNRHLFSWGLMWKAFFALVPILSFNFFAAIMNQIYDVDIDRINKPDLPLVSGEMSIETAWILSIIVALTGLIVTIKLKSAPLFVFIYIFGIFAGFAYSVPPIRWKQYPFTNFLITISSHVGLAFTSYSATTSALGLPFVWRPAFSFIIAFMTVMGMTIAFAKDISDIEGDAKYGVSTVATKLGARNMTFVVSGVLLLNYLVSISIGIIWPQVFKSNIMILSHAILAFCLIFQTRELALANYASAPSRQFFEFIWLLYYAEYFVYVFI
CsCBDAS / NP_001384865.1 / NCBI	MKCSTFSFWFVCKIIFFFFSFNIQTSIANPRENFLKCFSQYIPNNATNLKLVYTQNNPLYMSVLNSTIHNLRFTSDTTPKPLVIVTPSHVSHIQGTILCSKKVGLQIRTRSGGHDSEGMSYISQVPFVIVDLRNMRSIKIDVHSQTAWVEAGATLGEVYYWVNEKNENLSLAAGYCPTVCAGGHFGGGGYGPLMRNYGLAADNIIDAHLVNVHGKVLDRKSMGEDLFWALRGGGAESFGIIVAWKIRLVAVPKSTMFSVKKIMEIHELVKLVNKWQNIAYKYDKDLLLMTHFITRNITDNQGKNKTAIHTYFSSVFLGGVDSLVDLMNKSFPELGIKKTDCRQLSWIDTIIFYSGVVNYDTDNFNKEILLDRSAGQNGAFKIKLDYVKKPIPESVFVQILEKLYEEDIGAGMYALYPYGGIMDEISESAIPFPHRAGILYELWYICSWEKQEDNEKHLNWIRNIYNFMTPYVSKNPRLAYLNYRDLDIGINDPKNPNNYTQARIWGEKYFGKNFDRLVKVKTLVDPNNFFRNEQSIPPLPRHRH

Any signal (as in CsCBDAS) or transit peptides (as in CsPT4) are underscored.

According to the NCBI description, there are glycosylations in only one of our enzymes: the cannabidiolic acid synthase (CBDAS), with seven glycosylation sites predicted at the amino acid residues 45, 65, 168, 296, 304, 328 and 498 (Onofri, De Meijer, Mandolino, 2015).

Structural modeling

INITIAL MODELS

After the gathering of the FASTA sequences, the existence of signal peptides was evaluated for protein expression using the SignalP 6.0 website (Teufel et al., 2022). The sequences were then submitted to the Swiss Model server (Waterhouse et al., 2018), and the models were chosen based on the highest identity and query possible, and we prioritized the more accurate templates, ideally obtained from a previously crystallographic studies. Then, the Ramachandran plots (Hollingsworth; Karplus, 2010) and the MolProbity (Williams et al., 2018) parameters were also evaluated for the first model validation.

The following figures represents, respectively, the peptide signal evaluation results from SignalP 6.0 and the chosen models from Swiss-Model:

CsHCS1:

No signal peptide was predicted in its structure (Figure 1).

Figure 1: CsHCS1 probabilistic plot for signal peptide evaluation generated by SignalP 6.0.

Source: 2023, Authors.

Based on the best template from Swiss-Model, according to the parameters, CsHCS1 model obtained from Swiss Model (Figure 2).

Figure 2: CsHCS1 identity based model obtained from Swiss-Model.

Source: 2023, Authors.

The same modeling procedure was carried out for each of our enzymes, and we managed to get reasonable models for each one of them, and we found some interesting things about our enzymes, as we show below with Olivetol Synthase (OS)

OS:

No signal peptide was predicted in its structure (Figure 3).

Figure 3: OS probabilistic plot for signal peptide evaluation generated by SignalP 6.0.

Source: 2023, Authors.

OS model obtained from Swiss Model (Figure 4) shows its dimeric structure, with each chain mirrored and dimerized with each other through hydrogen bonds and some other Van der Waals interactions.

Figure 4: OS identity based model obtained from Swiss-Model.

Source: 2023, Authors.

Similarly with Olivetolic Acid Cyclase (OAC), this enzyme is dimeric, and being so small, we knew we would have to take lots of care when dealing with it in our circuits, so its structure-function relationship would not be thrown too far. The following figures show the signal peptide evaluation and the model we got from the Swiss Model server, in its cute heart-shaped dimerized state.

OAC:

No signal peptide was predicted in its structure (Figure 5).

Figure 5: OAC probabilistic plot for signal peptide evaluation generated by SignalP 6.0.

Source: 2023, Authors.

The OAC model we obtained from Swiss Model (Figure 6) came from a crystallized structure, but as the image shows (in the reddish regions) and as our further model evaluation confirmed (see sections below), there are uncertainties around this enzyme, which can be related to both its small size and its dimeric structure, both of which can affect the enzyme’s tridimensional conformation.

Figure 6: OAC model obtained from Swiss-Model.

Source: 2023, Authors.

CsaPT4:

No signal peptide predicted in its structure (Figure 7).

Figure 7: (A) CsaPT4 probabilistic plot for signal peptide evaluation generated by SignalP 6.0. (B) Transit peptide evaluation performed at the targetP 2.0 server. Figure was truncated.

Source: 2023, Authors.

The aforementioned transit peptide can be clearly seen in the native CsaPT4 model obtained from Swiss-Model (Figure 8a). We modified the protein to remove these first 77 amino acids using the MODELLER extension from PyMOL (The PyMOL Molecular Graphics System), and this modification was carried out in our part and circuit design as well. This was done to not cause any issues in our chassis due to unrecognized signal sequences, which can trigger premature protein degradation. (Figure 8b).

Figure 8: CsaPT4 model obtained from Swiss-Model. In A shows the structure with the transit peptide and in B we built a new model by removing the transit peptide.

Source: 2023, Authors.

CBDAS:

This was the first and only true PTM and possible secretion signal peptide we found in our enzymes. The evaluation predicted the existence of a signal peptide corresponding to the first 28 residues (Figure 9).

Figure 9: CBDAS probabilistic plot for signal peptide evaluation generated by SignalP 6.0.

Source: 2023, Authors.

The CBDAS model we obtained from Swiss Model (figure 10) was actually really interesting - we could already see possible catalytic pockets without even developing our phylogenetic analysis (see Docking section). We felt pretty confident about this model - a crystallized and well-annotated structure would facilitate some of our work.

Figure 10: CBDAS model obtained from Swiss-Model.

Source: 2023, Authors.

Structural modeling

MODELS EVALUATION

Based on the files generated by SwissModel, evaluations were carried out using other softwares, such as ProCheck from UCLA-DOE LAB SAVESv6.0 (Lakowski et al, 1993), to visualize the results in summarized and graphical form. Energy refinement was also tested using the Charmm-Gui server (Jo et al., 2008; Park et al., 2023), using ProCheck to verify QMEAN (Benkert, Biasini, Schwede; 2011) and evaluating the generated Ramachandran plots.

It’s important to remember that the Ramachandran plot was originally developed by Viswanathan Sasisekharan. It represents the psi (ψ) and phi (φ) bond angles of the amino acids present in a peptide. It is used to determine “permitted” (i.e. energetically and geometrically favorable) areas for backbone or spine dihedral points of the amino acid (Coumar, 2021).

In other hands, QMEAN stands for Qualitative Model Energy Analysis. It describes the major geometrical aspects of protein structures using five different structural descriptors. The local geometry is analyzed by a new kind of torsion angle potential over three consecutive amino acids.

The Ramachandran plot, QMEAN and energetic parameters are respectively displayed below for each enzyme, represented in the following figures.

CsHCS1:

Firstly, we have our Ramachandran Plot from this structure, evaluating peptide bonds phi and psi from the amino acids present at the structure.

Figure 11: CsHCS1 general Ramachandran plot from Procheck.

Source: 2023, Authors.

As we can see in Figure 11, on the reddish islands we call “Ramachandran favored”, we obtained more than 94% of all residues. This means that very mostly residues are inside the best thermodynamics areas in the graphic. In other hands, at additional allowed areas, we find another almost 6%, which implies we have no significantly outlier amino acids from this model, concluding that is a good model by this parameter.

Then, we entered with the PDB archive into the Swiss-Model system to evaluate another kind of score: the global quality of energetic model (QMEAN global, as explained above).

Figure 12: Energy scores and geometry for CsHCS1 obtained from Swiss-Model.

Source: 2023, Authors.

Here, we obtained a score a little lower than the ideal, however, still reasonable. As shown in Figure 12, the parts in blue (more conserved, impling a local with more energetic quality) is concentrated at the protein domain, while the red ones (lower energetic quality) is concentrated at the extremities.

Finally, we inserted our PDB file at the MolProbity (Williams et al., 2018), hydrogenated it and made some tests, as we can see in the summary below.

Figure 13: CsHCS1 energetic analysis from MolProbity.

Source: 2023, Authors.

In Figure 13, are shown some parameters discussed before. We can see that CsHCS1 has passed well (green) or reasonable (yellow) at almost all tests done, taking only two of them that presented not so good (red ones), even though in a little relative value.

OS:

Repeating the same procedure we did for CsHCS1, firstly we obtained the Ramachandran Plot as can be analyzed below.

Figure 14: OS general Ramachandran plot from Procheck.

Source: 2023, Authors.

In Figure 14, we observed a similar measure occurred in the previous protein, with most of our amino acids present at the red region (most thermodynamic favored area). Almost all of the other ones were at the yellow or pale yellow areas, which means the outliers still be considered none of our residues, indicating that is a good model in the parameter < i > phi-psi evaluated.

As expected, the energetic parameters for OS, as evaluated by QMEAN showed mostly excellent scores, with the regions harboring most structural uncertainties being the extremes, as one would expect, and some regions internal to the protein, as shown in Figure 15.

Figure 15: OS energetically refined QMEAN obtained from Swiss-Model.

Source: 2023, Authors.

Given the excellent results from QMEAN, we also expected the MolProbity scores to be reasonably good, as we could confirm it in our analysis (figure 16). We did have some parameters below average or not ideal, but they shouldn’t be major issues in our models.

Figure 16: OS energetic analysis from MolProbity.

Source: 2023, Authors.

Both of these results corroborate the Ramachandran plot analysis - if most of the enzyme’s amino acids are in favorable positions and have mostly ideal bond angles, the energetic parameter will follow the trend.

OAC:

For the OAC, we insert at the MolProbity server to obtain our Ramachandran Plot. In this case, we have less amino acids (only 101 residues), which implies any of them being an outlier can be a considerable amount to the final analysis.

Figure 17: OAC general Ramachandran obtained from MolProbity.

Source: 2023, Authors.

In Figure 17, happily, we found almost all residues at the most favored area (shown by the internal lines) and some others at the additional favored area, impling, despite being a little model, a good result in general.

Also, the QMEAN was evaluated and responded with a good final result. Conserving a big part of the molecule, the domain ones can be seen in blue, while just a part between the dimer evaluated had some red involved, as we can see in Figure 18.

Figure 18: OAC energetically refined QMEAN evaluation obtained from Swiss-Model.

Source: 2023, Authors.

Finally, the MolProbity summary result was also evaluated as seen in Figure 19.

Figure 19: OAC energetic analysis from MolProbity.

Source: 2023, Authors.

Corresponding to the Ramachandran expected, in fact, we had a reasonable structure, with only 1 outlier identified, but, in general, even with this being the smaller model worked and so the easiest one to show some bad result based in only 1 amino acid outlier, we obtained a reasonable result.

CsaPT4:

As we mentioned before, this was our most problematic enzyme for the most part. First, we evaluated the model with the transit peptide, to check if there would be a severely compromised structure. The Ramachandran plot we generated for it (Figure 20) already showed a rough start.

Figure 20: CsaPT4 with the transit peptide general Ramachandran obtained from Swiss-Model.

Source: 2023, Authors.

In the Ramachandran plot above, we highlighted a few of the amino acids that comprise the transit peptide, and cause great instability to the model, being so far off of the favorable Ramachandran islands. This is due to the transport peptide size - its really long, 77 amino acids in total -, in a loop-like conformation, and that yields a highly unstable and unpredictable structure. Adding that to the fact that there was no crystallized structure for CsPT4 or other similar proteins in the Cannabaceae family, we had to press forward and work with the model we had.

The Ramachandran analysis with the transport peptide was already not great, so we knew QMEAN parameters would not be too good, and we confirmed that when submitting the model for analysis (Figure 21).

Figure 21: CsaPT4 with the transit peptide QMEAN evaluation generated by Swiss-Model.

Source: 2023, Authors.

Not only can we see that the model in general does not have a great energetic evaluation, we can see a chunk of pretty bad quality estimate in the first 77 amino acids, which are the residues corresponding to the transit peptide. A predicted model is already something to be careful with, and this transit peptide would get us further problems.

Barging forward, we performed the MolProbity analysis (Figure 22) to check specifically what was so wrong about our model, and the results were clear: the amino acids in the transit peptide were, indeed, the problem. We can check that by looking at what MolProbity told us was wrong: poor protein geometry and some amino acid residues. We cross-examined these results with our Ramachandran and QMEAN and could conclude, with high confidence, that the transit peptide should be removed for this project.

Figure 22: CsaPT4 with the transit peptide energetic analysis from MolProbity.

Source: 2023, Authors.

Having confirmed what was the main issue with this model, we repeated this process with the model we built by removing the transport peptide. The Ramachandran plots we generated using ProCheck (Figure 23) were already much more promising - most of the amino acid residues were in favorable positions now, and that filled us with confidence.

Figure 23: CsaPT4 without the transit peptide general Ramachandran obtained from Swiss-Model.

Source: 2023, Authors.

With the better results we got from the graphs above, we can indeed conclude that the transit peptide was a major effector in the poor model quality we had gotten before. Having a better Ramachandran, we again submitted our new model to QMEAN (Figure 24), and again got results that were much better.

Figure 24: QMEAN evaluation generated by QMEAN for the model of CsaPT4 without the transit peptide.

Source: 2023, Authors.

We can see a major improvement in the quality of the energetic model, compared to what we previously got. The model still had a score that was quite lower than our other models, but it was reasonable enough for us to proceed, so we submitted it to MolProbity (Figure 25) to assure it was better now.

Figure 25: CsaPT4 without the transit peptide energetic analysis from MolProbity.

Source: 2023, Authors.

With the transit peptide removed, we actually managed to get a mostly ideal protein geometry, and this assured us that the model we were working with would get us results that were actually valid.

CBDAS:

Our last protein was also evaluated. Inserting the CBDAS to the MolProbity server, we obtained the Ramachandran, as seen below.

Figure 26: CBDAS general Ramachandran obtained from MolProbity.

Source: 2023, Authors.

It presented a great result in general, showing almost all amino acids at the favored areas observed.

Global QMEAN was also evaluated, as seen below, with great parameters. Only the extremity corresponding to the signal peptide has shown the bigger red part, but all the domains would be said as a good energetic quality structure.

Figure 27: CBDAS global QMEANevaluation generated by Swiss-Model.

Source: 2023, Authors.

Finally, we took the MolProbity summary results, as seen below.

Figure 28: CBDAS energetic analysis from MolProbity.

Source: 2023, Authors.

As expected from the Ramachandran visual analysis and the score obtained from the QMEAN, the results were great. We observed that almost all parameters passed well in Figure 28, only with one score being a red one, although close to the ideal value.

Structural modeling

T2A SEQUENCES

Aiming to ensure that the five enzymes are synthesized in the biological circuit and have the expected function, we found a possible route of action: the use of 2A sequences. They are short, high-efficiency sequences for the synthesis of multiple individual proteins from a single mRNA molecule.

The technique consists of exploiting the inability of the eukaryotic ribosome to insert a peptide bond between the last proline and glycine residues, but without detaching itself from the mRNA molecule. This failure leads to a jump between amino acid residues without stopping protein synthesis, which generates multiple individual proteins from a single mRNA. The T2A and P2A sequences are the most efficient in this function and their sequences can be easily obtained from bioinformatics banks. Despite the ease of working with 2A sequences, the functions of the resulting proteins must be evaluated, since in the N- and C-terminal regions of the separated proteins there will be fragments corresponding to the cleavage of the 2A sequence used (Mansouri, 2014).

In our project, we needed to evaluate 3 different positions in each enzyme to implement this technique. Using the following T2A sequence,

EGRGSLLTCGDVEENPGP

We have three possible situations for our enzymes:

The protein gets the (EGRGSLLTCGDVEENPG) sequence on its C-terminal;
The initial (P) on its N-terminal and the final (EGRGSLLTCGDVEENPG) on its C-terminal or;
Only the single proline in the N-terminal for each protein expressed.

So, we had to build new models, based on the first and evaluated ones, that could express the new size and conformation of those enzymes, observing the interferences at the molecular docking, the main point of our metabolic pathway.

Evaluating all of the models obtained with the proposed modifications (15 total models!), we considered some key points: how good the model of the native protein is and its size, both of which directly imply the correspondence of protein functionality interference. Also, we repeated those evaluation methods to get the best protein models possible. The table below presents the results of the best models containing the T2A sequence fragments, which we underlined.

Table 2: sequences with T2A.

CsHCS+EGRGSLLTCGDVEENPG	MGKNYKSLDSVVASDFIALGITSEVAETLHGRLAEIVCNYGAATPQTWINIANHILSPDLPFSLHQMLFYGCYKDFGPAPPAWIPDPEKVKSTNLGALLEKRGKEFLGVKYKDPISSFSHFQEFSVRNPEVYWRTVLMDEMKISFSKDPECILRRDDINNPGGSEWLPGGYLNSAKNCLNVNSNKKLNDTMIVWRDEGNDDLPLNKLTLDQLRKRVWLVGYALEEMGLEKGCAIAIDMPMHVDAVVIYLAIVLAGYVVVSIADSFSAPEISTRLRLSKAKAIFTQDHIIRGKKRIPLYSRVVEAKSPMAIVIPCSGSNIGAELRDGDISWDYFLERAKEFKNCEFTAREQPVDAYTNILFSSGTTGEPKAIPWTQATPLKAAADGWSHLDIRKGDVIVWPTNLGWMMGPWLVYASLLNGASIALYNGSPLVSGFAKFVQDAKVTMLGVVPSIVRSWKSTNCVSGYDWSTIRCFSSSGEASNVDEYLWLMGRANYKPVIEMCGGTEIGGAFSAGSFLQAQSLSSFSSQCMGCTLYILDKNGYPMPKNKPGIGELALGPVMFGASKTLLNGNHHDVYFKGMPTLNGEVLRRHGDIFELTSNGYYHAHGRADDTMNIGGIKISSIEIERVCNEVDDRVFETTAIGVPPLGGGPEQLVIFFVLKDSNDTTIDLNQLRLSFNLGLQKKLNPLFKVTRVVPLSSLPRTATNKIMRRVLRQQFSHFEEGRGSLLTCGDVEENPG
P+OS+EGRGSLLTCGDVEENPG	PMNHLRAEGPASVLAIGTANPENILLQDEFPDYYFRVTKSEHMTQLKEKFRKICDKSMIRKRNCFLNEEHLKQNPRLVEHEMQTLDARQDMLVVEVPKLGKDACAKAIKEWGQPKSKITHLIFTSASTTDMPGADYHCAKLLGLSPSVKRVMMYQLGCYGGGTVLRIAKDIAENNKGARVLAVCCDIMACLFRGPSESDLELLVGQAIFGDGAAAVIVGAEPDESVGERPIFELVSTGQTILPNSEGTIGGHIREAGLIFDLHKDVPMLISNNIEKCLIEAFTPIGISDWNSIFWITHPGGKAILDKVEEKLHLKSDKFVDSRHVLSEHGNMSSSTVLFVMDELRKRSLEEGKSTTGDGFEWGVLFGFGPGLTVERVVVRSVPIKYEGRGSLLTCGDVEENPG
P+OAC	PMAVKHLIVLKFKDEITEAQKEEFFKTYVNLVNIIPAMKDVYWGKDVTQKNKEEGYTHIVEVTFESVETIQ DYIIHPAHVGFGDVYRSFWEKLLIFDYTPRK
P+CsPT4	PHHESDNSIATKILNFGHTCWKLQRPYVVKGMISIACGLFGRELFNNRHLFSWGLMWKAFFALVPILSFNFFAAIMNQIYDVDIDRINKPDLPLVSGEMSIETAWILSIIVALTGLIVTIKLKSAPLFVFIYIFGIFAGFAYSVPPIRWKQYPFTNFLITISSHVGLAFTSYSATTSALGLPFVWRPAFSFIIAFMTVMGMTIAFAKDISDIEGDAKYGVSTVATKLGARNMTFVVSGVLLLNYLVSISIGIIWPQVFKSNIMILSHAILAFCLIFQTRELALANYASAPSRQFFEFIWLLYYAEYFVYVFI
CBDAS+EGRGSLLTCGDVEENPG	MKCSTFSFWFVCKIIFFFFSFNIQTSIANPRENFLKCFSQYIPNNATNLKLVYTQNNPLYMSVLNSTIHNLRFTSDTTPKPLVIVTPSHVSHIQGTILCSKKVGLQIRTRSGGHDSEGMSYISQVPFVIVDLRNMRSIKIDVHSQTAWVEAGATLGEVYYWVNEKNENLSLAAGYCPTVCAGGHFGGGGYGPLMRNYGLAADNIIDAHLVNVHGKVLDRKSMGEDLFWALRGGGAESFGIIVAWKIRLVAVPKSTMFSVKKIMEIHELVKLVNKWQNIAYKYDKDLLLMTHFITRNITDNQGKNKTAIHTYFSSVFLGGVDSLVDLMNKSFPELGIKKTDCRQLSWIDTIIFYSGVVNYDTDNFNKEILLDRSAGQNGAFKIKLDYVKKPIPESVFVQILEKLYEEDIGAGMYALYPYGGIMDEISESAIPFPHRAGILYELWYICSWEKQEDNEKHLNWIRNIYNFMTPYVSKNPRLAYLNYRDLDIGINDPKNPNNYTQARIWGEKYFGKNFDRLVKVKTLVDPNNFFRNEQSIPPLPRHRHEGRGSLLTCGDVEENPG

Source: 2023, Authors.

And, finally, we could obtain the new structures, using the extension MODELLER in the PyMOL Molecular Graphics System software and building a new homology, or better saying, comparative modeling, based on our native-protein model as a template. The structures below are seen in a surface-cartoon view, where the red part in all of them represents the T2A sequence.

Figure 29: T2A CsHCS1 obtained from PyMOL.

Source: 2023, Authors.

Figure 30: T2A OS obtained from PyMOL.

Source: 2023, Authors.

Figure 31: T2A OAC obtained from PyMOL.

Source: 2023, Authors.

Figure 32: T2A CsaPT4 obtained from PyMOL.

Source: 2023, Authors.

Figure 33: T2A CBDAS obtained from PyMOL.

Source: 2023, Authors.

Structural modeling

References

Onofri, C., de Meijer, E. P., & Mandolino, G. (2015). Sequence heterogeneity of cannabidiolic-and tetrahydrocannabinolic acid-synthase in Cannabis sativa L. and its relationship with chemical phenotype. Phytochemistry, 116, 57-68.

Luo, X., Reiter, M. A., d’Espaux, L., Wong, J., Denby, C. M., Lechner, A., ... & Keasling, J. D. (2019). Complete biosynthesis of cannabinoids and their unnatural analogues in yeast. Nature, 567(7746), 123-126.

Mansouri, M., & Berger, P. (2014). Strategies for multigene expression in eukaryotic cells. Plasmid, 75, 12-17. S

Teufel, F., Almagro Armenteros, J.J., Johansen, A.R. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat Biotechnol 40, 1023–1025 (2022). https://doi.org/10.1038/s41587-021-01156-3

Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T., de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46(W1), W296-W303 (2018).

Bienert, S., Waterhouse, A., de Beer, T.A.P., Tauriello, G., Studer, G., Bordoli, L., Schwede, T. The SWISS-MODEL Repository - new features and functionality. Nucleic Acids Res. 45, D313-D319 (2017).

Guex, N., Peitsch, M.C., Schwede, T. Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis 30, S162-S173 (2009).

Studer, G., Rempfer, C., Waterhouse, A.M., Gumienny, G., Haas, J., Schwede, T. QMEANDisCo - distance constraints applied on model quality estimation. Bioinformatics 36, 1765-1771 (2020).

Bertoni, M., Kiefer, F., Biasini, M., Bordoli, L., Schwede, T. Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology. Scientific Reports 7 (2017).

Hollingsworth SA, Karplus PA. A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins. Biomol Concepts. 2010 Oct;1(3-4):271-283. doi: 10.1515/BMC.2010.022. PMID: 21436958; PMCID: PMC3061398.

José Juan Almagro Armenteros, Marco Salvatore, Ole Winther, Olof Emanuelsson, Gunnar von Heijne, Arne Elofsson, and Henrik Nielsen Life Science Alliance 2 (5), e201900429. doi:10.26508/lsa.201900429

Laskowski R A, MacArthur M W, Moss D S, Thornton J M (1993). PROCHECK - a program to check the stereochemical quality of protein structures. J. App. Cryst., 26, 283-291..

S. Jo, T. Kim, V.G. Iyer, and W. Im (2008). CHARMM-GUI: A Web-based Graphical User Interface for CHARMM. J. Comput. Chem. 29:1859-1865.

S.J. Park, N.R. Kern, T. Brown, J. Lee, and W. Im (2023). CHARMM-GUI PDB Manipulator: Various PDB Structural Modifications for Biomolecular Modeling and Simulation (2023) J. Mol. Biol. 167995.

Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B., 3rd, Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S., & Richardson, D. C. (2018). MolProbity: More and better reference data for improved all-atom structure validation. Protein science : a publication of the Protein Society, 27(1), 293–315. https://doi.org/10.1002/pro.3330.

Coumar, M. S. (2021). Molecular Docking for Computer-Aided Drug Design. Academic Press.

The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.

Molecular docking

DOCKING

We used molecular docking as a tool in order to figure out the best spots in the enzymes to position the substrates in order to optimize the system. This was done using a couple different softwares, such as ClusPro, HADDOCK and HDOCK, and the enzymes both in their natural form and after the addition of a T2A gene sequence, as a way to compare results and check for the best between them and their similarities.

The HADDOCK (High Ambiguity Driven protein-protein DOCKing) drives the docking process by encoding information from already predicted or identified protein interfaces in ambiguous interaction restraints (AIRs), instead of using a ab-initio method (“HADDOCK Web Server,” n.d., para. 2). The ClusPro, on the other hand, is a web server for protein-protein docking (Kozakov et al. , 2017), and the HDOCK uses a hybrid algorithm template-base modeling and ab initio free docking for protein-protein and protein-RNA/DNA docking (Yan, Zhang, Zhou, Li, & Huang, 2017). For this project, after studying all three of them, we decided to use the HDOCK as our primary source for docking.

We began with the molecular docking between the enzymes and the metabolic pathway substrates found in the Journal of Biotechnology 259 (Zirpel, Degenhardt, Martin, Kayser, & Stehle, 2017, p. 205), shown in Figure 1, using the enzymes' natural form. As we did our research, we noticed that Hexanoyl-CoA synthase is an acyltransferase, responsible for binding the Coenzyme-A group to the carboxylic end of a hexanoic acid molecule, making it a molecule capable of reacting in subsequent steps of the metabolic pathway (Stout, Boubakir, Ambrose, Purves, & Page, 2012). It is also a transmembrane protein; so instead of analyzing each atom’s bond in order to determine the chemical cavity, which was a bit impractical, we opted to approximate the results.

Figure 1: Biosynthetic Pathway of Cannabinoids in C. sativa.

Source: 2017, Zirpel et al.

As seen in Figure 1, first, the hexanoyl-CoA synthase (CsHCS1), activates the hexanoic acid. The docking for this interaction is presented in Figure 2.

Figure 2: Hexanoyl-CoA Synthase With Hexanoic Acid Surface

Note. This is the best among the 10 results obtained. The light purple part represents the hexanoyl-CoA synthase, and the green the hexanoic acid. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors.

During the second step of the olivetolic acid pathway, the olivetol is generated in a reaction that is catalyzed by the olivetol synthase (OS), an enzyme with transferase function (“UniProt,” n.d.-a), with the hexanoyl-CoA, created in the first step, and the malonyl-CoA, created in other metabolic pathways natural to the chassis, as substrates. The reaction's products are olivetol, carbon dioxide and the CoA group as shown in Figure 3.

Figure 3: Reaction Catalyzed by Olivetol Synthase

Source: 2023, KEGG Pathway Database (Adapted).

The docking between OS and hexanoyl-CoA, and OS and the three different malonyl-CoA can be seen in Figures 4 to 7.

Figure 4: Olivetol Synthase with Hexanoyl-CoA Surface

Note. This is the best among the 10 results obtained. The light blue part represents the olivetol synthase, and the red and dark blue part, the hexanoyl-CoA. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors.

Figure 5: Olivetol Synthase with Malonyl-CoA 1 Surface

Note. This is the best among the 10 results obtained. The light blue part represents the olivetol synthase, and the red, yellow and dark blue part, the Malonyl-CoA 1. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors.

Figure 6:Olivetol Synthase with Malonyl-CoA 2 Surface

Note. This is the best among the 10 results obtained. The light blue part represents the olivetol synthase, and the red and dark blue part, the Malonyl-CoA 2. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors.

Figure 7: Olivetol Synthase with Malonyl-CoA 3 Surface

Note. This is the best among the 10 results obtained. The light blue part represents the olivetol synthase, and the red, yellow, and dark blue part, the Malonyl-CoA 3. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors.

In the third step, the olivetol-CoA produced in the reaction shown in Figure 3, is converted to olivetolic acid by the olivetolic acid cyclase (OAC), a lyase responsible for the cyclization of olivetol-CoA, releasing the CoA group and generating olivetolic acid (UniProtKB, n.d.-b). The docking of the olivetolic acid cyclase (OAC) with the olivetol-CoA as a substrate is shown in Figure 8.

Figure 8: Olivetolic Acid Cyclase with Olivetol-CoA Surface

Note. This is the best among the 10 results obtained. The green part represents the olivetolic acid cyclase, and the red part, the Olivetol-CoA. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors.

The fourth step consists of the prenylation of the olivetolic acid produced during the third step. This process is carried out by cannabigerolic acid synthase, also known in literature as prenyltransferase 4 (CsPT4), an enzyme with prenyltransferase function, and the substrates for the reaction are olivetolic acid and geranyl diphosphate (GPP), both of which are naturally produced within the chassis. The product of this reaction is cannabigerolic acid, and there is a release of a diphosphate group, as can be seen in Figure 9. The docking between the enzyme and both substrates can be seen in Figures 10 and 11.

Note. Image from KEGG Pathway Database, 2023, adapted.

Figure 9: Cannabigerolic Acid Synthesis Reaction, Catalyzed by a Prenyltransferase

Source: 2023, KEGG Pathway Database (Adapted)

Figure 10: CsPT4 with Olivetolic Acid Surface

Note. This is the best among the 10 results obtained. The purple part represents the CsPT4, and the green the olivetolic acid. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors

Figure 11: CsPT4 with Geranyl Diphosphate Surface

Note. This is the best among the 10 results obtained. The purple part represents the CsPT4, and the red the GPP. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors

Finally, during the fifth and final step of the olivetolic acid pathway, the cannabigerolic acid (CBGA) is converted to cannabidiolic acid (CBDA) in a reaction catalyzed by cannabidiolic acid synthase (CBDAS), which is an oxidoreductase. For this reaction, the substrate is the CBGA, and the products are CBDA and hydrogen peroxide. The docking between CBDAS and CBGA can be seen in Figure 12.

Figure 12:Cannabidiolic Acid Synthase with Cannabigerolic Acid Surface

Note. This is the best among the 10 results obtained. The green part represents the CBDAS, and the red the cannabigerolic acid. Docking from HDOCK and image obtained on PyMOL.

Source: 2023, Authors

Our next step was to analyze the molecular enzyme-substrate docking of the proteins with the T2A gene sequence, which provided us the same most optimal positions as the protein in its natural form’s analysis.

The five enzymes were again analyzed, now using CavityPlus, a software that detects cavities and potential binding sites using the proteins’ 3D models (“CavityPlus,” n.d.), and, using the molecular docking plugin for Pymol, we were able to further analyze these binding sites (Seeliger & De Groot, 2010). For each of the enzymes, we obtained around 10 models of each ligand and a range of 6 to 15 cavities, that were later combined and compared to the phylogenetic analyzes.

We also interpreted some results from HDOCK and CavityPlus, shown in Appendix A, that show a couple parameters that corroborated our analysis.

Molecular docking

PHYLOGENETIC ANALYSIS

In parallel, our team did a protein blast using NCBI in order to find the FASTA sequences for similar proteins for each enzyme, which were then compared using the T-COFFEE MUSCLE alignment site in order to evaluate the alignment of these proteins, and, in order to compare and get a landscape of the essential parts of these alignments, we used the software Jalview. Having our sequences alignment visualization on Jalview, we gathered as much information as we could find in the literature about the binding sites, active sites and catalytic pockets of each enzyme, finding the following results, shown in Figures 13 to 17:

Figure 13: CsHCS1 binding sites, active sites and catalytic pockets

Note. Binding sites indicated in the picture (Uniprot, n.d.-c). Picture from Jalview, adapted.

Source: 2023, Authors

Figure 14: OS binding sites, active sites and catalytic pockets

Note. Active site indicated in the picture (Uniprot, n.d.-a). Picture from Jalview, adapted.

Source: 2023, Authors

Figure 15: OAC binding sites, active sites and catalytic pockets

Note. Binding sites and active site indicated in the picture (Uniprot, n.d.-b).Picture from Jalview, adapted.

Source: 2023, Authors

Figure 16: CsPT4 binding sites, active sites and catalytic pockets

Note. Binding sites and active site indicated in the picture (Uniprot, n.d.-d).Picture from Jalview, adapted.

Source: 2023, Authors

Figure 17: CBDAS binding sites, active sites and catalytic pockets

Note. Binding sites and active site indicated in the picture (Uniprot, n.d.-e).Picture from Jalview, adapted.

Source: 2023, Authors

Molecular docking

DOCKING VALIDATION

Finally, we were able to compare the phylogenetic analysis with the softwares’ results and move on to the assessment of the cavities and the catalytic pockets present in the proteins. We studied the parameters and compared the many results between the docking, the catalytic pockets servers and the information found in the literature, that we then filtered to the following best results, shown in Figures 18 to 37.

Figure 18: CsHCS1 with Phylogenetic Analysis Results

Note. Enzyme in purple and phylogenetic analysis in red. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 19: CsHCS1 with Phylogenetic Analysis Results and Cavities Indicated

Note. Enzyme in purple, phylogenetic analysis in red and cavities in light yellow. Many of the active and binding sites are within the cavity. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 20: CsHCS1 with Phylogenetic Analysis Results, Cavities Indicated and Hexanoic Acid Positioned

Note. Enzyme in purple, phylogenetic analysis in red, cavities in light yellow and ligand in green. The ligand is well positioned within the cavity and near an active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 21:OS with Phylogenetic Analysis Results

Note. Enzyme in light blue and phylogenetic analysis in red. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 22: OS with Phylogenetic Analysis Results and Cavities Indicated

Note. Enzyme in light blue, phylogenetic analysis in red and cavities in light yellow. The active site is within the cavity. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 23: OS with Phylogenetic Analysis Results, Cavities Indicated and Hexanoyl-CoA Positioned

Note. Enzyme in light blue, phylogenetic analysis in red, cavities in light yellow and ligand multicolored. The ligand is well positioned within the cavity and near an active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 24: OS with Phylogenetic Analysis Results, Cavities Indicated and Malonyl-CoA 1 Positioned

Note. Enzyme in light blue, phylogenetic analysis in red, cavities in light yellow and ligand multicolored. The ligand is well positioned within the cavity and near an active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 25: OS with Phylogenetic Analysis Results, Cavities Indicated and Malonyl-CoA 2 Positioned

Note. Enzyme in light blue, phylogenetic analysis in red, cavities in light yellow and ligand multicolored. The ligand is well positioned and near an active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 26:OS with Phylogenetic Analysis Results, Cavities Indicated and Malonyl-CoA 3 Positioned

Note. Enzyme in light blue, phylogenetic analysis in red, cavities in light yellow and ligand multicolored. The ligand is well positioned and near an active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 27: OS with Phylogenetic Analysis Results, Cavities Indicated and All Substrates Positioned

Note. Enzyme in light blue, phylogenetic analysis in red, cavities in light yellow and ligand multicolored. The ligands are well positioned and near an active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 28: OAC with Phylogenetic Analysis Results

Note. Enzyme in green and phylogenetic analysis in red. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 29: OAC with Phylogenetic Analysis Results and Cavities Indicated

Note. Enzyme in green, phylogenetic analysis in red and cavities in light yellow. Many of the active sites are not within the cavity. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 30: OAC with Phylogenetic Analysis Results, Cavities Indicated and Olivetol-CoA Positioned

Note. Enzyme in green, phylogenetic analysis in red, cavities in light yellow and ligand in pink. The ligand is not well positioned within the cavity, but it is near the cavity and the active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 31: CsPT4 with Phylogenetic Analysis Results

Note. Enzyme in purple and phylogenetic analysis in red. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 32: CsPT4 with Phylogenetic Analysis Results and Cavities Indicated

Note. Enzyme in purple, phylogenetic analysis in red and cavities in light yellow. Many of the active sites are within the cavity. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 33: CsPT4 with Phylogenetic Analysis Results, Cavities Indicated and Olivetolic Acid Positioned

Note. Enzyme in purple, phylogenetic analysis in red, cavities in light yellow and ligand in green. The ligand is well positioned within the cavity and near an active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 34: CsPT4 with Phylogenetic Analysis Results, Cavities Indicated and GPP Positioned

Note. Enzyme in purple, phylogenetic analysis in red, cavities in light yellow and ligand in orange. The ligand is well positioned within the cavity and near and active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 35: CBDA with Phylogenetic Analysis Results

Note. Enzyme in green and phylogenetic analysis in red. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 36: CBDA with Phylogenetic Analysis Results and Cavities Indicated

Note. Enzyme in green, phylogenetic analysis in red and cavities in light yellow. Many of the active sites are near and within the cavity. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

Figure 37: CBDAS with Phylogenetic Analysis Results, Cavities Indicated and CBGA Positioned

Note. Enzyme in green, phylogenetic analysis in red, cavities in light yellow and ligand in red. The ligand is well positioned within the cavity and near an active site. Data acquired through the methods described in previous text, and image obtained on PyMOL.

Source: 2023, Authors

As seen in Figures 18 to 37, the three methods gave us mostly compatible results that validated our work. The only issue we had was with the AOC, due to it being the smallest protein we worked with, with only 101 amino acids.

Through the phylogenetic analysis and using HDOCK, we were able to position the Olivetol-CoA near a catalytic pocket, but the CavityPlus results did not indicate that spot as being part of a cavity with pharmacological properties. Since the cavity was near this spot, we kept these results and searched for other sources that could validate them. We then found in the RCSB Protein Data Bank, a crystallized AOC molecule with the substrate, in this case olivetolic acid, already present around residue 74, that can be seen in Figure 38 (Yang et al., 2016). Since our results indicated that the substrate should be near residue 77, we concluded that our results were valid.

Figure 38: Crystallized AOC Molecule with Olivetolic Acid

Note. Enzyme multicolored and ligand in red. Data from RCSB Protein Data Bank, and image obtained on PyMOL.

Source: 2023, Authors

Molecular docking

REFERENCES

Berman, H. M. (2000). The Protein Data Bank. Nucleic Acids Research, 28(1), 235–242. https://doi.org/10.1093/nar/28.1.23

BLAST: Basic Local Alignment Search Tool. (n.d.). Retrieved from https://blast.ncbi.nlm.nih.gov/Blast.cgi

CavityPlus. (n.d.). Retrieved from http://www.pkumdl.cn:8000/cavityplus/index.php#/

HADDOCK Web Server. (n.d.). Retrieved from https://wenmr.science.uu.nl/haddock2.4/

Kozakov, D., Hall, D. R., Xia, B., Porter, K. A., Padhorny, D., Yueh, C., . . . Vajda, S. (2017). The ClusPro web server for protein–protein docking. Nature Protocols, 12(2), 255–278. https://doi.org/10.1038/nprot.2016.169

Larkin, M. A., Blackshields, G., Brown, N. P., Ramu, C., McGettigan, P. A., McWilliam, H., . . . Higgins, D. G. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23(21), 2947–2948. https://doi.org/10.1093/bioinformatics/btm404

Martí‐Renom, M. A., Stuart, A. C., Fiser, A., Sánchez, R., Melo, F. S., & Šali, A. (2000). Comparative protein structure modeling of genes and genomes. Annual Review of Biophysics and Biomolecular Structure, 29, 291–325. https://doi.org/10.1146/annurev.biophys.29.1.291

Notredame, C., Higgins, D. G., & Heringa, J. (2000). T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. Thornton. Journal of Molecular Biology, 302(1), 205–217. https://doi.org/10.1006/jmbi.2000.4042

Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America, 85(8), 2444–2448. https://doi.org/10.1073/pnas.85.8.2444

Remmert, M., Biegert, A., Hauser, A., & Söding, J. (2011). HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature Methods, 9(2), 173–175. https://doi.org/10.1038/nmeth.1818

Seeliger, D., & De Groot, B. L. (2010). Ligand docking and binding site analysis with PyMOL and Autodock/Vina. Journal of Computer-aided Molecular Design, 24(5), 417–422. https://doi.org/10.1007/s10822-010-9352-6

Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., . . . Higgins, D. G. (2011). Fast, scalable generation of high‐quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology, 7(1). https://doi.org/10.1038/msb.2011.75

Stout, J., Boubakir, Z., Ambrose, S. J., Purves, R. W., & Page, J. E. (2012). The hexanoyl-CoA precursor for cannabinoid biosynthesis is formed by an acyl-activating enzyme in Cannabis sativa trichomes. Plant Journal, no. https://doi.org/10.1111/j.1365-313x.2012.04949.x

T-COFFEE Multiple Sequence Alignment Server. (n.d.). Retrieved from https://tcoffee.crg.eu/

The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC. (n.d.).

UniProt. (n.d.-a). Retrieved from https://www.uniprot.org/uniprotkb/B1Q2B6/entry

UniProt. (n.d.-b). Retrieved from https://www.uniprot.org/uniprotkb/I6WU39/entry

UniProt. (n.d.-c). Retrieved from https://www.uniprot.org/uniprotkb/H9A1V3/entry

UniProt. (n.d.-d). Retrieved from https://www.uniprot.org/uniprotkb/Q9PP99/entry

UniProt. (n.d.-e). Retrieved from https://www.uniprot.org/uniprotkb/Q8GTB6/entry#function

Wang, S., Xie, J., Pei, J., & Lai, L. (2023). CavityPlus 2022 Update: An Integrated Platform for Comprehensive Protein Cavity Detection and Property Analyses with User-friendly Tools and Cavity Databases. Journal of Molecular Biology, 435(14), 168141. https://doi.org/10.1016/j.jmb.2023.168141

Waterhouse, A., Procter, J. B., Martin, D., Clamp, M., & Barton, G. J. (2009). Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics, 25(9), 1189–1191. https://doi.org/10.1093/bioinformatics/btp033

Xu, Y., Wang, S., Hu, Q., Gao, S., Ma, X., Zhang, W., . . . Pei, J. (2018). CavityPlus: a web server for protein cavity detection with pharmacophore modelling, allosteric site identification and covalent ligand binding ability prediction. Nucleic Acids Research, 46(W1), W374–W379. https://doi.org/10.1093/nar/gky380

Yan, Y., Zhang, D., Zhou, P., Li, B., & Huang, S. (2017). HDOCK: a web server for protein–protein and protein–DNA/RNA docking based on a hybrid strategy. Nucleic Acids Research, 45(W1), W365–W373. https://doi.org/10.1093/nar/gkx407

Yang, X., Matsui, T., Kodama, T., Mori, T., Zhou, X., Taura, F., . . . Morita, H. (2016). Structural basis for olivetolic acid formation by a polyketide cyclase from Cannabis sativa. FEBS Journal, 283(6), 1088–1106. https://doi.org/10.1111/febs.13654

Zirpel, B., Degenhardt, F., Martin, C., Kayser, O., & Stehle, F. (2017). Engineering yeasts as platform organisms for cannabinoid biosynthesis. Journal of Biotechnology, 259, 204–212. https://doi.org/10.1016/j.jbiotec.2017.07.008

Biological Circuits

The biological circuits for the genes of interest used can be assembled in any expression vector that has a replication origin for E. coli and yeast, as well as a strong inducible promoter for yeast. Using Saccharomyces cerevisiae as a chassis, the olivetolic acid and CBDA synthesis pathways require the insertion of its synthesis pathway by means of biological circuits compatible with the expression system.

To ensure that the five enzymes are synthesized with the expected function in the circuit, two possibilities were tested: the use of fusion proteins and the use of 2A sequences. The activity of the possible proteins was assessed by structural and kinetic modeling in order to find the proteins with the best efficiency in producing CBDA

The genes for CBDA synthesis were cloned into two separate biological circuits, so that the resulting inserts and expression vectors are of a reasonable size and, when translated, are more efficient at converting substrates and producing CBDA.

Using the fusion proteins, pairs of constructs containing two or three genes, in different combinations, were evaluated for the biosynthesis of two fusion proteins. The genes were fused into an insert, spaced with flexible linkers of 10 amino acids to maintain the activity and structure of the individual proteins. Each insert was cloned into pRS vectors.

For the use of 2A sequences, two inserts were constructed, of two and three genes, spaced by the T2a sequence (gagggcaggcagtctgctgacatgcggtgacgtggaagagaatcccggccct). The genes were combined until they formed and generated two inserts of approximately the same size.

SWISS-MODEL and AlphaFold2 servers were used for structural modeling of the native proteins with T2a appendages and fusion proteins. The three-dimensional structures and energy parameters of the proteins were evaluated using SWISS-MODEL, the PyMOL software, and the QMean server.

For kinetic modeling, the CHARMM-GUI NMR structure calculator software was used to obtain energetically refined protein models for molecular docking with the HADDOCK, ClusPro 2, and HDOCK software, together with the appropriate substrates and cofactors added to these servers (See at Structural Modeling section above).

Structural and kinetic modeling of the proteins separated using 2A sequences or fusion proteins allowed the construction of expression cassettes already suitable for the chassis, which will be cloned into the pRS425 and pRS426 vectors using the Gibson Assembly method.

The biological circuits were constructed using the cassettes developed previously, with Gal1 promoters, efficient ADH1 terminators for the chassis, the insert developed with genes intercalated by 2A sequences, and selection markers.

Figure 1: Diagram of the device for CsHCS, OS, and OAC expression.

Source: 2023, Authors.

Figure 2: Diagram of the device for CBDAS and CsPT4 expression.

Source: 2023, Authors.

The expression cassettes were cloned into the pRS vectors mentioned beforehand.

Several biological circuits were tested before coming to the conclusion of which ones would be best for the project. In this case, the plasmids pRS423, pRS424, pRS425, pRS426, pYES2, and pET26b were used for testing.

These vectors required expression cassettes to be made, consisting of a promoter - CDS - terminator. The following sequences were therefore tested for the assembly of each cassette:

Table 1: Sequences tested for the cassettes’ assembly.

Promoter	CDS	Terminator
GAL1	CSHCS OS OAC	ADH1
GAL1	CSHCS OS CsaPT4	ADH1
GAL1	CBDAS OS OAC	ADH1
GAL1	CBDAS OS CsaPT4	ADH1
GAL1	PT4 CBDAS -	ADH1
GAL1	PT4 OAC -	ADH1

Source: 2023, Authors.

The pYES2 vectors and plasmids for E. coli do not need cassettes, since they already possess a promoter-terminator pair.

The assembled plasmids were visualized using the SnapGene Viewer software, so we could analyze their sequence, features and genetic constructs.

After all of our cassettes were constructed, the best plasmid candidates for our project were the following:

Figure 3: pRS423

Source: 2023, Authors.

Figure 4: pRS424

Source: 2023, Authors.

Figure 5: pRS425

Source: 2023, Authors.

Figure 6: pRS426

Source: 2023, Authors.

Figure 7: pYES2

Source: 2023, Authors.

Figure 8: pET-26b(+)

Source: 2023, Authors.

After we obtained these plasmids, it was possible to build a total of 72 different circuits with different combinations between the enzymes’ coding sequences. With this, we could also evaluate all of them, optimize and choose two biological circuits, using the pRS425 and the pRS426 plasmid, shown in the following figures.

Figure 9: pRS426 biological circuit - CBDSub.

Source: 2023, Authors.

Figure 10: pRS425 biological circuit - CBDSyn.

Source: 2023, Authors.

References

Biological circuits designed with SnapGene software (www.snapgene.com)

Gene Expression Modeling

Thinking of a way to represent both the gene expression of our biological circuits and the action of the metabolic pathway, since these are five interdependent enzymes, we looked for models in the literature that could translate cell behavior. In this context, mixed-effects models (ME) are a class of statistical models introduced to describe the response of different individuals in a population to known or predicted stimuli (Llamosi et al. , 2016).

As described by Llamosi et al. (2016), we consider a simplified dynamic case of gene expression, in which we consider the translational parameter to be dependent on the transcriptional parameter, in turn linked to the various factors that influence the start of transcription. Once the cell has been transformed and, by Gibson Assembly, the plasmid has been added to its genetic material, we would have a u(t) that represents the activity of the transcriptional factors (in our case, it could be, for example, the kinetics of RNA polymerase). In addition, there are different constants: ki and gi, which describe the production and decay rates, either of the mRNA (described by the equations as m(t)), or of the proteins (in the equations as p(t)). Thus, we would have:

m(t)=k_mu(t)-g_mm(t)

p(t)=k_pu(t)-g_pp(t)

In this way, we obtained a model to be followed in experimental analysis. We can observe the different conditions for each sample point and understand the expression profile from our yeast profile, and, therefore, understand the possible bottlenecks of transformed cellular metabolism. Thus, we can obtain, through regressions, the value of the constants k and g used for each equation, creating the model, based on the determination of the sample values (m(t) - mRNA concentration, p(t) - protein concentration and u(t ) - concentration of transcriptional factors) and facilitating the understanding of the production of the product of desire (CBDA).

Having described the profile considered for gene expression, we used Boolean logic based on biomolecules to describe a model of the logic gates of our metabolic pathway (Miyamoto et al. , 2013). With the understanding that the necessary substrates will be added, namely: galactose to activate the GAL1 promoter, as seen in the Biological Circuits section, and hexanoic acid, which, despite being present in the metabolic pathway, an addition of this substrate to the pathway will be evaluated, some substrates already present in the metabolic pathway of mevalonate and other secondary metabolisms of Saccharomyces cerevisiae will be considered as excess and, therefore, non-determinants of the Boolean logic gates.

For the first logic circuit, the transformation of both plasmids on a cell from the yeast population is considered. Thus, the only substrate added - apart from galactose - is hexanoic acid (in the circuit being described as "Hexanoic acid A"). It should be noted that both GPP (geranyl diphosphate), malonyl-CoA and other cofactors such as Mg2+ ions were considered to be in excess in the yeast intracellular medium.

Figure 1: Complete Boolean-based logic circuit for CBDA metabolic pathway.

Source: 2023, Authors (made with VisualParadigm).

It’s important to remember that the intermediate substrates are obtained from the reaction system. The substrate added is represented by the inputs: Galactose, which activates the promoter GAL1 (responsible for introducing the CDS sequences that express our enzymes), and Hexanoic Acid A, the CsHCS substrate. Besides, we also have the alternative Hexanoic Acid B, coming from the S. cerevisiae secondary metabolism, that corresponds to the “OR” gate represented above. All other logic gates would be “YES” ones, where they would dock to the corresponding ligand.

We also evaluate the circuits as binary models, based on this Boolean Logic applied. So, we applied 0 for “false” and 1 for “true” ones on the inputs and outputs responses. The Figure 2 below compiles the logic binary information for our metabolic pathway shown.

Figure 2: Binary inputs and outputs for the CBDA metabolic pathway.

Source: 2023, Authors.

Or, on a compiled way,

Figure 3: Compiled binary inputs for the CBDA metabolic pathway.

Source: 2023, Authors.

Where, on the “CsHCS - OR gate”, the input “A” represents the hexanoic acid added and the “B” one represents the acid already present at the cell metabolism. That allowed us to make a probabilistic study, giving 0.5 to each “YES” gate and 0.75 to the “OR” gate, as seen on Figure 4.

Figure 4: Probability for each substrate.

Source: 2023, Authors.

The numbers represent each gate used, respectively, as shown in Figure 1. This study is just to show how dependent each gate is from the one before, being important to follow the straight biologic way.

The second part of this study observes each circuit individually, in which we consider a population "S. cerevisiae A” of Saccharomyces transformed with the pRS426-CBDSub and one “S. cerevisiae B'' transformed with pRS425-CBDSyn. To the first of these, the same substrates discussed are added. However, we intend extract the Olivetolic Acid (OA) after carrying out the successive reactions, as shown at the Figure 5 below.

Figure 5: Boolean-based logic circuit for Olivetolic Acid metabolic pathway.

Source: 2023, Authors (made with VisualParadigm).

Similarly to the previous circuit, we also have the alternative Hexanoic Acid B, coming from the yeast metabolism, that corresponds to the “OR” gate and the “YES” logic gates. However, in this case, we intend to secrete the Olivetolic Acid, determinant substrate to produce the CBGA, immediately before CBDA.

In binary mode, we could obtain the following results,

Figure 6: Binary inputs and outputs for the OA metabolic pathway.

Source: 2023, Authors.

Where the “OR” gate works identically mentioned before. We also made the probabilistic study in this case, presented in Figure 7.

Figure 7: Probability for each substrate in the OA metabolic pathway.

Source: 2023, Authors.

As expected, with the number of gates having decreased, we observed a high probability of obtaining the product, in this case, the Olivetolic Acid.

Meanwhile, the cells with pRS425-CBDSyn plasmide will be a difference: instead we enter with the hexanoic acid as one of the substrates, the olivetolic acid obtained from the last population will become the real decisive substrate, as we can see at the following figure.

Figure 8: Boolean-based logic circuit from Olivetolic Acid to CBDA pathway.

Source: 2023, Authors (made with VisualParadigm).

In this case, the Galactose activates our promoter that makes possible the consequent protein expression, while the Olivetolic Acid is a determinant substrate for CsPT4. That’s why we have here a new gate, called an “AND” one, in need of both substrates.

Also seeing the binary mode of this circuit,

Figure 9: Binary inputs and outputs for the metabolic pathway from the OA to CBDA.

Source: 2023, Authors.

Unlike the previous circuits, this one presents another kind of gate, an “AND”, in which the input “A” means both promoter activation and the consequence enzyme production, and the “B” means the Olivetolic Acid. So, the figure 48 compiles the probabilistic study, as shown below.

Figure 10: Probability for each substrate in the metabolic pathway from OA to CBDAS.

Source: 2023, Authors.

It’s important to note that at the gate “AND”, numbered as 5, we found a probability of 0.25, since only one case in four inputs gives us a positive response.

All of this study gives us a better understanding of how the expression can be measured and evaluated at the Wet Lab stage.

References

Llamosi, A., Gonzalez-Vargas, A. M., Versari, C., Cinquemani, E., Ferrari-Trecate, G., Hersen, P., & Batt, G. (2016). What population reveals about individual cell identity: single-cell parameter estimation of models of gene expression in yeast. PLoS computational biology, 12(2), e1004706.

Miyamoto, T., Razavi, S., DeRose, R., & Inoue, T. (2013). Synthesizing biomolecule-based Boolean logic gates. ACS synthetic biology, 2(2), 72-82.