In order to assure our engineering was as refined as possible, we divided our modeling in 6 parts: Native modeling, Self-cleaving proteins design, Docking, Catalytic and binding site analysis, Biological circuit design and Expression modeling. Since we would not be developing wet-lab activities this year, we worked at each of these goals individually, with all of our team, to ensure we had great, well developed results.

There was nothing trivial about this project: most of our enzymes were yet poorly characterized, some didn’t even have a crystallized structure deposited in databases, and each one of them presented some sort of peculiarity: they were either too small, dimeric, glycosylated, embedded in membranes… so we had to find out what we were dealing with, and how to deal with it. This was, overall, the reason why the engineering cycle was so useful: we had a lot of trial-and-error before we finally had models we could be proud of.

Native Modeling

DESIGN

In the process of developing our modeling approach, we conducted a comprehensive evaluation of proteins from both structural and dynamic perspectives. This involved identifying the most closely related crystal structure and predicting optimal protein stability in vials.

In order to facilitate the transformation of microorganisms for cannabidiolic acid (CBDA) production, the chassis determined to be used was the yeast Saccharomyces cerevisiae SC9721, with the genotype MATα his3-Δ200 URA3-52 leu 2Δ1 lys2Δ202 tryp 1Δ63, which already possesses the geranyl diphosphate (GPP) pathway naturally and several auxotrophies we can take advantage of (Zirpel et al. 2017). With this, we can introduce two different biological circuits into our cells, each carrying an auxotrophic mark complement, and each circuit containing one vital part of the metabolic pathway: one containing the enzymes needed for olivetolic acid synthesis, one containing the enzymes needed for cannabidiolic acid synthesis.

Figure 1: Biosynthetic pathway of cannabinoids in C. sativa.

Source: 2017, Zirpel et al.

BUILD

To model all native proteins within the metabolic pathway, we used the homology modeling server SWISS-MODEL (Waterhouse et al. , 2018). It operates by constructing three-dimensional protein structure models relying on experimentally determined structures of closely related family members. Notably, homology modeling is recognized as the most accurate computational approach for generating reliable structural models. The modeled proteins, as shown in figure 2, were Hexanoyl-CoA (CsHCS1), Olivetol Synthase (OS), Olivetolic Acid Cyclase (OAC), Cannabis sativa prenyltransferase 4 (CsaPT4) and Cannabidiolic Acid Synthase (CBDAS).

Figure 2: Native model of CsHCS1.

Source: 2023, Authors.

Figure 3: Native model of OS.

Source: 2023, Authors.

Figure 4: Native model of OAC.

Source: 2023, Authors.

Figure 5: Native model of CsaPT4.

Source: 2023, Authors.

Figure 6: Native model of CBDAS.

Source: 2023, Authors.

In order to enhance the diversity and reliability of our models, we employed different modeling techniques based on different servers. Furthermore, we utilized the CHARM-GUI tool to refine our models for comparison with their unrefined counterparts (Jo et al. , 2008; Park et al., 2023).

TEST

We tested our models’ energetic and geometric stability and qualities using the QMEAN scores (Benkert, Biasini, Schwede; 2011) and MolProbity (Williams et al., 2018) servers, with Ramachandran plots generated by the various servers providing further information on the models’ geometry.

LEARN

The first QMean obtained from CsHCS1 of 0.63 and from the CsaPT4 of 0.45 were not ideal. On the other hand, all other models were above 0.7, values considered acceptable.

For CsPT4, shown in figure 6, we can postulate that most of the instability and poor energetic and geometric parameters results came from the large loop-like structure we could see on the model’s N-terminal, with a Ramachandran plot (refer to structural modeling, Model page on this Wiki) showing plenty of this region’s amino acids outside of the favorable Ramachandran regions. At the time we generated this first model, we couldn’t find a characterization of what that region could be, and a signal peptide analysis on SignalP 6.0 (Teufel et al., 2022) had shown it not a leader sequence for secretion. However, we were still suspicious of what that region was intended for, so further analysis with TargetP 2.0 (Armenteros et al., 2019) revealed that the proteins’ first 77 amino acids coded for a transit peptide, directing the protein to the plant cell’s chloroplasts’ plastids. Knowing this, and in an attempt to prevent any issues with our yeast’s cell machinery and premature protein degradation, we removed this sequence from both the protein’s amino acid sequence and its genetic coding sequence. After removing this region from our sequences, we redid our modeling for CsPT4-NoTP (figure 7), the modified protein containing no transit peptide residues.

Figure 7: Model for CsPT4-NoTP. Transit peptide residues removed.

Source: 2023, Authors.

With our new model generated, we submitted it again to MolProbity and QMEAN in order to get our energetic and geometric parameters, and lo and behold, the results were indeed improved (Figure 14). This iteration of the engineering cycle was crucial: it allowed us to obtain a better model, even though it is a prediction model not based on a crystal structure. This model was then submitted to CHARMM-GUI in order to get a refined model, in the same fashion as our other models.

After evaluating the MolProbity scores for CsPT4-NoTP, we could already predict that the QMEAN results would also have improved - after all, there is an intricate relationship between a protein’s structure geometry and its surface, internal and stability energies. Comparing the QMEAN analysis for both CsPT4 and CsPT4-NoTP (refer to structural modelng, Model page on this Wiki), we could see a considerable improvement for CsPT4-NoTP - its score went from 0.45 to 0.56 - still somewhat not ideal, but considerably better than what we would have to deal otherwise.

Having all of our models, both refined and unrefined, we analyzed the best QMEAN, MolProbity and Ramachandran results (see at Structural Modeling) between them. None of the refined models presented better QMEAN or Ramachandran results, so we chose to use all the unrefined initial models from SWISS-MODEL for the next steps.

Self-cleaving proteins

DESIGN

The utilization of 2A sequences is an elegant, simple way to synthesize multiple individual proteins from a single monocistronic mRNA molecule, due to their short length, high efficiency and simplified circuit designs. Among these sequences, T2A and P2A have demonstrated the highest efficiency in this regard, and their sequences can be readily obtained from bioinformatics databases. The choice of T2A sequence was made because it is more effective than P2A in promoting the separation of proteins during synthesis in the ribosome. We should note that the T2A sequence leaves fragments on each proteins’ N-, C- or both terminals, so for each protein, we would have to build three different models, one for each possibility.

BUILD

With the proteins’ native models ready, we proceeded to model them again, this time incorporating the T2A sequences into them. Then we used the homology modeling method via SWISS-MODEL, and it ended up generating the same original proteins of the last step. After further investigations, we found out that the software was automatically removing the T2A sequences.

Our modelings were going wrong at this stage because Swiss Model is a homology modeling server - which means it will try to fit the sequence we gave it into the templates that were already deposited in databases and had enough identity and coverage. This results in some fragments not being properly modeled or added into our modified proteins, and the T2A fragments we were attempting to add to our proteins happened to be such a case, and the models we were getting from Swiss Model were the exact same ones we got from our previous modeling stages. So, we had to look for an alternative way to get our modified protein models.

So we turned to MODELER, an extension of PyMOL, that is able to align our modified protein sequences to a given template, then constructs a model that includes the added fragment via homology of its structure to other found in databases. Modeller predicted a way to incorporate these new amino acids, considering the given template, which we set to be the native protein models we obtained in our previous stage after evaluating all of the needed parameters. We inserted fragments of the T2A sequence in the N-, C-, and both terminals of each protein’s fasta sequence, aligned them to the chosen models and built the modified protein models. The sequences modified can be found on the Model page, at Structural Modeling - T2A Sequences section.

TEST

We used ProCheck from UCLA-DOE LAB SAVESv6.0 (Lakowski et al, 1993), MolProbity and QMEAN to obtain geometric structural and energetic parameters of our new, modified protein models, as well as getting Ramachandran plots in order to define which of the new models of each protein were more structurally stable. We obtained the following T2A models as the most uniform ones, presented in Figure 8 below, where respectively A, B, C, D and E represents CsHCS1, OS, OAC, CsPT4 and CBDAS T2A-modified.

Figure 8: Models from our proteins with T2A sequence.

Source: 2023, Authors.

LEARN

At this point, we had already defined we would build expression cassettes using the GAL1 promoter and ADH1 terminator, both efficient and functional on our chassis. By analyzing the results we got, we were able to define some possible cassettes, taking into consideration the best positions for each T2A fragment on each enzyme, in such a way that the structure and stability of each protein was as maintained as possible.

GAL1 - CsHCS - OS - OAC - ADH1
GAL1 - CSHCS - OS - CsaPT4 - ADH1
GAL1 - CBDAS- OS - OAC - ADH1
GAL1 - CBDAS- OS - PT4 - ADH1
GAL1 - CBDAS - PT4 - ADH1
GAL1 - PT4 - OAC - ADH1

We chose the first and fifth cassettes for two main reasons: the enzymes positioned as they are have the best structural, geometrical and docking properties, as evaluated through our modeling process (see Model page, at the Structural Modeling - T2A section). Their practical versatility is also a key factor; these cassettes can be easily interchanged into different plasmids, re-optimized, built into different cassettes, set up for different chassis. This adaptability paves the way for future teams and groups to build upon our circuits. Additionally, these cassettes are designed for the separate synthesis of olivetolic acid (cassette 1) and cannabidiolic acid (cassette 5), so the production of our precursor (olivetolic acid) can be optimized first, followed by the optimization of our product’s (CBDA) biosynthesis.

One last interesting property of our developed cassette for the synthesis of CBDA is that the last enzyme in the cannabidiolic acid pathway can be interchanged with other cannabinoid synthesis enzymes, allowing for the ease of use of our devices and circuits for the synthesis of several cannabinoids in a simplified system.

Biological circuit design

DESIGN

Biological circuits are at the heart of synthetic biology. We can use them to control a cell in any way we can imagine, from silencing a gene to making art with cells. Here, we wanted to do something very intricate: we wanted to replicate a plant’s metabolic pathway inside yeast cells. This is no small feat - this metabolic pathway is complex, combines with other metabolic processes happening naturally inside the yeast cells and has very specialized enzymes involved. Our biological circuit design was the culmination of our modeling efforts, where we were able to define precisely and with several different reassurances that the expression cassettes we developed not only kept the pathway enzymes in their most-stable shape, but also kept their structure, binding and catalytic sites intact. Through bioinformatics, we even discovered amino acid residues in our proteins with very high probabilities of being binding and catalytic sites for enzymes not before characterized!

One major consideration on our circuit design was to improve the applicability of our constructs. Since we developed two cassettes, one for each key step on the metabolic pathway, we made sure to develop them in such a way they would be easy to change between chassis, circuits and cassettes, allowing for several expression platforms to be tested by other iGEM teams, with proper optimizations. We even deposited a few primers on the Registry for this exact reason (see Parts)! We do note, however, that these constructs are only for eukaryotes.

To construct our expression vectors, we tested and built circuits on several plasmids. Namely, we used pRS423, pRS424, pRS425, pRS426 and pYES2 (see Biological Circuits at Modeling). The reason we tested so many of them was because of the logistics needed to get reagents and gain access to the plasmids we needed. We knew our lab already had pRS426, but we would need two plasmids - we were not too keen on using bidirectional promoters, such as GAL1,10. At this point, we also already knew we would be using Gibson Assembly (Gibson et al., 2009) to put our circuits together, since our lab has some expertise with this technique. With this, we could do something we were not expecting to: convert pRS426 into what would be similar to other plasmids on the pRS series. pRS426 has the URA3 auxotrophic mark, while pRS425 has the LEU2, pRS424 has the TRP1 mark and pRS423 has the HIS3 mark. If we were able to substitute the auxotrophic mark on pRS426 for one of these, we would be able to use this new plasmid to perform clonings and transformations with. So, we prepared ourselves: we designed primers for both substituting URA3 on pRS426 with both LEU2 and HIS3. The HIS3 mark is even available on the iGEM distribution kit, so that is something perfect for our use case.

Having our plasmids’ sequences, we were able to finally actually design and build our biological circuits.

BUILD

When building on the pRS vectors, we needed to construct expression cassettes. Each cassette consists of a promoter, and we’re using GAL1, a coding sequence (CDS), which we described earlier, and a terminator, the ADH1 in our case. We inserted our built cassettes on the LacZɑ gene region, because this insertion would disrupt the genes’ sequence, and we would be able to use E. coli blue-white screening. This facilitates our screening stages, since we’d be able to identify which colonies have the construct inserted inside the LacZɑ region by selecting only the white ones.

However, for the pYES2 vector, cassettes were not required, since the plasmid already has a promoter-terminator pair. We only inserted the CDS, the same one we had between the promoter-terminator pair in our cassettes, between the promoter and terminator of pYES2. Even though this built circuit was rather simple, there was a major issue: we didn’t have ready access to this plasmid, and we would need to get it from somewhere else if we chose to go with it.

And now, how we built our new plasmids, pRS423s and pRS425s, containing the HIS3 and LEU2 auxotrophic marks. We designed sets of primers for amplifying both the entire marker cassettes from either the iGEM distribution kit or from other sources, such as genomic, positive HIS3 or LEU2, yeast DNA, and a set of primers for amplifying pRS426 around the URA3 mark, effectively linearizing the plasmid for Gibson Assembly AND removing the URA3 mark. We built the plasmids’ sequences with the new marker cassettes exactly as we expect them to be built on our wet-lab stages, on the following year, and visualized these new plasmids on SnapGene® software (from Dotmatics; available at snapgene.com), were we also could compare them to the original pRS423 and pRS425.

TEST

As a precaution, we built the cassettes with all the 7 different cassettes on each of the 6 different vectors, totalizing 72 biological circuits. We did that so we could have options - if we need to optimize, test different sequences, different cassettes and assemblies, we would have the plasmids already built. Not only are the circuits we built great for practical reasons, we found they are also of a similar size, so we can “normalize” the burden on our cell cultures.

LEARN

We selected circuits that enable the separation of the synthesis pathway into two main steps: olivetolic acid synthesis and cannabidiol synthesis. This approach provides flexibility in the chassis, allowing us to synthesize either the substrate or the product independently. It also enables the separation of the enzyme’s coding sequences (CDS) and the interchange of promoters and terminators as required for different chassis.

Figure 9: pRS425-CBDSyn and pRS426-CBDSub, respectively, cassettes for the CBDA and olivetolic acid.

Source: 2023, Authors.

Catalytic and binding site analysis

DESIGN

As some of our proteins had their template models based on prediction models, coming from the different existent databases (mainly Swiss-Model and AlphaFold2), instead of a crystallographic template, we performed a phylogenetic analysis to evaluate and identify domains, binding and catalytic pockets.

To do so, we first did some research on biotechnology databases, where we looked for sequences of proteins similar to the ones we were studying - we chose at least five protein sequences to compare to our enzymes. Firstly, we ran a protein-protein BLAST on NCBI and UniProt to find similar sequences for each one of our enzymes. We managed to identity good sequences to align ours with, even if they were not too closely related to Cannabis - which led us to find out amazing things about our enzymes by comparing some of them to bacterial or even tomato proteins.

BUILD

We aligned the sequences we got for the databases using the T-COFFEE MUSCLE server (Notredame; Higgins; Heringa, 2000), which gave us an overview of which sections on the aligned sequences were conserved. With a literature review around each protein used in each of the alignments, we gathered as much information as possible about the proteins’ description on binding sites, active sites, catalytic pockets and probable family-domains. We visualized the alignments on Jalview software (Waterhouse et al., 2009), as seen at figures 13 to 17 on the Molecular Docking - Phylogenetic Analysis section from our Model page.

TEST

For testing it, using the current literature and the predicted sites from homology studies, we could get the information that allowed us to infer the amino acids binding sites. Having the phylogenetic analysis results, we were able to cross our data for the newfound catalytic and binding sites with what we got from docking and cavity studies. These three analyses were done as a scientific Ouroboros - one needs the other to make sense, one’s results needs the other’s to make sense and be confirmed, forming a gorgeous cycle where one result runs after the other to work out.

LEARN

Understanding proteins of the same family as similar proteins and that the conformational structure of each protein determines their functionality is crucial for work. Being the determinant for chemical spaces formed inside the proteins, the 3D structure allows hydrophobic amino acids to be close together, thus creating a cavity with this characteristic that serves as a pocket for similar ligands. So, we could understand that the conservative spaces between a protein family infers that the functionality of it probably is defined by those similar sites.

Our phylogenetic analysis revealed very interesting things, mainly regarding the proteins that were not well characterized. We managed to find, with quite a lot of certainty, catalytic and binding sites for both substrates and cofactors on enzymes that were not yet elucidated in literature, all with bioinformatics! Granted, we compared CsPT4 with proteins that came from tomatoes and bacteria, but these alignments managed to show conserved regions on sites characterized as catalytic or binding in the proteins we compared ours to - that allows us to infer that these same regions, even though they are a little different in our proteins, are indeed catalytic and binding regions! This data is shown extensively on our Molecular Docking page.

Docking

DESIGN

Starting with interaction modeling, our goal revolves around providing a structural representation of how the enzyme-substrate interaction unfolds within the analyzed protein pair. We meticulously reviewed both gene sequencing and the activity interface to guide our simulations.

We did research on each of the necessary enzymes, in order to determine catalytic sites, ideal substrates, enzymatic activity and other information deemed relevant to the project.

BUILD

To conduct docking simulations, we employed both native proteins and self-cleaving variants to assess their performance. Utilizing HDOCK (Yan et al., 2020), we simulated 10 times for each of the pairs on our pathway, both for the native enzymes and for those with the T2A sequence we had previously modeled.

The Enzymes-Substrate pairs on the CBDA pathway are displayed on the following table:

Table 1: Enzymes-Substrate pairs on the CBDA pathway.

Enzyme	Substrate
Hexanoyl-CoA Synthetase	Hexanoic Acid
Olivetol Synthase	Hexanoyl-CoA \| 3 Malonyl-CoA
Olivetolic Acid Cyclase	Olivetol-CoA
CsPT4	Olivetolic Acid \| Geranyl Diphosphate
Cannabidiolic Acid Synthase	Cannabigerolic Acid

Source: 2023, Authors.

This way, we acquired the PDB structures of all ligand molecules involved in the process, so to insert them into the docking servers. It's crucial to note that we employed multiple docking for proteins with more than one substrate. Furthermore, we conducted docking with the dimeric proteins (OS and OAC) utilizing both their monomers and dimers.

TEST

Having done all the ten simulations for each pair and for the T2A, we selected the best between them and displayed them on PyMOL, as we can see at the following figures.

Figure 10: Hexanoyl-CoA Synthase With Hexanoic Acid.

Source: 2023, Authors.

Figure 11: Olivetol Synthase with Hexanoyl-CoA.

Source: 2023, Authors.

Figure 12: Olivetol Synthase with Malonyl-CoA 1

Source: 2023, Authors.

Figure 13: Olivetol Synthase with Malonyl-CoA 2.

Source: 2023, Authors.

Figure 14: Olivetol Synthase with Malonyl-CoA 3.

Source: 2023, Authors.

Figure 15: Olivetolic Acid Cyclase with Olivetol-CoA.

Source: 2023, Authors.

Figure 16: CsPT4 with Olivetolic Acid.

Source: 2023, Authors.

Figure 17: CsPT4 with Geranyl Diphosphate.

Source: 2023, Authors.

Figure 18: Cannabidiolic Acid Synthase with Cannabigerolic Acid.

Source: 2023, Authors.

After performing the simulations, we proceeded to analyze the same enzymes using CavityPlus (Wang et al., 2023). It identified the most optimal cavities based on catalytic site parameters and also pinpointed sites with potential pharmacological relevance. Once we had these selected pockets, we employed the previously conducted phylogenetic analysis to conduct a more in-depth examination of these cavities. Subsequently, we validated the results using a theoretical approach and software simulations, aiming to determine the most favorable ligand positions within the chemical space. Finally, we assessed the top 10 positions derived from the docking parameters and selected the one that best represented the three scenarios: CavityPlus, phylogenetics, and docking—for each enzyme.

LEARN

Following extensive analyses and simulations, we attained a new comprehension of the enzyme's functioning in response to the ligand within the metabolic pathway. Furthermore, beyond mere comprehension, we were able to prove that docking at the catalytic sites is more propitious, facilitating the occurrence of the reaction described. (See on the Modeling page at the Docking section for more in depth analysis).

Expression modeling

DESIGN

In our quest to depict the concurrent dynamics of gene expression within our biological circuits and the functioning of the metabolic pathway, given their interdependence involving five enzymes, we explored existing models in scientific literature capable of elucidating cellular behavior. In this regard, mixed-effects models (ME) emerge as a noteworthy category of statistical models, initially introduced for characterizing how various individuals within a population respond to anticipated or observed stimuli (Llamosi et al., 2016). About the construction of a model representation of the production of the metabolic pathway, we employed a Boolean logic approach grounded in biomolecular principles to construct a model representing the logic gates within the metabolic pathway, as outlined by Miyamoto et al. in 2013.

This last model takes into account the expected addition of essential substrates, while also considering certain substrates already inherent in the mevalonate metabolic pathway and other secondary metabolic processes of Saccharomyces cerevisiae as surplus components. Consequently, these surplus substrates are regarded as non-determinants in the context of the Boolean logic gates.

BUILD

As elucidated by Llamosi et al. (2016) , our approach involves a simplified dynamic scenario of gene expression. In this scenario, we consider the translational parameter's dependence on the transcriptional parameter, which, in turn, is influenced by various factors governing transcription initiation. After the cellular transformation and the incorporation of the plasmid into its genetic material using Gibson Assembly, we introduce the variable u(t) to represent the activity of transcriptional factors. In our specific case, this may pertain, for instance, to the kinetics of RNA polymerase. Additionally, we account for various constants, namely k_i and g_i, which characterize the rates of mRNA production and decay, denoted as m(t), as well as the rates of protein production and decay, represented by p(t), in the equations.

After delineating the gene expression profile we considered, we employed the Boolean logic. In this context, we take into account the anticipated addition of essential substrates, namely, galactose to activate the GAL1 promoter (as explained in the Biological Circuits section), and hexanoic acid. It's worth noting that, although hexanoic acid is naturally present in the metabolic pathway, we evaluate its addition as a substrate to the pathway. Certain substrates already inherent in the mevalonate metabolic pathway and other secondary metabolisms of Saccharomyces cerevisiae will be considered in excess, and consequently, they are regarded as non-determinants in the context of the Boolean logic gates. Thus, we considered 2 options of logic circuits, the first as single circuit and the second as two circuits.

In the first logic circuit, we take into account the introduction of both plasmids into a cell within the yeast population. Consequently, the sole additional substrate introduced, apart from galactose, is hexanoic acid, referred to in this context as "Hexanoic acid A". It's important to emphasize that we assume an abundance of GPP (geranyl diphosphate), malonyl-CoA, and other cofactors, including Mg²⁺ ions, within the yeast's intracellular environment.

Figure 19: Complete Boolean-based logic circuit for CBDA metabolic pathway.

Source: 2023, Authors (made with VisualParadigm).

It's crucial to bear in mind that the intermediate substrates are generated within the reaction system itself. The substrates introduced are depicted as inputs: Galactose, which triggers the activation of the GAL1 promoter (responsible for incorporating the CDS sequences responsible for enzyme expression), and Hexanoic Acid A, serving as the substrate for CsHCS. Additionally, we also consider an alternative substrate, Hexanoic Acid B, originating from the secondary metabolism of S. cerevisiae , functioning as the "OR" gate described earlier. All other logic gates are configured as "YES" gates, where they readily bind to their corresponding ligands.

The second phase of this study involves an individual examination of each circuit. In this context, we focus on a population referred to as "S. cerevisiae A," which has been transformed with the pRS426-CBDSub plasmid, and another population referred to as "S. cerevisiae B," transformed with pRS425-CBDSyn. In the case of the former population, we introduce the same substrates discussed earlier. However, our objective is to extract Olivetolic Acid (OA) after conducting a series of consecutive reactions, as illustrated in Figure 20 below.

Figure 20: Boolean-based logic circuit for Olivetolic Acid metabolic pathway.

Source: 2023, Authors (made with VisualParadigm ).

Much like the earlier circuit, we also include an alternative substrate, Hexanoic Acid B, derived from yeast metabolism, which functions as both the "OR" gate and the "YES" logic gates. However, in this instance, our aim is to facilitate the secretion of Olivetolic Acid, which serves as the crucial substrate for producing CBGA, just prior to CBDA formation.

Conversely, for the cells containing the pRS425-CBDSyn plasmid, there is a distinction: instead of introducing hexanoic acid as one of the substrates, we rely on the olivetolic acid obtained from the preceding population, which becomes the pivotal substrate, as illustrated in the subsequent figure.

Figure 21: Boolean-based logic circuit from Olivetolic Acid to CBDA pathway.

Source: 2023, Authors (made with VisualParadigm ).

In this scenario, Galactose serves as the activator for our promoter, enabling subsequent protein expression, while Olivetolic Acid plays a critical role as the essential substrate for CsPT4. Hence, we introduce a new gate referred to as an "AND" gate, requiring the presence of both substrates for its operation.

TEST

We tested the circuits using binary models, employing Boolean Logic principles. In this context, we represent "false" as 0 and "true" as 1 for both the inputs and outputs. The figure below consolidates the binary logic data for our illustrated metabolic pathway for both cases we proposed.

Figure 22 A: Binary inputs and outputs for the combined circuit (case 1).

Source: 2023, Authors.

Figure 22 B: Binary inputs and outputs for the separated circuits (case 2).

Source: 2023, Authors.

In the context of the "CsHCS - OR gate," the input "A" signifies the introduced hexanoic acid, while "B" represents the acid naturally occurring within the cell's metabolism, an "AND" gate, where input "A" signifies both promoter activation and subsequent enzyme production, while "B" represents the presence of Olivetolic Acid.. This configuration enabled us to conduct a probabilistic analysis, assigning a value of 0.5 to each "YES" gate, 0.75 to the "OR" gate and 0.25 to the “AND” as illustrated in Figure 23.

Figure 23 A: Probability for each substrate for the combined circuit (case 1).

Source: 2023, Authors.

Figure 23 B: Probability for each substrate for the separated circuits (case 2).

Source: 2023, Authors.

This study aimed to illustrate the interdependence of each gate on the preceding one, highlighting the significance of adhering to the natural biological sequence. Interestingly, as the number of gates increases, we observe a diminishing probability of product formation.

In the context of mixed-effects models (ME), it gave us a model to be followed in experimental analysis that will be done next year. That being said, by examining various conditions at each sample point, we gain insights into the expression profile within our yeast population. This allows us to grasp potential bottlenecks in the altered cellular metabolism. Consequently, we can determine the values of the constants "k" and "g" for each equation through regression analysis, thereby constructing a model based on the derived sample values (m(t) - mRNA concentration, p(t) - protein concentration, and u(t) - concentration of transcriptional factors). This modeling approach enhances our understanding of the production process of the desired product, CBDA.

LEARN

After conducting comprehensive studies, we have achieved a deeper and more intricate understanding of the intricate dynamics of gene expression and the production within the metabolic pathway. This newfound insight serves as a valuable asset in measurement and evaluation at the Wet Lab stage.

References

Zirpel, B., Degenhardt, F., Martin, C., Kayser, O., & Stehle, F. (2017). Engineering yeasts as platform organisms for cannabinoid biosynthesis. Journal of Biotechnology, 259, 204–212. https://doi.org/10.1016/j.jbiotec.2017.07.008

Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F. T., de Beer, T. A. P., Rempfer, C., Bordoli, L., Lepore, R., & Schwede, T. (2018). SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic acids research, 46(W1), W296–W303. https://doi.org/10.1093/nar/gky427

Benkert, P., Biasini, M., & Schwede, T. (2011). Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics (Oxford, England), 27(3), 343–350. https://doi.org/10.1093/bioinformatics/btq662

Llamosi, A., Gonzalez-Vargas, A. M., Versari, C., Cinquemani, E., Ferrari-Trecate, G., Hersen, P., & Batt, G. (2016). What population reveals about individual cell identity: single-cell parameter estimation of models of gene expression in yeast. PLoS computational biology, 12(2), e1004706.

Miyamoto, T., Razavi, S., DeRose, R., & Inoue, T. (2013). Synthesizing biomolecule-based Boolean logic gates. ACS synthetic biology, 2(2), 72-82.

Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., Heer, F.T., de Beer, T.A.P., Rempfer, C., Bordoli, L., Lepore, R., Schwede, T. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46(W1), W296-W303 (2018).

Bienert, S., Waterhouse, A., de Beer, T.A.P., Tauriello, G., Studer, G., Bordoli, L., Schwede, T. The SWISS-MODEL Repository - new features and functionality. Nucleic Acids Res. 45, D313-D319 (2017).

Guex, N., Peitsch, M.C., Schwede, T. Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis 30, S162-S173 (2009).

Studer, G., Rempfer, C., Waterhouse, A.M., Gumienny, G., Haas, J., Schwede, T. QMEANDisCo - distance constraints applied on model quality estimation. Bioinformatics 36, 1765-1771 (2020).

Bertoni, M., Kiefer, F., Biasini, M., Bordoli, L., Schwede, T. Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology. Scientific Reports 7 (2017).

Hollingsworth SA, Karplus PA. A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins. Biomol Concepts. 2010 Oct;1(3-4):271-283. doi: 10.1515/BMC.2010.022. PMID: 21436958; PMCID: PMC3061398.

José Juan Almagro Armenteros, Marco Salvatore, Ole Winther, Olof Emanuelsson, Gunnar von Heijne, Arne Elofsson, and Henrik Nielsen Life Science Alliance 2 (5), e201900429. doi:10.26508/lsa.201900429

Laskowski R A, MacArthur M W, Moss D S, Thornton J M (1993). PROCHECK - a program to check the stereochemical quality of protein structures. J. App. Cryst., 26, 283-291..

S. Jo, T. Kim, V.G. Iyer, and W. Im (2008). CHARMM-GUI: A Web-based Graphical User Interface for CHARMM. J. Comput. Chem. 29:1859-1865.

S.J. Park, N.R. Kern, T. Brown, J. Lee, and W. Im (2023). CHARMM-GUI PDB Manipulator: Various PDB Structural Modifications for Biomolecular Modeling and Simulation (2023) J. Mol. Biol. 167995.

Williams, C. J., Headd, J. J., Moriarty, N. W., Prisant, M. G., Videau, L. L., Deis, L. N., Verma, V., Keedy, D. A., Hintze, B. J., Chen, V. B., Jain, S., Lewis, S. M., Arendall, W. B., 3rd, Snoeyink, J., Adams, P. D., Lovell, S. C., Richardson, J. S., & Richardson, D. C. (2018). MolProbity: More and better reference data for improved all-atom structure validation. Protein science : a publication of the Protein Society, 27(1), 293–315. https://doi.org/10.1002/pro.33301.

Coumar, M. S. (2021). Molecular Docking for Computer-Aided Drug Design. Academic Press.

Biological circuits designed with SnapGene software (www.snapgene.com).

PyMOL Molecular Graphics System, Version 1.2r3pre, Schrödinger, LLC.

D. G. Gibson, L. Young, R. Y. Chuang, J. C. Venter, C. A. Hutchison, and H. O. Smith, “Enzymatic assembly of DNA molecules up to several hundred kilobases,” Nat Methods, vol. 6, no. 5, pp. 343–345, 2009, doi: 10.1038/nmeth.1318 .

Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009 May 1;25(9):1189-91. doi: 10.1093/bioinformatics/btp033 . Epub 2009 Jan 16. PMID: 19151095; PMCID: PMC2672624.

Shiwei Wang, Juan Xie, Jianfeng Pei, Luhua Lai, CavityPlus 2022 Update: An Integrated Platform for Comprehensive Protein Cavity Detection and Property Analyses with User-friendly Tools and Cavity Databases, Journal of Molecular Biology, Volume 435, Issue 14, 2023, 168141, ISSN 0022-2836, https://doi.org/10.1016/j.jmb.2023.168141. (https://www.sciencedirect.com/science/article/pii/S002228362300219X)

Yan Y, Tao H, He J, Huang S-Y.* The HDOCK server for integrated protein-protein docking. Nature Protocols, 2020; doi: https://doi.org/10.1038/s41596-020-0312-x.

Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000 Sep 8;302(1):205-17. doi: 10.1006/jmbi.2000.4042. PMID: 10964570.