Notebook | USP-EEL-Brazil

As we are working on a two-year project and this first one was dedicated to Dry Lab, we have brought the development of our "computational laboratory" into the research so far. This division was made due to the legal difficulties of handling and synthesizing cannabinoids in the laboratory, even without the need to cultivate and/or extract from the plant.

Week 17/04 to 21/04

We outlined a logical path for carrying out the research in Dry Lab this year: research and characterization of the metabolic pathway, with a special focus on the enzymes used and, therefore, a search in the literature for structural and kinetic parameters, to then carry out and validate the comparative modeling of the biomolecules. With the structures ready and the biosynthesis well defined, we would carry out the docking to evaluate and validate the catalytic site, as well as elucidating the biochemical reaction mechanism. And finally, a final molecular dynamics would be carried out to prove the occurrence of this form. In parallel, a mathematical modeling of gene expression would be proposed to evaluate the theoretical model to be followed and compared for the Wet Lab.

To begin our research, the first step we took was to train ourselves in bioinformatics tools for protein structural modeling, both comparatively and ab initio, as described in the bibliography by Verli (2014). We carry out protein modeling and characterization with validation routes using various servers: scientific repositories such as ScienceDirect, NCBI or UniProt; general tools present in DTU Bioinformatic Tools and Services; servers and software for building models such as Swiss-Model, AlphaFold or PyMOL (The PyMOL Molecular Graphics System) and, for this first part, finally, servers for evaluating models according to parameters of identity, coverage, Ramachandran, among others, such as MolProbity, Procheck, QMean, PredSi or SignalP 6. 0.

Week 24/04 to 28/04

We presented the metabolic pathways, based on Luo et al. (2020) and Zirpel et al. (2017), for the biosynthesis of cannabidiolic acid, the direct precursor of cannabidiol, expressed in Saccharomyces cerevisiae chassis.

In addition, we discussed the possibility of using two mechanisms to optimize the transformation and subsequent desired synthesis: the first was the use of fusion proteins, a coding sequence construction technique that allows the synthesis of multifunctional proteins from a single mRNA molecule, based on the insertion of linkers, short segments with the function of spacing one catalytic domain from another, without there being structural interference from each protein domain (Jadaun et al., 2020); while the second would be the use of fusion proteins, a technique that allows the synthesis of multifunctional proteins from a single mRNA molecule, based on the insertion of linkers, short segments with the function of spacing one catalytic domain from another, without there being structural interference from each protein domain (Jadaun et al., 2020); while the second would be the use of 2A sequences, short and highly efficient sequences for the synthesis of multiple individual proteins from a single mRNA molecule, exploiting the inability of the eukaryotic ribosome to insert a peptide bond between the last proline and glycine residues, thus causing the protein to self-cleave (Mansouri, 2014). We decided to use the T2A sequence for this one.

Week 02/05 to 05/05

A literature search was carried out on the stability parameters of CBD (Jaidee et al., 2022). We then studied the Charmm-GUI server, specifically the NMR Structure Calculator and Solution Builder functions, with the aim of evaluating the total internal energy of the enzyme structures developed by the aforementioned software and subsequent energy refinement.

With regard to the chimeric protein and self-cleaving protein pathways, we discussed the possible conformations of the five enzymes in the metabolic pathway (the first three of which are related to the production of olivetolic acid and the last two to the production of cannabidiolic acid). As by statistical arrangement we would have more than 30 different combinations, it was decided that a literature search would be carried out to define an order more consistent with the proposal. With regard to the addition of T2A, we commented on the possible expressions, paying attention to the propeptides that correspond to the cleavage points, noting that the additional parts could affect the biochemical function of the molecule. Finally, we discussed the presence of both the signal peptide and glycosylations in the CBDAS (cannabidiolic acid synthase) molecule.

Week 08/05 to 12/05

Immediately before modeling the enzymes in their native forms, research was carried out to compile the enzymatic characterization of the five proteins used in biosynthesis (Hexanoyl Co-A Synthase - HCS, Olivetol Synthase - OS, Olivetolic Acid Cyclase - OAC, Prenyltransferase 4 - PT4 and Cannabidiolic Acid Synthase - CBDAS), studying existing databases. The sequence of the proteins was then obtained, along with the necessary cofactors and some structural and kinetic properties. We then began to model the structure of the proteins.

Week 15/05 to 19/05

We obtained the first structures of the proteins we were working on. After the first score evaluation, we discussed two specific models: PT4 and HCS. The former had a "tail" region with a low QMean, being a transmembrane protein, while the latter had a low overall QMean. We discussed the possibility of evaluating the structure of the prenyltransferase with the "tail" removed, while, in this next step, we would evaluate the energy refinement carried out by Charmm-GUI for all the proteins, implying a next battery of tests to validate the new structural model and compare it with the existing one. For the other proteins, there was already crystallographic information in the comparative modeling database with good evaluation parameters. It is worth noting that CBDAS has 7 glycosylation sites and a signal peptide and OS and OAC are dimeric proteins, whose probable catalytic site is the result of the cavity formed when the two monomeric proteins meet.

Week 22/05 to 26/05

The results from the Charmm-GUI server were evaluated and found some errors and, in general, lower structural parameters than those previously obtained. We therefore decided to continue validating the initially understood structure of the proteins.

We were also given some servers and software to study and download: docking servers such as as HDOCK (Yan et al., 2020), ClusPro 2.0 (Desta et al., 2020; Vajda et al., 2017; Kozakov et al., 2017; Kozakov et al., 2013), HADDOCK (Honorato et al., 2021; Zundert et al., 2016) and PyMOL..

Week 02/06 to 09/06

Training was given on the proteins in silico, with an explanation of the T2A sequence (EGRGSLLTCGDVEENPG-P) to be added, by altering the FASTA sequence of the proteins, in which they are as follows:

sequence+EGRGSLLTCGDVEENPG,

P+sequence+EGRGSLLTCGDVEENPG and

P+sequence,

Giving a total of 4 structures for each enzyme worked on, together with the sequence of the native protein. We would then characterize each structure, evaluating the structural and energetic parameters of the models in order to find the mRNA composition that would avoid as much as possible the interference of T2A in the catalytic site of the molecules.

Week 12/06 to 16/06

Firstly, a new model was found for the HCS protein, with better and more validated structural parameters.

Subsequent modeling with the T2A sequence was unsuccessful in most cases, as servers such as AlphaFold and Swiss-Model adjust the FASTA sequence by adding it to the same structure as the native protein, since only the Identity and Query parameters would be little affected (in addition, the addition corresponded to only around 5% of the total number of amino acids in the enzymes, which would guarantee a coherent model). However, there would be no way of evaluating the possible interference of the T2A sequence in the protein's active site, such as some indentation in this direction or even the creation of some new cavity as a favorable chemical space for the substrates used (generally hydrophobic). Thus, as was done for CBDAS, the PyMOL software with the PyMOD and MODELLER extensions was used to carry out the comparative and deductive modeling based on the structural information of the native enzyme.

Weeks 19/06 to 23/06 and 26/06 to 30/06

Docking training was carried out to make the first enzyme-substrate couplings with the native protein.

In addition, T2A modeling was carried out and the 3 structures resulting from it were evaluated in comparison to the native structure.

Week 10/07 to 14/07

The dockings for OS were carried out for the first type of substrate (3-malonyl-CoA), but the other substrate (hexanoyl-CoA) still needs to be added. In the meantime, the other dockings have been evaluated with a view to repeating them in order to find the best possible structural model that corroborates the chemical space being worked on.

In parallel, the study of catalytic sites and the design of biological circuits began.

Week 24/07 to 28/07

The plasmids on which the circuits would be built were defined: initially pRS425, pRS426 and pET426. Research began into expression modeling.

New dockings were made between the enzymes and the substrates for the proteins with the T2A sequences.

Week 07/08 to 11/08

For the study of catalytic sites, training was given on phylogenetic alignment, with the aim of finding, among proteins of the same family and similar functions, conserved regions that indicate possible catalytic domains, which are fundamental for the theoretical validation of docking.

The possibilities of using Boolean models or models based on differential equations for expression modeling were discussed.

Week 14/08 to 18/08

The proposed phylogenetic results would be the structural information on the fasta sequence and the 3D structure of the protein on the catalytic domain.

The circuits were ready and a total of 72 biological circuits were made for different plasmids (five for yeast, pRS423, pRS424, pRS425, pRS426 and pYES2 and two for E. coli, pET26-b and pET28-a) and different associations of two groups: HCS-OS-OAC and PT4-CBDAS, since the former corresponds to the formation of olivetolic acid, the precursor substrate of CBDA.

In parallel, a new server was introduced to corroborate the docking and phylogenetic results by analyzing all the cavities in the protein, CavityPlus (Wang et al., 2023). It would interpret the chemical space indicated, the strength of interaction and the pharmacological potential.

Week 21/08 to 24/08

Once the circuits were finalized, the primers for the PCR were made.

The first comparison of results between docking and cavity analysis was carried out.

Week 28/08 to 01/09

Firstly, the syntheses and primers used were optimized. We tried to leave each part with an optimum temperature variation of up to 1ºC and as close to 72ºC as possible, in order to guarantee optimization of the synthesis.

Week 04/09 to 08/09

Comparison of dockings to cavities and catalytic sites in phylogenetics.

Week 11/09 to 15/09

Analysis and validation of dockings. We compared and checked which specific amino acids the ligand was coupled to, evaluating the chemical space involved and the correspondence (hydrophobicity, for example). Also, we worked on the texts about our modeling project.

Week 18/09 to 21/09

Working on expression modeling, we research about different approaches to transcription, translation modeling, besides the metabolic reaction modeling, using the Boolean logic.

Week 25/09 to 05/10

We finalized our expression modeling. Researching on literature, we found a model called mixed-effect in the class of statistical, as described by (Llamosi et al. , 2016). With it, we could explain how the transcriptional factors are intimately linked to the translational factors.

Also, we have the Boolean logic at the metabolic pathway, indicating how our promoter and enzymes can be represented as a logic and binary gate, as we presented at Modeling section.

References

Verli, H. (2014). Bioinformática: da biologia à flexibilidade molecular.

Luo, X., Reiter, M. A., d’Espaux, L., Wong, J., Denby, C. M., Lechner, A., ... & Keasling, J. D. (2020). Author Correction: Complete biosynthesis of cannabinoids and their unnatural analogues in yeast. Nature, 580(7802), E2-E2.

Zirpel, B., Degenhardt, F., Martin, C., Kayser, O., & Stehle, F. (2017). Engineering yeasts as platform organisms for cannabinoid biosynthesis. Journal of biotechnology, 259, 204-212.

Jadaun, J. S., Narnoliya, L. K., Srivastava, A., & Singh, S. P. (2020). Chimeric enzyme designing for the synthesis of multifunctional biocatalysts. In Biomass, Biofuels, Biochemicals (pp. 119-143). Elsevier.

Mansouri, M., & Berger, P. (2014). Strategies for multigene expression in eukaryotic cells. Plasmid, 75, 12-17.

Jaidee, W., Siridechakorn, I., Nessopa, S., Wisuitiprot, V., Chaiwangrach, N., Ingkaninan, K., & Waranuch, N. (2022). Kinetics of CBD, Δ9-THC degradation and cannabinol formation in cannabis resin at various temperature and pH conditions. Cannabis and cannabinoid research, 7(4), 537-547.

Llamosi, A., Gonzalez-Vargas, A. M., Versari, C., Cinquemani, E., Ferrari-Trecate, G., Hersen, P., & Batt, G. (2016). What population reveals about individual cell identity: single-cell parameter estimation of models of gene expression in yeast. PLoS computational biology, 12(2), e1004706.

Yan Y, Tao H, He J, Huang S-Y.* The HDOCK server for integrated protein-protein docking. Nature Protocols, 2020; doi: https://doi.org/10.1038/s41596-020-0312-x.

Shiwei Wang, Juan Xie, Jianfeng Pei, Luhua Lai, CavityPlus 2022 Update: An Integrated Platform for Comprehensive Protein Cavity Detection and Property Analyses with User-friendly Tools and Cavity Databases, Journal of Molecular Biology, Volume 435, Issue 14, 2023, 168141, ISSN 0022-2836, https://doi.org/10.1016/j.jmb.2023.168141. (https://www.sciencedirect.com/science/article/pii/S002228362300219X)

The PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.

Desta IT, Porter KA, Xia B, Kozakov D, Vajda S. Performance and Its Limits in Rigid Body Protein-Protein Docking. Structure. 2020 Sep; 28 (9):1071-1081.

Vajda S, Yueh C, Beglov D, Bohnuud T, Mottarella SE, Xia B, Hall DR, Kozakov D. New additions to the ClusPro server motivated by CAPRI. Proteins: Structure, Function, and Bioinformatics. 2017 Mar; 85(3):435-444.

Kozakov D, Hall DR, Xia B, Porter KA, Padhorny D, Yueh C, Beglov D, Vajda S. The ClusPro web server for protein-protein docking. Nature Protocols. 2017 Feb;12(2):255-278.

Kozakov D, Beglov D, Bohnuud T, Mottarella S, Xia B, Hall DR, Vajda, S. How good is automated protein docking? Proteins: Structure, Function, and Bioinformatics. 2013 Dec; 81(12):2159-66.

R.V. Honorato, P.I. Koukos, B. Jimenez-Garcia, A. Tsaregorodtsev, M. Verlato, A. Giachetti, A. Rosato and A.M.J.J. Bonvin (2021). "Structural biology in the clouds: The WeNMR-EOSC Ecosystem." Frontiers Mol. Biosci., 8, fmolb.2021.729513.

G.C.P van Zundert, J.P.G.L.M. Rodrigues, M. Trellet, C. Schmitz, P.L. Kastritis, E. Karaca, A.S.J. Melquiond, M. van Dijk, S.J. de Vries and A.M.J.J. Bonvin (2016). "The HADDOCK2.2 webserver: User-friendly integrative modeling of biomolecular complexes." J. Mol. Biol., 428, 720-725 (2015).