Software
Updated on 2023-11-5: we won the Best Software Tool (Undergrad) (opens new window).
Live demo of our software RAP is available. Visit our live demo http://54.169.242.254:5000/ (opens new window)! To run your own with docker
, please follow these three links, 1 (opens new window) 2 (opens new window) 3 (opens new window).
# Overview
# Highlights
- Intuitive WebUI with APIs for flexible development
- Compatible with GenBank format, and easy integration with SnapGene (opens new window)
- Full DBTL cycle support
- Experimentally validated results
- Up-to-date documentation and tutorial videos
Improving chassis metabolism is crucial for applications in synthetic biology. And most synthetic biology applications involves multiple enzymes working together to form a cascade reaction. Flux imbalance frequently occur in unoptimized cascade reactions, which may lead to problems like unexpected speed limit step, occurrence of metabolic stress on cells, and accumulation of cytotoxic intermediates, etc. The success of the Ribozyme-Assisted Polycistronic co-expression (pRAP) system and the benefits of replacing upstream RBS for individual CDS result in improved chassis metabolism have been demonstrated by Fudan iGEM 2022 (opens new window). However, previous team has two limits:
- Lack of quantitative regulation for the pRAP system. Only relatively change the expression of different proteins in pRAP by qualitatively changing the ribosome binding site (RBS) with different strength.
- Only focusing on ribozyme cleavage and ribosome binding, without attention to degradation which could be a problem for many cleavaged mRNA.
This year, we developed a software tool for quantitatively design pRAP system, covering for all possible regulatory elements, full support the Design-Build-Test-Learn (DBTL) cycle: RAP.
RAP consists of three parts: KineticHub (docker image (opens new window)), RAP Builder (docker image (opens new window)) and PartHub 2 (docker image (opens new window)). The functions of RAP incorporate:
- Parts searching
- Calculation of optimal concentrations of enzymes
- Sequence assemble for a pRAP
RAP has made a number of efforts to improve the user experience. To make RAP user-friendly, we have designed an intuitive Web UI with tutorials; and for advanced usage of RAP, we have provided a series of RESTFul APIs. To compatible with existing sequence manipulation tools, we have adopted the GenBank format (opens new window) to save sequences, which is an existing standards, and the sequences can be used in external tools such as SnapGene (opens new window). In order to get feedback from external users to improve user-friendliness, we have collaborated with other iGEM teams (SCUT-China iGEM 2023 (opens new window) and Nanjing-China iGEM 2023 (opens new window)) to improve the software. Meanwhile, sequences predicted and generated by our RAP, has been validated by experiments (opens new window).
RAP has gone through several DBTL cycles, leading to a concise workflow below (Figure 1).
Figure 1: Workflow of RAP
In summary, RAP offers new perspectives as well as strategies for handling the metabolic flux optimization. The source code is freely available at the repository https://gitlab.igem.org/2023/software-tools/fudan (opens new window).
# Principle of pRAP
It is indicated that the major problem of polycistronic vectors, which contain two or more target genes under one promoter, is the much lower expression of the downstream genes compared with that of the first gene next to the promoter[1]. Compared with multiple promoter system, self-interaction of the polycistron can be avoid and each cistron can initiate translation with comparable efficiency. In pRAP system (Figure 2), the RNA sequence of a ribozyme conduct self-cleaving, and the polycistronic mRNA transcript is thus post-transcriptionally converted into individual mono-cistrons in vivo[2] .
Figure 2: Principle of pRAP system
In the pRAP system, there are two regulatory elements: RBS and stem-loop. RBS influences translation initiation rate by influencing ribosome binding to mRNA, which in turn impacts protein concentration. Stem-loop affects protein concentration by regulating the rate of mRNA degradation[3]. By regulating these two regulatory elements, we can control the expression of proteins. For more details on building pRAP system, please refer to RAP Builder.
Compared to the multi-promoter strategy, pRAP system has three advantages:
- Minimal changes in genetic circuit. Since the regulatory elements are not in the CDS, designing a pRAP system only requires adding regulatory elements upstream and downstream of the CDS.
- Polycistronic expression provides reproducible ratios of the synthesized proteins, mitigating the effect on protein expression of the fluctuating transcription associated with interference between multiple promoters [4].
- Shorter cargo DNA are required to deliver multiple genes for the application of synthetic biology.
For pRAP parts we have built, please visit our Part Collection.
# Web UI
The need of user interface improvements were mentioned in several Learn phases of the DBTL cycles during the development of RAP, so we end up developing a Web UI for RAP, compatible with the following browsers (Table 1).
Table 1: Browsers support of the Web UI (until Oct 2023)
IE / Edge | Firefox | Chrome | Safari | Opera |
---|---|---|---|---|
IE9, IE10, IE11, Edge version≥116 | version≥116 | version≥116 | version≥16.6 | version≥101 |
RAP provides a detailed online (opens new window) and offline (opens new window) documentation and you can get the source code of Web UI here (opens new window). Web UI is based on Vue.js (opens new window), using Ant Design Vue (opens new window) as the front-end component library and Axios (opens new window) to access and process the APIs. Neovis.js (opens new window) was used to visualize the network structure in PartHub 2.
# Tutorials
In this tutorial, we will demonstrate the utilization of RAP for the purpose of constructing pRAP sequence similar to BBa_K4765121 (opens new window) or BBa_K4765122 (opens new window). To obtain the requisite information (refer to Table S1), click "see more" below. Alternatively, you can directly access our pre-built demo (opens new window) to generate the sequence.
See More
Table S1: Information for the tutorials of Web UI
Name | Part ID | EC Number | Sequence | |
---|---|---|---|---|
galU | BBa_K3331001 (opens new window) | 2.7.7.9 | 28.63 | sequence on Registry (opens new window) |
pgmA | BBa K3331002 (opens new window) | 5.4.2.2 | 109.27 | sequence on Registry (opens new window) |
To use RAP, please follow the following four steps:
Visit Home Page of RAP
Upon visiting our live demo (opens new window) or
Home
page of RAP deployed through docker, you will see the page below (Figure 3):
Figure 3: Home page of RAP
Step 1: KineticHub
The first step in constructing a pRAP system using RAP is to calculate the optimal enzyme concentration by KineticHub. This step involves two tasks, i.e., searching target reaction and enzyme record for the reaction and constructing a cascade reaction, optimal ratio of enzymes based on the data will be returned after these two tasks.
First, in the
KineticHub > Search Enzyme
page, please enter the query and the type of keyword you want to retrieve.Here are some examples of query and the corresponding keyword types (Table 2):
Table 2: Examples of query and corresponding keyword types
Keword Type Meaning & Format Query EC number EC number of the reaction (e.g. x.y.z.w), it's fine to just enter part of the EC number 1.1.1.101 Name The name of the reaction Nadh peroxidase Type Reaction type Oxidation Substrate One of the possible substrate of the reaction alcohol Product One of the possible product of the reaction NADPH In order to search for the target enzyme more accurately, it is recommended to use EC number to search for the enzyme. You will be presented with the interface as illustrated in Figure 4:
Figure 4: Search result of KineticHub for reactions
After clicking
Get Kcat
, the interface will be represented like Figure 5:
Figure 5: Search result of KineticHub for kcat
Here you can get the species from which the enzyme is derived and its kinetic parameters in different environments with references, and you can check the appropriate record for your project and click on the plus icon. After that, return to the search results page, and you'll see that the flask icon in the upper right corner has added a little red dot.
You have successfully imported one reaction! Now you need to add the remaining other reactions following this process.
If you haven't found the proper
? You can add relevant experimental data into KineticHub to use it in KineticHub > Add Enzyme
. You can use one of the machine learning based prediction methods available online (opens new window) or using the colab ipynb scripts (opens new window) we provided!After that, in the
KineticHub > Build reactions
page (Figure 6), please enter the ratio of product to substrate stoichiometry for your reactions according to EC number and then clickSubmit
button.
Figure 6: Building reactions in KineticHub
Step 2: RAP Builder
In
RAP Builder
page (Figure 7), you can select design mode (RBS or stem-loop mode) and input the sequence of the enzyme. Then you can clickSubmit
button to run RAP Builder! After completion of RAP Builder, the results will be automatically downloaded, containing a sequence file inGenBank
format and an annotation incsv
format.
Figure 7: Running RAP Builder
Search Part in PartHub 2
To search existing iGEM Registry (opens new window) documented parts with our software, you can navigate to the
PartHub 2
page (Figure 8). Subsequently, please input your search query and select the appropriate keyword type as required.
Figure 8: PartHub 2
The examples and meaning of the search terms corresponding to search types is as follows (Table 3):
Table 3: Examples and meaning of the search terms corresponding to search types
keyword type | Meaning & Format | Example |
---|---|---|
ID | The id of the part (e.g. BBa_xxxxxxxx). It is fine to enter a search term with or without BBa_ | K3790012 |
Name | The name of the part in Registry of Standard Biological Parts | GFP |
Sequence | The sequence of the part (if it exists). The search term can only be the combination of [a, t, g, c, A, T, G, C] | TTAACTTTAAGAAGGAG |
Designer | The name of the person who designed the part | Weiwen Chen |
Team | The name of the team which designed the part | Fudan |
Content | The content of the part in Registry of Standard Biological Parts | carotene |
You have the option to reuse the search results and access the sequence you are interested in (Figure 9).
Figure 9: Search results of PartHub 2
By clicking on a search result, you will be presented with the network structure of this part. You can scroll to zoom the canvas and drag to move the nodes (Figure 10). Furthermore, clicking on a node will display detailed information about this part, while double-clicking will lead you to the dedicated Part page. This year, improved from 2022 (opens new window), the user will also have the convenience of directly downloading the GenBank format sequence.
Figure 10: Parts network of PartHub 2
# KineticHub
To optimize of intracellular metabolism, the user needs to determine the optimal enzyme concentrations. For this purpose, we designed a model for calculating the optimal values. The model is based on the following assumptions. We provide a summary of the notation in Table 4.
Table 4: A summary of the notation for KineticHub, with the corresponding descriptions
Variable | Description |
---|---|
flux of | |
concentration of the substrate of | |
velocity value of | |
ratio of product to substrate stoichiometry of | |
optimal concentration of the enzyme of |
# Assumption 1
We expect the optimal state of intracellular metabolism to be flux-balanced, so we should take steady-state constraint into consideration, which can be expressed as:
With this assumption, it prevents metabolic stress caused by flux imbalance and permits the cell to be in a state of flux balance.
# Assumption 2
The substrate should not be overstacked and should be kept more stationary, which we denote here as follows:
From the Michaelis-Menten equation we could find:
and
Combining these equations and Assumption 1 we can conclude:
Therefore, we can calculate the optimal enzyme concentration based on the
# Implementation
We downloaded the open available (licensed under Creative Commons Attribution License 4.0 (opens new window)) version of BRENDA (opens new window) as a data source for KineticHub. BRENDA enabled KineticHub to include more than 70,000 records and more than 7,000 EC numbers[5]. But BRENDA provides text file containing BRENDA enzyme database, we need to parse and convert to our database type. So we used and contributed to (opens new window) a Python package (Figure 11) to parse and manipulate the BRENDA database (BRENDApyrser (opens new window)[6]), and put BRENDA data into MySQL.
Figure 11: Contribute to BRENDApyrser
The Unified Modeling Language (UML) diagram of KineticHub is illustrated in Figure 12.
Figure 12: UML diagram of KineticHub
For KineticHub, we created a set of RESTFul APIs using Flask that users can access either through our Web UI or by making direct API requests. An example notebook is available as ipynb file (opens new window). To facilitate the use of KineticHub, we have also created the corresponding docker image (opens new window).
# RAP Builder
After calculating the optimal enzyme concentration through KineticHub, the corresponding regulatory elements need to be designed through RAP Builder to form the sequence for a pRAP system. We developed a thermodynamic model and employed Monte Carlo algorithm for designing of the corresponding regulating elements. All synthetic, no worry of potential patent barrior. There are two modes in RAP Builder: RBS design mode and stem-loop design mode. We provide a summary of the notation in Table 5.
Table 5: A summary of the notation for RAP Builder, with the corresponding descriptions
Variable | Description |
---|---|
gene concentration of | |
mRNA concentration of | |
protein concentration of | |
translation initiation rate of | |
transcription initiation rate | |
mRNA degradation rate of | |
growth rate of the cell | |
optimal concentration of the enzyme of | |
total free energy change of | |
free energy change when16S rRNA cofolds and hybridizes with the mRNA subsequence at the 16S rRNA-binding site of | |
free energy change when anticodon hybridizes to the start codon of | |
energetic penalty for a nonoptimal distance between the 16S rRNA-binding site and the start codon of | |
free energy change when the standby site is folded of | |
the free energy change when mRNA is folded of of | |
the free energy change when stem-loop is folded of of | |
gas constant | |
culture temperature |
Protein concentration is affected by both translation initiation rate and mRNA degradation rate. We adopted and designed a cell-scale dynamic model (Figure 13) of transcription and translation[7]. Here are some of the assumptions that RAP Builder is based on.
Figure 13: Model of transcription and translation
# Assumption 1
The dynamics of transcription and translation can be expressed as follows:
If we consider steady-state constraint, both
Combining these two equations we can represent the relationship between protein concentration and transcription as well as translation below:
# Assumption 2
Since the enzymes under pRAP system are all expressed under the same promoter, the gene concentration and transcription initiation rate are the same between reactions:
Thus the concentration of protein is determined by
For RBS design mode:
Based on Assumption 1, we notice:
For stem-loop design mode:
Based on Assumption 1, we notice:
# Assumption 3
Let's start with the RBS design mode. We employ a Gibbs free energy based thermodynamic model to predict the RBS strength[8]. If we decompose translation into individual processes, we can calculate
And the translation initiation rate could be presented by Boltzmann weight (
where
# Assumption 4
Then comes the stem-loop design mode, similar to the RBS design mode, where we develop a model based on the Gibbs free energy to predict the stem-loop intensity.
We investigate and validate the correlation between
Support Vector Regression (SVR) learns a non-linear function through the utilization of the kernel trick. In other words, it learns a linear function within the space created by the specific kernel, which corresponds to a non-linear function in the original space.
We designed our experiments by placing GFP (stayGold (opens new window)) in the first position of the pRAP system, RFP (mScarlet (opens new window)) in the second also last position. The stem-loop after the 3' end of GFP was then changed, the plasmids was made, bacteria was transformed and individual clones were cultured, and the fluorescence intensity ratio of GFP to RFP was measured to reflect mRNA degradation corresponding to different stem-loops. For more information about our experiment validation, please visit our Part BBa_K4765129 (opens new window). The experimental results are shown in Figure 14.
Figure 14: The relationship between the free enegy of stem-loop and GFP/OD and RFP/OD.
The dataset we used can be accessed here (opens new window). The prediction results are shown in the Figure 15.
Figure 15: The relationship between the free enegy of stem-loop and mRNA degradation rate.
In conclusion, our experiment validates the accuracy of RAP, and demonstrates that mRNA degradation regulated by the 3' stem-loop can be accurately predicted using SVR.
# Implementation
RAP Builder uses the open source ViennaRNA package 2 (opens new window) for the necessary RNA secondary structure and free energy calculations[10]. We refer to the Monte Carlo algorithm (Figure 16) from the open-source version of the RBS Calculator (opens new window) to generate synthetic RBS[11].
Figure 16: Model of transcription and translation
We built the RAP Builder with Python 3.10. RAP Builder's Monte Carlo algorithm has a good search efficiency and can get a suitable sequence in a range of up to 1,000 iterations (Figure 17).
Figure 17: The Efficiency of RBS searching by Monte Carlo algorithm. The ddG means difference between dG target and current dG.
The Monte Carlo algorithm of stem-loop design mode is similar to that used in RBS design mode. We use scikit-learn (opens new window) to implement support vector regression, which can be found in our notebook (opens new window). Meanwhile, RAP Builder is packed into a python package, which can be accessed here (opens new window). RAP Builder also support RESTful APIs developed by Flask. For more details, please try a example notebook (opens new window) to generate synthetic stem-loop. Similar to KineticHub, we've made docker image (opens new window) for easier installation. RAP Builder generates GenBank-formatted sequence files from Biopython (opens new window) and an annotation file that generates annotations for each sequence in the project and can be opened directly in SnapGene (Figure 18).
Figure 18: Integration with SnapGene
# Parthub 2
Our PartHub (opens new window) got a hit in Paris last year. so this year we gathered feedback from PartHub users (2022 iGEMers and some judges in 2022 Grand Jamboree), resolved known issues, and launched the improved PartHub 2! Compared to the first release, PartHub 2 has several new features:
- Update with 2022 iGEM Registry of Standard Biological Parts. Parthub 2 contains over 60,000 Parts.
- Improve the user interaction interface to further simplify the process of usage. PartHub 2 incorporates feedback and suggestions from PartHub users, removes unnecessary options and starts developing recommendation algorithms.
- Design a diversified recommendation algorithm based on graph algorithm. We developed a diversified recommendation algorithm based on PageRank and Louvain method (opens new window) to prioritize the appearance of more significant parts in the search results and avoid the recurrence of similar parts.
The core of PartHub 2 is the graph model based on the Registry of Standard Biological Parts. For each Part, in addition to the attributes (like name, sequence etc.) it carries, there are two types of relationships with other nodes: "twins" and "refers to". The "twins" are undirected, while the "refers to" are directed; each is a data structure known as graph, and their relationships may be represented by an adjacency matrix. PageRank and Louvain method (opens new window) is based on this graph model. We use PageRank (opens new window) to reflect how actively a Part is used and Louvain enables community detection of similar Parts.
We provide a summary of the notation in Table 6.
Table 6: A summary of the notation for PartHub 2, with the corresponding descriptions
Variable | Description |
---|---|
damping factor which can be set between 0 (inclusive) and 1 (exclusive) | |
modularity of a graph | |
the sum of the weights of the edges attached to | |
sum of all of the edge weights in the graph | |
communities of the nodes |
# PageRank
The PageRank algorithm[12] evaluates the relevance of each node in the network based on the number of incoming relationships and the importance of the associated source nodes. In general, the underlying notion is that a node is only as important as the nodes that link to it. We assume that a node
where
# Louvain Method
The Louvain method[13] is a community detection method suitable for huge networks. It maximizes a modularity score for each community, where modularity measures the quality of node assignment. This entails determining how much more densely connected nodes within a community are in comparison to how connected they would be in a random network.
In Louvain method (opens new window), modularity
where
Louvain algorithm needs to be repeated for two phases to greedily maximize modularity. Phase 1 will put each node in a graph into a distinct community and optimize modularity by allowing only local changes to node-communities memberships. Phase 2 will aggregate communities into super-nodes to build a new network. The new network of super-nodes will then run Phase 1. Phase 1 and Phase 2 will be repeated until no movement yields a gain on the sum of modularity of all communities.
# Implementation
We used a web crawler (opens new window) to obtain Parts in 2022 iGEM Registry of Standard Biological Parts. We used Neo4j as the graph database for PartHub 2, and Flask to develop RESTful APIs for PartHub 2. PageRank and Louvain method (opens new window) was performed with Neo4j Graph Data Science (GDS) library (opens new window). PartHub 2 also has docker image (opens new window) available for installation. Like RAP Builder, PartHub 2 supports exporting GenBank format sequence files via Biopython (opens new window). To attain the objective of diversified recommendation, our priority is to display Parts with a higher PageRank while ensuring it does not belong to the same Louvain community.
# DBTL cycles of RAP
RAP is strongly related to the DBTL cycle on two levels, including the fact that RAP may aid users throughout the DBTL cycle and that the development of RAP follows the DBTL cycle.
RAP can provide support throughout the DBTL cycle for users. During the Design phase of the DBTL cycle, the user can utilize PartHub 2 to create a Part network in order to find the proper Part and its sequence. Then users can obtain the
Meanwhile, RAP's development has gone through DBTL cycles. RAP strictly follows Conventional Commits (opens new window) and Semantic Versioning 2.0.0 (opens new window). Changelog (opens new window) capture the changes made in each commit during the development of RAP, and the tags released in GitLab show how RAP improved during the DBTL cycle. It is worth noting that RAP receives feedback from both internal (Fudan iGEM 2023) and external (SCUT-China iGEM 2023 (opens new window) and Nanjing-China iGEM 2023 (opens new window)) users during DBTL cycles. For more information on DBTL cycles of RAP, please visit our Engineering page.
# References
Kim, K.-J., Kim, H.-E., Lee, K.-H., Han, W., Yi, M.-J., Jeong, J., & Oh, B.-H. (2004). Two-promoter vector is highly efficient for overproduction of protein complexes. Protein Science: A Publication of the Protein Society, 13(6), 1698–1703. https://doi.org/10.1110/ps.04644504 ↩︎
Liu, Y., Wu, Z., Wu, D., Gao, N., & Lin, J. (2023). Reconstitution of Multi-Protein Complexes through Ribozyme-Assisted Polycistronic Co-Expression. ACS Synthetic Biology, 12(1), 136–143. https://doi.org/10.1021/acssynbio.2c00416 ↩︎
Newbury, S. F., Smith, N. H., & Higgins, C. F. (1987). Differential mRNA stability controls relative gene expression within a polycistronic operon. Cell, 51(6), 1131–1143. https://doi.org/10.1016/0092-8674(87)90599-x ↩︎
Sokolenko, S., George, S., Wagner, A., Tuladhar, A., Andrich, J. M. S., & Aucoin, M. G. (2012). Co-expression vs. co-infection using baculovirus expression vectors in insect cell culture: Benefits and drawbacks. Biotechnology Advances, 30(3), 766–781. https://doi.org/10.1016/j.biotechadv.2012.01.009 ↩︎
Chang, A., Jeske, L., Ulbrich, S., Hofmann, J., Koblitz, J., Schomburg, I., Neumann-Schaal, M., Jahn, D., & Schomburg, D. (2021). BRENDA, the ELIXIR core data resource in 2021: New developments and updates. Nucleic Acids Research, 49(D1), D498–D508. https://doi.org/10.1093/nar/gkaa1025 ↩︎
Estévez, S. R. (2022). Robaina/BRENDApyrser: Zenodo [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.7026555 ↩︎
Balakrishnan, R., Mori, M., Segota, I., Zhang, Z., Aebersold, R., Ludwig, C., & Hwa, T. (2022). Principles of gene regulation quantitatively connect DNA to RNA and proteins in bacteria. Science (New York, N.Y.), 378(6624), eabk2066. https://doi.org/10.1126/science.abk2066 ↩︎
Salis, H. M., Mirsky, E. A., & Voigt, C. A. (2009). Automated design of synthetic ribosome binding sites to control protein expression. Nature Biotechnology, 27(10), Article 10. https://doi.org/10.1038/nbt.1568 ↩ ↩︎ ↩︎
Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. Proceedings of the 9th International Conference on Neural Information Processing Systems, 155–161. ↩︎
Lorenz, R., Bernhart, S. H., Höner zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P. F., & Hofacker, I. L. (2011). ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6(1), 26. https://doi.org/10.1186/1748-7188-6-26 ↩︎
Salis, H. (2023). Hsalis/Ribosome-Binding-Site-Calculator-v1.0 [Python]. https://github.com/hsalis/Ribosome-Binding-Site-Calculator-v1.0 (Original work published 2009) ↩︎
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1), 107–117. https://doi.org/10.1016/S0169-7552(98)00110-X ↩︎
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008 ↩︎