"If you can`t explain it simply, you don`t understand it well enough." - Albert Einstein

Introduction

Plasmid copy number (PCN) refers to the number of plasmid copies that can be found in a single cell (figure 1). In synthetic biology, plasmids are commonly used to transfer and express genes of interest in various organisms. PCN can be correlated with gene expression, but the relationship is complex and depends on various factors. For example, overexpressing a gene can lead to higher levels of protein production, but too much expression can be toxic. Therefore, controlling PCN is important for designing and optimizing synthetic biology systems to achieve the desired outcomes. For instance, if a synthetic biology system involves the expression of a toxic protein, it may be necessary to tightly control the PCN to prevent overexpression and toxicity. On the other hand, if the system requires high levels of expression of a particular gene, it may be beneficial to increase the PCN to maximize the number of copies of the gene.
Copy number of plasmids can be relevant to various biotechnological objectives such as metabolic pathway engineering, DNA vaccines, efficient alternative protein production for food-tech, biosensors and more.

Plasmid DNA
Figure 1: Illustration of bacterial DNA - plasmids and genome.

What is Ctrl C?

Everyone knows Ctrl C is the keyboard shortcut for copy. Well, for us, it’s also about controlling the copy number of plasmids.

CTRL C

Our Journey - how did we choose our project?

As our team gathered, we spent a few weeks on finding an important topic to our project. To this end, each pair of students introduced iGEM projects developed throughout past years competitions.
As we studied previous iGEM projects, we realized that if we desire to develop a helpful iGEM project, we have to focus on a fundamental biological issue and find a solution to this problem using our knowledge in molecular biology, combined with bioinformatic tools and engineering background. Throughout this period, we also received lessons and developed expertise in the field of synthetic biology, by our instructor and PI, and other accomplished researchers from Tel Aviv University.
The next step involved coming up with project ideas independently. While still working in pairs, a wide range of project proposals were offered, such as enhancing CRISPR specificity using RNA toeholds, suppressing bacterial communication with aptamers, addressing lab evolution challenges with near-IR proteins, and even using pain to stimulate sensory receptors for malaria detection. However, as our discussions progressed, we understood that we would prefer to have a broader effect in the laboratory and research community than to solve a specific problem. Therefore, we have chosen the idea of Eden and Tzlil, which was utilizing machine learning tools for engineering and predicting plasmid copy numbers, with the goal of making a significant impact on the foundational advancement of synthetic biology technologies.

IGEM team lecture
Figure 2: IGEM TAU team members lecturing at synthetic biology class.

Motivation

Our iGEM project was inspired by the need to address plasmids incompatibility within bacteria cells. As a first step, we examined factors that contribute to plasmid incompatibility and realized that the copy number of plasmids plays a crucial role. However, there was no tool available to predict the copy number of a given plasmid in a host. Therefore, we focused on engineering plasmid copy numbers to achieve our goal based on computational models. We were inspired by the Vilnius – Lithuania (Undergrad) 2017 SynORI iGEM team. Our project aims to expand on this idea by creating a tool (including a software) to predict and engineer plasmid copy numbers. This will have a wide range of applications in biotechnology and synthetic biology.

Current synthetic plasmids are often limited in their copy number selection, which can constrain the potential applications of the plasmids. Our tool aims to widen the range of copy numbers available for synthetic plasmids based on computational modeling and machine learning tools of the interactions at the origin of replication (ORI) in different organisms.

Importance of controlling the copy number

Tuning plasmid copy number can be used for different applications, including:

Plasmid Image 1

Optimization of protein yield.

Plasmid Image 2

Metabolic engineering.

Plasmid Image 3

DNA vaccines generation.

Plasmid Image 4

Synthetic analog computation in living cells.

Plasmid Image 5

Alternative proteins for food tech.

Plasmid Image 5

Biosensors.

Our goal is to develop computational models that can simulate multi-plasmid systems, where each plasmid in use has a different copy number. By incorporating the different copy numbers of each plasmid, our model will enable us to investigate how the gene expression of each plasmid is affected and how this impacts the overall behavior of the host. The project has the potential to enhance the optimization of plasmid-related work in both academic and industrial settings.

protein expression levels
Figure 3: Illustration of protein expression levels as a function of plasmid copy number.

The figure above (figure 3) demonstrates the importance of controlling the plasmid copy number. A very high plasmid copy number doesn’t necessarily mean the highest protein expression levels. Other factors, such as metabolic strain and host-related elements, also influence protein expression, leading to decreased levels when the plasmid count is exceedingly high. The figure illustrates that the optimal range for achieving high protein expression does not coincide with the maximum attainable plasmid copy number.

Biological background

ColE1 Replication Regulation

We focus on ColE1-like plasmids that are the most popular vectors for recombinant protein expression[1]. As a direct result, their respective plasmid replication regulation system is highly studied. ColE1 plasmid replication requires the formation of a primer that is complementary to the ORI. The plasmid codes for RNAp, a pre-primer complementary to the ORI. The 3′ end of RNAp forms a stable hybrid with the origin of replication, guided by complementary GC-rich stretches. This type of hybridization is known as an R-loop. The pre-primer matures into a primer when E.coli's RNaseH processes the R-loop, creating a free 3' -OH terminus (figure 4). DNA polymerase I can then extend this terminus, initiating leading-strand synthesis, the first step in the plasmid’s replication[2,3]. The regulation of ColE1 plasmid’s replication is achieved by the interaction between RNAi, RNAp, and ROM (also known as “rop”). RNAi and RNAp are complementary to one another and form a complex known as a ‘kissing complex’. The kissing complex can be stabilized by interacting with the ROM protein dimer. Regulation is also affected by E.coli’s metabolic burden. RNAp and RNAi can interact with uncharged tRNA, such interactions prevent the formation of a kissing complex, allowing RNAp to initiate plasmid replication. pUC19 is a high copy number plasmid. The high copy number is a result of the lack of the rom gene and a single point mutation in the ORI [4].

ColE1 Replication
Figure 4: Illustration of ColE1 Replication Regulation mechanism.

The Project

Increasing the copy number of plasmids in cells can improve the yield of protein production. Conversely, decreasing the copy number of plasmids in cells for minimal copy number, can have benefits such as reducing metabolic burden and stabilizing plasmid maintenance. For maximal copy number, RNAp should have a good affinity to the DNA strand, and RNAi and ROM should be repressed or removed. The opposite is required for a minimal copy number. Our tool generates a plasmid sequence that will be replicated in the cell for an ideal copy number based on free energy calculations, evolutionary considerations, and codon bias optimization.

We used data from [5], which contains 366 inhibitory RNA promoter variants and 833 priming RNA promoter variants with their corresponding copy number. The variants were edited at positions (-33,-30), (-11,-8), and +1 for priming RNA and (-33,-30), (-10,-7) , and +1 for inhibitory RNA. Based on this data we engineered above 6000 features (see more details in the Model page) With additional research we were able to add more features to the model and have a better understanding of the plasmid replication mechanism.

The stages below (figure 5) were followed to achieve the goals of the project:

  1. Modeling interactions
  2. Calculating free energy and affinity of the interactions.
  3. Modifying affinity.
  4. Generating sequences.
  5. Experimental validation.
  6. Use the validated data as an input to the model, and repeat the stages.
modeling and biological stages
Figure 5: The engineering cycle of our project - modeling and biological stages.

We created a user-friendly web tool that allows scientists and lab workers to easily obtain the modified plasmid sequence for their desired copy number (figure 6). With our tool we are aiming to reach broad communities such as academy researchers, industrial biologists and anyone who wishes to work with plasmids.

web tool
Figure 6: Example for our web tool (see more details in the Software page).

References

  1. Camps, M. Modulation of ColE1-Like Plasmid Replication for Recombinant Gene Expression. Recent Patents Dna Gene Sequences 4, 58–73 (2010).
  2. Standley, M. S., Million-Weaver, S., Alexander, D. L., Hu, S. & Camps, M. Genetic control of ColE1 plasmid stability that is independent of plasmid copy number regulation. Curr Genet 65, 179–192 (2019).
  3. Itoh, T. & Tomizawa, J. Formation of an RNA primer for initiation of replication of ColE1 DNA by ribonuclease H. Proc National Acad Sci 77, 2450–2454 (1980).
  4. Grabherr, R. & Bayer, K. Impact of targeted vector design on ColE1 plasmid replication. Trends Biotechnol 20, 257–260 (2002).
  5. Rouches, M. V., Xu, Y., Cortes, L. B. G. & Lambert, G. A plasmid system with tunable copy number. Nat Commun 13, 3908 (2022).