Abstract

In this page, we present our cutting-edge software tool, designed to revolutionize plasmid's copy numbers prediction. This tool, with its friendly UI, enables users to input plasmid sequences or select from existing ones, and producing FASTA files edited to their desired copy numbers. It offers the unique ability to generate multiple sequences in one go, streamlining experimentation. Moreover, it enhances user experience by providing comprehensive plasmid maps with clear annotations, facilitating data interpretation and sharing. The software is compatible with synthetic biology standards, validated through experimental work, and useful for various projects, including bioproduction optimization and genetic circuit development. The tool's accessibility across various devices ensures effortless use, making it an invaluable asset for geneticists and biotechnologists. This page delves into the tool's functionalities and instruction, showcasing its potential to advance genetic engineering and accelerate scientific discoveries.

Join us as we explore the possibilities and advantages that our software tool brings to the world of plasmid design and copy number prediction:

Introduction

In the rapidly expanding field of synthetic biology, researchers are continuously developing innovative tools and techniques to design and engineer biological systems with unprecedented precision. One key challenge in this endeavor is the precise control of plasmid copy numbers. Plasmid copy number refers to the number of copies of a plasmid that are present in a cell, and it has a significant impact on gene expression levels. Therefore, the ability to precisely control plasmid copy numbers is essential for optimizing the performance of synthetic biological systems.

Existing methods for plasmid copy number control are often complex and time-consuming, requiring the use of specialized strains or multiple plasmids. This can be a barrier for researchers who are new to synthetic biology or who do not have access to sophisticated laboratory equipment.

To address this challenge, we have developed a software tool that enables researchers to design and predict plasmid copy numbers in a simple and efficient manner. Our tool is based on a machine learning model that has been trained on a large dataset of plasmid sequences and copy number measurements. This allows our tool to accurately predict the copy number of any plasmid, based on its sequence and the host strain in which it will be used. At its core, our software tool offers a set of essential features that have been assembled through collaborative consultations with biologists:

Key Features	Description
Copy number adaptation	The tool allows users to input a plasmid sequence (FASTA of GenBank files), specifying the desired copy number, and receive a sequence with ORI modifications that yields the desired copy number.
Versatility	The tool can modify a given plasmid so it will have the desired copy number, or create a plasmid with the desired copy number from scratch.
Tailored Fasta Files	The tool can generate fasta files with precise adjustments to achieve the desired copy number.
Efficiency Through Multiplicity	The tool allows users to request multiple copy numbers simultaneously, receiving a dedicated sequence for each
Comprehensive Annotations	The tool presents clear and concise plasmid maps accompanied by annotations that are seamlessly integrated into the sequence itself
Accessible Anywhere	The tool is accessible on both computers and mobile devices
API	The tool is available for developers to implement in their project via this link

Table 1: Key features of the software tool with short description

These features make the software tool a valuable tool for plasmid design and copy number prediction. The ability to generate tailored FASTA files and request multiple copy numbers simultaneously can save researchers a lot of time and effort. The comprehensive annotations can simplify data interpretation for scientists. And the accessibility of the tool makes it easy for researchers to work on their plasmid designs and copy number predictions from anywhere.

In addition to these features, the software tool also has the potential to improve the accuracy and reduce the cost of plasmid design and copy number prediction. The use of machine learning to predict plasmid copy numbers could potentially lead to more accurate predictions than traditional methods. And the software tool could potentially save researchers money by reducing the need for specialized laboratory equipment and reagents.

Contributing to the synthetic biology community

Our iGEM team's software tool accomplishes several key aspects of synthetic biology standards and utility. First, it is compatible with existing synthetic biology standards, such as FASTA and GenBank, by using this format to read and write data. Additionally, the software uses several popular bioinformatics tools, including:

Diamond [1] - A new tool that is similar to BLAST, but it is faster and more sensitive.
SeqViz [2] - A tool that is used to visualize and analyze biological sequence data. It can be used to create a variety of different types of visualizations, such as sequence alignments.
Infernal [3] - A tool that is used to identify RNA secondary structures. It is often used to identify non-coding RNAs, such as microRNAs and tRNAs.
Planotate [4] - A tool that is used to annotate biological sequences. It can be used to identify genes, exons, introns, and other features in DNA sequences.

All of these are widely used in the synthetic biology community, and their inclusion in our software tool makes it more compatible with other tools and resources.

Second, the software has been validated through experimental work, demonstrating its accuracy and reliability in predicting plasmid copy numbers. This is important for ensuring the success of synthetic biology experiments, which often rely on precise control of gene expression.

Third, the software is designed to be useful to a wide range of synthetic biology projects, beyond the scope of our team's own project. Its customization options, efficiency, comprehensive annotations, and accessibility make it valuable for different research objectives. Additionally, its potential for improved accuracy and cost savings further enhance its applicability to a wide range of projects such as:

Bioproduction Optimization: In a project focused on optimizing the production of a specific bio-based product, such as a biofuel or enzyme, researchers can use the software tool to design plasmids with tailored copy numbers. By precisely controlling the plasmid copy numbers in the host organisms, they can fine-tune gene expression levels, thereby increasing the yield of the desired product. This customization can significantly enhance the efficiency of the bioproduction process.
Genetic Circuit Development: For a project involving the construction of complex genetic circuits for synthetic biology applications, the software tool can aid in the design of plasmids with specific copy numbers. Researchers can create plasmids that balance the expression of various genes within the circuit to achieve desired behaviors or functions. The tool's ability to generate multiple sequences at once is particularly advantageous in constructing and testing different circuit configurations efficiently.

In both examples, the software tool's flexibility in designing plasmids with precise copy numbers can accelerate research progress and help achieve project objectives effectively.

Fourth, the software is user-friendly and accessible to researchers across a variety of backgrounds. This is achieved through a friendly UI, provides an intuitive and streamlined user interface, allowing researchers to achieve their desired output plasmid with just a few simple interactions. Furthermore, its comprehensive documentation, clear instructions and code comments(?) also makes our software user-friendly and accessible.

Finally, the software is well-written and easy for future contributors to understand and build upon. This is due to the investment in comprehensive documentation, as well as the use of best practices in software development.

Instructions

How to use the Ctrl-C tool?

Choose between (1) using the provided plasmids (currently supporting pUC-19 and pBR322), (2) uploading your own plasmid's FASTA file or (3) providing the plasmid's sequence as text.

Figure 1:The tool supported inputs,from left to right- (1) Working with pUC19 or pBR322 plasmids. (2) Working with your own Fasta file. (3) Working with your own seqence, provided as text.

Choose your desired copy number. If you want a grid (range) of copy numbers, first click on the “grid” option, then enter 5 values of your desired grid of copy numbers.

Figure 2 :From left to right- Copy number input box for a single output. Grid copy number input box for multiple output.

Choose whether to visualize the annotated plasmid map or not.

Figure 3:Plasmid map vizualisation activation and diactivation button.

Click “Get Modified Plasmid” and wait for the results to appear on the screen.

Figure 4:The sofrware "Run" button, while the modification are being made, the loading page (right) will apear.

If “visualize the plasmid map” checkbox was marked, clicking on the different site of the plasmid's map illustration will refer the user to the site sequence.

Figure 5:The intercative visualizations created by the software.
Click “Download Modified Sequence” to download the plasmid modified sequence as a FASTA file.

Figure 6: Download buttun, the modifeid sequence will be saved to your comuter as a Fasta file.

What Ctrl-C tool does?

Given the input sequence, the tool annotates the sequence using DIAMOND.
After identifying the ORI’s (origin of replication) location, our tool modifies the plasmid's ORI sequence.
The ORI’s modified sequence affects the plasmids replication system, controlling the
The sequence is determined using our computational predictive model of the replication system of ColE1 plasmids.
If you choose to visualize the plasmid, DIAMOND Blast will add annotations for the modified plasmid sequence.

Typical Ctrl-C tool applications

Controlling the Copy Number had a wide range of applications:

Any project that involves working with plasmids, can benefit from controlling the plasmid Copy Number.
It offers an additional way to optimize and control recombinant protein expression.
Biosensors are innovative devices that detect and convert biological or chemical signals into measurable data. Adjusting plasmid copy numbers can help optimize the signal-to-noise ratio of biosensors.
Pathways containing multiple steps often require synchronization and optimization to avoid bottlenecks in the production of the final product. Controlling copy number was shown to be essential in such processes, to get better production and avoid cell toxicity.
Plasmid DNA vaccines represent an innovative vaccination approach, utilizing plasmids as both a carrier for antigens and a scalable means for vaccine production. Controlling plasmid copy number can help the production process.

FAQ

Why do I get an Error "the promoter of the RNA-p sequence does not match the original promoter"?
Our tool supports Col-E1 like plasmids only, that contains the wild type RNAp promoter sequence. Please make sure that the plasmid you are using contains the correct RNAp promoter.
How accurate is the modification made by the Ctrl-C tool to the plasmid's ORI sequence?
Our model reached high performance, with Pearson correlation of 0.96 and p-value of 7.92*10^-70.
Can I cite or reference the Ctrl-C tool in my research or publications?
Yes. We plan to submit a paper about the tool. In the meantime, you can cite our website.
Is the Ctrl-C tool compatible with all plasmid sequences?
Our tool currently supports col-E like plasmids only.
What are the system requirements for running the Ctrl-C tool?
None! internet access only.
What if I don’t know which plasmid to use?
First, you can simply choose one of the default and recommended plasmids under ‘Selected Plasmids’. Second, you can enter any col-E plasmid you find into the tool and run with plasmid map visualization to get the annotations of the plasmid, which will show you what the plasmid contains.
How to choose the desired plasmid copy number?
If you don’t know which plasmid copy number is optimal for your need, no worries. By selecting ‘Grid’ in the desired plasmid copy number section, you will get the option to request a range of 5 different copy numbers, that can then be use to find the optimal copy number for you.
Can I input multiple plasmids at once?
Unfortunately, right now we only support one plasmid at a time, but hey, you can run it as many times as your heart desires.
Can I integrate this tool as part of my own project?
Yes, we have a public API available. For more information please visit the API documentation at this link

Under The hood

Ctrl-C tools make use of multiple advanced biosynthetic tools. First we use seqparse[5] for reading the plasmid input from the uploaded fasta file. Then we find the plasmid annotations with First we use pLannotate[4], a python library for automatically annotating engineered plasmids. Finally in order to visualize the beautiful annotated plasmid map, we use seqviz[2], an amazing javascript library that serves as a DNA, RNA, and protein sequence viewer.

Validation

The tool outputs is based on our predictive model that estimating pUC19 plasmid copy numbers based on the ORI and the regulatory sequences within this region. The model achieved accurate results with a Pearson correlation of up to 0.91 (P-value= 4.06 x 10^-50) and Spearman correlation of 0.81 (P-value= 1.42 x 10^-30) between the model predictions and a test set (read more about our model on model page).

The model predictions were validate in the wet lab, and the Pearson correlation with biological measurements we generated and saved for validation was 0.652 (P-value= 1.6X10 x 10^-2 (read more about our results on results page).

Summary

Overall, our iGEM team's software tool represents a significant contribution to the synthetic biology community. It is a well-designed, well-implemented, and easy-to-use tool that addresses several key aspects of synthetic biology standards and utility. Try our software tool

Our tool can be used anywhere and anytime, whether you are in the lab, on the bus back home, or even on a trip to the desert. Our tool to the rescue

References

Buchfink B. et al. 2021, Diamond github - https://github.com/bbuchfink/diamond
Joshua Timmons , 2022 ,seqparse github https://github.com/Lattice-Automation/seqparse
E. P. Nawrocki and S. R. Eddy, 2013, Infernal github - https://github.com/EddyRivasLab/infernal
matt mcguffie, 2021,pLannotate v1.2.1 https://github.com/mmcguffi/pLannotate
Kevin leshane , 2022 , seqviz v3.9.1, https://github.com/Lattice-Automation/seqviz