Software Contribution 1: The RSModifer
Problem
As shown in the linear sequence above, there are three BsaI restriction sites present. However, if you need to cleave this sequence only at two of these sites, it becomes a problem especially when the third site is in a gene part vital to get the desired sequence, there is going to be a third undesired cut. This was one of the problems we faced while developing our DNA circuit.
General Solution
Generally, different combinations of bases could code for one amino acid as shown below:
Applying the concept of amino acids to solve the problem mentioned above, bases of some amino acids in the recognition sites are substituted with other bases that code for the same amino acid. In this way, the recognition site is eliminated while also preserving the behavior of the part that contained the unwanted recognition site. Changing these bases manually can be quite a hassle especially if there are multiple of these unwanted recognition sites. There are also certain rules that must be followed considering that three bases code for an amino acid. These rules make the manual changing of bases a complex task.
Our Solution
In our efforts to contribute to Synthetic Biology and to the work of future iGEM teams who may face similar challenges when developing their DNA circuits, we developed a web application (RSModifier) that eliminates the unwanted recognition sites automatically, while also preserving the behavior of the part.
Follow this link to have a feel of the software designed
RSModifier User Menu
Software Contribution 2: Similarity Calculation for DNA Sequences
Overview
The provided code is designed to calculate the similarity between a user-inputted DNA sequence and a sequence in the iGEM (International Genetically Engineered Machine) database. It utilizes Longest Common Subsequence (LCS) to determine how closely the user's sequence matches sequences in the database. The code ultimately generates a tabular report that displays the database sequences with high similarity scores, providing valuable information for users who need to find sequences similar to their input.
For one to use the software, you have to input your dna sequence and the software will check in the iGem database for similar sequences. If no matching sequence is found the software returns a message indicating that no matching sequence was found, in the case that a sequence is found all the matching sequences are returned in a table. The table has six columns, 1st Base Pair Position, Last Base Pair Position, Part Name, Similarity Score, DNA Sequence and Short Description. An explanation of each column is given below.
Output |
Explanations |
1st Base Pair Position |
This is the start position of the DNA sequence where the DNA sequence was compared to the database sequence to give us the similarity score. |
Last Base Pair Position |
This is the end position of the DNA sequence where the DNA sequence was compared to the database sequence to give us the similarity score. |
Part Name |
The name of the database sequence as a part in the iGEM repository. |
Similarity Score |
The similarity score between the section of the input DNA sequence and the database sequence |
DNA sequence |
The DNA code for the database sequence |
Short Description |
May show the actual name of the database sequence or any important information about the database sequence |
How It's Helpful:
This code plays a crucial role in a software system designed for genetic research and analysis. It assists users in finding DNA sequences in the iGEM database that closely match their input. The benefits of this code include:
Sequence Similarity Assessment: It provides a quantitative measure of similarity between the user's sequence and database sequences, enabling researchers to quickly identify potentially relevant genetic parts.
Quick and Automated Process: The code automates the process of sequence comparison, saving researchers time and effort. It generates a clear and organised report for easy reference.
Database Exploration: Users can explore the iGEM database and identify sequences of interest by entering their own sequences. This functionality is valuable for genetic engineers, biologists, and bioinformaticians.
Customizable Threshold: The code uses an 85% similarity threshold by default, which can be adjusted to fit specific research requirements.
Documentation: The code is well-documented, making it accessible and understandable for users and developers.