After the commencement of the project, we established contact with the experimental team, known as "iGEM-USTC", to understand the background related to synthetic biology. During our discussions, we learned that protein optimization and expression is a time-consuming process. One of the reasons for this is the current popular protein design platforms have scattered functions and are not beginner-friendly. Therefore, we felt that we should make our contribution in this field. With the idea of combining life science and computer science, we tried to integrate various commonly used protein auxiliary design platforms. Under the guidance of the wet lab team, we incorporated and implemented cutting-edge codon optimization algorithms, culminating in our product - Prot-DAP, which has contributed to the iGEM community. Our contributions are as follows:
Codon optimization module based on a frontier algorithm
Prot-DAP incorporates a module we independently developed based on a di-codon optimization algorithm. We discovered an algorithm for codon optimization based on di-codons in an article from BMC Systems Biology. The article pointed out that this optimization algorithm greatly outperforms single codon optimization. We, therefore, decided to incorporate it into Prot-DAP. However, the distribution frequency of di-codons in different species does not have readily available data. After consulting with iGEM-USTC and the biology team, our backend team wrote scripts to count di-codon distributions on the National Library of Medicine and developed our di-codon optimization algorithm based on this data.
Integration of commonly used protein design modules:
Jalview: Visualization of mutation sites
We chose Jalview to display MSA results intuitively. By importing a FASTA file, users can see the differences between each mutation site and the corresponding position in the original sequence. By showing the similarity and scoring of each site in the form of statistical data, users can intuitively discern evolutionary homology and evolutionarily conserved sites.
PSSM Matrix: Protein Feature Description
Compared to other sequence feature extraction methods, the PSSM matrix effectively captures the sequence characteristics of proteins and the conservation information of amino acids. The efficient representation of primary structure evolutionary information enables various biological studies to be more easily carried out. The Prot-DAP platform supports the online generation of PSSM matrices from protein sequences, providing convenience for future teams with similar requirements.
User-friendliness
User-friendliness is a primary goal for Prot-DAP. The interface of Prot-DAP has undergone multiple optimizations. We have added flow advancement arrows, input prompts, and loading animations. We also offer checkboxes for site mutations to directly modify corresponding sites, ensuring a full-process guide, making it suitable for general use and achieving a user-friendly integrated protein assistance design platform. Thus, for future iGEM teams working on topics similar to ours, Prot-DAP can significantly reduce their learning curve and enhance work efficiency. With the help of Prot-DAP, they can more quickly grasp the basic workflow, which greatly aids in expediting their project completion.
Educational Significance
For junior students and others interested in synthetic biology, Prot-DAP serves as a competent introductory teaching platform. It offers them a chance to participate firsthand in protein design and optimization, nurturing their problem-solving abilities and allowing a broader audience to appreciate the allure of synthetic biology.
Troubleshooting: from PROSS to EVmutation
We encountered several challenges when choosing tools for predicting mutation sites. Our initial choice, PROSS, had application scenario limitations (limited to enhancing protein stability), couldn't be directly deployed on servers, and offered a subpar user experience post-integration. To enhance the versatility and user-friendliness of Prot-DAP, we began exploring other models for protein optimization. After several attempts, we settled on EVmutation, an unsupervised learning-based protein optimization model. Compared to other models, it boasts higher efficiency and more stable results. Based on our team's experience, we strongly recommend future teams with similar needs to consider EVmutation as their primary method for predicting mutation sites.