Project Test Examples and Output Results
1. Query GSTP1 as the test target Protein in Protein Search.
2. Import the Protein Search file into MSA.
3. Download the MSA result, adjust the target sequence GFP_AEQVI to the top, copy all sequences and enter Mutation Site Prediction to get the heat map and recommended mutation site table; Import the sequence file into Jalview to provide a 2-D visualization of mutation sites
4. Copy all sequences and input PSSM module to obtain corresponding PSSM.
5. Site No. 38 was selected for mutation from V to T, and the mutant sequence was automatically filled into Structure Prediction and Codon Optimization.
6. In the Codon Optimization module, select Rattus norvegicus option to submit and get the DNA sequence file after Codon Optimization.
7. Copy and paste the DNA sequence into Primer Design. Select PUC19 as the vector and insert position as 70.
8. Full results:
Statistical Evaluation of Users
To better understand users' real needs and make our platform more practical and reliable, we conducted a trial activity at three universities: USTC, HFUT, and AHU. After trying out the platform, users could fill out our questionnaire, provide feedback, and suggest modifications. Over the three days from October 10th to October 12th, we collected a total of 146 questionnaires. Here is a statistical analysis of the questionnaire results:
In terms of personnel structure, out of the 146 participants, 136 were from USTC, and the remaining 10 were from HFUT and AHU. Most of them were first and second-year students, mainly from biology-related majors.
Fig.1 Personal Information of Participants
Regarding the trial experience, the majority of users found our interface design and feature guidance to be excellent, scoring 4.64 and 4.68, respectively.
Fig.2 Evaluation of Interface Design and Guidance
In terms of specific functionality, most users believed that our platform could help them in learning synthetic biology and designing protein sequences to some extent. However, many also felt that there is room for improvement in our platform.
Fig.3 Benefit Evaluation
Of course, we understand that conducting simple online trials may not capture more professional guidance. Therefore, we also reached out to several peers who conducted in-depth trials and evaluations of our platform. For more details, please click on the link: Website Usage Experience
If you wish to access the raw data of our statistical table, please click Platform_usage_feedback_survey.xlsx.
Failed Attempts Before
At first, we considered modeling and experimental verification of some metabolic pathways in biology based on machine learning and deep learning. However, for a period of time, we couldn't find suitable metabolic pathways, so we temporarily shelved this direction. Later, when a senior student introduced some issues encountered in his research topic, we learned that the upstream design for solubility optimization required back-and-forth jumps and multiple iterations, resulting in significant time costs. Thus, we considered designing a comprehensive and user-friendly platform to accomplish this task.
After our preliminary design, we successfully implemented the basic framework. However, challenges soon emerged. There were issues with the invocation of Pross on the server, which meant we could not provide a stable and convenient channel for users. In the end, to ensure the user experience, we decided to abandon the use of PROSS. After referring to a review in this year's Cell System, while retaining the basic design process, we changed the solubility optimization module to a more general rational design of proteins, or simply put, protein sequence design.
We extensively researched protein sequence design models. Among them, the semi-supervised model caught our attention due to its superior performance. However, another challenge arose: the semi-supervised model required downstream lab validation and database construction. Although we made contact with the USTC's IGEM experimental team, cooperation couldn't be realized due to synchronization issues in our schedules. Consequently, we had no choice but to abandon this option.