Project Test Examples and Output Results

1. Query GSTP1 as the test target Protein in Protein Search.

First Step

2. Import the Protein Search file into MSA.

Second Step

3. Download the MSA result, adjust the target sequence GFP_AEQVI to the top, copy all sequences and enter Mutation Site Prediction to get the heat map and recommended mutation site table; Import the sequence file into Jalview to provide a 2-D visualization of mutation sites

Third Step Third Step

4. Copy all sequences and input PSSM module to obtain corresponding PSSM.

Fourth Step

5. Site No. 38 was selected for mutation from V to T, and the mutant sequence was automatically filled into Structure Prediction and Codon Optimization.

Fifth Step

6. In the Codon Optimization module, select Rattus norvegicus option to submit and get the DNA sequence file after Codon Optimization.

sixth Step

7. Copy and paste the DNA sequence into Primer Design. Select PUC19 as the vector and insert position as 70.

seventh Step

8. Full results:

eighth Step

Statistical Evaluation of Users

To better understand users' real needs and make our platform more practical and reliable, we conducted a trial activity at three universities: USTC, HFUT, and AHU. After trying out the platform, users could fill out our questionnaire, provide feedback, and suggest modifications. Over the three days from October 10th to October 12th, we collected a total of 146 questionnaires. Here is a statistical analysis of the questionnaire results:

In terms of personnel structure, out of the 146 participants, 136 were from USTC, and the remaining 10 were from HFUT and AHU. Most of them were first and second-year students, mainly from biology-related majors.

Survey Result 1 Survey Result 2 Survey Result 3

Fig.1 Personal Information of Participants

Regarding the trial experience, the majority of users found our interface design and feature guidance to be excellent, scoring 4.64 and 4.68, respectively.

Survey Result 4 Survey Result 5

Fig.2 Evaluation of Interface Design and Guidance

In terms of specific functionality, most users believed that our platform could help them in learning synthetic biology and designing protein sequences to some extent. However, many also felt that there is room for improvement in our platform.

Survey Result 6 Survey Result 7

Fig.3 Benefit Evaluation

Of course, we understand that conducting simple online trials may not capture more professional guidance. Therefore, we also reached out to several peers who conducted in-depth trials and evaluations of our platform. For more details, please click on the link: Website Usage Experience

If you wish to access the raw data of our statistical table, please click Platform_usage_feedback_survey.xlsx.

Failed Attempts Before

At first, we considered modeling and experimental verification of some metabolic pathways in biology based on machine learning and deep learning. However, for a period of time, we couldn't find suitable metabolic pathways, so we temporarily shelved this direction. Later, when a senior student introduced some issues encountered in his research topic, we learned that the upstream design for solubility optimization required back-and-forth jumps and multiple iterations, resulting in significant time costs. Thus, we considered designing a comprehensive and user-friendly platform to accomplish this task.

After our preliminary design, we successfully implemented the basic framework. However, challenges soon emerged. There were issues with the invocation of Pross on the server, which meant we could not provide a stable and convenient channel for users. In the end, to ensure the user experience, we decided to abandon the use of PROSS. After referring to a review in this year's Cell System, while retaining the basic design process, we changed the solubility optimization module to a more general rational design of proteins, or simply put, protein sequence design.

We extensively researched protein sequence design models. Among them, the semi-supervised model caught our attention due to its superior performance. However, another challenge arose: the semi-supervised model required downstream lab validation and database construction. Although we made contact with the USTC's IGEM experimental team, cooperation couldn't be realized due to synchronization issues in our schedules. Consequently, we had no choice but to abandon this option.