Application and Prospect of DARWINS
In the process of building the DARWINS model, the functions we want to achieve are as follows:
Mapping of protein sequence to specific structural information,
Characterization of the structure of the KmAgo mutant
Regression of melting temperature of mutants. For actual users, they prefer to use this model in a more convenient way while implementing these functions.
Reduce the space of possible mutation sites using the model and reduce the number of experiments that need to be attempted.
Use the model to verify the stability of the designed protein structure.
Taking KmAgo as an example, the model is used to achieve rapid directed evolution of mutants under specific conditions, reducing production costs.
When studying the role of enzymes in pathways, use models to reduce the experimental cost required for mutations.
In the process of studying the directed evolution of KmAgo, the goal of the researchers is to obtain KmAgo mutants that are active at room temperature through directed evolution. Researchers need to provide sequence information of alternative mutants, and our model can predict the thermal stability of the protein structure corresponding to the input sequence. Sorting the output structures to select mutants with high thermal stability, and then conducting the next round of site-specific mutations, researchers can save a lot of experimental material costs and time costs in the current search process for feasible mutation sites.
The accuracy of the model's prediction of protein melting temperature heavily depends on the quality of the training set. For the research target KmAgo, we have only trained the model to accurately predict the melting temperature of KmAgo and members of the same protein family.
The mapping of protein sequences to structures requires significant memory consumption to enhance credibility.
The operation of high parameter pre trained models requires a longer time.
Melting temperature is only one of the indicators reflecting protein stability, making it difficult to benchmark with similar software.
Users can click the "Upload" button on the software page to submit the sequence information of the protein they want to predict. Please note that the software only accepts files in the format fasta (starting with>). After uploading, the file content will be displayed in the text box below, where users can check whether the correct file has been submitted.
Subsequently, by clicking the "Prediction" button, the model will automatically receive protein sequence information on the server and run it. Please note that due to limited memory on the server, we do not recommend users to upload proteins with a length exceeding 100 A.A . This will result in the server running out of memory and returning an error.
If the page does not respond after clicking, please be patient and wait for a period of time. The server needs some time to run the model. After running, the software will return a downloadable file called result.txt, which contains the protein name and the predicted melting temperature.
Can't wait to use darwins? Please click  
hereDARWINS is a tool that uses a thermal stability prediction model as a screening tool for faster and easier screening. It consists of two main components: a feature extraction module to extract potential representations of protein sequences and a Tm (Melting Temperature) prediction module to predict the thermal stability of proteins based on potential representations.
In the feature extraction model, we implement the functionality mainly through ESM-2, a protein language model that can be used to obtain dependencies between amino acids using mask language modelling (MLM). These features can be used to predict the structure of a protein. We use ESM-2 to obtain the relationships between amino acids as features of specific protein sequences for further stability prediction. With DARWINS, we expect to achieve the following:
1. Training a general model for high accuracy thermal stability prediction and fine-tuning the model for Ago proteins and mutant strains.
2.Development of a thermal stability prediction platform where user-submitted sequences are able to output TM values and sort them.
3. Auxiliary thermal stability modification of KmAgo to obtain mutants by means of directed evolution, and evaluation by predictive models to obtain more stable KmAgo proteins.