Validation on Our Project
In this project, we employed Protein Language Models (PLM) to extract sequence features for thermal stability prediction. The effectiveness of features extracted by PLMs has been widely validated[1-5]. Take ESM2 for example; the features it extracts can generate high-confidence protein structures when passed through a structural module, as published in Science in 2023. Therefore, after extensive literature research and discussions, we concluded that the use of PLMs for assisting thermal stability prediction is a feasible approach.
We collected protein sequence data from Kaggle and PremPS. We conducted model training and effectiveness validation using various PLMs, including TempL and ESM2, and ultimately found that the model based on ESM2 yielded the best predictive performance. During the model training process, we employed Mean Squared Error (MSE) supervision, and the loss function curve is depicted below:
As you can see, MSE converged towards the end of training, and there were no gradient anomalies during the training process. On the test set, the trained model achieved an R-squared (R2) value of 0.71 when comparing predicted values to actual values, indicating that the model could capture the inherent thermal stability properties within the sequences.
We assessed the model's performance using experimental values for KmAgo mutants obtained from our collaborating wet lab and iterated based on the results.
For specific results, please refer to
https://2023.igem.wiki/sjtu-software/engineering .
1. Zeming Lin et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379,1123-1130(2023).
2. Ruidong Wu et al. High-resolution de novo structure prediction from primary sequence. bioRxiv 2022.07.21.500999.
3. Minjie Mou et al. A Transformer-Based Ensemble Framework for the Prediction of Protein–Protein Interaction Sites, Research, 6, (2023).
4. Jiawei Xing et al. Origin and functional diversification of PAS domain, a ubiquitous intracellular sensor, Science Advances, 9, 35, (2023).
5. Ruth Nussinov et al. AlphaFold, allosteric, and orthosteric drug discovery: Ways forward, Drug Discovery Today, 28, 6, (2023).