An Overview on Our Project
Since the outbreak in 2019, COVID-19 has caused a large number of deaths worldwide, regardless of each country's economic situation. Being highly contagious in human society, it has served as a stark reminder of the paramount importance of precise and rapid disease detection.
Argonaute proteins are a conserved class of proteins whose conserved functional domains include the following components: PAZ structural domain, MID structural domain, and PIWI structural domain.
The PAZ structural domain recognizes hydrogen bonds and stem-loop structures by interacting with the 3' end of the RNA molecule, specifically with the 2-3 nucleotides of the RNA molecule. This interaction allows Argonaute proteins to recognize and distinguish different types of miRNA from siRNA and to participate in subsequent RNA interference or RNA regulation processes.
The MID structural domain is usually located at the N-terminal end of Argonaute proteins, adjacent to the PAZ structural domain. The conserved amino acid residues within the MID structural domain have specific structural domain characteristics, and recognize and anchor the target RNA molecule primarily by forming hydrogen bonds and hydrophobic interactions with specific sequences at the 5' end of the RNA molecule.
The PIWI structural domain typically consists of approximately 350 amino acids and its three-dimensional structure exhibits a characteristic RNase H-like fold containing a central nucleotide binding site for binding to small RNAs. This structural domain also contains several conserved basic residues that play a key role in RNA binding and catalytic activity.
Argonaute protein is a key protein in RNA-mediated post-transcriptional gene regulation. It provides anchor sites for non-coding small RNAs to degrade target genes or inhibit translation. In eukaryotes, small RNAs bind to Ago proteins to form the RNA-induced silencing complex (mRISC), which controls almost every biological process. Argonaute proteins, for example, recognize and degrade viral RNA in antiviral immunity, thereby inhibiting viral infection. The study of Argonaute proteins contributes to our understanding of the molecular mechanisms of RNA interference and post-transcriptional gene silencing, and their role in physiological and pathological processes in living organisms. Using Argonaute proteins and RNA interference technology, targeted gene silencing can be achieved to explore the function of specific genes and develop potential therapeutic approaches.
Directed evolution is an important tool in molecular biology research, and its role is to simulate
the evolutionary process at the molecular level in the laboratory, where a large number of mutants
can be artificially created by random mutation and recombination. By applying selection pressure, it
is possible to select proteins with the desired characteristics and to simulate evolution at the
molecular level. The general process of directed evolution is:
1)mutation, i.e. the introduction of random mutations in the gene to be modified by error-pronePCR, DNA Shuffling, etc.;
2)expression, i.e. the transfer of the modified gene into the host cell for expression;
3)screening, i.e. the rapid assessment of properties such as turbidity, colour, fluorescence, or chemiluminescence
In contemporary applications of directed evolutionary methods, the screening process is limited by complex experimental protocols and great time cost, and is susceptible to environmental factors.
KmAgo (Kurthia massiliensis Argonaute, Lixin Ma, 2021) is a programmable and allosteric Argonaute nuclease derived from a mesothermal prokaryote named Kurthia massiliensis. Though being able to efficiently cleave most types of nucleic acids, the thermal stability of KmAgo is actually not satisfying.
Directed evolution on thermal stability of KmAgo can significantly extend its potential application areas, particularly in industrial processes involving high temperatures. Whether in areas such as biofuel production or industrial enzyme catalysis, the thermal stability of KmAgo offers the opportunity to function in high temperature environments. This allows scientists to further explore and develop the potential of KmAgo in a variety of high temperature applications.
DARWINS is a tool that uses a thermal stability prediction model as a screening tool for faster and easier screening. It consists of two main components: a feature extraction module to extract potential representations of protein sequences and a Tm (Melting Temperature) prediction module to predict the thermal stability of proteins based on potential representations. In the feature extraction model, we implement the functionality mainly through ESM-2, a protein language model that can be used to obtain dependencies between amino acids using mask language modelling (MLM). These features can be used to predict the structure of a protein. We use ESM-2 to obtain the relationships between amino acids as features of specific protein sequences for further stability prediction. With DARWINS, we expect to achieve the following:
1.Training a general model for high accuracy thermal stability prediction and fine-tuning the model for Ago proteins and mutant strains.
2.Development of a thermal stability prediction platform where user-submitted sequences are able to output TM values and sort them.
3.Auxiliary thermal stability modification of KmAgo to obtain mutants by means of directed evolution, and evaluation by predictive models to obtain more stable KmAgo proteins.
On one hand, we have utilized DARWINS to screen all potential mutants of KmAgo. In the near future, we will continue our collaboration with the wet lab and select several mutants for further thermal stability functional validation.
On the other hand, DARWINS is not exclusively developed for KmAgo. We have already created an online platform to support researchers worldwide in predicting protein thermal stability and assisting them with relevant protein modification tasks. Moving forward, we will update and iterate on the backend model, training a more robust and precise model on a larger dataset. Additionally, we will perform regular server maintenance to ensure the model can provide services more reliably.
[1] Yang Liu et al. A programmable omnipotent Argonaute nuclease from mesophilic bacteria Kurthia massiliensis. Nucleic Acids Res. 2021 Feb 22;49(3):1597-1608.
[2] Lingel A et al. Structure and nucleic-acid binding of the Drosophila Argonaute 2 PAZ domain. Nature. 2003 Nov 27;426(6965):465-9.
[3] Lingel A et al. Novel modes of protein-RNA recognition in the RNAi pathway. Curr Opin Struct Biol. 2005 Feb;15(1):107-15.
[4] Carmell MA, et al. The Argonaute family: tentacles that reach into RNAi, developmental control, stem cell maintenance, and tumorigenesis. Genes & development. 2022 16(21), 2733-2742.
[5] Vaswani A et al. Attention is all you need. Advances in neural information processing systems, 2017, 30.
[6] Tan P et al. TemPL: A Novel Deep Learning Model for Zero-Shot Prediction of Protein Stability and Activity Based on Temperature-Guided Language Modeling. arXiv preprint arXiv:2304.03780, 2023.
[7] Lin Z et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 2023, 379(6637): 1123-1130.
[8] Qin Y et al. Emerging Argonaute-based nucleic acid biosensors. Trends Biotechnol. 2022 Aug;40(8):910-914.