Contribution

back to top back to top

Overview

From the outset of our participation in iGEM, our team has been committed to exploring how to apply our expertise to the field of medicine and health, not limited to human life and health. Inspired by the humanization of antibodies sourced from mice, we aimed to broaden the scope of antibody drug design to encompass a wider range of species. We collected antibody sequences from hundreds of species through BLAST and completed annotations of the FR and CDR regions, constructing a multi-species FR homologous database. Based on this, we have developed an artificial intelligence-based antibody drug design system that integrates antibody sequence generation, antibody sequence scoring, antibody structure scoring, and antibody result visualization into a single system. We believe that our work will help address the issues of overuse of antibiotics and limited treatment options for pets and livestock, drawing more attention to healthcare concerns across different species and making a contribution to broader medical and health fields. Our work itself is beneficial to future iGEM teams. In addition, we have also summarized several additional experiences that can be adopted by future iGEM teams.

Contribution overview
Figure 1: Contribution overview

Animal Drug Development and Societal Health

With the continuous development of society, various new drugs and medical technologies, such as antibiotics and vaccines, have emerged, and many human diseases have been overcome. However, many animals still suffer from diseases and illnesses. The development of drugs for various types of animals is a slow process. Whether it's pets that accompany people or cattle and sheep living on farms, they generally rely on large doses of antibiotics through oral administration or injection for treatment. This not only enhances resistance in disease strains but also poses a risk to human health when antibiotics remain in animal products consumed by people.

pet hospital

Pets, as companions in people's lives, and livestock, as the main source of protein intake for humans, have a direct impact on the physical and mental health of human populations. Given this issue, we are eager to make a contribution to animal drug development and raise awareness of its importance among more people.

During our research, we learned about the development of antibody-type drugs, which caught our attention. Antibody-type drugs offer excellent specificity and lower side effects, making them a potential breakthrough in animal drug development. Inspired by the humanization of mouse-derived antibodies, we enthusiastically embarked on the design of antibodies for multiple species. We actively engaged with external experts, sought valuable advice, and aimed to raise awareness about the significance of animal drug development among a wider audience. We also hope that more iGEM teams will pay attention to the field of animal medicine and health in the future.

pet healing

Data Collection

BLAST is a well-known bioinformatics tool used for comparing and searching nucleotide or protein sequences against a database of known sequences. It provides information about sequence homology, aiding in determining the source of input sequences or their evolutionary relationships with known sequences. We used BLAST to collect antibody sequences from hundreds of different species.

Protein BLAST usage
Figure 4: BLAST workflow

We also completed annotations for the Framework Regions (FR) and Complementarity-Determining Regions (CDR) and constructed a multi-species FR homology database. This will facilitate the work of other researchers in related fields, making the development of antibody drugs for multiple species more accessible.

Sequence Generation Paradigm

We have provided a reliable strategy for generating sequence mutations, which can be widely applied to similar protein sequence prediction-related work. By integrating the features of input antibody data and the statistical patterns of antibody data from the target species, we have employed a conservative mutation strategy to determine the most likely amino acid for each position. We then generate a potential antibody data sequence library through permutations and combinations. Through continuous experimentation, we have established a reliable threshold range and confirmed the stability and reliability of our mutation generation method with the final scoring rules. This will be a significant contribution to the rest of the protein design work in the iGEM community.

Sequence Scoring Approach

We have innovatively divided the scoring into two parts: sequence scoring and structure scoring. Sequence scoring, as currently the most mainstream scoring method for humanized mouse antibodies, serves as the basic ranking metric for guiding our model training. However, considering the potential differences in antibody structures, we have introduced a new structure scoring after conducting structure prediction using IgFold. This structure score is used to characterize whether the folded spatial structure can maintain its original features, ensuring its stability. This feature of our system ensures the credibility of the generated results, avoiding the high trial-and-error costs and long development cycles associated with overly black-box results. Our entire system helps drug developers quickly identify potentially effective antibody sequences, thereby accelerating the development of antibody-based animal drugs and improving the living environment for animals, reducing the impact of diseases, and creating a better world for both humans and animals.

Model Contribution

Currently, a mainstream and simple encoding method in the field of bioinformatics is one-hot encoding. This encoding method makes the data has high separability, but does not contain information on the physical and chemical properties of amino acids and genetic statistical laws of nature. In order to dig out the real characteristics of the data, we investigated the mainstream amino acid coding matrices including BLOSUM matrix and PAM matrix. We hope that more iGEM teams in the future will consider these encoding method as a candidate to make their projects better.

Blosum62
Figure 5. Blosum62 matrix
Pam
Figure 6. Pam matrix
Antibody structure
Figure 7. Relationship between the matrices

Antibody structure
Figure 8. Ecoding example

Moreover, we migrated the Deep One Class model for image anomaly detection to multi-genus scoring. We combined the concept of One-Class and the strong feature extraction capabilities of the deep learning model to successfully achieve reasonable species scoring of antibody data and solve the problem of imbalanced sample data. At present, the One-Class model has not received much attention from everyone. Our project provides a good example for using the OneClass model, especially the Deep One Class model, which will enable future iGEM teams to solve similar problems more easily.

Antibody structure
Figure 9. Training frameworks of the proposed DOC method