Notebook

back to top back to top

2022

September 2022:

Yuman Li, Jingfei Hou, Yuan Chiang, Ziqian Wang, and Zhan Shi took a course called "Interdisciplinary Research and Practice in Dept.of Automation". The theme of the course was synthetic biology. Under the guidance of Professor Gu Jin and Professors Zhen Xie, this course enriched our knowledge of biology, and we conducted extensive literature research, explored the intersection of synthetic biology and informatics.

November 2022:

Yuman Li, Jingfei Hou, and Yuan Chiang developed a keen interest in protein structure prediction. During a literature sharing session in class, we chose to discuss the paper "Highly accurate protein structure prediction with AlphaFold," focusing on the algorithmic details of AlphaFold, such as MSA representation & pair representation, IPA & Residue Gas, Self-distillation training and Recycling.

March to June 2023

March 2023:

Yuman Li, Jingfei Hou, Yuan Chiang, Ziqian Wang, and Zhan Shi were interested in protein optimization and protein design. We decided to team up to participate in the iGEM competition.

Focusing on protein optimization design, Yuman Li, Jingfei Hou, and Yuan Chiang discovered significant exploration potential in enzyme optimization design. They proposed an enzyme optimization pipeline: predicting mutation sites based on MSA and HotSpot Wizard, constructing a sequence space for mutations, and predicting the activity and stability of potential sequences. Ziqian Wang and Zhan Shi attempted various algorithms, using BLOSUM encoding to construct a mutation sequence space for a specific enzyme, and used deep models to fit and predict activity and stability.

Preliminary pipeline
Figure 1. Preliminary pipeline of mutation sequence space and scoring

April 6, 2023:

We went to QiTan Tech and had a full communication with them. We not only toured the company's bioinstrumentation but also gained a deep understanding of the structure, properties, and meaningful applications of nanopore proteins. During the exchange, we realized that optimizing enzyme design is challenging and discussed the progress in de novo protein design with the company, discovering that only a few research groups worldwide can accomplish de novo protein design.

April 22, 2023:

Focusing on protein optimization design, Yuman Li, Jingfei Hou, and Yuan Chiang discovered significant exploration potential in enzyme optimization design. They proposed an enzyme optimization pipeline: predicting mutation sites based on MSA and HotSpot Wizard, constructing a sequence space for mutations, and predicting the activity and stability of potential sequences. Ziqian Wang and Zhan Shi attempted various algorithms, using BLOSUM encoding to construct a mutation sequence space for a specific enzyme, and used deep models to fit and predict activity and stability.

May 2023:

Inspired by Professor Xie Zhen, we decided to focus on the theme of "humanization of antibodies" and began algorithmic design for antibodies based on our previous achievements.

May 28, 2023:

We specifically visited the Zhongguancun Life Science Park, which focuses on cutting-edge tracks such as cell and gene therapy, innovative pharmaceuticals, AI-driven drug development, AI-assisted diagnostics, and more. We visited some companies within the park, with a particular emphasis on holding discussions with Beijing Syngentech and BeiCell Biotechnology. In this exchange, we gained a greater understanding of antibody drugs.

June 2023:

We conducted various algorithmic attempts in the design of antibody drugs, including researching machine learning methods such as Hu-mAb and BioPhi for antibody humanization. We proposed two technical routes: "CDR transplantation" and "fully human antibody design." After multiple rounds of discussion, we generally believe that there is still significant room for improvement in the method of minimal response mutation. Additionally, we also learned about the antibody structure prediction tool IgFold and deployed relevant software environments on supercomputers.

Minimal response mutation
Figure 2: Minimal response mutation

July to October 2023

July 8-10, 2023:

Yuman Li participated in the CCiC (10th Conference of China iGEMer Community) in Hainan, China, and gained a lot of inspiration through exchanges with other iGEM teams.

July 18, 2023:

We visited VJTbio company for the first time, whose research direction is animal antibodies. We identified the users of our project and established collaboration. In our subsequent design, we constantly consider the practical needs of the industry. Our iGEM project has expanded from humanization of antibodies to the species-specification of antibodies, aiming to provide suitable antibody drugs for a variety of species.

July 25, 2023:

The final team roster is confirmed: Yuman Li, Jingfei Hou, Yuan Chiang, Ziqian Wang, Zhan Shi, Xiaomeng Chen.

August 2, 2023:

We collected tens of thousands of antibody sequence data, filtered FR1-FR4 to their typical lengths, and used algorithms for species prediction, achieving an accuracy of approximately 95%. This preliminary validation demonstrates the feasibility of classifying antibodies based on the species-specific FR region sequences. It indicates that the FR regions of antibodies from different species possess sufficient specificity. Our concept of enabling more species to have access to appropriate antibody drugs is viable.

Confusion matrix
Figure 3: Confusion matrix for seventeen-class classification

August 9, 2023:

We participated in the Synbio Plus AI Conference and shared some insights regarding generalization capabilities, dataset quality, interpretability, and security.

August 10, 2023:

We completed the construction of the Deep One Class model, extracting feature vectors for each species and providing species-specific sequence scores based on similarity measurements.

Species-specific sequence scores
Figure 4: Species-specific sequence scores based on the Deep One Class model

August 22, 2023:

We completed the part of the project that involves structural scoring based on IgFold.

Structural scoring
Figure 5: Structural scoring based on IgFold

September 2023:

We began packaging the software, summarizing the project's advantages, and reflecting on areas for improvement to make a greater impact on the world.

October 2023:

We organized the data and code throughout the entire project process and improved the wiki.