Scroll to reveal !!

Software

Overview

Introduction

Over the past few years, a rapid expansion in the applications of bioinformatics and information technology has taken place, especially in the field of immunology. Immunological Bioinformatics has provided novel methods to analyze large datasets and build-upon them through the usage of deep learning and machine learning algorithms. This has opened the doorway for the characterization of the different immune cells and has performed predictions on their corresponding target epitopes. One of the most important applications of this new field is represented in what’s known as T-Cell repertoires, which is a method that enables distinguishing the gene sequences of different T-Cell Receptors (TCR) in an extracted population of T-Cells. The analysis of the T-Cell repertoire can also provide comprehensive information on the history of antigens to which an individual was exposed, whether these antigens were formed due to exposure to foreign bodies, or even due to an autoimmune reaction towards a self-antigen. It is also noted that each condition can present a lot of different TCRs that correspond to different epitopes; however, there is individual variation in the presentation of these epitopes according to each person’s unique genetic background.

That’s why we’re looking to utilize the potential of the T-Cell Repertoire, in order to personalize the design of our Syn-Notch receptor and make it compatible with the distinctive genetic variations of each individual. This approach can enable the user to identify the top ranking epitope of their condition, and accordingly our Syn-Notch receptor will be designed to target this specific epitope in order to ensure precise and specific targeting for an overall better efficacy.

The theme for this year’s project is developing a SUPER approach. Also, since every superhero needs his arsenal of cool gadgets, we’ve decided to provide our SUPER-Cell design with one of its own.

Introducing our ‘SUPER-Gadget’ for this year’s software development and using the concept of T-Cell Repertoire, we’re looking to personalize the features of our Syn-Notch receptor in order to correspond with the highest possible specificity for the epitopes of each different patient. Nowadays, the importance of personalized medicine has become increasingly prominent, as it is based on each person’s genetic representation, which makes some patients respond to some treatments, while others don’t.

The goal of our SUPER-Gadget is to identify which of the Rheumatoid Arthritis (RA) epitopes corresponds with the user’s input Complementarity-Determining Region 3-β (CDR3-β) sequence of TCRs. This would result in the increase in expression of some RA-associated CDR3-β sequences at the expense of other sequences due to an increase in the expression of different epitopes within each individual.

Our Pipeline for this year’s software tool:

  • Analysis of the input CDR3-β protein sequence in order to determine whether it is classified as RA associated or Non-RA associated.

  • Measuring the binding between the CDR3-β sequence and a list of known RA epitopes.

  • The tool then ranks the formed complexes according to their binding score.

  • The sequence of the highest ranking epitope is isolated and incorporated within the design of the Syn-Notch’s external domain.

Ihela
Figure 1. The pipeline for our SUPER-Gadget

In addition, we’re also introducing an efficient Hand X-ray Classification Model that can be used to assess the impact of RA on the hand joints as they are the disease's most targeted joints. The importance of this model can be most distinguished within a clinical practice setting as it would help physicians identify early joint changes, and in turn take the necessary measures to prevent their progression.

This model can be conjugated with our SUPER-Gadget to create an effective follow-up method for the effectiveness of our therapeutic approach.

All the coding notebooks for our software tools can be found on our team’s Gitlab software website, including our pipeline and the instructions on how to use it. You can access it via this link: https://gitlab.igem.org/2023/software-tools/afcm-egypt .

1) CDR3-β prediction Model:

To start the implementation of our designed pipeline, we firstly had to develop a model that can predict whether the user's input CDR3-β sequence is associated with RA or with another condition. Our work began by collecting CDR3-β datasets that can be used to train our model and make it able to distinguish CDR3-β sequences of different conditions. The datasets have been derived from state-of-the-art databases, including the Immune Epitope Database (IEDB), Pathology-associated TCR database (McPAS-TCR), and the Rheumatoid Arthritis Synovial TCR Repertoire Database. After this, we started to analyze and process these datasets to prepare them for the training process.

We chose to compare CDR3-β sequences with those associated with other conditions, including other autoimmune diseases and infections.

Following this, we started the structuring process of our model. The construction of our model depends on using deep neural networks to process and analyze the input training datasets. We used 3 layers of 1-dimensional Convolutional Neural Networks (1d-CNNs) to enable our model to identify the patterns between the data and make the model learn how to distinguish RA-associated CDR3-β sequences from those who are associated with other condictions or "Non-RA". A Rectified Linear Unit (ReLU) activation function was used to speed up the training process, while reducing the computational complexity of the network.

The model has shown great training results, reaching an accuracy of nearly 95%. To validate these training results. We used a testing dataset which the model was unfamiliar with, and it has shown an accuracy of 89% in predicting the classes of the introduced sequences.

Ihela
Figure 2. A demonstration of the structure of our CDR3-β Class prediction model. The previewed curves illustrate the accuracy and loss of both model’s training and validation process. As for the confusion matrix, it demonstrates the model's ability to distinguish true results from false ones.
Ihela
Figure 3. Illustrates the results of our model when tested against a different dataset.

2) CDR3-Epitope Binding Prediction Model:

After the class of the input CDR3-β sequence is determined, it is then conjugated with a dataset of known RA Epitopes. The formed complexes are then subjected to our CDR3-β-Epitope Binding Prediction Model in order to rank them according to their binding, which can allow for isolating the epitope included in the highest ranking complex.

The datasets used for training this model were derived from the VDJ database, and McPAS-TCR. Firstly,the data was processed to create complexes between the epitopes and their corresponding CDR3-β sequences. This was followed by constructing our neural networks, which were similar to those of the previous model. However, we’ve decided to include a sigmoid function in the network's construction to measure the probability of prediction between 0 and 1. Accordingly, the predictions can be ranked descendingly from highest to lowest to select the epitope included in the highest ranking complex to be used later on in the tool’s pipeline.

The model has shown great results in terms of training accuracy, which reached up to 95%. The model’s testing results were also satisfying as the model has shown an accuracy of 91% when tested against a different dataset.

Ihela
Figure 4. The structure of the CDR3-Eptiope prediction model. The learning curves demonstrate the model accuracy and loss through the training and validation steps, which showed great accuracies of 95% and 90%. The confusion matrix demonstrates the model’s ability to differentiate between True and false results.
Ihela
Figure 5. Demostration of the testing accuracy of our model against a different dataset to evaluate its performance.

3) Using our team’s protein binding affinity model:

In order to make the best of our team’s previous software models that have been developed throughout our journey in iGEM, we decided to use one of the models that we’ve developed for last year’s project. This model predicts the binding affinity between two protein sequences using dG calculations. This can help improve the prediction accuracy of the binding between the CDR3-β and the list of epitopes. The two binding prediction models can be freely utilized by the user according to their preference.

Ihela
Figure 6. The training and validation results of the binding affinity model.

The prediction results of the model are ranked according to the dG score of each complex, where the complex showing the most negative score is considered the one with the highest binding affinity.

Ihela
Figure 7. Shows the binding prediction results of the binding affinity model, ranked according to their dG score.

4) Selecting the highest ranking epitope and visualizing the final structure of the Syn-Notch:

After ranking the CDR3-Epitope Complexes, the highest ranking epitope is then isolated to be incorporated in the sequence of our Syn-Notch receptor. The epitope sequence is specifically integrated into the external domain of the receptor. This would make our receptor compatible to bind with the auto-reactive cells that recognize the epitope, therefore enabling our SUPER-Cell to specifically recognize the auto-reactive cells to deliver its therapeutic cargo.

Finally, the 3D protein structure of the formed receptor can be visualized using AlphaFold2.

Ihela
Figure 8. The 3D structure of the generated Syn-Notch Receptor, visualized using PyMOL.

X-Ray Classification Model:

In addition to this year’s software tool, we’ve developed a classification software for analyzing a user’s input hand X-Ray and evaluating whether it shows any pathological changes that can be related to RA. These changes include erosion, joint space loss, and decrease in the quality of bone involved in forming the joint.

We’ve trained our model to detect these changes using datasets of normal hand X-rays and X-rays of RA patients, with a total number of 529 X-rays. This allows our model to detect even the earliest changes in joint quality, which can help in the early management of these cases.

The model has shown outstanding testing results, with an accuracy of nearly 98% in recognizing the difference between normal and RA-affected joints.

Ihela
Figure 9. A sample of the X-ray images that were used in the training process of the software tool. The figure to the right includes RA-affected hand X-rays, while the figure to the left includes healthy hand X-rays.
Ihela
Figure 10. An illustration of the testing results of our model against a different dataset, which showed an exceptional accuracy of 98%.

Our model can also be used as a follow-up tool for the response to therapeutic agents, because the disease severity corresponds with its impact on the patients’ affected joints. This means that the more the patient’s condition is controlled, the less changes there are in his X-ray findings.

We’re now looking to improve the scope of our model to include more than just RA changes, but to also enable it to identify any pathological changes within any of the patients’ joints, not only to be limited to the hand joints. Accordingly, the model can be configured to recommend the next course of action for the patient in order to avoid any unwanted adverse events.

Ihela
Figure 11. An example of the performance of the model when the user inputs a normal X-ray image
Ihela
Figure 12. An example of the performance of the model when the user inputs an X-ray image that shows rheumatoid changes

Adapting our X-ray model to mobile devices:

To make our X-ray classification model more user-friendly, we decided to use the power of Android Studio to build an efficient mobile application that can accurately predict the results of an input X-ray image.

Ihela
Figure 13. The interface of the mobile application

We used the Tensorflow Lite framework to enable our model to efficiently work on android-operated mobile devices, while maintaining its high prediction accuracy. The same neural network structure and training datasets were used for the training process of this software. The model has shown great training results, reaching almost 100%.

Ihela
Figure 14. The results of the training accuracy and loss of the software
Ihela
Figure 15. Illustrates an example of the software's predictions

Recommendations for future improvements:

Our SUPER-Gadget software tool and the X-ray Classification model is available for anyone who’s interested in the field of immunoinformatics. We recommend some improvements to be included by other developers who are interested in improving the performance of our software tool. These include incorporating the Major Histocompatibility Complex (MHC) classes I, II into the formed complexes for further improvement of the model’s prediction accuracy. We also recommend using a regression model to accurately predict the binding affinity between the epitope and CDR3-β.

For our X-ray Classification Model, we suggest including the recognition of X-rays of different body joints other than only using hand X-rays. This is because RA can affect any joint in the body, despite the fact that it mainly targets the small joints of the hand.

References

  • Mazzotti L, Gaimari A, Bravaccini S, Maltoni R, Cerchione C, Juan M, et al. T-Cell Receptor Repertoire Sequencing and Its Applications: Focus on Infectious Diseases and Cancer. Int J Mol Sci. 2022 Aug 2;23(15).
  • Pasetto A, Buggert M. T-Cell Repertoire Characterization. Methods Mol Biol. 2022;2574:209–19.
  • Vita R, Mahajan S, Overton JA, Dhanda SK, Martini S, Cantrell JR, et al. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res. 2019 Jan 8;47(D1):D339–43.
  • Tickotsky N, Sagiv T, Prilusky J, Shifrut E, Friedman N. McPAS-TCR: a manually curated catalogue of pathology-associated T cell receptor sequences. Bioinformatics. 2017 Sep 15;33(18):2924–9.
  • Zheng Z, Chang L, Mu J, Ni Q, Bing Z, Zou Q-H, et al. Database of synovial T cell repertoire of rheumatoid arthritis patients identifies cross-reactive potential against pathogens including unencountered SARS-CoV-2. Ann Rheum Dis. 2022 Oct 19;
  • Goncharov M, Bagaev D, Shcherbinin D, Zvyagin I, Bolotin D, Thomas PG, et al. VDJdb in the pandemic era: a compendium of T cell receptors specific for SARS-CoV-2. Nat Methods. 2022 Sep;19(9):1017–9.
  • Jasim SA, Yumashev AV, Abdelbasset WK, Margiana R, Markov A, Suksatan W, et al. Shining the light on clinical application of mesenchymal stem cell therapy in autoimmune diseases. Stem Cell Res Ther. 2022 Mar 7;13(1):101.
  • Smirnov AS, Rudik AV, Filimonov DA, Lagunin AA. TCR-Pred: A new web-application for prediction of epitope and MHC specificity for CDR3-β TCR sequences using molecular fragment descriptors. Immunology. 2023 Mar 16;
  • Montemurro A, Schuster V, Povlsen HR, Bentzen AK, Jurtz V, Chronister WD, et al. NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data. Commun Biol. 2021 Sep 10;4(1):1060.
  • Sidhom J-W, Larman HB, Pardoll DM, Baras AS. DeepTCR is a deep learning framework for revealing sequence concepts within T-cell repertoires. Nat Commun. 2021 Mar 11;12(1):1605.
  • Wang Z, Liu J, Gu Z, Li C. An Efficient CNN for Hand X-Ray Overall Scoring of Rheumatoid Arthritis. Complexity. 2022 Feb 23;2022:1–9.
  • Fung DL, Liu Q, Islam S, Lac L, O’Neil L, Hitchon CA, et al. Deep learning-based joint detection in Rheumatoid arthritis hand radiographs. AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:206–15.