Overview

Cancer Sniper is a new cancer therapy that can overcome the current dilemma in the clinical treatment of lung cancer, establish a more efficient lung cancer treatment with low side effects, and develop a new microbial-based tumor treatment system. we plan to develop an accurate treatment of lung cancer based on genetically engineered bacteria selected by machine learning. Our results are divided into 3 parts, which are the machine-learning selection of lung cancer bacteria, The test of three promoters, and the construction of the anti-tumor bacteria system.

Machine Learning Selection of Bacteria

Accuracy Validation

In verifying the model accuracy and precision, we choose two ways to verify.

To verify the model accuracy and precision from a mathematical perspective, we use the ROC AUC plot and the confusion matrix. The ROC (Receiver Operating Characteristic) is a graphical representation showcasing the performance of a binary classification model across all classification thresholds. Specifically, the y-axis of the ROC represents the True Positive Rate (TPR, also known as sensitivity), and the x-axis represents the False Positive Rate (FPR, or 1-specificity). The curve illustrates how the model's TPR and FPR change with varying classification thresholds. The AUC (Area Under Curve) is essentially the area beneath the ROC curve. AUC values range between 0 and 1, where 1 signifies a perfect classifier, and 0.5 indicates a model that performs no better than random guessing.

The confusion matrix, on the other hand, is a specific table layout useful for visualizing and evaluating the performance of supervised learning algorithms. For binary classification problems, this matrix is of size 2x2, representing the actual versus predicted categories. The four primary components of a confusion matrix are True Positives (TP), False Negatives (FN), True Negatives (TN), and False Positives (FP). These components help delineate the types of errors made by the classifier.

The rationale behind choosing the ROC AUC and confusion matrix for model evaluation is multi-fold. The ROC AUC provides a comprehensive metric of the model's performance, independent of a specific classification threshold. This is especially beneficial when comparing the performance of multiple models, allowing for an intuitive understanding of which model outperforms across various thresholds. The confusion matrix offers a clear depiction of the types of classification errors made by the model. These metrics offer insights into the model's performance on both positive and negative classes and guide potential model refinement and optimization.

Furthermore, to validate our model in a biological way, we performed a differential expression analysis (DEG) using RNAseq results from bacteria that were not included in the training set. We identified all the genes marked as being present in a specific environment from our prediction data. These genes represent the specific genes identified for that environment through machine learning. At the same time, our DEG (differentially expressed gene) dataset represents genes that show differential expression in bacteria when they are in that specific environment. We performed an overlap analysis between these two sets of genes, and the higher the overlap value, the greater the biological accuracy and reliability of our model.

Lung Data

The ROC AUC plot notably leans towards the top-left corner, which is emblematic of optimal classifier performance, and indicates an AUC of 0.99 for the RandomForestClassifier. This suggests outstanding classification capabilities. Further validation is provided by the confusion matrix, which unveils high accuracy rates for both true positives and true negatives. Concluding the analysis, when juxtaposed with actual DEG results, the model's predictive score for the "Acinetobacter" bacterial identification exhibits a striking alignment with the empirical data, thus showcasing a marked consistency between the model's predictions and the real DEGs.

Fig 1A. ROC AUC plot of the model with lung data

Fig 1B. Confusion matrix of the model with lung data

Fig 1C. DEG match result of the model with lung data

Tumor Data

The ROC AUC plot prominently favors the upper-left corner, indicating an exemplary classifier performance, with a calculated AUC of 0.99 for the RandomForestClassifier. This underscores its exceptional classification capabilities. Additionally, the confusion matrix confirms high accuracy rates for true positives and true negatives (98.17% overall). In conclusion, when compared to actual DEG results, the model's predictive score for identifying 'Acinetobacter' bacteria highlights a substantial consistency (55%) between the model's predictions and real DEGs.

Fig 2A. ROC AUC plot of the model with tumor data

Fig 2B. Confusion matrix of the model with tumor data

Fig 2C. DEG match result of the model with tumor data

From mathematic perspective, our model has a very high accuracy, however, when proving using From biological perspective, our overlap gene has been demonstrated to have a certain level of accuracy, but compared to mathematical validation, the accuracy is relatively lower. To deal with this, we decide to use a language model to further study the description of each gene, which may find more biological link between different gene of different bacteria. This will improve our model accuracy and the reliability in biological perspective.

Matching Results

In the match of engineered bacteria, we select three bacteria, S.typhimurium, Pseudomonas aeruginosa and Escherichia coli that is used to be report as the engineering bacteria for cancer immunotherapy.

Lung Data

Fig 3A. Identification of lung data

Tumor Data

Fig 3B. Identification of tumor data

Based on a comprehensive analysis of the research findings, it is evident that in the lung environment dataset, both E. coli and Pseudomonas perform exceptionally well, while in the tumor environment dataset, E. coli and S. typhimurium exhibit outstanding performance. These observations are not only crucial for selecting the appropriate engineering bacterium but also provide key insights for our wet lab experiments. Therefore, taking into consideration all the results, we draw a conclusion that E. coli is the most suitable chassis bacterium and should be chosen for our wet lab experiments.

The test of three promoters

To verify the initiation ability of the three promoters, pPepT( BBa_K4776001 ), LldR( BBa_K4776000 ), pCadC( BBa_K4776002 ) we cloned them upstream of pUC19-LUC ( BBa_K4776018 ) (Figure 3 A, B, C) and transformed them into Escherichia coli DH5 α Medium. Incubate at 37 ℃ for 24 hours in aerobic and anaerobic environments, lactate gradient, and pH gradient. Afterward, use a spectrophotometer to measure its fluorescence intensity. The results showed that the initiation ability of the pPepT promoter in anaerobic environment was significantly higher than that in aerobic environment (Figure 4A), and the initiation ability of the LldR promoter significantly increased with increasing lactate concentration (Figure 4B). The pCadC promoter had the strongest initiation ability in slightly acidic environment with pH=6.4.

Fig 3A. pPepT Promotes LUC Expression Plasmid

Fig 3B. LldR Promotes LUC Expression Plasmid

Fig 3C. pCadC Promotes LUC Expression Plasmid

Fig 4A. pPepT promote LUC expression in anaerobic environment and normal environment

Fig 4B. LldR Promotes LUC Expression Plasmid

Fig 4C. pCadC Promotes LUC Expression Plasmid

To further validate the ability of these three promoters to express the anti-tumor protein CDD-iRGD, we removed the LUC portion of the plasmid in Figure 1 using enzyme digestion and replaced it with a CDD-iRGD protein sequence labeled with 6 * his, and confirmed its expression through Western blot. The results showed that the initiation ability of the Ppept promoter in anaerobic environment was significantly higher than that in aerobic environment (Figure 5A), and the initiation ability of the LldR promoter significantly increased with increasing lactate concentration (Figure 5B). The pCadC promoter has higer expression ability when PH lower than 7 (Figure 5C).

Fig 5A. pPepT promoted expression of CDD-iRGD protein at different oxygen concentrations

Fig 5B. LldR promoted expression of CDD-iRGD protein different lactate concentration (mol/l)

Fig 5C. pCadC promoted expression of CDD-iRGD protein different PH

Construction of the anti-tumor bacteria system

Based on the results of our promoter initiation ability tests, we have achieved a significant milestone in our project by successfully constructing two essential plasmids. These plasmids have been meticulously designed to facilitate precise control and targeted expression of our anti-tumor protein CDD-iRGD, and notably, these plasmids were co-transformed into the same bacterial strain, Escherichia coli DH5α, to further advance our research.

We first engineered a single PUC19-LUC plasmid that incorporates all three promoters—LldR, pCadC, and Ppept with trigger A1, A2, A3, respectively. This was named “PUC19-trigger” which was designed to ensure tight regulation of promoter-driven initiation within a single genetic construct. After assembling this multi-promoter plasmid, it was co-transformed into Escherichia coli DH5α alongside the CDD-iRGD expression plasmid, as described below.

Concurrently, we developed an element that includes a switch sequence linked to the CDD-iRGD anti-tumor protein sequence. This element was integrated into the pACYC177-T7-EGFP vector, which was called “PT7-CDD-iRGD”. It was then co-transformed into Escherichia coli DH5α along with the “PUC19-trigger” plasmid (Figure 6).

Fig 6. pCadC promoted expression of CDD-iRGD protein different PH

Following the successful assembly of these plasmids and their co-transformation into the same Escherichia coli DH5α strain, we have entered the phase of evaluating the expression of CDD-iRGD in response to the trigger elements and exploring the potential therapeutic implications. We have co cultured the double transformed bacteria with 3D cultured tumor cell spheres. The relevant experimental results will be presented on Jamboree.