Model | Guangxi-U-China

Overview
Introduction
Mathematic modeling
Results
Conclusions
References

Model

Overview

In 2023, we have completed 1 mathematic model for the investigation of adaptative evolution of the metabolic and proteomic network of bacteria living on lactic acid.

Reprogramming of cellular metabolism is the emerging hallmark of TNBC that exacerbates proliferation, metastasis and angiogenesis of tumor cells.¹ The high glycolytic activity of TNBC cells usually causes the accumulation of metabolic byproduct lactic acid and hypoxic tumor microenvironment (TME).² We have demonstrated in Project page that bacteria can be genetically modified so as to attack on tumor cells with advanced synthetic biology technique. In contrast to intensively studied genetic mechanisms, much less attention has been paid to proteomic and metabolic network evolution in responses to environmental stresses. With regular bioinformatic tools, we have discovered lots of proteomic and metabolomic changes (see Measurement and Results page for more information). In this model, we are going to show you the establishment of an unsupervised machine learning mathematic model of artific02ial intelligence that can perform deeper data mining than regular bioinformatic tools. It reveals how bacteria can naturally adapt to lactic acid by themselves and live on lactic acid through proteomic and metabolic reprogramming.

Introduction

Because of the good biocompatibility and ideal targeting ability, Escherichia coli Nissle 1917 bacteria have become an emerging biomimetic platform that can retain their innate biological functions for in situ therapy of cancers in the acidic and hypoxic tumor microenvironment.³ Over the years, biological systems have been capable of adapting to continuous environmental changes through the self-invoking of different strategies to survive a specific condition. In this model, Escherichia coli Nissle 1917 cells have been cultured in a series of culture medium containing different amounts of lactic acid. Proteins and metabolites have been extracted from three batches of cells in parallel that are grown without (control) or with (sample) lactic acid at a concentration of 3 g/L, respectively. Before the analysis, proteins are digested into peptides with trypsin. Tryptic peptides and metabolites are subjected to liquid chromatographic separation and mass spectrometric analysis, respectively. Based on the advancement of proteomic and metabolomic tools, the molecular mechanism of evolved phenotypes such as the growth rate or product secretion rate can be dissected in detail. Although a number of technologies can provide genome-wide measurements of adaptive changes in gene expression or genome sequences,^4-7 it remains unknown how the gene expressions are related with improved phenotypes and improved cellular properties during the evolutionary process. Our mathematic model reveals the proteomic and metabolic network responses to laboratory evolution that are helpful for understanding proteomic and metabolic functions and pathway fluxes. Because glucose is the preferred carbon and energy source for most bacteria and eukaryotic cells,^8-10 the evolution of bacteria on glucose has been studied by lots of groups.^11-15 Our project indicates that microorganisms might be also capable of adapting to various other carbon compounds in the environment.

Mathematic modeling

The regular bioinformatic volcano plot is used as an example to demonstrate our algorithm. Volcano plot has been developed as an efficient tool for the visualization of complex datasets generated by genomic screening, proteomic and metabolomic approaches. It is essentially a scatter plot, which shows the log2 of the fold change (FC) on the x-axis and minus log10 of the p-value on the y-axis. Dots in plots represent the data and the position of the individual points is defined by these coordinates.^16-18 Threshold values for the fold change and the significance are set to classify the data as ‘unchanged’, ‘decreased’ or ‘increased’. We develop a K-means clustering algorithm for the presentation of volcano plot. It is an unsupervised learning algorithm that has been used to solve the clustering problems in machine learning and data analysis.^19-21 The unlabeled datasets are divided into k different clusters do that each dataset belongs to one group that has similar properties. It is a centroid-based algorithm that can minimize the sum of distances among data points and clusters.

§ Fold change
The fold change (FC) is defined with equation 1.

(1)

in which _sample represents the average expression level of the sample group and _control represents the average expression level of the control group. The value of the fold change shows the multiple changes in protein compared to the control group, which can screen out proteins with significant changes in expression.

§ P-value
The significance of differences (p-value) is a value calculated with statistical testing methods. It shows the statistical significance of differences among datasets. In this case, we screened for proteins or metabolites with significant differences in expression without or with lactic acid by using T-test methods. The datasets are defined as sample group and control group, respectively. Each group has 3 replicates. In order to verify whether there is a significant change in the same protein between the sample group and the control group, two independent t-tests are used that are aimed to compare whether there is a difference in the overall mean between two independent datasets. Two datasets are independent and follow a normal or approximate normal distribution with homogeneity of variance.

Main steps:
✓ Determine the original hypothesis and alternative hypothesis. The mean of the two sets with the original hypothesis H0 is equal, or alternative hypothesis H1 with that the mean values of the two sets are unequal. As for H0, we obtain x_s=x_c. Otherwise for H1, we obtain x_s≠x_c.
✓ Determine whether the dataset meets the assumptions of the t-test. The data follows a normal distribution, and the two sets of data have equal variances and are independent of each other.
✓Calculate t-test
Define sample group dataset as X_S and control group dataset as X_C. Equation 2 is obtained.

(2)

In which _C represents the average value of the control group and _S represents the average value of the experimental group. n_c and n_s represent the volume of the control group and the sample group, respectively. When the variance is equal, the common variance S² is estimated by the combined variance of two samples as shown in equation 3.

(3)

✓ Find t-critical value. Based on the selected significance level, find the corresponding t-critical value in the t-distribution table.
✓ Comparing t-values and t-critical values. As for a two-sided test that determines whether the mean of two populations is equal, equation 4 is used to judge the absolute value of the calculated t-statistic. Reject the original hypothesis and assume that the overall mean is unequal, otherwise accept the original hypothesis.

(4)

✓ Calculate the p-value. Based on the t-value and degrees of freedom, calculate the p-value. The P-value represents the probability of observing extreme results that are as extreme as the actual data, assuming the original assumption is true. When H0 is established, the P-value at the degree of freedom df=n_c+n_s-2 is met, based on the significance level α and the degree of freedom to search for the t-distribution table.
✓ Judge statistical significance. When the p-value is less than the selected significance level, the original hypothesis is rejected. If it is greater than selected significance level, the original hypothesis is accepted. It is considered that there is no significant difference in the mean between the two datasets.

§ K-means clustering algorithm
Given the dataset D={x₁,x₂,…,x_m}, the k-means algorithm divides the clustering results into C={C₁,C₂,…,C_k} with a minimum square error as shown in equation 5.

(5)

In which represents the mean vector of cluster C_i. The smaller the E-value, the higher the similarity of the datasets within the cluster.

Main steps
✓ Define k initial clustering centers within the dataset.
✓ Calculate the distance between each dataset and the cluster center. Classify it into the class with the smallest distance from the cluster center.
✓ For each class, recalculate its clustering center.
✓ Repeat until a certain condition is reached (such as number of iterations, minimum error change).

Results

Two computer programs have been established for T-test and K-means clustering that can be downloaded from our team wiki. The programs are written in Python with KLEAN and MATPLOTLIB units for K-means clustering and visualization, respectively. Figure 1 and 2 show the volcano plots generated with the algorithm with different K values. Compared with regular bioinformatic analysis described in our Project page, more dissected proteomic and metabolomic network changes in E coli Nissle 1917 that live in lactic acid are revealed.

Figure 1. Volcano plots of proteomic network changes in E coli Nissle 1917 living in lactic acid

Figure 2. Volcano plots of metabolomic network changes in E coli Nissle 1917 living in lactic acid

Conclusions

One preliminary mathematic model has been established for the discovery of dynamic changes in proteome and metabolome of E coli Nissle 1917 cells that live in lactic acid. This preliminary model represents our first step towards the development of more comprehensive metabolic and proteomic networks that can explore the dynamics of bacteria evolution in responses to environmental changes. We hope our preliminary model can be helpful for the iGEM community.

Click here for the download of T-test computer program
Click here for the download of K-means clustering computer program

References

1. Hay, N., Reprogramming glucose metabolism in cancer: can it be exploited for cancer therapy? Nat. Rev. Cancer 2016, 16, 635-649.
2. Deepak, K. G. K., Vempati, R., Nagaraju, G. P., Dasari, V. R., Nagini, S., Rao, D. N., Malla, R. R., Tumor microenvironment: Challenges and opportunities in targeting metastasis of triple negative breast cancer. Pharmacol. Res. 2020, 153, 104683.
3. Duong, M. T. Q., Qin, Y., You, S. H., & Min, J. J. (2019). Bacteria-cancer interactions: bacteria-based cancer therapy. Experimental & molecular medicine, 51(12), 1-15.
4. López-Maury, L., Marguerat, S., & Bähler, J. (2008). Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation. Nature Reviews Genetics, 9(8), 583-593.
5. Fletcher, E., Feizi, A., Bisschops, M. M., Hallström, B. M., Khoomrung, S., Siewers, V., & Nielsen, J. (2017). Evolutionary engineering reveals divergent paths when yeast is adapted to different acidic environments. Metabolic engineering, 39, 19-28.
6. Ju, S. Y., Kim, J. H., & Lee, P. C. (2016). Long-term adaptive evolution of Leuconostoc mesenteroides for enhancement of lactic acid tolerance and production. Biotechnology for biofuels, 9, 1-12.
7. Shi, A., Fan, F., & Broach, J. R. (2022). Microbial adaptive evolution. Journal of Industrial Microbiology and Biotechnology, 49(2), kuab076.
8. Bergmann, R. (1989). Thiolutin inhibits utilization of glucose and other carbon sources in cells of Escherichia coli. Antonie van Leeuwenhoek, 55, 143-152.
9. Winkler, H. H., & Wilson, T. H. (1967). Inhibition of β-galactoside transport by substrates of the glucose transport system in Escherichia coli. Biochimica et Biophysica Acta (BBA)-Biomembranes, 135(5), 1030-1051.
10. Aguilar, C., Escalante, A., Flores, N., de Anda, R., Riveros-McKay, F., Gosset, G., & Bolívar, F. (2012). Genetic changes during a laboratory adaptive evolution process that allowed fast growth in glucose to an Escherichia coli strain lacking the major glucose transport system. BMC genomics, 13(1), 385.
11. Balderas-Hernández, V. E., Hernández-Montalvo, V., Bolívar, F., Gosset, G., & Martínez, A. (2011). Adaptive evolution of Escherichia coli inactivated in the phosphotransferase system operon improves co-utilization of xylose and glucose under anaerobic conditions. Applied biochemistry and biotechnology, 163, 485-496.
12. Jiang, L., Li, S., Hu, Y., Xu, Q., & Huang, H. (2012). Adaptive evolution for fast growth on glucose and the effects on the regulation of glucose transport system in Clostridium tyrobutyricum. Biotechnology and Bioengineering, 109(3), 708-718.
13. Wannier, T. M., Kunjapur, A. M., Rice, D. P., McDonald, M. J., Desai, M. M., & Church, G. M. (2018). Adaptive evolution of genomically recoded Escherichia coli. Proceedings of the National Academy of Sciences, 115(12), 3090-3095.
14. LaCroix, R. A., Sandberg, T. E., O'Brien, E. J., Utrilla, J., Ebrahim, A., Guzman, G. I., & Feist, A. M. (2015). Use of adaptive laboratory evolution to discover key mutations enabling rapid growth of Escherichia coli K-12 MG1655 on glucose minimal medium. Applied and environmental microbiology, 81(1), 17-30.
15. Shi, A., Fan, F., & Broach, J. R. (2022). Microbial adaptive evolution. Journal of Industrial Microbiology and Biotechnology, 49(2), kuab076.
16. Li, W. (2012). Volcano plots in analyzing differential expressions with mRNA microarrays. Journal of bioinformatics and computational biology, 10(06), 1231003.
17. Goedhart, J., & Luijsterburg, M. S. (2020). VolcaNoseR is a web app for creating, exploring, labeling and sharing volcano plots. Scientific reports, 10(1), 20560.
18. Laplaza, R., Das, S., Wodrich, M. D., & Corminboeuf, C. (2022). Constructing and interpreting volcano plots and activity maps to navigate homogeneous catalyst landscapes. Nature Protocols, 17(11), 2550-2569.
19. Likas, A., Vlassis, N., & Verbeek, J. J. (2003). The global k-means clustering algorithm. Pattern recognition, 36(2), 451-461.
20. Pham, D. T., Dimov, S. S., & Nguyen, C. D. (2005). Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219(1), 103-119.
21. Ghazal, T. M. (2021). Performances of K-means clustering algorithm with different distance metrics. Intelligent Automation & Soft Computing, 30(2), 735-742.