Modeling

Metabolic modeling

In the second part of our project, the aim was to produce single-cell proteins from microbial biomass grown on the PET monomers ethylene glycol (EG) and terephthalic acid (TPA). While residual PET and the choice of microbes grown are obviously hugely important considerations when envisioning the end product as food1, we wanted to begin by simply assessing the feasibility of such a process.

The modeling aimed to provide both qualitative and quantitative insight, and give us some baseline predictions before starting wet lab work. Firstly, we were interested in seeing if our chosen organisms were predicted to be able to grow on EG or TPA as the sole carbon source. We were furthermore interested in seeing how growth rates and yields compare against other carbon sources such as glucose. Secondly, as we planned to conduct our experiments on the M9 minimal medium, we wanted to make sure that our medium would not be lacking in important metabolites. Thirdly, we wanted to compare the synthetic assimilation pathways defined by our parts to standard glycolysis. Lastly, from the sustainability point of view, we wanted to see how much of the carbon from PET we could expect to turn into biomass, and how much would be secreted as CO2 or other compounds.

These questions were explored using a technique called flux balance analysis (FBA). Below, the method is briefly explained. We refer readers more interested in the technique to an excellent paper by Orth, JD. et al.16 and to the paper on COBRApy17, which we also used in our modeling.

Modeled organisms

Before modeling, we had to select the most promising organisms. We built a shortlist through literature searches and meetings with experts such as Stephen Techtmann.

As our wild-type species, we chose Comamonas testosteroni, due to the presence of the tph operon in a closely related strain Comamonassp. E62, which codes for a TPA assimilation pathway. The same operon has also been found in strains of Rhodococcus opacus3, and we actually investigated the presence of this pathway in our exact strain of R. opacus. Lastly, we also found that Clostridium glycolicum can utilize EG as a carbon source4,5.

We could not find metabolic models for the specific natural strains we had access to through the microbial domain biological resource center HAMBI. Instead, we used gapseq, which is a pipeline for generating genome-scale metabolic models from genome sequences through several steps including a comprehensive database search of genes corresponding to known reactions6. The model-building of gapseq can also be guided by providing a minimal medium along with the genome sequence. We utilized the metabolic model iGR1773 for R. opacus PD630 to build a minimal medium based on the M9 minimal salts, trace metals and a PET monomer carbon source, and supplied it to gapseq. While all three strains were modeled, later laboratory experiments would begin with just R. opacus and C. testosteroni.

We omitted C. glycolicum from wet lab experiments as a more thorough literature search reveals that C. glycolicum is known to have adverse health effects7,8. Furthermore, despite some recent papers still using the name C. glycolicum, it was actually reclassified as Terrisporobacter glycolicus in 20149, a search of which provides more sources citing adverse health effects.

For engineering, we chose Escherichia coli, and Pseudomonas putida as the hosts. E. coli was chosen as there is existing literature on how the species can be engineered to grow with EG as the main carbon source4,10. As for P. putida, there are reports of certain strains such as JM37, which can naturally utilize EG11. Instead, we had access to strain KT2440, which reportedly cannot grow on EG as the sole carbon source (even though our modeling with iJN1462 indicates otherwise). However, KT2440 is widely regarded as a robust host for metabolic engineering, and it had been modified to utilize TPA previously12. Genome-scale metabolic models iAF126013, and iJN146214 were augmented with the engineered assimilation pathways for E. coli K-12 and P. putida KT2440 respectively, and used in the analysis.

You can read more about the parts and pathways involved in engineering here.

The code for reproducing the results, models generated with gapseq, and further discussion can be found in our GitLab.

Defining the growth medium

In our growth experiments, we used M9 Minimal Salts (2X) from ThermoFisher Scientific as the basis for our growth mediums to which essential trace metals and a primary carbon source would be added (See protocols). As the results of flux balance analysis are dependent on the nutrient concentrations and/or uptake rates, we began by calculating the amounts of each nutrient in our growth medium. Some larger macromolecules were split into constituent molecules as advised in literature15. This was necessary as many of the genome-scale metabolic models did not contain exchange reactions for the macromolecules, but accepted the constituent molecules. Table 1 shows the derived concentrations of the various molecules when M9 Minimal Salts (2X) is prepared following recommendations of the manufacturer, thus using water to dilute 0.5L of minimal salts into 1L of media, and adding a calcium and magnesium source.

Table 1. Compounds included in the modeled M9 minimal medium before the addition of trace metals and a carbon source.

Compound Formula Moles per 1L of medium
Sodium phosphate dibasic heptahydrate H15Na2O11P ≈ 47.7mmol
Monopotassium phosphate KH2PO4 ≈ 22.0mmol
Sodium chloride NaCl ≈ 8.6mmol
Ammonium chloride NH4Cl ≈ 18.7mmol
Magnesium sulfate MgSO4 ≈ 2.0mmol
Calcium dichloride CaCl2 ≈ 0.1mmol

Defining the bounds for exchange reactions

Flux balance analysis relies on defining microbial metabolism as a set of reactions, each of which consumes a set of metabolites while producing another set of metabolites. The goal then is to find a set of fluxes through all reactions such that mass balance is retained for all metabolites. From the perspective of modeling, the two most important reactions are the biomass reaction and the exchange reactions.

The biomass reaction uptakes metabolites needed for cellular growth thus depleting them from the metabolic network. On the other hand, exchange reactions define what metabolites enter the metabolic network. In our case, we are interested in finding a steady flux state that maximizes the biomass function. The value of the biomass function is often limited by bounds placed on the exchange reactions, in other words, the availability or uptake rate of some metabolites can be identified as the factor limiting growth rate. There are two ways of defining the bounds for the exchange reactions that change the interpretation of the modeling results.

First, concentrations of each metabolite in the growth medium can be calculated in mmol/L, and these can be set as the bounds to the respective exchange reactions15. Now as the biomass function is maximized, its value will also be in mmol/L, which can be turned into grams dry weight (gDW) by summing up all of the molecular weights of metabolites included in the biomass function. This result can be interpreted as the ability of the metabolic network to turn some amounts of metabolites into some corresponding amount of microbial mass. This result cannot however be used to make any predictions about the rate of growth, only the feasibility.

While the first approach would be easy to approach in our case, as we know the specific amounts of different metabolites in our growth mediums, a more common strategy is to set the bounds in terms of fluxes in mmol/(gDW·h)16. The documentation of COBRApy, a popular Python library for FBA, even suggests that the approach of using concentrations alone is flat-out wrong17. The challenge of using flux rates as bounds is that the maximum uptake rate is unique for each metabolite, and depends on the context and organism, and would thus need to be determined experimentally15. We make one assumption that allows us to convert metabolite concentrations into maximum uptake rates:

Given that metabolite y is present in the medium with a concentration of x mmol/L, we assume that the maximum uptake rate for metabolite y is given by x/24 mmol/(gDWh)

That is, we assume that at the beginning of an experiment, we have 1 gDW of bacteria per 1 liter of solution, and that the maximum uptake rate of a metabolite is the rate at which the starting bacterial mass would uptake all of the available metabolite in 24 hours if the uptake rate was constant.

There are some issues with this assumption. Firstly, in the real world, the uptake rate of a metabolite is not constant over time, but changes as metabolite availability and the number of microbes changes. Secondly, just because we assume that microbes can utilize some amount of metabolites in 24 hours does not mean it is the case in the real world, as metabolites might fail to enter the cytoplasm at the desired rate. While our medium is quite minimal with no extreme amounts of any single metabolite, a lower uptake rate of some of the metabolites will likely be one of the key reasons growth rates in the lab could be slower than predicted by the modeling.

Despite these limitations, the following modeling results will give us insight into qualitative questions such as whether or not an organism can grow given some set of metabolites. The differences in predicted growth rates will also enable us to compare TPA and EG as a carbon source to a more traditional one like glucose. Furthermore, the modeled mass balances still hold even if in reality uptake fluxes were smaller, meaning that in the lab we might still reach the predicted biomass quantities, but in a longer time than 24 hours.

Interpreting growth rates

Before presenting our results, it is also useful to mention how the growth rates given by the models were converted into growth predictions.

The biomass function of a genome-scale metabolic model is scaled such that when fluxes in the network are represented as mmol/(gDW·h), its value corresponds to the exponential growth rate of the organism16. Thus the amount of biomass after t hours could be represented as

where r is the exponential growth rate, and a is the starting biomass, which in our case is always one.

However, there is a problem with the above representation as microbial growth in the real world is not strictly exponential, but rather sigmoidal. Furthermore, by plugging some of the following results in the above formula, we can get biomass yields that are higher than the total mass of metabolites uptaken by the model, which is obviously nonsense.

Instead, we opt to model biomass growth with a logistic curve, where the amount of biomass after t hours is given by

where r is the exponential growth rate, and P0 is the starting biomass, which in our case is always one. Furthermore, we have carrying capacity K which represents the maximum amount of biomass possible, thus letting us place a limit on exponential growth.

We estimate the carrying capacity K for each result separately. Intuitively, it makes sense that the maximum possible amount of biomass would be limited by the total mass of metabolites available. Let c be the total mass of our carbon source in grams, n be the total mass of ammonium in grams (for reasons that will become apparent later), xi be the ith secretion flux, and wi be the molecular mass for the corresponding metabolite. We approximate carrying capacity K with:

as we have 17.3g of dry metabolites in our 1L of prepared M9 medium, from which we subtract the mass of base ammonium (0.337g). We add the final mass of our carbon source and ammonium, and subtract the total mass secreted by the model over 24 hours as this is mass that cannot contribute toward biomass accumulation. We assume that the masses of trace metals are insignificant.

Single microbe modeling

Table 2 summarizes the modeling setups. Besides the base M9 minimal salts, models were supplied with minimal amounts of trace metals, and a PET monomer as a carbon source to allow growth. The details, code, and values along with the generated gapseq models can be found on our GitLab.

Table 2. Summary of simulations and models used. For T. glycolicus, the model was generated with acetaldehyde as the carbon source, as EG is not a viable modelseed compound. The reaction from EG to acetaldehyde5 was added after the fact.

EG TPA Engineered Model used
P. putida KT2440 x x Yes iJN1462 augmented with a TPA assimilation pathway
E. coliK-12 x Yes iAF1260 augmented with an EG assimilation pathway
R. opacus DSM 43205 x No Generated with gapseq from genome assembly ASM91059154v1
T. glycolicus DSM 1288 x No Generated with gapseq from genome assembly ASM43910v1
C. testosteroni DSM 50244 x No Generated with gapseq from genome assembly ASM24152v2

Growth rates on TPA

We began by adding TPA as the carbon source, varying the concentration between 1g/L to 20g/L. Figure 1 shows the predicted growth rates. R. opacus is predicted to be the best at utilizing TPA out of the three microbes. Figure 2 subsequently shows the predicted growth curves over 24 hours when enough TPA is supplied for each strain to maximize growth rates.

Figure 1. Growth rates of P. putida, C. testosteroni, and R. opacus on M9 minimal media supplemented with trace metals and varying concentrations of TPA.

Figure 2. Predicted growth curves of P. putida, C. testosteroni, and R. opacus on M9 minimal salts supplemented with trace metals and enough TPA to reach maximum growth rate given that other fluxes stay constant.

Note that the growth curves in Figure 2 still exhibit an exponential shape suggesting that not all available metabolites are being converted into biomass. Through analyzing the reduced costs of the solved flux states, it became evident that the utilization of TPA as a carbon source was being bottlenecked by too small nitrogen fluxes. The natural next step was to model increasing the exchange flux of ammonium, as it acts as the main source of nitrogen in our medium.

TPA concentration at t=0 was fixed at 10g/L so that carbon availability would not bottleneck the growth rate. Next, ammonium concentrations were varied between 0.1g/L and 3.0g/L, for reference, the prepared M9 minimal medium has around 0.337g/L of ammonium. Figure 3 shows greatly improved growth rates for all of the modeled organisms. The improved growth curves are presented along the earlier curves in Figure 4. As can be seen, the curves now exhibit more of the expected sigmoidal shape suggesting that most metabolites are being utilized, with carbon influx beginning to be a significant bottleneck.

Figure 3. Growth rates of P. putida, C. testosteroni, and R. opacus on M9 minimal media supplemented with trace metals and TPA at 10g/L. Increasing available nitrogen through an increased ammonium exchange flux increases growth rates.

Figure 4. Predicted growth curves of P. putida, C. testosteroni, and R. opacus before and after increasing the available nitrogen through the ammonium exchange flux.

Growth rates on EG

Next, the simulations were repeated, but this time using ethylene glycol as the carbon source, once again varying its concentration between 1g/L to 20g/L. Figure 5 shows the predicted growth rates for P. putida, E. coli, and T. glycolicus, while Figure 6 shows the subsequent growth curves.

Figure 5. Growth rates of P. putida, E. coli, and T. glycolicus on M9 minimal media supplemented with trace metals and varying the concentration of EG.

Figure 6. Predicted growth curves of P. putida, E. coli, and T. glycolicus on M9 minimal media supplemented with trace metals and enough EG to reach maximum growth rate given that other fluxes stay constant.

As with TPA growth simulations, the scarcity of nitrogen proved to be a bottleneck in EG bioconversion as well. We thus repeated the earlier approach, this time fixing EG at 15g/L, and varying ammonium concentrations between 0.1g/L and 3.0g/L. Figure 7 shows the greatly improved growth rates with Figure 8 showing the improved growth curves.

Figure 7. Growth rates of P. putida, E. coli, and T. glycolicus on M9 minimal media supplemented with trace metals and EG at 15g/L. Increasing available nitrogen through an increased ammonium exchange flux increases growth rates with E. coli overtaking T. glycolicus.

Figure 8. Predicted growth curves of P. putida, E. coli, and T. glycolicus before and after increasing available nitrogen through the ammonium exchange flux. Note how E. coli, and T. glycolicus initially have similar growth rates, but the more efficient metabolite utilization of T. glycolicus allows it to accumulate more biomass over time.

Carbon utilization

Besides predicting biomass growth rates, we were also interested in biomass yield. That is, how much biomass in gDW can we accumulate per grams of TPA or EG. This data is presented in Table 3. To reproduce the results, refer to our Jupyter notebook. Another interesting aspect from the sustainability perspective is the carbon flux, including CO2 emissions. Table 4 goes over the carbon fluxes and CO2 emissions for each simulation. Besides CO2, some of the flux states secrete carbon in other forms, such as fumarate, citrate, and oxalate among others. As a reference point, the same statistics are also provided when glucose is used as the carbon source at 15g/L which results in a similar carbon flux as with EG at 15g/L or TPA at 10g/L.

Table 3. Biomass yields of the various simulations. The biomass amount after 24 hours is calculated with the logistic equation as explained earlier. Carbon source yield is the total gDW of biomass per gram of carbon source. The overall yield is gDW of biomass per grams of dry nutrients, so 17.3g of nutrients from the M9 minimal medium to which the mass of the carbon source is added. Note that all metabolites present in the medium are not necessarily uptaken in 24 hours. (GLC: D-glucose)

Organism Carbon Nitrogen Biomass after 24h Carbon source yield Overall yield
P. putida 4g/L TPA 0.337g/L NH4 4.458 gDW 1.115 gDW/g TPA 0.209 gDW/g
P. putida 10g/L TPA 0.9g/L NH4 11.408 gDW 1.141 gDW/g TPA 0.418 gDW/g
P. putida 5g/L EG 0.337g/L NH4 4.362 gDW 0.872 gDW/g EG 0.196 gDW/g
P. putida 15g/L EG 1.1g/L NH4 11.904 gDW 0.794 gDW/g EG 0.369 gDW/g
P. putida 15g/L GLC 1.1g/L NH4 20.943 gDW 1.376 gDW/g GLC 0.633 gDW/g
C. testosteroni 4g/L TPA 0.337g/L NH4 4.688 gDW 1.172 gDW/g TPA 0.220 gDW/g
C. testosteroni 10g/L TPA 1.1g/L NH4 12.476 gDW 1.248 gDW/g TPA 0.457 gDW/g
C. testosteroni 15g/L GLC 1.1g/L NH4 24.234 gDW 1.616 gDW/g GLC 0.733 gDW/g
R. opacus 7g/L TPA 0.337g/L NH4 6.308 gDW 0.901 gDW/g TPA 0.260 gDW/g
R. opacus 10g/L TPA 2.1g/L NH4 12.809 gDW 1.289 gDW/g TPA 0.472 gDW/g
R. opacus 15g/L GLC 2.1g/L NH4 24.311 gDW 1.621 gDW/g GLC 0.714 gDW/g
E. coli 5g/L EG 0.337g/L NH4 4.324 gDW 0.865 gDW/g EG 0.194 gDW/g
E. coli 15g/L EG 1.5g/L NH4 13.621 gDW 0.908 gDW/g EG 0.422 gDW/g
E. coli 15g/L GLC 1.5g/L NH4 17.094 gDW 1.140 gDW/g GLC 0.511 gDW/g
T. glycolicus 4g/L EG 0.337g/L NH4 6.151 gDW 1.538 gDW/g EG 0.289 gDW/g
T. glycolicus 15g/L EG 1.1g/L NH4 22.248 gDW 1.483 gDW/g EG 0.689 gDW/g
T. glycolicus 15g/L GLC 1.1g/L NH4 22.297 gDW 1.486 gDW/g GLC 0.674 gDW/g

Table 4. Analysis of carbon fluxes. The carbon flux is calculated by multiplying the flux of a carbon-containing molecule by its carbon number. For instance, a flux of 0.9217 mmol/gDWh TPA corresponds to a flux of 7.3737 mmol/gDWh carbon atoms. Carbon utilization is the percentage of carbon atoms that flow towards the biomass function. The rest is secreted out of the cell, mostly as CO2. Note the discrepancy between carbon utilization and CO2 emissions, this is because some of the solved flux states also secrete small amounts of carbon as part of other molecules. See the Jupyter notebook for details. (GLC: D-glucose)

Organism Carbon Nitrogen Carbon flux Carbon secretion Carbon utilization rate CO2 emissions
P. putida 4g/L TPA 0.337g/L NH4 7.376 mmol/gDWh 4.426 mmol/gDWh 40.0% 56.7%
P. putida 10g/L TPA 0.9g/L NH4 20.304 mmol/gDWh 12.629 mmol/gDWh 37.8% 51.1%
P. putida 5g/L EG 0.337g/L NH4 6.588 mmol/gDWh 3.639 mmol/gDWh 44.8% 55.2%
P. putida 15g/L EG 1.1g/L NH4 17.956 mmol/gDWh 9.366 mmol/gDWh 45.8% 54.2%
P. putida 15g/L GLC 1.1g/L NH4 15.054 mmol/gDWh 5.444 mmol/gDWh 63.8% 34.3%
C. testosteroni 4g/L TPA 0.337g/L NH4 6.992 mmol/gDWh 3.880 mmol/gDWh 44.5% 39.1%
C. testosteroni 10g/L TPA 1.1g/L NH4 20.304 mmol/gDWh 11.157 mmol/gDWh 45.1% 39.5%
C. testosteroni 15g/L GLC 1.1g/L NH4 13.116 mmol/gDWh 2.967 mmol/gDWh 77.4% 22.6%
R. opacus 7g/L TPA 0.337g/L NH4 14.208 mmol/gDWh 10.584 mmol/fDWh 25.5% 32.8%
R. opacus 10g/L TPA 2.1g/L NH4 20.304 mmol/gDWh 12.290 mmol/gDWh 39.5% 45.6%
R. opacus 15g/L GLC 2.1g/L NH4 18.780 mmol/gDWh 4.136 mmol/gDWh 78.0% 22.0%
E. coli 5g/L EG 0.337g/L NH4 6.392 mmol/gDWh 3.433 mmol/gDWh 46.3% 53.7%
E. coli 15g/L EG 1.5g/L NH4 20.140 mmol/gDWh 7.970 mmol/gDWh 60.4% 39.6%
E. coli 15g/L GLC 1.5g/L NH4 20.820 mmol/gDWh 8.515 mmol/gDWh 59.1% 40.9%
T. glycolicus 4g/L EG 0.337g/L NH4 4.594 mmol/gDWh 0.981 mmol/gDWh 78.6% 15.7%
T. glycolicus 15g/L EG 1.1g/L NH4 14.388 mmol/gDWh 3.073 mmol/gDWh 78.6% 15.8%
T. glycolicus 15g/L GLC 1.1g/L NH4 18.426 mmol/gDWh 7.099 mmol/gDWh 61.5% 16.8%

Figure 9 shows the modeled growth curves of the organisms on glucose and on the PET monomers. Unsurprisingly, growth rates and yields on glucose are better than with PET monomers, with the exception being T. glycolicus, for which the final yield is more or less the same.

Figure 9. Growth curves of all simulated organisms on glucose and the respective PET monomer carbon source.

As a bonus, we also briefly investigated modeling co-cultures growing on both TPA and EG at the same time, as in our proof-of-concept, we would have both metabolites as a carbon source simultaneously. In principle, it seems plausible that our engineered P. putida KT2440 and E. coli K-12 would be able to concurrently convert TPA and EG into biomass. See the brief modeling here.

Key takeaways

Like all modeling, flux balance analysis also makes many simplifications. The set of modeled reactions is most likely a subset of the reactions really present in each of the organisms. Furthermore, we assumed that maximum flux rates were limited by metabolite concentration, while in reality the flux rates are also limited by the rate at which metabolites can penetrate the cellular membrane. Nevertheless, the metabolites used are quite ordinary and should be able to enter the modeled organisms. TPA should be able to enter as all three organisms modeled have the tph operon either naturally or engineered. EG on the other hand should diffuse freely given its smaller size.

Even with the limitations, it is possible to give some answers to the questions posed at the beginning, bearing in mind that growth rates in reality will most likely be slower than modeled. For example, in a recent study where E. coli was engineered as done here, 12g/L of EG was experimentally determined to be completely uptaken in 47 hours4, which is slower than our modeling predicts.

Based on the modeling of iJN1462 and iAF1260 augmented with our assimilation pathways, biomass growth is predicted to occur. The growth rates with the PET monomers are smaller than with glucose, but seem somewhat comparable to natural options. However, the assimilation pathway for TPA could be improved, as CO2 emissions are higher than that of C. testosteroni, and R. opacus, even though all organisms initially use the same pathway for TPA assimilation.

Generally speaking, the prepared M9 minimal medium with trace metals should be sufficient for growing our microbes on alternate carbon sources, but additional nitrogen should be added. However, it is good to remember that especially when it comes to the models generated by gapseq, a minimal medium was provided, which affects the model building process. Additionally, in literature, it has been found that often small amounts of yeast extract or amino acids are required to kickstart growth on PET monomers4,10.

While growth is predicted for PET monomers, it is however clear that they are not as optimally utilized as glucose. This can be seen in Table 3, where the yields on glucose are better than the yields achieved with our alternate carbon sources. In Table 4, it furthermore is clear that less carbon from PET monomers is directed toward central metabolism, and instead gets secreted as CO2 when compared to glucose.

An exception to this rule is T. glycolicus which produces significantly less CO2 than other examined strains due to its anaerobic nature5. Furthermore, the biomass yields of T. glycolicus grown on EG are comparable to yields given by glucose. Despite these advantages, the general pathogenicity of T. glycolicus makes it unsuitable for our goals and thus was not experimented with in the wet lab. However, taking a closer look at the pathways it uses may be of value when considering future engineering plans.

References

  1. Schaerer LG, Wu R, Putman LI, Pearce JM, Lu T, Shonnard DR, Ong RG, Techtmann SM. Killing two birds with one stone: chemical and biological upcycling of polyethylene terephthalate plastics into food. Trends in Biotechnology. 2023;41(2):184–196. doi:10.1016/j.tibtech.2022.06.012
  2. Sasoh M, Masai E, Ishibashi S, Hara H, Kamimura N, Miyauchi K, Fukuda M. Characterization of the Terephthalate Degradation Genes of Comamonas sp. Strain E6. Applied and Environmental Microbiology. 2006;72(3):1825–1832. doi:10.1128/AEM.72.3.1825-1832.2006
  3. Salvador M, Abdulmutalib U, Gonzalez J, Kim J, Smith AA, Faulon J-L, Wei R, Zimmermann W, Jimenez JI. Microbial Genes for a Circular and Sustainable Bio-PET Economy. Genes. 2019;10(5):373. doi:10.3390/genes10050373
  4. Pandit AV, Harrison E, Mahadevan R. Engineering Escherichia coli for the utilization of ethylene glycol. Microbial Cell Factories. 2021;20(1):22. doi:10.1186/s12934-021-01509-2
  5. Hartmanis MGN, Stadtman TC. Diol metabolism and diol dehydratase in Clostridium glycolicum. Archives of Biochemistry and Biophysics. 1986;245(1):144–152. doi:10.1016/0003-9861(86)90198-0
  6. Zimmermann J, Kaleta C, Waschina S. gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models. Genome Biology. 2021;22(1):81. doi:10.1186/s13059-021-02295-1
  7. Jiang W, Abrar S, Romagnoli M, Carroll KC. Clostridium glycolicum Wound Infections: Case Reports and Review of the Literature. Journal of Clinical Microbiology. 2009;47(5):1599–1601. doi:10.1128/JCM.02411-08
  8. Kim SG, Summage-West CV, Reyna M, Feye KM, Foley SL. Complete Genome Sequence of Terrisporobacter glycolicus Strain WW3900, Isolated from Influent Wastewater at a Research Center with Multiple-Species Research Animal Facilities Newton ILG, editor. Microbiology Resource Announcements. 2022;11(11):e00859-22. doi:10.1128/mra.00859-22
  9. Gerritsen J, Fuentes S, Grievink W, Van Niftrik L, Tindall BJ, Timmerman HM, Rijkers GT, Smidt H. Characterization of Romboutsia ilealis gen. nov., sp. nov., isolated from the gastro-intestinal tract of a rat, and proposal for the reclassification of five closely related members of the genus Clostridium into the genera Romboutsia gen. nov., Intestinibacter gen. nov., Terrisporobacter gen. nov. and Asaccharospora gen. nov. International Journal of Systematic and Evolutionary Microbiology. 2014;64(Pt_5):1600–1616. doi:10.1099/ijs.0.059543-0
  10. Panda S, Fung VYK, Zhou JFJ, Liang H, Zhou K. Improving ethylene glycol utilization in Escherichia coli fermentation. Biochemical Engineering Journal. 2021;168:107957. doi:10.1016/j.bej.2021.107957
  11. Mückschel B, Simon O, Klebensberger J, Graf N, Rosche B, Altenbuchner J, Pfannstiel J, Huber A, Hauer B. Ethylene Glycol Metabolism by Pseudomonas putida. Applied and Environmental Microbiology. 2012;78(24):8531–8539. doi:10.1128/AEM.02062-12
  12. Werner AZ, Clare R, Mand TD, Pardo I, Ramirez KJ, Haugen SJ, Bratti F, Dexter GN, Elmore JR, Huenemann JD, et al. Tandem chemical deconstruction and biological upcycling of poly(ethylene terephthalate) to β-ketoadipic acid by Pseudomonas putida KT2440. Metabolic Engineering. 2021;67:250–261. doi:10.1016/j.ymben.2021.07.005
  13. Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BØ. A genome‐scale metabolic reconstruction for Escherichia coli K‐12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Molecular Systems Biology. 2007;3(1):121. doi:10.1038/msb4100155
  14. Nogales J, Mueller J, Gudmundsson S, Canalejo FJ, Duque E, Monk J, Feist AM, Ramos JL, Niu W, Palsson BO. High‐quality genome‐scale metabolic modelling of Pseudomonas putida highlights its broad metabolic capabilities. Environmental Microbiology. 2020;22(1):255–269. doi:10.1111/1462-2920.14843
  15. Marinos G, Kaleta C, Waschina S. Defining the nutritional input for genome-scale metabolic models: A roadmap Pagnani A, editor. PLOS ONE. 2020;15(8):e0236890. doi:10.1371/journal.pone.0236890
  16. Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nature Biotechnology. 2010;28(3):245–248. doi:10.1038/nbt.1614
  17. Ebrahim A, Lerman JA, Palsson BO, Hyduke DR. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Systems Biology. 2013;7(1):74. doi:10.1186/1752-0509-7-74 (Documentation on flux bounds: https://cobrapy.readthedocs.io/en/latest/media.html)
  18. Zachary A. King, Andreas Dräger, Ali Ebrahim, Nikolaus Sonnenschein, Nathan E. Lewis, and Bernhard O. Palsson (2015) Escher: A web application for building, sharing, and embedding data-rich visualizations of biological pathways, PLOS Computational Biology 11(8): e1004321. doi:10.1371/journal.pcbi.1004321
The header was generated using Escher.

tph operon in R. opacus DSM 43205

In addition to designing synthetic assimilation pathways for the bioconversion of PET monomers, we also sought to explore natural strains that could utilize EG and TPA. Here, we will focus on TPA assimilation, and more specifically, the tph operon from Comamonas sp. E6, which has been shown to allow for TPA assimilation1. This operon, together with a TPA transporter tpaK from Rhodococcus jostii RHA1 is what we also used in our synthetic operon design, inspired by recent literature2.

We had access to a strain of Comamonas testosteroni (HAMBI: 403) through the Microbial Domain Biological Resource Centre HAMBI at the University of Helsinki. Additionally, we were interested in experimenting with a strain of Rhodococcus Opacus likewise found at the collection (HAMBI: 2006), as some strains of R. opacus such as 1CP reportedly have the tph genes3.

The Rhodococcus strain we had access to was R. opacus DSM 43205, and luckily, a whole genome shotgun sequencing of the specific strain had very recently been completed. (GenBank: CAJUXZ000000000). tBLASTn was used to locate sequences from R. opacus DSM 43205 corresponding to the catalytic tph genes and the tpaK transporter. The matches are presented in Table 1.

Table 1. tBLASTn searches for the tph genes and tpaK all returned decent matches to the same contig in the WGS project.

Gene Protein NCBI no. % identity (query coverage) Contig
tphA1 TPADO reductase AAX18944 42.06% (100%) CAJUXZ010000001.1
tphA2 TPADO oxygenase large subunit AAX18941 68.41% (97%) CAJUXZ010000001.1
tphA3 TPADO oxygenase small subunit AAX18942 50.65% (99%) CAJUXZ010000001.1
tphB DCD dehydrogenase AAX18943 46.74% (97%) CAJUXZ010000001.1
tpaK Probable terephthalate transporter ABG99225 87.80% (100%) CAJUXZ010000001.1

Next, ORFs around the hits were extracted, and alignments were created on the amino acid level as shown in Figure 1. As can be seen, tpaK is unsurprisingly highly conserved, as it originates from a closely related strain. As for the catalytic tph genes, there are clearly similarities, and highly conserved regions, but quite a bit of divergence too.

Figure 1. The catalytic tph genes, and the tpaK transporter from Comamonas E6, and R. jostii RHA1 respectively, aligned against tBLASTn hits from R. opacus DSM 43205 on the amino acid level.

At this point, we had gained enough confidence in R. opacus DSM 43205 to include it in our experiments. As a final test, we took the ORFs around our tBLASTn hits in R. opacus, and folded the sequences with ColabFold4 on computing resources provided to us by CSC. The predicted folds were then compared to structures of the tph products, and given the small root mean square deviation (RMSD), we found it likely that they would have a similar function to the genes in Comamonas sp. E6. Table 2 lists the reference structures against which the predicted folds were aligned with PyMOL, while Figures 2 to 5, show the alignments.

Table 2. Protein structures against which the predicted structures of the tph products from R. opacus DSM 43205 were aligned.

Protein UniProt id Structure id Structure souce
tphA1II Q3C1D2 AF-Q3C1D2-F1 AlphaFold
tphA2II Q3C1D5 7Q06 domain e7q06D(1-3) PDB
tphA3II Q3C1D4 7Q06 domain e7q06A1 PDB
tphBII Q3C1D3 AF-Q3C1D3-F1 AlphaFold

Figure 2. 3D alignment of predicted tphA1II structure from R. opacus DSM 43205 (green) against Comamonas sp. E6 tphA1II AlphaFold prediction (cyan). RMSD: 1.172Å.

Figure 3. 3D alignment of predicted tphA2II structure from R. opacus DSM 43205 (green) against Comamonas sp. E6 tphA2II from crystal structure 7Q06. Domain D1 (cyan), domain D2 (magenta), domain D3 (orange). RMSD: D1: 0.461Å, D2: 0.622Å, D3: 0.595Å

Figure 4: 3D alignment of predicted tphA3II structure from R. opacus DSM 43205 (green) against Comamonas sp. E6 tphA3II from crystal structure 7Q06 chain A (cyan). RMSD: 0.568Å

Figure 5: 3D alignment of predicted tphBII structure from R. opacus DSM 43205 (green) against Comamonas sp. E6 tphBII AlphaFold prediction (cyan). RMSD: 1.141Å

While the RMSD values for the single-chain alignments are indeed small, it is good to remember that for example the protein products from tphA2 and tphA3 form a heterohexamer5, and thus solely examining the single-chain structure only tells one part of the story. Due to time constraints, we were not able to continue exploring this aspect by conducting docking or molecular dynamics simulations.

References

  1. Sasoh M, Masai E, Ishibashi S, Hara H, Kamimura N, Miyauchi K, Fukuda M. Characterization of the Terephthalate Degradation Genes of Comamonas sp. Strain E6. Applied and Environmental Microbiology. 2006;72(3):1825–1832. doi:10.1128/AEM.72.3.1825-1832.2006
  2. Werner AZ, Clare R, Mand TD, Pardo I, Ramirez KJ, Haugen SJ, Bratti F, Dexter GN, Elmore JR, Huenemann JD, et al. Tandem chemical deconstruction and biological upcycling of poly(ethylene terephthalate) to β-ketoadipic acid by Pseudomonas putida KT2440. Metabolic Engineering. 2021;67:250–261. doi:10.1016/j.ymben.2021.07.005
  3. Salvador M, Abdulmutalib U, Gonzalez J, Kim J, Smith AA, Faulon J-L, Wei R, Zimmermann W, Jimenez JI. Microbial Genes for a Circular and Sustainable Bio-PET Economy. Genes. 2019;10(5):373. doi:10.3390/genes10050373
  4. Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold: making protein folding accessible to all. Nature Methods. 2022;19(6):679–682. doi:10.1038/s41592-022-01488-1
  5. Kincannon WM, Zahn M, Clare R, Lusty Beech J, Romberg A, Larson J, Bothner B, Beckham GT, McGeehan JE, DuBois JL. Biochemical and structural characterization of an aromatic ring–hydroxylating dioxygenase for terephthalic acid catabolism. Proceedings of the National Academy of Sciences. 2022;119(13):e2121426119. doi:10.1073/pnas.2121426119
  6. Zachary A. King, Andreas Dräger, Ali Ebrahim, Nikolaus Sonnenschein, Nathan E. Lewis, and Bernhard O. Palsson (2015) Escher: A web application for building, sharing, and embedding data-rich visualizations of biological pathways, PLOS Computational Biology 11(8): e1004321. doi:10.1371/journal.pcbi.1004321
The header was generated using Escher.