Software

I. FSEOF Modification

Given the existing implementation of FSEOF, there are rooms for improvement, both in inaccurate mathematical interpretation of the original research paper, as well as the flexibility to tailor the algorithm to our specific needs.

The FSEOF algorithm is based on an important presupposition about the enforced objective: the enforced reaction is assumed to compete with the objective (biomass) reaction.

Given the equation to calculate different enforced objectives:

Maximize $v_{\text{biomass}}$

Subject to for $n, k$ $\in$ $\mathbb{N}$ such that $\forall k < n$ we have $v_{\text{product}}^{enforced} = v_{\text{product}}^{initial} + \frac{k}{n}(v_{\text{product}}^{max} - v_{\text{product}}^{initial})$

$\forall i \in M, j \in N \implies \sum_{j = 1}^{|N|} S_{ij}v_j = 0$

We have the following:

If $v_{\text{product}}^{max} = v_{\text{product}}^{initial}$, then the constraint becomes $v_{\text{product}}^{enforced} = v_{\text{product}}^{initial}$ because the other terms become zero. Thus, the "enforced" flux becomes completely meaningless since we are simply enforcing it to be equal to itself. Furthermore, if we tried to enforce, not up to the maximal possible flux through the objective reaction, but rather up to a realistic upper bound (approximately 90\% according to ^[1] , the algorithm fails:

If we are constraining $v_{\text{product}}^{\text{enforced}} = v_{\text{product}}^{\text{initial}} + \frac{k}{n}(v_{\text{product}}^{\text{max}} \cdot 0.9 - v_{\text{product}}^{\text{initial}})$ and $v_{\text{product}}^{\text{max}} = v_{\text{product}}^{\text{initial}}$, then $\frac{k}{n}(v_{\text{product}}^{\text{max}} \cdot 0.9 - v_{\text{product}}^{\text{initial}}) < 0$.

Therefore, the algorithm would inadvertently enforce a decreasing flux instead of an increasing one. The predicted targets for over-expression would thus not only be biologically irrelevant, but also misleading.

If the maximum flux through the enforced reaction is the same as the initial flux, the enforced reaction would not be competing against, but rather complementing growth. In other words, the enforced reaction is synergistic with biomass. This makes sense especially in our case, since the reaction we are enforcing is a consumption reaction. Specifically, consumption of a carbon source that gets used in E. coli's central carbon metabolism. An antagonistic enforced objective would have the opposite effect. We consider this a significant shortcoming of the classical FSEOF as it fails to take into consideration synergistic objectives.

To address the synergistic enforced objective, we set $v_{\text{product}}^{\text{initial}}$ to zero in the case where $v_{\text{product}}^{\text{initial}} = v_{\text{product}}^{\text{max}}$. Therefore, the flux is gradually increased from 0 to the maximum. This is rationalized as follows: Even if methanol consumption and biomass are synergistic with each other, there are several reactions in the metabolic network whose flux necessarily increases as methanol consumption increases, but not necessarily as biomass increases. These are the reactions we want to uncover with this fix. For example, flux through the Ribose-5-phosphate isomerase reaction is associated with high methanol consumption (more negative methanol exchange fluxes); nevertheless, high fluxes through this reaction are also associated with smaller biomass fluxes.

Another aspect of our particular case that the classical FSEOF failed to consider, was that in GSMMs usually, consumption reactions are set to have negative fluxes. The more negative a flux, through a consumption reaction, the more metabolite is consumed. Hence, the way $v_{product}^{max}$ is calculated must change because the maximum flux is the most negative, which means the optimization problem must be phrased differently.

To address this issue, our algorithm changed the calculation of $v_{\text{product}}^{\text{max}}$ if $v_{\text{product}}$ is a consumption reaction. $v_{\text{product}}^{\text{max}}$ is calculated as follows:

Minimize $v_{\text{product}}$

Subject to:

$\forall i \in M, j \in N \implies \sum_{j = 1}^{|N|} S_{ij}v_j = 0$

$v_j^\alpha \leq v_< \leq v_j^\beta$

where $v_j^\alpha$ and $v_j^\beta$ are the lower and upper bounds for flux in any arbitrary reaction $v_j$. For non-reversible consumption reactions, $v_j^\beta = 0$, and for non-reversible production reactions, $v_j^\alpha = 0$.

Lastly, the classical FSEOF does not provide a very fine ranking of overexpression targets. So, we implemented a rough measure of how much the flux through a given reaction increases/decreases as we scan through enforced objectives: it is simply a linear regression and its associated Pearson Correlation Coefficient ($r$) value. The steeper the slope and the higher the $r$ value, the "better" candidate a reaction is for overexpression.

II. Knockout Algorithm

To search for gene knockout candidates, we begin with the following motivation. Suppose there are two individuals of T-B18 that have growth rates $\rho$ and $\rho^*$, where $\rho < \rho^*$. For a reaction $r$, consider the fluxes $v_{r}$ and $v_{r}^*$ through $r$ in individuals $\rho$ and $\rho^*$, respectively. If

Suppose there are two individuals of T-B18 that have growth rates $\rho$ and $\rho^*$, where $\rho < \rho^*$. For a reaction $r$, consider the fluxes $v_{r}$ and $v_{r}^*$ through $r$ in individuals $\rho$ and $\rho^*$, respectively. If

\[ \left|v_{r}\right| \gg 0 \quad\text{and}\quad \left|v_{r}^*\right| \approx 0 \]

then the reaction $r$ might be a good knockout target. Intuitively, $r$ exhibits high flux in the slower-growing individual but low flux in the faster-growing one. Then, knocking it out may encourage the bacteria to adopt a more optimal metabolic flux distribution. We can turn this into gene-level scores by considering for each gene $g$:

\[ s(g) = \max\{|v_r| \mid r \text{ is a reaction associated with } g\} \]

and defining $s^*(g)$ analogously. That is, we desire that $s(g) \gg 0$ and $s^*(g) \approx 0$.

Implementation. We use the same T-B18 model and media as described above. We set $\rho = 0.7\mu$ and $\rho^* = 0.9\mu$, where $\mu$ is the optimal simulated growth rate under FBA. An important technicality is that, in the preceding discussion, we presupposed a single flux distribution for each individual. However, the solution space for FBA problems is rarely uniquely determined. Hence, we require another heuristic: the metabolic flux of T-B18 growing on methanol should be minimally rerouted from that of native E. coli, or T-B18 growing on glucose. Specifically, consider the individual with growth rate $\rho$. Using parsimonious enzyme usage FBA ^[2] , we first obtain the flux distribution $\mathbf{v}_0$ of T-B18 growing at rate $\rho$ on the same media except with glucose instead of methanol. Then, we solve for the fluxes $\mathbf{v}$ of T-B18 on the original media, but take the solution that minimizes the flux distance $\|\mathbf{v} - \mathbf{v}_0\|_1$. This is done through the linear minimization of metabolic adjustment algorithm ^[3] . We perform the same process for the faster-growing individual.

[1] Choi, Hyung Seok and Lee, Sang Yup and Kim, Tae Yong and Woo, Han Min (2010). In {Silico} {Identification} of {Gene} {Amplification} {Targets} for {Improvement} of {Lycopene} {Production}. Applied and Environmental Microbiology, 76(10), 3097--3105.

[2] Nathan E Lewis and Kim K Hixson and Tom M Conrad and Joshua A Lerman and Pep Charusanti and Ashoka D Polpitiya and Joshua N Adkins and Gunnar Schramm and Samuel O Purvine and Daniel Lopez-Ferrer and Karl K Weitz and Roland Eils and Rainer K\"{o}nig and Richard D Smith and Bernhard {\O} Palsson (2010). Omic data from evolved \textit{E. coli} are consistent with computed optimal growth from genome-scale models. Molecular Systems Biology, 6(1), . doi: 10.1038/msb.2010.47.

[3] Schellenberger, Jan and Que, Richard and Fleming, Ronan M T and Thiele, Ines and Orth, Jeffrey D and Feist, Adam M and Zielinski, Daniel C and Bordbar, Aarash and Lewis, Nathan E and Rahmanian, Sorena and Kang, Joseph and Hyduke, Daniel R and Palsson, Bernhard {\O} (2011). Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nature Protocols, 6(9), 1290--1307. doi: 10.1038/nprot.2011.308.