Although immunoglobulins are produced by B cells, Ig synthesis and release depends on the reciprocal interaction of many different cell types of the immune system. Induction of the innate response is essential, as well as the activation and proliferation of T cells and B cells.
However, to evaluate the Ig response, we focused on T cells and B cells, mainly because the innate cellular response counteracts pathogens at a predominantly local level, and leads to the eventual elimination of them in a time frame probably longer compared to the fast neutralization of the humoral response.
The software suggested by professor Viola[1] required us to get in touch with the owner company, explaining our needs and not being sure if they could be satisfied on time. For this reason, we opted for other tools more quick to use, but still as reliable as possible.
For this purpose, we used the Immune Epitope Database & Tools (IEDP)[2] and the epitope prediction algorithm associated.
First of all, we researched if there were any known linear epitopes of any M13 deposited protein in the database. The IEDB database had only G3P and G8P proteins registered, i.e. we could not assess the immunogenicity of the other proteins. The specified parameters of our query are listed below.
According to the software, the only immunogenic linear epitope known of the virus is TFMYVFSTF[3], which is a fragment (407-415aa) of the G3P protein.
After the research among known data, we proceeded with a new query using prediction softwares listed in the IEDB website to look for the most probable not-known epitopes of the viral proteins. Since there were many different software, we selected a few of them based on our purpose and interests.
Firstly, we looked into how the phage proteins would be processed by the immune system.
The MHC class I complex is present on the surface of almost any human cell, except for immune cells. It binds intracellular pathogen peptides and exposes them to CD8+ T cells. Although MHC-I analysis is not very relevant for our project since the phage does not infect any human cell (i.e. it is very unlikely that a phage peptide would be exposed on MHC-I), we decided to look for MHC-I epitopes anyway because of the much higher accuracy of those predictions, compared to the MHC-II epitopes predictions.
NetCTL[4][5] is a predictor of T cell epitopes along a protein sequence. This software requires a protein sequence as input data, giving in output any possible linear epitope with relative MHC binding affinity, cleavage affinity, TAP transport score and an overall prediction score. Thus, we use the software to select the most probable epitopes exposed on MHC-I proteins on the cell surface. Since the algorithm required to select an HLA supertype family, here is a table that lists HLA supertype classification[6]. We used the default A1 supertype value.
Parameters used in the queries:
Explaining output values:
G3P protein
G6P protein
G7P protein
G8P protein
G9P protein
The MHC class II complex is present on the surface of antigen presenting cells. It binds internalized pathogen peptides and exposes them to CD4+ T cells. Since the phage tropism spares with high confidence any human cells, MHC-II antigen processing and binding affinity is the focus of our interest.
We analyzed the MHC-II processing of the protein, using MHC-II binding prediction software[7-17].
Since the MHC-II binding epitopes prediction tool is a bit more complex than the previous one, we add some more explanation on how it works for a more comprehensive reading the tool:
Parameters used in the queries:
The predicted output is given in units of IC50 (nM) for the combinatorial library and SMM_align. Therefore a lower number indicates higher affinity. As a rough guideline, peptides with IC50 values <50 nM are considered high affinity, <500 nM intermediate affinity and <5000 nM low affinity. Most known epitopes have high or intermediate affinity. Some epitopes have low affinity, but no known T-cell epitope has an IC50 value greater than 5000.
For each peptide, a percentile rank for each of the three methods (combinatorial library, SMM_align and Sturniolo) is generated by comparing the peptide's score against the scores of five million random 15 mers selected from SWISSPROT database. And the adjusted percentile rank is the percentile rank adjusted based on the frequency of peptide lengths.
A small numbered percentile rank indicates high affinity. The median percentile rank of the three methods were then used to generate the rank for consensus method. The data for percentile rank was updated in the release 2.22 by recalculating the percentile data with a consistent sample peptides datasets for all the methods.
The cutoff of percentile rank was set on 10.0 (percentile rank ≤ 10.0), as the tool instructions suggested. Since there are several possible alleles for MHC-II complex, the number of results is very high: each predicted epitope can bound more than one couple of alleles, and vice versa.
Proteins analyzed, with no. of results in brackets:
We then used another software to assess what linear epitopes of the phage proteins are immunogenic for T cells. For this purpose, we used the CD4 T cell immunogenicity prediction software[18][19], mainly because it allowed us to assess the immunogenicity for all seven most common alleles in the population[20] for each query. The software requires the protein sequence as input; the user can predict the T cell immunogenicity using the 7-allele method[20], immunogenicity method and combined method (IEDB recommended). The combined method predicts the final score that combines the predictions from the 7-allele method and immunogenicity method. In the results, the "Immunogenicity Score" ranges from 0 to 100, with low values identifying more immunogenic peptides and high values non-immunogenic peptides. And the "Combined Score" is given by combining both HLA binding and the immunogenicity prediction which presumably incorporate the capacity of being recognized by TCR. And low value identifying high capacity of being recognized by TCR.
Parameters used in the queries:
Proteins analyzed, with no. of results in brackets:
The G7P protein query was negative, as it did not show any results.
In order to reach the highest reliability possible, we then compared the results of the two softwares (MHC-Il binding prediction tool and CD4 T cell immunogenicity prediction tool), and thus the predicted epitope through their respective parameter (MHC-II binding prediction and T-cell immunogenicity prediction).
Our assumption was that the two software evaluate two different processes of the human immune response. On one hand we have MHC processing, on the other hand we have the binding affinity between epitopes and T cell receptors, thus the grade of probability to activate the T cells. In fact, it was our belief that a comparative analysis between the results of two software could lead to a more reliable prediction.
The comparative analysis was conducted as follows:
G3P protein
G6P protein
G8P protein
G8P protein
We used two different softwares to predict B cell epitopes namely DiscoTope and Elliscope. Whereas the former predicts only three dimensional epitopes, with the latter we were able to predict linear epitopes too.
DiscoTope[21][22] is a method for predicting discontinuous epitopes from 3D structures of proteins in PDB format. The chain ID for all the viral proteins was the same: A. The used version of the software was 2.0.
Since the 3D protein structure is needed, we could analyze G3P and G8P only.
Explaining output values:
G3P protein
G8P protein
ElliPro[23] predicts linear and discontinuous antibody epitopes based on a protein antigen's 3D structure. ElliPro accepts as an input protein structures in PDB format.
Parameters used for the queries:
G3P protein
Predicted linear epitopes:
Predicted discontinuous epitopes:
Linearized epitopes:
We also tried to analyze G8P, but the tool did not work for still unknown reasons, even after several attempts.
Allergic reactions can cause significant damage in individuals, especially if occure outside the hospital. For this reason, we designed an allergic reaction analysis of the phage proteins. Our idea was to use the data obtained from the immunogenicity essay to predict the probability of an allergic reaction to the phage.
To do so, we first search in the IEDB database for any known allergies to the phage, i.e. if the only certain epitope of the phage (G3P). The query had no positive results.
At this point, we tried to analyze the predicted epitopes with the previous softwares.
For T cell epitopes, we decided to search for the one we found in both the MHC-II binding prediction tool and the CD4+ T cell binding prediction tool, i.e. the ones resulted from the comparative analysis between the two softwares.
As for the parameters, we applied the following:
G3P protein
G6P protein
G8P protein
G9P protein
Since DiscoTope results are shown as a DiscoTope score assigned to each amino acid, we could not rebuild the discontinuous epitopes easily. Thus, for B cell epitopes, we started analyzing the linear epitopes obtained from ElliPro. As a matter of fact, the prediction principle relies on two possible assumptions.
Following the composition theory assumption, we search in the database every linear epitope obtained from ElliPro analysis of G3P protein.
G3P protein
Linear epitopes:
Following the position theory assumption, we search in the database every discontinuous epitope obtained from ElliPro analysis of G3P protein. Since the IEDB database can only get linear epitopes as input, we treated discontinuous epitopes as they were, precisely, linear. We based our research on the belief that in 3-dimensional space the discontinuous epitopes behave like linear ones due to the proximity (d≤8Å) of their residues.
Linearized discontinuous epitopes:
Finally, we repeated the same process for the autoimmunity risk. One other important risk to evaluate when developing a therapeutic agent is, in fact, the phenomenon of molecular mimetism, i.e. the fact that the therapeutic agent can be similar (in terms of chemical composition, 3D structure, overall charge distribution) to a human protein, thus carrying the risk of generating an autoimmune disease.
We made a research of both peptidic and non-peptidic material of M13 for homology in any human autoimmune disease epitopes registered in the database, with no results. We applied the same parameter as for the allergic reaction query, selecting “autoimmune” instead of “allergy”. The result was still negative.
We searched in the database if there were any known autoimmune disease epitopes with a certain homology to our T and B cell predicted epitopes.
G3P protein
G6P protein
G8P protein
G9P protein
Linear epitopes:
Linearized discontinuous epitopes:
Professor Viola made another important statement. Even if the phage is immunogenic, there is the possibility to make it less visible to the immune system. Besides the many different delivery methods to hide phage particles before they reach the target site, there is another kind of mimetism which depends directly on the amino acid composition of the coat proteins. As a matter of fact, it is possible to slightly change the amino acid sequence of the most immunogenic epitopes to make them less immunogenic. One tool to achieve this is Deimmunization[24]. Due to a time limit of the project, we decided not to look into this software.
We have tried to draft a mathematical model which recapitulates the expected mechanism of infection that the system should undergo. Namely, a pool of engineered non-lytic phages infecting a population of AMR bacteria, transfecting a CRISPRi machinery, in turn restoring susceptibility to antibiotics. We have divided the modeling aspects into 3 parts:
This has allowed us to create a nested model focusing on single decoupled aspects; this approach will enable future teams to re-use and adapt the model at their will, changing modeling strategies and structure without re-drafting the whole system.
In a general environment, we can assume that bacteria grow following a chemostat-like abstraction: the number of bacteria x depends on the nutrients abundance N and on a washout w. The latter can arise from washing, defecating, coughing or any other action or external source exerted on the colony or culture which can lead to the removal of part of the bacterial population. For the sake of simplicity, any other effect associated with bacterial basal death rate - apart from antimicrobial actions - will be assumed as included in this constant.
While the washout w can be fairly modeled as a first order kinetic, the dependence on nutrients can be modeled in several ways; neglecting possible space constraints, the most suitable model that we can adopt is probably through the Monod equation:
In this framework the growth is limited by the amount of nutrients through the parameter N (and the half growth constant Ks which correspond to the amount of nutrient for which the growth is halved). It is worth noting that under the assumption of nutrients overabundance (N→∞) this equation collapses in a classical exponential growth.
This result in a final form:
with ɑ(t) generic function describing the nutrients supply Nin distribution over time and ε conversion constant from bacterial to nutrient concentration.
Once the AMR bacterial population evolution is described, it is now necessary to model the delivery of the CRISPRi system and its effect on the bacterial cells. Different approaches and possible similar systems are reported in literature [1-3]. However, we decided to explore a different interesting modeling approach based on the exploitation of delayed differential equations (DDE). This representation could catch realistic interesting scenarios since, once infected by phages, the cells require time to produce the CRISPRi system and thus become sensitive. However, not all the cells turn to be sensitive at the same time; during this delay, mutations could occur, leading to a decrease in the efficiency of the system. This type of modeling can be implemented through transit compartmental models (TCM), a type of modeling inherited from pharmacokinetics [4]. Through these models, the population can be divided into different subpopulations, depending on their state (or stage of infection) and the passage from a state to another one can be modeled as rates [5].
Briefly, assuming the existence of n intermediate transitory possible states xi and considering the average time τ to reach the final state xn+1, the rates k (also called latency, assumed equal between states) can be computed as:
Conveniently, depending on the number of intermediate states, the system can evolve with a first order dynamic (n=0) up to a step-like one (n→∞).
Moving to a general version of our model, (neglecting for now the cell growth and the washout, for the sake of notation readability), we can call:
with δ absorption rate of phages in the bacterium (a second order dynamic with unit [mL/(#*h)] as reported in [6]) and bs burst size, which is the number of phages released by an infected cell during the lytic cycle. Since our system does not involve lytic phages but only engineered temperate ones with a single lysogenic cycle, the system can be be simplified in two versions:
Administration of isolated phages
Administration of probiotics P producing phages
Since the second version could lead to safety concerns, from now on we will take into consideration only the model 1, which is how PASTA has been conceived. By adopting the TCM formulation, the system can be expanded as:
with:
For the sake of clarity, the model 2 would have required additional division in more stages also for P, with phages burst only at the final stage.
Assuming the possibility that both the CRISPRi system passed by the phage or the target DNA can mutate, we can describe the following possible states:
where:
The whole system can be therefore written as:
with
This system allows the designer to recapitulate the desired system in a biologically plausible framework; moreover, its structure enables a detailed analysis for fine tuning or exploration of the designed platform boundaries by varying the parameters and the number of intermediate states.
For the sake of simplicity, we have posed some assumption to our model: Saturation of nutrients (N→∞)
Additionally, we have introduced a first order kinetic A to describe the action of an antibiotic (constantly administered) on pathogens with restored susceptibility.
Under this assumptions, the simplified model can be written as:
To simulate our system, we have set the following parameters from literature:
while the initial amount of target pathogen T0 was set to the effect of the antibiotic A and the initial amount of phages F0 were the actual input of the simulations, thus were explored over a wide range of possible values, as reported in the following figures; all the other initial conditions for the different states were set to zero. Additionally, the amount of killed bacteria were shown and computed as A•Txr.
Simulations were carried out over a time frame of 10h, which was the maximum time at which all the phages were almost transfected (or washed out). Indeed, longer time would have the bias of a mixed population of mutants and non-transfected cells growing again without any therapeutic action.
We developed a script in MatLab to understand how PASTA could be affected by tunable aspects such as the dosage of phages F(t0) and the killing effect of the antibiotic A. We set the initial amount of bacteria to be targeted equal to 5•107; meanwhile, for the two variables, we covered a wide range of possible values, namely:
In the simulations that we have run, we calculated the number of killed bacteria using the MatLab function trapz, integrating the vector that describes the trend of Txr over time (i.e., the result from the ODE solver ode15s), multiplied by the antibiotic killing rate A (called “Killed”). We also calculate the amount of the bacteria that remain resistant to the antibiotic after the same time span (due to mutations in either the target DNA or the CRISPRi machinery) and so are still alive (called “Escapers”); to do this we integrated the sum of the variables M, Ty, Mx, My, TRy, MRx, MRy.
We can see the trend of these two lumped variables in the heatmap at the top of the figure. First of all, we can observe that the TCM ODE model was implemented properly, indeed a shift in the curves that were supposed to have a delay during the evolution of the system can be clearly observed. This is proof that this model is a valuable and easy to use modeling methodology for this kind of system.
As envisaged, the model demonstrates that there is an increase of killed bacteria and a decrease of escapers for increasing antibiotical killing rate and the amount of initial phage is rising. More interestingly, it can also be observed that when the antibiotical rate of killing is bigger than 10 h-1 and the initial number of phages is bigger than 106 #/mL we can appreciate that there is a high number of killed bacteria and only a few escapers, which suggest the existence of minimal thresholds for the design of the treatment.
In the model we also compute the number of mutants that can be formed, we consider a possible mutation in the gene target in the CRISPRi circuitry, or both. From the graphs in the figure, it is clear that the number of mutants is not negligible. As we can see in the Ty, Mx, My and M graphs, at the end of the simulation there is a rapid increase of these variables. It is worth noting that we considered mM = my , this means that the mutation rate of the plasmid and the genome are the same. Actually, this can be considered a worst-case scenario, since a single point mutation in the CRISPRi system (as well as in the target DNA) does not necessarily imply a complete disruption of the repression action [21].
Here the developed model implemented in Matlab:
An important aspect in the characterization of novel parts, is the choice of a reference to adopt for normalizing data, thus results can be shared and compared reliably between laboratories with different instrumentations. As proposed in Kelly et al.[1], genetic circuits (more precisely, promoter strengths) can be compared by normalizing the computed Scell for a reference strain bearing:
Additionally, we also hypothesized that an additional aspect to be considered could have been a measurement bias rising from the possible different relationship between OD600 measured and the actual CFU of the culture; indeed, some of the strains used in our study displayed a slightly mucoid phenotype.
For this reason, liquid cultures of 3 different bacterial species used in this study were analyzed by measuring samples OD600 and plating proper dilutions to perform colony counting. The results of the analysis are reported in the following table:
The graph above represents the fitting of the linear model. There are the values of the measurements, the line that describes the trend of the OD vs CFU, and the area between the dashed lines that describe the confidence interval. The confidence interval is the area where there is a high probability of finding unknown values.
The equation describing the linear relationship between OD600 and CFU (imposing a null intercept) for a certain species x is given by:
This means that two species having the same CFU are in a relationship through the following equation:
Considering an E. coli strain as a reference, the equivalent OD600 (on which then compute Scell or RFP/OD and thus RPUs units) can be obtained through the conversion constant between strains as:
so:
For the strains analyzed, we computed βK.pneumoniae=1.37 and βA.baumannii=0.60.
While the linearity between OD600 and CFU seems verified and a conversion coefficient has been computed, data gathered for this analysis are too poor and noisy to be reliably used. Indeed, due to time constraints, no additional replicates were performed; therefore, we decided not to include this additional normalization step in the analysis reported in the wiki, despite further investigation to assess the reliability of this normalization will be performed in the future.