Modeling: Structural optimization of the transmembrane biosensing module

Introduction

In our design strategy, we identified the tumoral biomarkers folate receptors and HER2 (Human Epidermal Growth Factor Receptor-2) which are overexpressed in breast, colorectal and lung cancers. Therefore, we aimed at inducing a therapeutic response only after detection of these proteins, which is essential for specific cancer treatment. We thought of a way to initiate the production of 5-FU upon recognition of the HER2 receptor.

We had the idea to create a T7 RNA polymerase split into two subunits (or fragments), each linked to an anti-HER2 antibody through a transmembrane segment (Figure 1). Binding of these antibodies to HER2 brings closer the two subunits of the T7 RNA polymerase which can then assemble into a functional enzyme for transcription initiation of a target gene inside liposomes. The success of our strategy thus requires optimal linker sequences that enable complementation of the T7 RNAP once antibodies have recognized HER2. Modeling this Anti-HER2-linked split T7 RNAP is a challenging task because the heterodimer complex involves both extra and intraliposomal moieties, as well as a transmembrane domain.

Figure 1: Overview of our biosensing and signal transduction strategy. Recognition of HER2 extracellular domain induces functional assembly of the split T7 RNA polymerase, which enables gene expression of target gene under control of a T7 promoter.
Anchoring and sensing

We started by examining the extraliposomal sensing domain that binds to HER2, a transmembrane glycoprotein overexpressed in different types of cancers. HER2 extracellular domain has four epitopes, which allows two antibodies, Pertuzumab and Trastuzumab, to simultaneously interact with the receptor, respectively on the domains II and IV (Fig. 2). The structure of each of these antibodies bound to HER2 was already available in the Protein Data Bank (PDB: 1S78 [1] for Pertuzumab and PDB: 1N8Z [2] for Trastuzumab) and served as the basis for our structural analysis. We thus combined the 3D structures of each antibody with HER2 to determine the structure of the whole complex. This was necessary to verify whether its assembly was possible, e.g., without strong steric constraints that would impair the formation of the complex. The combined structural models indicate that the linker to the transmembrane segment can be conjugated to the antibodies without interfering with the binding to HER2 (Figure 2).

Figure 2: Simultaneous interaction of Pertuzumab (gray) and Trastuzumab (green) with HER2 extracellular domain (red). Zoom-in images of each antibody reveal that the sites for connecting the linker sequences (pink spheres) are not buried in the molecule.
Modeling of the transmembrane segment and its disordered linker regions

Methods

This section summarizes the operation of the 3 algorithms used in this study.

AlphaFold2 is an artificial intelligence program which performs predictions of protein structure [3]. It uses a neural-network trained to predict a structure from a sequence of amino acids and can benefit from Multiple Sequence Alignment (MSA) or structural templates to enrich the prediction. Its default mode is monomer prediction and it has been extended to protein multimer prediction [4]. As output, the algorithm proposes several PDB structures, but also a confidence score called pLDDT indicating the confidence for each position.

We used AF2 to predict the structure of our split T7 RNAP using the sequences of the two subunits. Several AF2 prediction modes were compared using the same sequence input of the two fragments. Single sequence mode was first employed followed by MSA refinement, using MMSeqs2 [5] for sequence search and finally this mode was combined with the addition of structural templates identified in the PDB database [6].

MoMA-FReSa (MoMA: Molecular Motion Algorithm and FReSa : Flexible Region Sampler) is a loop- sampling algorithm from the MoMA software suite, developed in LAAS-CNRS, Toulouse. It is used for sampling protein loops including disordered regions and provides an ensemble of conformations for the given protein loop. In terms of implementation, the loop is represented as sections of tripeptides. Then each tripeptide is sampled from a structural database and the connection between each of them is computed using robotic-based algorithms [7].

We used MoMA-FReSa to sample the disordered regions between the transmembrane segment and the extremities of the proteins. For each simulation, the two connecting loops were simultaneously sampled (one for each subunit of the Anti-HER2-linked split T7 RNAP complex).

Local Structural Propensity Predictor (LS2P) [8] is an algorithm from the MoMA suite which provides structural insights on Intrinsically Disordered Regions. It performs statistical analyses based on a database of three-residue fragments extracted from coil regions of high-resolution protein structures [9]. For each residue, conformations are classified into three main regions: (α) right-handed helical conformations, (β) right-handed extended conformations, and (γ) left-handed conformations. The combination of these three conformations defines 27 structural classes for a tripeptide. LS2P computes the propensity of each tripeptide to be observed in each one of these structural classes, considering the local sequence context. Values larger than 1.0 for a given structural class indicate that this class is favored for a given tripeptide in the local sequence context, while values below 1.0 indicate the unlikelihood of this class.

As input, LS2P requires a FASTA-type sequence. It provides two output files: a table with the propensities of the 27 structural classes for the middle residue of each tripeptide, and a plot showing propensities for helical and extended conformations.

Structure prediction of the recombined split T7 RNAP

We then focused on the intraliposomal moiety of the biosensor, i.e., the split T7 RNA polymerase and the transmembrane linker. Our split T7 RNAP is composed of two subunits: the N-term part from residue 1 to 180, and the C-term part from residue 181 to 704 [10]. We studied the interaction of those two chains and sought to validate that the complemented split protein was structurally similar to the native T7 RNA polymerase. This is very important, since the structure of the split T7 RNAP is not well known. Yet, we need a realistic and stable structure for the rational engineering of other protein domains linked to the T7 RNAP moieties.

Philippe Urban and Sophie Barbe, researchers at the Toulouse Biotechnology Institute (TBI), as well as Younes Bouchiba, PhD student at TBI, explained to us how AlphaFold2 [5] could help us determine the structure of our split T7 RNAP and verify if it corresponds to the natural enzyme. Following their advice, we used AlphaFold2 with MMseqs2 for MSA construction and AMBER relaxation of the best models, to predict the multimerization between both subunits of the split T7 RNAP. To force the interaction into the expected structure, we used a template from PDB (1CEZ [11]).

The following relaxed structure was obtained, corresponding to the template T7 RNA polymerase (Figure 3).

Figure 3: (A) Structure of the complemented split T7 RNAP obtained with AlphaFold2 and PDB 1CEZ of the native protein as a template. N-term subunit is represented in gray, C-term subunit in green, and the linker-bound amino acid in red. (B) Comparison of the split T7 RNAP (in green) obtained with AlphaFold2 and the full-length T7 RNAP (in pink). (C) Predicted lDDT for each position. The per-residue confidence score indicates the confidence for each position (high-confidence if higher than 90, low-confidence if lower than 50).

plDDT indicates an overall high-confidence in the obtained structure. Confidence score was lower for the Nterm subunit (residues 1 to 180) because of the presence of flexible loops but visual inspection confirmed its resemblance to the template.

Having validated both the extracellular and intracellular structures, we next optimized the transmembrane linker. This is of utmost importance, as the disordered linker must not be too short to limit movement, but not too long to lead to unintended interaction. We first had to choose the transmembrane segment.

Determining the transmembrane domain

From the database Membranome, we chose a bacterial protein which forms a helix in the inner membrane to ensure compatibility with the lipid composition of our liposomes, with only one transmembrane (TM) subunit. Among all the candidates, we chose the shortest sequence to prevent the TM part from tilting and blocking interactions. The chosen linker was therefore ZipA (ID: ZIPA_ECOLI), a cell division protein localized in the inner membrane of Gram negative bacterias [12].

Figure 4: AlphaFold2 [5] prediction of the transmembrane segment of ZipA (helix). Proteic sequence: Nterm-RLILIIVGAIAIIALLVHGF-Cterm

Determining the complete linker between each domain

The antibodies and the split T7 RNAP cannot directly be linked to ZipA without supplementary residues as it would cause a lack of mobility due to the spatial proximity of the membrane. A flexible and disordered sequence of residues was added on each side of the TM segment. To determine the correct length, we collaborated with the LAAS-CNRS, as they have modeling and statistical tools to study the molecular mechanics of proteins.

In a 3D plane, we placed each domain at a certain distance, taking into account the thickness of the liposome membrane as shown in Figure 5. If it’s too close, it might block the motion and therefore the interaction. But if it’s too far away, the complex might move too much which could interfere with its stability or promote non-specific interactions. This is an important consideration because the HER2-bound antibodies are perpendicular to each other.

Figure 5: Positioning of the 3D protein complex around the TM part of the linker to optimize the structure of the disordered linkers. Red, HER2; Pink, ZipA; Green, Trastuzumab-T7C-term (from top to bottom); Grey, Pertuzumab-T7N-term (from top to bottom).

This 3D model was used as a basis to study different disordered linkers. The aim was to test different sequence lengths and see which conformation is the most relevant.

A standard disorder linker is the (GGGS)n as glycine (G) and serine (S) residues are small and non-polar amino acids which can provide flexibility and allow mobility of the connected functional domains [13]. Another advantage is the possibility to adjust the copy number “n” to optimize the length without losing mobility which makes it a modulable linker sequence to start with in order to determine the appropriate size.

Juan Cortés and Ilinka Clerc, researchers at the LAAS, taught us how a data-driven method could provide a first solution. We decided to perform a first simulation using MoMA-FReSa to scan different lengths of the linker (GGGS)n and predict the entire conformation of our complex. The algorithm iterated simultaneously from each end and steered them to meet in the middle. The length was chosen according to the algorithm's success rate. The more successful the algorithm was in finding a conformation for the linker, the more likely that conformation was to exist in nature.

For the linker (GGGS)n, the simulation’s results are visible in Table 1.

Table 1: Success rate of the algorithm for different lengths of the (GGGS)n linker.

For the residues between ZipA and the T7 RNAP, a linker of 32 amino acids had the highest success rate, suggesting that it corresponded to the optimal distance. However, for the residues between ZipA and the antibodies, the linker with the highest success rate did not seem optimal. Longer linkers might be more probable and stable, but according to the algorithms, they were penetrating into the membrane (not explicitly considered in the model). As a compromise, we decided to choose 52 amino acids, with a correct success rate and conformations which seemed long enough without being too flexible.

As our goal was not to determine the most probable conformation, we decided to focus on the choice of length. The modeling strategy that we developed nevertheless allowed us to predict the linker properties that enable functional assembly of the split T7 RNAP in the event of HER2 recognition. The different conformations that are possible for the lengths we chose are visible in Figure 6.

Figure 6: Possible conformations of the (GGGS)n linker. n= 13 for the upper part, n= 8 for the lower part. Red: HER2. Gray from top to bottom: Pertuzumab linked to the Nterm subunit of the T7 RNAP. Green from top to bottom: Trastuzumab linked to the Cterm subunit of the T7 RNAP.

We realized that this linker includes many repetitions which can be an issue for cell-free gene expression due to the unusual high abundance of glycine and serine residues. In PURE system, this may lead to the depletion of the corresponding tRNAs. We therefore sought a compromise that would account for both the genetic and structural constraints to optimize our linker. The optimal sequence would be a sequence rich in glycine and serine residues but complemented with other amino acids to limit repetitions and bring diversity to enhance gene expression. We therefore came up with the new linker sequence below.

New sequence Linker_V2: GAAASEGGGSGGPGSGGEGSAGGGSAGGGS

To predict the secondary structure propensities in IDPs (Intrinsically Disordered Proteins), we used the tool LS2P [8]. This is a way to estimate if the chosen sequence is disordered and does not form any secondary structure. We applied this tool to the Linker_V2, and compared it to the (GGGS)n linker:

Figure 7: (A) Prediction of the secondary structures of the new linker. (B) Prediction of the secondary structure of the linker (GGGS)n.

The prediction of the secondary structure of the (GGGS)n by LS2P, known as disordered, is used as a reference to study the new linker. The trends of the two profiles were similar and consistent, despite a tendency of the new linker to adopt an helical character in N-term, but which does not prevent the disordered nature.

As the Linker_V2 was validated as disordered, we performed the same simulations as for the (GGGS)n linker. As we already knew the ideal length from the first simulations, we created longer linkers following the sequence of Linker_V2 and determined a few conformations (Fig. 8). The sequences are provided below:

Optimal sequence of the transmembrane linker for T7Nterm-SL-Pertuzumab:

  • T7Nterm-SGGGASGGGASGEGGSGPGGSGGGESAAAGSGRLILIIVGAIAIIALLVHGFGASGEGGSGPGGSGGGESAAAGSGGGASGGGASGEGGSGPGGS GGGESAAAG-Pertuzumab

Optimal sequence of the transmembrane linker for Trastuzumab-SL-T7Cterm:

  • Trastuzumab-GAAASEGGGSGGPGSGGEGSAGGGSAGGGSGAFGHVLLAIIAIAGVIILILRGSGAAASEGGGSGGPGSGGEGSAGGGSAGGGS-T7Cterm
Figure 8: Possible conformations of the Linker_V2. Red: HER2. Gray from top to bottom: Pertuzumab linked to the Nterm subunit of the T7 RNAP. Green from top to bottom: Trastuzumab linked to the Cterm subunit of the T7 RNAP.

The results of the simulation showed that the chosen sequence formed disordered regions, allowing for a certain degree of mobility of the whole system, without interfering with the multimerization of the split T7 nor the binding of the antibodies to HER2. Having found convenient linkers with modeling, those sequences could then be used for experimentation, which would validate our model. However, due to lack of time, these experiments could not be carried out.

Conclusion
Probabilistic and molecular models allowed us to determine the optimal length of our linker sequences and predicted its disordered behavior. We have successfully designed an Anti-HER2-linked split T7 RNA polymerase biosensor using modeling methods easily accessible to other teams. This methodology could be typically adopted by iGEM teams working on chimeric proteins in rational engineering.

Modeling helped us optimize the structure of the split T7 RNAP and the HER2 sensing domain that will be embedded in our liposome. The design of a suitable linker gives the split T7 RNAP enough flexibility and mobility through the membrane to reassemble once the antibodies recognize HER2, triggering expression of the gene of interest.

For the purpose of our project, and given the time available, we decided to carry out a static analysis rather than a dynamic and energetic one. This sampling-based method allowed us to find a satisfactory solution rapidly, but is still limited. It does not take into account the actual flexibility of our system, as it is dependent on the distance we chose between each structure before simulating. As a perspective, a dynamic study of the whole complex could help us verify its stability in the liposome membrane when anchored to cancerous cells.