General Introduction

Aptamers are single-stranded DNAs and RNAs that can be designed to bind to target molecules with high affinity (Dunn, 2017). They are generally found through a tedious and expensive experimental procedure named Systematic Evolution of Ligands by Exponential Enrichment (SELEX). This experimental selection process is iterative where random sequence libraries are used to find and enrich aptamers that bind to the target molecule (Darmostuk, 2015).

Several established in silico methods for aptamer-ligand interactions are currently used in the literature (Bulgak, 2020). These include folding predictions, docking simulations, and molecular dynamics. These methods can be used to further understand how the aptamer binds to the ligand and can aid in optimizing aptamer sequences.

Some in silico methods have also been developed by past iGEM teams. For example, the Heidelberg iGEM team in 2015 developed an in silico alternative to SELEX called MAWS (Making Aptamers Without SELEX), where de novo aptamer sequences could be generated through an entropy minimization algorithm. This software was then updated in 2017 to include further functionality and adaptability. However, in 2023 the code that this team created was non-functional and outdated.

Our team mission was to take the main in silico methods for aptamer-ligand dynamics (folding, docking, and MD) and integrate them into a user-friendly pipeline. Furthermore, we wanted to improve the MAWS software by updating the code, adding new functionality, making the software streamlined for future iGEM teams, and integrating it into our pipeline.

To accomplish these objectives, we created a pipeline named AptaLoop which consists of 4 modules (more information on our gitlab). This pipeline allows the user to: design DNA or RNA aptamers to target a specific molecule (protein, organic or lipid), predict the secondary and tertiary structure of the aptamer, predict the interaction position between the two molecules and simulate their molecular dynamics.

There are 2 possibilities to run the pipeline depending on whether the user already has an aptamer sequence or not:

  1. The user already has an aptamer sequence. Then, the user needs to provide the DNA or RNA sequence as text and run Modules 1a, 2 and 3.
  2. The user does not have an aptamer sequence yet. Then, the user needs to provide the PDB file of their target molecule and run Modules 1b, 2 and 3.

General idea diagram
Figure 1: Schematic representation of the AptaLoop pipeline. This image was created with BioRender.com.

The ultimate aim of our software is to provide researchers with a holistic view of aptamer-ligand dynamics in order to guide wet lab efforts in finding the best possible aptamer sequence for the desired target molecule.

MAWS

MAWS functions by using an entropy minimization algorithm. This algorithm is based on a paper published in 2011 by Tseng et al. where entropic minimization is used as a metric for binding stability in aptamer-ligand binding dynamics (Tseng, 2011). This approach has been named the entropic fragment-based approach (EFBA) and was adapted by the Heidelberg 2015 and 2017 iGEM teams to create the software known as MAWS.

The following mathematical explanation of the MAWS algorithm selection process is adapted from the Heidelberg iGEM team software page (Heidelberg, 2015).

We can begin by taking the equation for relative entropy (Kullback-Leibler divergence), which is a measure of how a probability distribution ( P ) diverges from another distribution ( Q ):

$$ S[P|Q] = -\int P \cdot \log\left(\frac{P}{Q}\right) d\chi $$

Distribution (P), in this case, would be the probability of an aptamer to bind to our target molecule, and (Q) would be an uniform distribution to compare our aptamer to.

To adapt this concept to include relevant physical parameters, we can replace our theoretical relative entropy equation with a novel entropy-like distance metric based on thermodynamic parameters.

$$ P(\chi) = \frac{e^{-\beta \cdot \Delta G}}{\mathcal{Z}} $$

Here we define the probability P of observing a certain conformation of the aptamer-ligand complex and introduce the inverse temperature β=1/T and Gibbs free energy Δ=G. We also introduce 𝒵, which is defined as:

$$ \mathcal{Z} = \int e^{-\beta \Delta G(\chi)} d\chi $$

𝒵, or the partition function, is a key member of the canonical ensemble, which is a way to describe the thermodynamic properties of a system in thermal equilibrium. We can use the canonical ensemble to describe the probability distribution of aptamers in different conformations to assess their thermodynamic feasibility.

Specifically, 𝒵 is the sum over all possible states i that the system can be in, and acts as a normalization factor for the expression as it ensures that the probabilities P(χ) sum to 1 across all states. This makes P(χ) a valid probability distribution, allowing us to compare the likelihood of different states directly.

Thus, given:

$$ S[P|Q] = -\int P \cdot \log \left( \frac{P}{Q} \right) d{\chi} $$

We can replace Q with our uniform distribution and use a natural logarithm to yield:

$$ \overline{S}[P] := -\int -P \cdot \beta \Delta G(\chi) + \ln(QZ) $$

From these equations we can derive the following distance metric:

$$ g[P,P'] = \left| \bar{S}[P'] - \bar{S}[P] \right| $$

This distance metric serves to give basis for aptamer selection, as the conformation with the higher probability P(χ) is selected through its entropic favorability.

In essence the selection through entropic criteria can be explained through the example of a rock and a magnet.

If a rock is thrown against a magnet, the resultant collision will be highly disordered, which is to say the entropy of this system will be high. If another magnet is thrown instead of a rock, then the resultant system will be less disordered than the rock, as there is a degree of attraction between the two objects.

Thus, the selection criteria for the aptamer sequence is the system calculated to have the minimal amount of entropy. What this means is that a lower degree of disorder is taken as a metric for binding stability, as lower entropy indicates more stability in aptamer ligand binding.

In summary, lower entropy increases the probability of binding P(𝓍), which through the distance metric, is compared to alternative conformations and the best overall conformation is selected for.

MAWS Algorithm

The MAWS algorithm functions through iterative nucleotide sampling over a target molecule within a bounded box. The process begins with the addition of one of four nucleotides (e.g. A, T, C, G) and iterates each nucleotide over a series of binding positions around the target molecule. The energies calculated for each position are saved, and parameters Z, P, and S are all computed and the minimal entropy aptamer selection criteria is run to choose the aptamer (figure 1).

General idea diagram
Figure 2: Flowchart outlining MAWS algorithm as generated by the Heidelberg iGEM team (2015). (https://2015.igem.org/Team:Heidelberg/software/maws)

This results in a nucleotide being chosen based on its overall thermodynamic favorability. This process is repeated by adding another nucleotide module to the previous nucleotide and repeating the process, such that the adjoined 2 nucleotide aptamer has its thermodynamic favorability assessed in each location for each base pair, with the best overall nucleotide being chosen to be the second nucleotide in the aptamer. It is important to note that this method of selection does not yield a binding location, as it selects nucleotides based on a generalized thermodynamic favorability.

General idea diagram
Figure 3: PFOA-specific aptamer generated using our updated MAWS script. (https://2015.igem.org/Team:Heidelberg/software/maws)

Our improvements to the code

We first attempted to run the MAWS algorithm from the Heidelberg 2017 iGEM team (github repo) following the guide that the iGEM NU Kazakhstan 2022 team made last year. However, we could not manage to make it work as it was, so we had to do some modifications to make it functional again. Mainly, the problem was related to package version incompatibilities, which was a bit difficult to debug since the original software did not contain a requirements.txt file. Also, some minor changes were needed to some path variables and functions.

When we had the first version running, we decided to first improve it by translating the code from python2 to python3 and correct some indentation issues to make the software easier to use and maintain in the future. We then proceeded by implementing the possibility of generating DNA aptamers, so that the user was not limited to RNA. And we also added more types of ligand molecules, specifically lipids and small molecules, so that the user was not limited to choosing a protein for the aptamer target. In order to properly add all these new capabilities to the software, we also incorporated the appropriate force fields to use depending on each case.

Other improvements include correcting the output naming, fixing compatibility issues within some functions and updating the argument parser to add more options (type of aptamer, type of ligand and conda environment name). Importantly, we generated an environment.yml file in order to make the environment reproducible and compatible in any other machine. Finally, we created a Jupyter implementation to make the software easier to use and to prove the reproducibility of our conda environment.

Future work

Although we were capable of making the MAWS software viable again and we provided some improvements, there are still some functionalities we could not implement due to lack of time and resources. Here, we provide insights on what we think it would be a good idea to add in the future.

An added functionality we are missing is the ability to truncate a given aptamer. This could be useful if the user already has a long aptamer and they want to make it shorter without losing stability or affinity for the target molecule. Another idea is to make MAWS capable of improving a given aptamer by exploring single nucleotide changes that would favor the overall thermodynamic favorability.

Additionally, we found out that the program lacks a way of resuming a long analysis. It happened to us that the analysis crashed in the middle and we had to start over again. So instead, it would be nice to have a kind of cache that allowed the user to resume the analysis if anything happened.

Finally, we identified a problem related to the threads the program uses. We believe it has something to do with the openMM package, but we are not sure. This issue makes the program run slower on a queuing system, because it is trying to use more threads than the limit imposed by the user.

We hope that the insights we provide here can guide efforts of future iGEM teams that have a project with aptamers.

Docking

Molecular docking is a computational technique used in structural biology and drug discovery to predict how two or more molecules interact with each other and form a stable complex (Morris, 2008). Docking algorithms aim to predict the most energetically favorable binding mode or conformation of the ligand within the protein's binding site (Meng, 2011). In most cases, docking is used to study the binding site and binding affinity of proteins and ligands, but other molecules can also be explored. In the case of AptaLoop, we performed docking between RNA molecules (our aptamers) and a specific ligand (PFOA).

Several software tools have been developed to perform docking. For AptaLoop, we chose AutoDock Vina, one of the most widely used docking engines, because of its ease in implementation, known accuracy, and it is open source, allowing for all the users of AptaLoop to perform docking without restrictions. In order to test the accuracy of the results we obtained with AutoDock Vina, we also performed docking between our aptamers and PFOA using Haddock. We deemed it possible that Haddock was more reliable, since it allows for the tuning of multiple parameters depending on the input, but it is more complex to configure and requires the users to ask for licenses. Thus, we decided not to include Haddock in AptaLoop, but use it to validate the AutoDock Vina results.

AutoDock Vina

As stated above, AutoDock Vina was chosen to be a part of AptaLoop because of its simple implementation. In contrast with other docking software tools, it only requires three inputs:

  • A structural file for the receptor
  • A structural file for the ligand
  • The search space on the receptor in which to look for the ligand’s binding site.

Moreover, even if AutoDock Vina is mostly used for protein-ligand docking, it was specifically designed for receptor-ligand docking (Eberhardt, 2021) (Trott, 2009), so it is also applicable to aptamer-ligand docking.

Scoring function Autodock Vina

All docking methods use scoring functions that aim to estimate the binding affinity between the ligand and the receptor. They are designed to capture the intermolecular forces and interactions that affect ligand binding, and their goal is to identify the position and orientation of the ligand with respect to the receptor that allows for more favorable binding (Li, 2019).

AutoDock Vina calculates the energy associated with each ligand pose, considering factors like steric clashes, hydrogen bonds, and electrostatic interactions. Then, it ranks these poses based on their calculated energy scores, looking into the most likely binding modes and estimating the binding affinity between the ligand and protein. To do so, it uses an empirical scoring function that aims to estimate the binding affinity of a receptor-ligand complex (Trott, 2009). The software takes into account Van der Waals interactions (attractive and repulsive forces between atoms), hydrogen bond interactions, and electrostatic interactions (Eberhardt, 2021). Thus, AutoDock Vina combines the advantages of force filed-based and empirical scoring functions, meaning that it uses physical principals and parameters to calculate energy terms, but it also relies on statistical analysis of experimentally determined binding data to derive scoring terms. For more detailed information about the scoring function, refer to Autodock Vina’s official documentation (AutodockVina, 2023).

Interestingly, AutoDock Vina ignores the partial charges supplied by the users in the input files. Instead, it addresses electrostatic interactions through the hydrophobic and hydrogen bonding terms (AutodockVina, 2023).

The software first places the ligand in an initial position and uses an optimization algorithm (Broyden-Fletcher-Goldfarb-Shanno, or BFGS) to refine its position and orientation within the binding site. For each ligand pose, AutoDock Vina uses the scoring function to calculate the energy. The sites with highest binding affinity get the lower scores (Trott, 2009).

Running Autodock Vina

Before running the software, it is important to ascertain that both the receptor and ligand files contain all the hydrogen atoms present in the molecule. While the positions of hydrogens in the output are arbitrary because AutoDock Vina uses a scoring function that only involves the heavy atoms (united-atom scoring function), the hydrogens in the input files are used to determine which atoms can be hydrogens bond donors or acceptors (AutodockVina, 2023). Hence, it is important to reduce the input files.

AutoDock Vina requires PDBQT files for both the receptor and the ligand as input. These files can be obtained both from PDB (more suitable for the receptors) or SDF (more suitable for small ligands) files using different methods. For AptaLoop, we chose the Open Babel software because its efficiency and ease to implement in a pipeline (O'Boyle, 2011) (Open Babel, 2023). AptaLoop takes PDB files for the receptor, and allows both PDB or SDF files for the ligand.

AutoDock Vina searches for possible sites and positions of docking within a defined grid that the user must provide as input. The more reduced the volume of the grid is, the less the computational effort and the more accurate the results will be (Trott, 2009). However, as was the case with our aptamers, if the user cannot determine the approximate area where the ligand will bind, it is necessary to define a grid that covers the whole receptor. A very large search space can lead to increased computational demands (and therefore slower docking) but, most importantly, it implies a risk of missing potential binding sites and reduced accuracy. Thus, if possible, it is always recommended to constrain the searching space as much as possible. A good approach for the large grid problem is to first run AutoDock Vina a few times searching for the binding site across the totality of the receptor, and then selecting the best binding site and running the software again, narrowing down the searching space to the area of the predicted binding site.

How to define the grid? What values to use? AutoDock Vina asks for “X center”, “Y center”, and “Z center”, the coordinates of the center of the center of the grid in the three-dimensional space; and for “X size”, “Y Size”, and “Z size”, which determine the dimensions of the grid box along each axis. All these values can be determined using Chimera (Tools > Surface/Binding analysis > AutoDock Vina).

Haddock

Haddock is a computational software that stands out for its ability to predict and model biomolecular interactions with exceptional accuracy. Haddock allows customization of the docking run to a particular scenario, such as protein-ligand, protein-protein, protein-nucleic acid, or, as in our case, nucleic acid-ligand.

This software works in three main stages. In the first step, it treats the parts that interact with each other as stiff elements (rigid bodies) that don't move. In the second stage, it allows these parts to be flexible, allowing them to move, rather than keeping them "frozen" in space. Finally, there's an optional third stage that refines the results by considering the space and interactions involving water molecules (Van Zundert, 2016) (HADDOCK, 2023) (Honorato, 2021). For more information, refer to Haddock's documentation.

For evaluating the binding affinity between receptor and ligand, Haddock calculates a score using a linear combination of various energies and surface area. For more information, refer to Haddock’s official documentation (HADDOCK2.4, 2023). The scoring function assigns the lowest value to the best structure.

Haddock’s official website contains multiple explanations, detailed tutorials, and examples for different docking situations. Furthermore, the users can post any questions on BioExcel, a forum in which experts give detailed feedback promptly (HADDOCK2.4, 2023). Haddock provides free license for non-profit purposes.

After asking through BioExcel, Haddock’s researchers enlightened us on the best way to tune parameters for RNA-ligand docking. Table 1 shows the way in which to adjust the parameters according to the molecules to dock. When docking aptamers and ligands, the recommended parameters are displayed in the last column.

Table 1 : Best values for Haddock's parameters in different docking scenarios.

Parameter Haddock's abbreviation default value Protein/Small molecule Protein/DNA DNA/small molecule
Dieletric constant for it0 dielec_0 rdie cdie rdie cdie
Dieletric constant for it1 dielec_1 rdie cdie rdie cdie
Number of MD steps for rigid body high temperature TAD initiosteps 500 0 500 0
Number of MD steps during first rigid body cooling stage cool1_steps 500 0 500 0
Initial temperature for second TAD cooling step with flexible side-chain at the interface tadinit2_t 1000 500 1000 500
Initial temperature for third TAD cooling step with fully flexible interface tadinit3_t 1000 300 1000 300
Weight of the intermolecular van der Waals energy for scoring at the rigid-body docking stage Evdw 1 0.01 1 0.01 1
Weight of the intermolecular electrostatic energy for scoring at the final stage Evdw 2 0.2 0.1 0.2 0.1
Epsilon dielectric constant for rigid-body docking epsilon_0 10 10 78 78
Epsilon dielectric constant for the semi-flexible refinement epsilon_1 10 10 78 10

As stated above, even though Haddock is a very powerful and promising tool for aptamer-ligand docking, we did not include it in AptaLoop. We chose AutoDock Vina instead because of its better accessibility . However, we used Haddock to validate AutoDock Vina's results (see Simulation).

Molecular Dynamics

Molecular dynamics(MD) was performed to understand the behavior of our aptamer in water. It was performed on the aptamer generated by MAWS.

For conducting MD simulations, a variety of software programs are available, including both licensed options like MAPS and Discovery Studio, as well as free alternatives like AMBER, CHARMM, and GROMACS. We chose GROMACS for its accessibility, extensive documentation, and compatibility with other tools like PyMol. Moreover, evidence of its effectiveness in aptamer optimization (Song, 2020) (Thevendran, 2023) further supported our choice.

In our iGEM project focused on building a PFAS biosensor, we employed MD simulations to model aptamer-small molecule interactions. The simulation workflow begins with the utilization of a 3D model of both the PFAS molecule and the aptamer sequence of interest. Critical components like the molecule's topology, position restraint files, and processed structure files are generated at this stage.

For our specific setup, the parameters chosen are highly curated to match our system:

  • Force Field: We employ the amber14sb_OL15 force field, known for its suitability for RNA or DNA aptamers.
  • Water Model: The TIP3P model is used to replicate water in the system, known for its balanced accuracy and computational efficiency.
  • Box Size: A cubic simulation box with dimensions 0.5 nm X 0.5 nm X 0.5 nm is initialized.
  • Ions: To maintain charge neutrality, ions like Na+ or Cl- are added based on the net charge of the system.

Following model construction, the editconf module is utilized to set the box dimensions, and the solvate module adds water molecules to the box, conforming to the chosen TIP3P model.

To ensure the system is free from any clashes or geometry-related issues, energy minimization (EM) is carried out. The EM process consolidates the aptamer structure, topology, and other simulation parameters into an input file, thus preparing the system for dynamic simulations.

Temperature and pressure equilibration steps are subsequently carried out to rule out any potential anomalies in the system. Given that our objective is to analyze the stability of aptamers in water, we run the MD simulations for a duration of 2 ns.

After these extensive preliminary steps, the MD simulation is executed. The resulting data, which includes key observables like aptamer conformation and PFAS binding affinity, is then analyzed to draw conclusions relevant to our biosensor project. The results of this specific analysis can be seen in the simulation tab.

AptaLoop's development

When creating AptaLoop, our intention was to develop a software tool that was useful and accesible to everybody. For such purposes, we developed Jupyter notebooks, which are interactive and are appropiate to clarify each step and give instructions to the users. We created Jupyter notebooks to:

  • Obtain a .pdb file to go from a DNA or RNA sequence to a 3D structure, using tleap, viennaRNA, and 3dRNA/DNA.
  • Generate an aptamer for a determined target molecule, using the optimized version of MAWS.
  • Perform Docking, using AutoDock Vina.
  • Perform Molecular Dynamics, using GROMACS.

After creating the Jupyter notebooks, we realized that plenty of dependencies were needed for the whole pipeline to run. In our experience with other software tools installations, this often leads to compatibility issues and other problems that prevent the users from using a software tool effectively and with no complications.

Since we wanted to make AptaLoop as accessible and user-friendly as possible, we decided to package the entire AptaLoop pipeline within a Docker container. We created a Docker image that contains all the necessary dependencies and libraries required for AptaLoop, thereby eliminating the often challenging task of managing and installing multiple dependencies on various operating systems. By implementing Docker, we eliminated compatibility issues, version conflicts, and complex instalation procedures.

Interested in using AptaLoop? Visit our gitlab, and start exploring all its functions.

References

  • AutoDock Vina Documentation. (2023). Frequently asked questions. https://autodock-vina.readthedocs.io/en/latest/faq.html

  • Buglak, A. A., Samokhvalov, A. V., Zherdev, A. V., & Dzantiev, B. B. (2020). Methods and applications of in silico aptamer design and modeling. International Journal of Molecular Sciences, 21(22), 8420. https://doi.org/10.3390/ijms21228420

  • Darmostuk, M., Rimpelova, S., Gbelcova, H., & Ruml, T. (2015). Current approaches in Selex: An update to Aptamer Selection Technology. Biotechnology Advances, 33(6), 1141–1161. https://doi.org/10.1016/j.biotechadv.2015.02.008

  • Dunn, M. R., Jimenez, R. M., & Chaput, J. C. (2017). Analysis of Aptamer Discovery and Technology. Nature Reviews Chemistry, 1(10). https://doi.org/10.1038/s41570-017-0076

  • Eberhardt, J., Santos-Martins, D., Tillack, A. F., & Forli, S. (2021). AutoDock Vina 1.2.0: New docking methods, expanded force field, and Python bindings. Journal of Chemical Information and Modeling, 61, 3891–3898.

  • HADDOCK 2.4. (2023). https://wenmr.science.uu.nl/haddock2.4/

  • HADDOCK 2.4 scoring function. (2023). https://www.bonvinlab.org/software/haddock2.4/scoring/

  • Heidelberg University iGEM Team. (2015). MAWS software. https://2015.igem.org/Team:Heidelberg/software/maws

  • Honorato, R. V., et al. (2021). Structural biology in the clouds: The WeNMR-EOSC ecosystem. Frontiers in Molecular Biosciences, 8, 729513.

  • Li, J., Fu, A., & Zhang, L. (2019). An overview of scoring functions used for protein–ligand interactions in molecular docking. Interdisciplinary Sciences, Computational Life Sciences, 11, 320–328.

  • Meng, X.-Y., Zhang, H.-X., Mezei, M., & Cui, M. (2011). Molecular docking: A powerful approach for structure-based drug discovery. Current Computer-Aided Drug Design, 7, 146–157.

  • Morris, G. M., & Lim-Wilby, M. (2008). Molecular docking. In A. Kukol (Ed.), Molecular modeling of proteins (Vol. 443, pp. 365–382). Humana Press.

  • O’Boyle, N. M., et al. (2011). Open Babel: An open chemical toolbox. Journal of Cheminformatics, 3, 33.

  • Open Babel. (2023). https://github.com/openbabel/openbabel

  • Park, J., Yang, K.-A., Choi, Y., & Choe, J. K. (2022). Novel ssDNA aptamer-based fluorescence sensor for perfluorooctanoic acid detection in water. Environmental International, 158, 107000.

  • Song, M., Li, G., Zhang, Q., Liu, J., & Huang, Q. (2020). De novo post-SELEX optimization of a G-quadruplex DNA aptamer binding to marine toxin gonyautoxin 1/4. Computational and Structural Biotechnology Journal, 18, 3425–3433. https://doi.org/10.1016/j.csbj.2020.10.041

  • Thevendran, R., Tang, T., & Citartan, M. (2023). in‐silico selection employing rigid docking and molecular dynamic simulation in selecting DNA aptamers against androgen receptor. Biotechnology Journal, 18(4). https://doi.org/10.1002/biot.202200092

  • Trott, O., & Olson, A. J. (2009). AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. Journal of Computational Chemistry. https://doi.org/10.1002/jcc.21334

  • Van Zundert, G. C. P., et al. (2016). The HADDOCK2.2 web server: User-friendly integrative modeling of biomolecular complexes. Journal of Molecular Biology, 428, 720–725.