Team Athens 2023 Software Tool

Introduction

In the early stages of our project, we realized that our mission was straightforward: to conduct extensive trials in order to arrive at the most suitable peptide, with the highest binding affinity for each of our selected volatile organic compounds (VOCs). However, rather than resorting to conventional and resource-intensive laboratory techniques such as phage display, we chose to embark on a journey into the realm of computational research.

This journey entailed the creation of a comprehensive computational pipeline, encompassing critical stages such as peptide generation, peptide and ligand preparation, evaluation of search space, and finally, molecular docking. To streamline and automate these processes, we developed a set of Python scripts.

Our choice to embrace in silico methods over phage display underscores our commitment to efficiency, cost-effectiveness, and innovative research practices. By bypassing extensive laboratory work, we not only conserve valuable time and resources but also contribute to the evolution of computational biology as a powerful tool in molecular research.

Our dedication to efficiency extends further. We seamlessly transformed these individual scripts into a versatile Python library, designed with user-friendliness as a priority. This library empowers fellow scientists to effortlessly conduct their own in silico phage display experiments.

Beyond computational aspects, we compiled an extensive database, housing over 290,000 peptides tested for their binding affinity to the four selected VOCs. This dataset stands as a valuable resource for future research and exploration.

Furthermore, our software endeavors encompassed the development of image processing and analysis algorithms, along with machine learning techniques. These components converged to create an innovative application intended to aid in the diagnosis of specific medical conditions.

In our software-focused journey, we bridge the gap between experimental and computational biology. We provide a robust platform for peptide discovery and analysis, driven by our commitment to scientific progress and our deliberate choice to forgo the challenges of traditional phage display techniques.

Software I: PanSimPy

In our quest to find the right genetic modifications of the coat protein of our M13 Bacteriophage, we run into a daunting challenge. The task was to find the best candidate sequence of amino-acids; the right peptide following two simple rules:

• We must maximize the affinity of our receptors for their corresponding ligands.

• We must minimize the affinity of our receptors for any other ligand in the sample.

Simple enough, right? Just perform docking. Well, that’s what we thought. Well, not quite… All the amino acids that can cover each place of the sequence are 20. The search space of 3-peptides is 20^3 = 8000. Manageable but we didn’t get satisfactory results. Next search space volume: 20^4 = 160,000. This was a problem.

But we can find someone to lend us their cluster. Right? Well, unfortunately not. We had to adapt. To improve. To overcome. Besides, we are engineers. And that’s what we do. And that’s what we did, and, in the process, we developed an automated Python docking library with integrated search space algorithms to significantly improve our chances. And it was a success. Let’s now delve into PanSimPy, our biopanning simulation library, and its instructions.

Software II: PanSimPy - Python Library for Biopanning and Molecular Docking

Description

PanSimPy is a versatile Python library designed for researchers and scientists working in the field of biopanning and molecular docking. It simplifies the process of peptide generation, conversion between different molecular file types, ligand and receptor preparation for docking, and docking of peptides to ligands using either AutoDock Vina or a strategic algorithm.

The library's capabilities are grounded in both scientific insights and practical utility, making it an indispensable asset for those delving into biopanning and molecular docking studies. Whether you're a seasoned researcher seeking innovative tools to expedite your work or a newcomer looking to harness the power of molecular docking, PanSimPy is your gateway to unlocking the potential of biopanning and molecular docking.

Features

PanSimPy offers the following key features:

Peptide Generation

Easily generate peptides of specific lengths for biopanning experiments. This feature allows you to explore a wide range of peptide candidates for a given target, making it an essential tool for screening potential binding sequences.

Molecular File Conversion

Convert between different molecular file formats, such as mol files, PDB files, or SMILES, effortlessly. This flexibility ensures seamless integration with other molecular modeling tools and databases.

Ligand and Receptor Preparation

Streamline the preparation of ligands and receptors for molecular docking. PanSimPy automates the necessary steps to ensure proper input files for docking simulations, saving you valuable time and effort.

Docking Grid Generation

Automatically generate docking grids for efficient molecular docking. The grid generation process is optimized for accuracy and speed, making it suitable for large-scale docking experiments.

Molecular Docking

Perform molecular docking of peptides to ligands using two different methods:

  • Exhaustive Docking with AutoDock Vina: Utilize AutoDock Vina, a widely-used software for molecular docking, to perform a comprehensive search of the docking space and identify potential binding poses.
  • Strategic Docking Algorithm: Employ a strategic docking algorithm developed in-house to efficiently explore the docking space and identify promising peptide-ligand interactions.

Installation

1. Setup Environment

Before installing PanSimPy, you need to ensure that your environment is properly configured. The following sections outline the steps required to set up your environment on different operating systems.

Windows

To make use of PanSimPy on Windows, we recommend using the Windows Subsystem for Linux (WSL). To install WSL, follow the instructions on the Microsoft website.

Once you have WSL installed, you can follow the instructions for Linux to install PanSimPy.

Linux

To use PanSimPy on Linux, you need to install the following dependencies:

  • Python 3
  • AutoDock Vina
  • PyMOL
  • Open Babel
  • RDKit
  • NumPy

To install these dependencies on Ubuntu, you can use the following commands:

$ apt update
$ apt install python3 python3-pip autodock-vina pymol openbabel
pip install rdkit numpy

MacOS

To install PanSimPy on MacOS, follow these steps:

Automatically generate docking grids for efficient molecular docking. The grid generation process is optimized for accuracy and speed, making it suitable for large-scale docking experiments.

Step 1: Install Homebrew

To install Homebrew, follow the instructions on the Homebrew website.

Step 2: Install Pip

To install Pip, follow the instructions on the Pip website.

Step 3: Install AutoDock Vina

To install AutoDock Vina, follow the instructions on the AutoDock Vina website.

Step 4: Install Open Babel, RDKit, and PyMOL
brew install open-babel rdkit pymol
pip install rdkit numpy

2. Clone repository

Clone the PanSimPy repository to your local machine:

git clone https://gitlab.igem.org/2023/software-tools/athens.git

3. Navigate to the Repository

Change your working directory to the cloned repository:

cd PanSimPy

4. Install PanSimPy

You can install PanSimPy using the following command:

pip install .

This will install the package and make it available for import in your Python environment.

5. Verify Installation

To verify that PanSimPy was successfully installed, you can run the following command:

python -c "import pansimpy; print(pansimpy.__version__)"

If the installation was successful, you should see the version number of the installed package printed to the console.

Usage

The following sections provide a detailed overview of PanSimPy's features and how you can use them in your research.

Example 1: Generating Random Peptides

import pansimpy

amino_acids = ["A", "R", "N", "D", "C", "Q", "E", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V"]
peptide_length = 5

random_peptide = generate_random_peptide(peptide_length, amino_acids)
print("Random Peptide:", random_peptide)

Example 2: Docking Peptides with Vina

import pansimpy

ligand_file = "ligand.pdbqt"
ligand_center = (10.0, 15.0, 20.0)
grid_size_x = 20.0
grid_size_y = 20.0
grid_size_z = 20.0
exhaustiveness = 32
no_of_peptides_to_search = 3

run_docking("vina", "receptor_directory", ligand_file, ligand_center, grid_size_x, grid_size_y, grid_size_z, exhaustiveness, no_of_peptides_to_search)

Example 3: Parallel Docking using Threads

import pansimpy

no_of_threads = 4
receptor_directory = "receptors"
ligand_file = "ligand.pdbqt"
ligand_center = (10.0, 15.0, 20.0)
grid_size_x = 20.0
grid_size_y = 20.0
grid_size_z = 20.0
exhaustiveness = 32
no_of_peptides_to_search = 10

run_docking_parallel(no_of_threads, "vina", receptor_directory, ligand_file, ligand_center, grid_size_x, grid_size_y, grid_size_z, exhaustiveness, no_of_peptides_to_search)

Example 4: Genetic Algorithm for Peptide Optimization

import pansimpy

output_folder = "peptide_optimization"
peptide_length = 7
ligand_file = "ligand.pdbqt"
ligand_center = (10.0, 15.0, 20.0)
grid_size_x = 20.0
grid_size_y = 20.0
grid_size_z = 20.0
amino_acids = ["A", "R", "N", "D", "C", "Q", "E", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V"]
exhaustiveness = 32
num_generations = 10
vina_score_threshold = -4

best_peptide = genetic_algorithm_peptide_optimization(output_folder, peptide_length, ligand_file, ligand_center, grid_size_x, grid_size_y, grid_size_z, amino_acids, exhaustiveness, num_generations, vina_score_threshold)
print("Best Peptide:", best_peptide)

Example 5: Full Docking Workflow

import pansimpy

output_folder = "full_docking_workflow"
peptide_length = 7
ligand_file = "ligand.pdbqt"
ligand_center = (10.0, 15.0, 20.0)
grid_size_x = 20.0
grid_size_y = 20.0
grid_size_z = 20.0
amino_acids = ["A", "R", "N", "D", "C", "Q", "E", "G", "H", "I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V"]
exhaustiveness = 32
num_generations = 10
vina_score_threshold = -4
no_of_threads = 4
no_of_peptides_to_search = 10

full_docking_workflow(output_folder, peptide_length, ligand_file, ligand_center, grid_size_x, grid_size_y, grid_size_z, amino_acids, exhaustiveness, num_generations, vina_score_threshold, no_of_threads, no_of_peptides_to_search)

Genetic algorithms

The thought to use genetic algorithms to navigate through are vast peptide search space was game-changing. Brute force docking wasn’t yielding significant results, and then we incorporated genetic algorithms. It was a breakthrough. Below, we just touch on the topic encouraging fellow iGem teams to strategically explore their problems’ search spaces:

Genetic algorithms (GAs) are optimization and search techniques inspired by the process of natural selection. Drawing parallels to biological evolution, GAs work by initializing a population of potential solutions, then iteratively selecting, crossing over, and mutating individuals based on a fitness function that evaluates their quality. Over successive generations, this evolutionary process promotes the proliferation of higher-quality solutions and gradually converges towards an optimal or near-optimal solution to a given problem. The adaptability and robustness of GAs make them effective for a wide range of complex problems where the search space is large or not well-understood.

Software III: App, Algorithms and Models

The creation of the app was of utmost importance to conclude the SCENTIPD project. We built the prototype making sure was extremely easy to use. Our app, containing an advanced image processing and deep learning pipeline can serve as a valuable point of reference for future iGEM teams in need of such an MVP.

Let’s first examine the pipeline of the diagnosis extraction:

Image 1: The phone is placed on top of the kit’s glass lid. The images is taken, the app processes it and extracts the diagnosis.

Image 2: An auxiliary box is placed in the camera functionality of the app to aid with the point of reference for image taking. Yet, the image processing pipeline can handle irregularities and anomalies, or even going outside of the boundaries.

The Prototype:

Image 3: The initial login/sign-up screen. Here a lot of crucial demographic data are gathered by creating the users’ profiles.

Image 4: The app’s Home Page. The design is very simplistic and minimal, the users are provided with 5 options: a) Open Camera, b) Upload Image, c) Show Results, d) Help, e) the start-up’s site.

Image 5: The results are provided in a straightforward (indication of PD) but soft way (suggestion to contact a neurologist)

Image 6: Help pages to aid with the kit’s usage. Must be updated according to the final version of the re-engineered kit.

APIs and DB schema:

The APIs implemented in our app implemented the logic of our app’s functionalities. Our APIs/app’s functionalities are the following:

@image_processing.route('/scentipd/image_processing', methods=['POST'])

This controller processes the image and extract the image’s attributes, storing the in the Database. It also applies the deep learning model on the image and classifies it, extracting the diagnosis.

@patient_handler.route('/api/patients', methods=['POST'])

This controller handles the sign-up process of a user and stores their data in the Database.

@patient_handler.route('/api/patients/login', methods=['POST'])

This controller handles user log-in.

@patient_handler.route('/api/patients/', methods=['GET'])

This controller extracts and stores the email of the user that logged in to make it possible to bring their past results stored in the database upon opening the ‘Results’ page.

Our database is noSQL (MongoDB). Our app’s database schema is the following:

ImageSchema

- id

- patientId

- imageAnalysis

- prediction

- date

PatientSchema

- _id

- fullName

- age

- gender

- images

- username

- password

Image Processing Pipeline: The algorithm

Let’s now follow our image processing algorithm steps:

Filter pixels where the RGB values don't differ more than 15 from each other Apply the filter to the image making the filtered pixels black.

1. Load the image.

2. Filter pixels where the RGB values don't differ more than 15 from each other (|R1-B1|<15, |R1-G1|<15, |B1-G1|<15)

3. Apply the filter to the image making the filtered pixels black

4. Convert the filtered image to grayscale

5. Perform morphological operations to enhance the rectangles.

6. Find contours in the eroded (edited) image

7. Keep track of the bounding rectangles

8. Handle noise outside of the rectangles: Handle shapes that haven’t been filtered out. Iterate over the contours

a) Calculate the area of the contour

b) Check if the contour area exceeds the threshold (1000 pixels):

i) Find the convex hull of the contour

ii) Find the bounding rectangle of the hull

iii) Shrink the current rectangle by a buffer (e.g., 10 pixels)

iv) Append the shrunken rectangle to the list of bounding rectangles

9) Convert the filtered image back to PIL Image format

10) Create a new image to hold the side-by-side rectangles

11) Iterate over the rectangles and copy them to the new image:

a) Crop the rectangle from the filtered image

b) Paste the rectangle onto the new image

12) Handle noise inside of the rectangles

a) Convert PIL image to OpenCV image

b) Create the mask for black pixels

c) Perform inpainting telea of radius 3

d) Convert the inpainted image to PIL format

13) Adjust the size to match the model’s input

14) Display the inpainted image

The Image Processing Pipeline Visualized:

STEP 1

Image 7: Filter out the case of the kit

STEP 2

Image 8: Draw a red outline around the rectangles that contain the colorimetric results of the kit.

STEP 3

Image 9: Using a buffer make the red rectangular outlines smaller to avoid the rough outline of the rectangles. Every red rectangular outline is of the same size.

STEP 4

Image 10: Put the rectangles side by side creating the colorimetric result image.

STEP 5

Image 11: Handle noise inside of the rectangles. The image is ready to be fed into the deep learning models.

Deep Learning

But why do we use deep learning to extract the diagnosis from the colorimetric results?

Let C1=(R1,B1,G1) and C2=(R2,B2,G2). We define ΔColor to be the vector difference of the two: ΔC=(R1-R2,B1-B2,G1-G2)

In order to encapsulate their proximity in RGB dimensions. Of course, distance metrics like Euclidean Distance wouldn’t work, because they would fail to distinguish differences between each dimension individually. The RGB channels were chosen arbitrarily. In order to differentiate the phage colorimetric results between PD patients and healthy participants it is important to study the color boundaries of healthy and PD results. Thus, other color channels/coordinate systems should be examined as well, like YIQ or CMYK. The conversion formulas across different channels will facilitate this boundary colorimetric result study. In the bibliography, there hasn’t been a colorimetric result analysis between phages and PD-related VOCs. So, the colorimetric boundaries might not be well established by any of the traditional color channels and linear or non-linear transformations might be imperative to be applied on the color channels/coordinate systems. This fact, coupled with the fact that our colorimetric results consist of 4 lanes of independent color gradients, make it really hard for the boundary to be simple enough to be manually calculated or for the necessary linear or non linear dimensionality transformation functions to be tracked. In conclusion, for the reasons listed above, we propose a Deep Learning pipeline, to uncover those hidden transformations and boundaries of PD patients and healthy participants. Deep learning models are tailored to the boundary and dimensionality problems we encountered, because they are comprised of many layers of nonlinear functions that aim to make the samples separable linearly. Deep learning models, when trained correctly, are adept at identifying intricate patterns in vast amounts of data and can automatically learn the ideal transformations for the problem at hand.

Yet, without real world data, we can’t build a robust model. For totality purposes we present a simple CNN.

Synthetic colorimetric result image creation

Because clinical data will be limited and it will be tough at first to train our deep learning models effectively, we have developed a synthetic image pipeline, simulating the colorimetric results of the kit, so that we will be able to expand our datasets after the initial clinical studies:

Image 12: Synthetic image of colorimetric results.

PepDB - A peptide-VOC binding affinity database

Description

PepDB is a powerful tool designed by Team Athens in 2023. This database, named peptides_database.db, is specifically designed to store and manage critical information about peptides and their binding affinities for four different VOCs (Volatile Organic Compounds): Hippuric Acid, Perillaldehyde, Octadecanal, and Icosane. This information is invaluable for researchers and scientists studying various fields, including molecular biology, chemistry, and drug development.

Installation

To use the PepDB tool, you need to download the database file (peptides_database.db) and place it in a directory of your choice. You can then use the database file to interact with the data using the SQLite command-line shell, a GUI tool, or a Python script.

Another option is to download the csv files (hippuric_acid.csv, perillaldehyde.csv, octadecanal.csv, and icosane.csv) and use them as a reference for creating your own database. This method is useful if you want to customize the database schema or add more data to it.

Schema

The database schema is structured to provide a comprehensive storage solution for peptide data. It consists of a single table named peptides with several essential columns:

- id (INTEGER, PRIMARY KEY, AUTOINCREMENT): This column serves as a unique identifier for each record in the table. It ensures that every peptide entry can be distinguished easily.

- peptide (TEXT, UNIQUE): The peptide column stores the unique sequences of peptides. This column is marked as UNIQUE, meaning that no two peptides with the same sequence can exist in the database.

- binding_affinity_hippuric_acid (REAL): This column holds the binding affinity data for the VOC named ‘hippuric_acid.’ It records the strength of the interaction between the peptide and hippuric acid, providing valuable insights into their compatibility.

- binding_affinity_perillaldehyde (REAL): Similar to the ‘hippuric_acid’ column, this column stores the binding affinity information for ‘perillaldehyde,’ another VOC. It helps researchers assess the peptide’s interaction with perillaldehyde.

- binding_affinity_octadecanal (REAL): The ‘octadecanal’ column is dedicated to recording the binding affinity between peptides and octadecanal, offering crucial data for understanding their relationship.

- binding_affinity_icosane (REAL): Lastly, the ‘icosane’ column tracks the binding affinity between peptides and icosane, providing insights into their compatibility with this specific VOC.

Usage

Team Athens has designed the PepDB tool to be user-friendly and flexible, offering multiple methods for interacting with the database:

1. SQLite Command-Line Shell: If you prefer a command-line interface, you can use the SQLite command-line shell. Simply navigate to the directory containing the database file (peptides_database.db) and run sqlite3 peptides_database.db to start the shell. This allows you to execute SQL queries and interact with the data interactively.

2. SQLite GUI Tools:For those who prefer graphical interfaces, there are excellent tools available, such as "DB Browser for SQLite" or "DBeaver". These tools provide a user-friendly environment for managing and querying the database. You can use them to explore the database schema, run SQL queries, and view data in a convenient tabular format.

3. Python Scripting: To interact with the database using Python, the sqlite3 library is your ally. This method allows you to automate database operations and integrate them into your research or application.

Examples

SQLite Command-Line Shell

sqlite3 peptides_database.db
SQLite version 3.40.1 2022-12-28 14:03:47
Enter ".help" for usage hints.
sqlite> SELECT * FROM peptides WHERE peptide = 'WWW';
290841|WWW||-3.555||
sqlite> .quit

Python Scripting

import sqlite3

connection = sqlite3.connect('peptides_database.db')

cursor = connection.cursor()

cursor.execute("SELECT * FROM peptides WHERE peptide = 'WWW'")

results = cursor.fetchall()

print(results)

connection.close()

Notes

It's important to note that when no binding affinity data is available for a particular peptide from a specific source, the corresponding column contains NULL. In Python, this is represented as None. Researchers should be aware of these null values when analyzing the data.

For optimal performance and resource management, remember to close the database connection after use. Failing to do so can lead to resource leaks, which may affect the stability of your application or research environment.

SCENTIPD Notebooks

Color Extraction

This Jupyter notebook, Color_Extraction.ipynb, demonstrates how to extract the dominant colors from an image using OpenCV and K-Means clustering. The notebook includes code for loading an image from a file, converting the image to the HSV color space, applying a color mask to extract the desired colors, and then computing the dominant colors using K-Means clustering. The resulting colors are printed out, along with the original image and the masked image. The notebook also includes code for displaying the colors as a color palette.

Getting Started

To run this notebook, you will need to have the following dependencies installed:

- OpenCV

- NumPy

- Matplotlib

The notebook assumes that you have an image file called kit.jpg in the notebook’s directory. You can download the kit.jpg file from the notebook’s directory and run the cells in the notebook in order. The notebook will load the image from the file, convert it to the HSV color space, apply a color mask to extract the desired colors, and then compute the dominant colors using K-Means clustering. The resulting colors will be printed out, along with the original image and the masked image.

Usage

To use this notebook, download the kit.jpg file in the notebook’s directory and run the cells in the notebook in order. You can specify the path to your image file in the img_path variable, and modify the color mask and K-Means clustering parameters as needed.

Diagnosis Algorithm

This Jupyter notebook, Diagnosis_Alg.ipynb, demonstrates how to train a deep learning model to diagnose medical images using TensorFlow and the ResNet50 architecture. The notebook includes code for loading and preprocessing image data, defining and training a deep learning model, and evaluating the model’s performance.

Dependencies

To run this notebook, you will need to have the following dependencies installed:

- TensorFlow

- scikit-learn

- NumPy

- PIL

- Data

The notebook assumes that you have a dataset of medical images in a directory called data. The images should be organized into subdirectories based on their class labels, with each subdirectory containing images of a single class. The notebook uses the ImageDataGenerator class from TensorFlow to load and preprocess the image data.

Model

The notebook uses the ResNet50 architecture, a pre-trained convolutional neural network, as the base model for the diagnosis algorithm. The notebook loads the pre-trained ResNet50 model without the top (classification) layer, freezes the layers in the base model so they won’t be trained, and then defines a new model that starts with the base model and ends with a Dense classification layer. The new model includes several additional layers, including a Flatten layer, BatchNormalization layers, and Dropout layers, to improve the model’s performance.

Training

The notebook splits the data into a training set and a test set using the train_test_split function from scikit-learn. The notebook then compiles the model using the Adam optimizer and binary cross-entropy loss, and trains the model for 10 epochs using the fit method. The notebook saves the trained model to a file called model.h5.

Evaluation

The notebook evaluates the performance of the trained model using the evaluate method, which calculates the loss and accuracy of the model on the test set. The notebook prints the test accuracy to the console.

To use this notebook, download and extract the kit_colors.zip in the notebook’s directory and run the. You can specify the path to your image data in the data_dir variable, and modify the model architecture and training parameters as needed.

Contributing

We welcome contributions from the community that can help improve PanSimPy and make it even more useful for researchers and scientists in the field of biopanning and molecular docking. If you're interested in contributing, please follow these steps to get started:

1. Fork the Repository: Start by forking the PanSimPy repository to your GitLab account.

2. Clone the Forked Repository: Clone the forked repository to your local machine using the following command:

git clone https://gitlab.igem.org/2023/software-tools/athens.git

3. Create a New Branch: Create a new branch for your contribution. Make sure to choose a descriptive name that reflects the nature of your contribution.

git checkout -b feature/new-feature

4. Make Changes: Make the necessary changes and improvements to the codebase.

5. Test Your Changes: Before submitting a pull request, ensure that your changes are tested and do not introduce any regressions.

6. Commit and Push: Commit your changes and push them to your forked repository.

git commit -m "Add your commit message here"
git push origin feature/new-feature

7. Submit a Pull Request: Go to the PanSimPy repository on GitLab and click on the “New Pull Request” button. Choose your branch and provide a clear description of your changes.

8. Code Review: Your pull request will be reviewed by the maintainers. Be prepared to address any feedback or suggestions.

Authors and Acknowledgment

The Software Tools have been developed by a dedicated team of four individuals who poured their expertise and efforts into every aspect of the project. We extend our heartfelt thanks and recognition to these core members for their substantial contributions:

Core Development Team

The Dry Lab memberss of the iGEM Athens 2023 team are responsible for the development of PanSimPy. Their collective efforts have resulted in a powerful tool that can be used by researchers and scientists to streamline their biopanning and molecular docking studies. Their hands-on usage of PanSimPy has been instrumental in the development of the tool, and has played a crucial role in the creation of PepDB.

Furthermore, we extend our appreciation to the authors of the following pivotal papers that have significantly influenced our work:

- Eberhardt, J., Santos-Martins, D., Tillack, A.F., Forli, S. (2021). “AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings.” Journal of Chemical Information and Modeling.

- Trott, O., & Olson, A. J. (2010). “AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.” Journal of Computational Chemistry, 31(2), 455-461.

We also acknowledge the importance of the following software tools that have been instrumental in our development process:

- Open Babel: N M O’Boyle, M Banck, C A James, C Morley, T Vandermeersch, and G R Hutchison. “Open Babel: An open chemical toolbox.” Journal of Cheminformatics, 3, 33 (2011). DOI:10.1186/1758-2946-3-33

- PyMOL: Schrödinger, LLC, ‘The PyMOL Molecular Graphics System, Version 1.8’. Nov-2015.

- RDKit: Open-source cheminformatics. RDKit Website

The functionalities and insights provided by these papers and software tools have played a crucial role in the creation of PanSimPy.

Should you be interested in becoming part of PanSimPy as well as PepDB and contributing to their continued growth, please refer to the “Contributing” section above for more information on how you can participate.