Hypothesis 2.2 - Drug molecule generation for Alzheimer’s disease

Organization


Drug discovery [2] can be described as the process of identifying chemical entities that have the potential to become therapeutic agents. The key goal of drug discovery campaigns is the recognition of new molecular entities that may be of value in the treatment of diseases that qualify as presenting unmet medical needs. Drug Target interactions [3] which is the identification of interactions between drugs and the target proteins is a key step in drug discovery. This not only aids to understand the disease mechanism, but also helps to identify unexpected therapeutic activity or adverse side effects of drugs. Hence, drug-target interaction prediction becomes an essential tool in the field of drug repurposing. We will be using various modeling approaches to identify the possible drugs that target and slow down one form of dementia, Alzheimer’s disease.

The page is organzied as below

  1. Description

  2. Modeling

  3. Results

Note: The background information of Deep learning techniques and Quantum Computing principles are added HERE!

Description


A suitable generative model is also trained to create newer drug structures targeting the disease. This model is trained to predict the possible drug compounds which slow down the progression based on the data from ChEMBL database [1]. The bioactive elements for AChe inhibition are taken and used to train the generative model.

The database is queried for single protein targets based on their bioactivity towards inhibiting this AChe inhibitor. The bioactivity data for Human Acetylcholinesterase which has target values has been selected for model training. The main parameter used here for training the model is the IC50 value.

Half maximal inhibitory concentration (IC50) is a measure of the potency of a substance in inhibiting a specific biological or biochemical function. IC50 is a quantitative measure that indicates how much of a particular inhibitory substance (e.g. drug) is needed to inhibit, in vitro, a given biological process or biological component by 50%. Lower the number of IC50, the better the potency of the drug becomes.IC50 is represented using molar concentration.

The bioactivity data is the IC50 value. Compounds having values of less than 1000 nM will be considered to be active while those greater than 10,000 nM will be considered to be inactive. As for those values between 1,000 and 10,000 nM will be referred to as intermediate. The dataset is prepared [2] with the molecular ID, the Canonical SMILES representation of the molecule and their corresponding IC50 value.


Modeling


Quantum Model

We will use the Generative modeling approach using Quantum Computing to create newer molecules which can be the possible targets for Alzheimer's disease. The ChEMBL pre-processed dataset is again used here for training but the molecules which have an bioactivity towards the disease are being taken for training. We use the patch method for quantum GAN as mentioned in reference [3]. This method uses several quantum generators, with each sub generator responsible for constructing a small patch of the final image. The training procedure is shown in the below figure.

drylab


The training is being done using a generator and discriminator. The generator is the quantum model (along with the sub models) which creates the patches of the molecular features. The discriminator is a classical model which is being trained to predict whether the generated molecular features are real or fake. The generator quantum model uses the same 4 stages - initialization, transformation, trainable ansatz and measurement.

  1. Initial State preparation - All 0 states.

  2. Transformation - Angle Embedding (Y rotations).

  3. Trainable Ansatz - Ansatz described in below figure.

  4. Measurement - Z measurement on all the qubits.

drylab


Classical Model

Here we use a pure classical, generative Recurrent Neural network [4] model to generate newer molecules. The archietcure of the model is shown below inn the figure

drylab


The model is trained in 2 ways

The Canonical SMILES sequenece set is passed and the next output is calluated from the model. The loss function can compute loss from this stage and do back propagation. The second method is to generate an entire sequence and compute the total loss values, and then perform back propagation.

Results


Quantum Model

Since the number of qubits availabe in the simulator is around 10 without taking over the complete memory, we need 14 different circuit with trainable parameters to generate 140 higher variant features. This caues a computation overhead and the simulation was not able to be performed.

Classical Model

The classical model was trained for 5 epochs with a batch size 32 using cross entropy loss function. It was able to reach around 1.65 loss value.

Using these weights, whenever a random input canonical SMILES data are given, the model generates upto the sequence length. This newly generted Canonical SMILES data can be converted to PaDEL features and passed to the model from Hypotheis 2.1 to figure out whether this molecule is an active element or not.


References


  1. ChemBL Database. EMBL-EBI homepage. (n.d.). https://www.ebi.ac.uk/chembl

  2. Dataprofessor. (n.d.). Dataprofessor/bioinformatics_freecodecamp. GitHub. https://github.com/dataprofessor/bioinformatics_freecodecamp.git

  3. Huang, He-Liang, et al. “Experimental quantum generative adversarial networks for Image Generation.” Physical Review Applied, vol. 16, no. 2, 27 Aug. 2021, https://doi.org/10.1103/physrevapplied.16.024051.

  4. Gupta A, Müller AT, Huisman BJH, Fuchs JA, Schneider P, Schneider G. Generative Recurrent Networks for De Novo Drug Design. Mol Inform. 2018 Jan;37(1-2):1700111. doi: 10.1002/minf.201700111. Epub 2017 Nov 2. Erratum in: Mol Inform. 2018 Jan;37(1-2): PMID: 29095571; PMCID: PMC5836943.