Hypothesis 2.1 - Drug Target Prediction for Alzheimer’s disease

Organization


Alzheimer’s disease is caused by the accumulation of plaques in between the neural connections of the brain. These plaques prevent the transmission of neural signals between different neurons thereby removing their connection causing loss of memory.

The page is organzied as below

  1. Description

  2. Modeling

  3. Results

Note: The background information of Deep learning techniques and Quantum Computing principles are added HERE!

Description


Through various years of research, a hypothesis has been validated that Acetylcholinesterase (AChe) [1] inhibitors are effective in slowing down (or reversing) Alzheimer’s disease and Myasthenia gravis. Alzheimer’s Disease is characterized by lower levels of Acetylcholine than normal cognition. Acetylcholine is a neurotransmitter that breaks down into acetate and choline.It’s role is to terminate neuronal transmission and signaling between synapses. Acetylcholinesterase (AChE) inhibitors result in higher concentrations of acetylcholine and better communication between neurons.This can temporarily improve or stabilize Alzheimer’s disease symptoms.Current AChE inhibitor treatments available for AD include Donepezil, galantamine and Rivastigmine.

drylab


The dataset is curated from the vast section of open source drug target dataset available in ChemEMBL database ( Link - ChEMBL Database (ebi.ac.uk)). ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.

The database is queried for single protein targets based on their bioactivity towards inhibiting this AChe inhibitor. The bioactivity data for Human Acetylcholinesterase which has target values has been selected for model training. The main parameter used here for training the model is the IC50 value.

IC50

Half maximal inhibitory concentration (IC50) is a measure of the potency of a substance in inhibiting a specific biological or biochemical function. IC50 is a quantitative measure that indicates how much of a particular inhibitory substance (e.g. drug) is needed to inhibit, in vitro, a given biological process or biological component by 50%. Lower the number of IC50, the better the potency of the drug becomes.IC50 is represented using molar concentration.

Dataset Preprocessing

The bioactivity data is the IC50 value. Compounds having values of less than 1000 nM will be considered to be active while those greater than 10,000 nM will be considered to be inactive. As for those values between 1,000 and 10,000 nM will be referred to as intermediate. The dataset is prepared with the molecular ID, the canonical smiles representation of the molecule and their corresponding IC50 value.

Compound Representation

The canonical smile representation is used to denote a molecular structure. Canonical SMILES [8] is a special version of the SMILES format that uniquely identifies a single molecule structure. SMILES is a linear text format that describes the connectivity and chirality of a molecule. Canonical SMILES uses a labeling system to uniquely identify each compound.

Exploratory Data Analysis

Rdkit package [3] - It is an open source software package consisting of a collection of cheminformatics and machine-learning software. The package allows us to compute the molecular descriptors for the compounds in the dataset.

The canonical smiles notation which contains the information about the chemical structure of the molecule has been used to compute the molecular descriptors. We calculate the Lipinski descriptors using this package and perform statistical analysis on those descriptors.

Lipinski descriptors

A set of rule-of-thumb for evaluating the drug likeness of compounds. Such drug likeness is based on the Absorption, Distribution, Metabolism and Excretion (ADME) that is also known as the pharmacokinetic profile. Lipinski analyzed all orally active FDA-approved drugs in the formulation of what is to be known as the Rule-of-Five or Lipinski's Rule[4]. (multiples of 5)

An orally active drug-like compound should not have more than one violation. The Lipinski's Rule stated the following:

  1. Molecular weight (MW) < 500 Dalton

  2. Octanol-water partition coefficient (LogP) < 5

  3. Hydrogen bond donors (NumHDonors) < 5

  4. Hydrogen bond acceptors < 10

PIC50

To allow IC50 data to be more uniformly distributed, we will convert IC50 to the negative logarithmic scale which is essentially -log10(IC50) called PIC50.

This custom function PIC50() will accept a DataFrame as input and will:

  1. Take the IC50 values from the standard_value column and convert it from nM to M by multiplying the value by 10.

  2. Take the molar value and apply -log10.

  3. Delete the standard_value column and create a new PIC50 column.

drylab


Data Analysis

We perform Chemical Space Analysis using the Lipsinki descriptors to understand more about the statistical properties of the dataset. The pre-processed dataset contains 4620 training entries. The distribution of each target class along with the PIC50 value are shown in the below figures respectively.

drylab


Statistical analysis | Mann-Whitney U Test

The Mann-Whitney U test [5] is a nonparametric statistical significance test for determining whether two independent samples were drawn from a population with the same distribution.

The default assumption or null hypothesis is that there is no difference between the distributions of the data samples. Rejection of this hypothesis suggests that there is likely some difference between the samples. More specifically, the test determines whether it is equally likely that any randomly selected observation from one sample will be greater or less than a sample in the other distribution. If violated, it suggests differing distributions.

  1. Fail to Reject H0: Sample distributions are equal.

  2. Reject H0: Sample distributions are not equal.

For the test to be effective, it requires at least 20 observations in each data sample.

Applying Mann-Whitney U Test on PIC50 values

drylab


Taking a look at pIC50 values, the active and inactive elements displayed statistically significant difference, which is to be expected since threshold values (IC50 < 1,000 nM = Active while IC50 > 10,000 nM = Inactive, corresponding to pIC50 > 6 = Active and pIC50 < 5 = Inactive) were used to define active and inactive elements.

Applying Mann-Whitney U Test on Molecular Weight descriptor

drylab


All of the 4 Lipinski's descriptors exhibited statistically significant differences between the active and inactive elements. We can also see the distribution of these values for both classes of bioactivity below.

drylab


Calculation of Molecular feature vector for model training

We will be calculating molecular descriptors (PaDEL descriptors) that are essentially quantitative descriptions of the compounds in the dataset. PaDEL-Descriptor is a molecular descriptor calculation software that can convert SMILES notation to molecular descriptors. PaDEL supports the calculation of 1D, 2D and 3D descriptors as well as molecular fingerprints.

We convert the canonical smiles notation into molecular feature vectors using the PaDEL [7] package. It creates a dataset with 881 PaDEL features for each molecule.

Modeling


We will be using various Classical Deep learning techniques along with Quantum Computation to formulate the model required to produce the desired prediction. The models which are being used are described in the section below. The introduction about the deep learning techniques and quantum computing principles is added in the models page HERE!

Deep Neural network model

We will be mainly exploring two deep neural network (DNN) models as shown below.

drylab


The first model is shown in the above figure. It is a basic DNN model which has 3 layers which acts as a feature extractor. The first layers may take an input of 140 features directly from the molecular feature bit or it can take the grouped decimal features of the molecule. The first layer contains 128 neurons with ReLU based non linear activation function followed by the second and third layer having 256 and 4 neurons with ReLU. The final layer contains 2 neurons with Softmax activation function for identifying two classes of data.

drylab


The second model is shown in the above figure. It is a slightly deeper model compared to Model 3.This model also takes 140 features directly from the molecule or the grouped decimal versions based on the initial configuration. Additionally, the model contains 4 hidden layers for feature extraction with each layer having 128, 256, 512 and 4 neurons with ReLU based nonlinear activation function. The final layer contains 2 neurons based on the number of classes with a Softmax activation.

Hybrid Quantum Classical Machine learning model

Classical models which use deep learning techniques are proven to be good feature extractors for the provided data. These models can be trained using various optimization techniques to extract the best features from the input data for which the prediction is being generated. On the other hand, quantum computing enables us to tap into the higher dimensional Hilbert space using various gate operations. By combining both the techniques, we can gain a lot of advantages in modeling.

drylab


We will be using the same Model 3 and 4 as mentioned earlier as the feature extractors and passing the final set of 4 features to the quantum model. The resulting output from the quantum model is being passed to 2 neurons which perform the Softmax function.

The figure below shows the architecture of the quantum model [1] being used. The quantum model architecture comprises 4 sections - initial state preparation, transformation, trainable ansatz and measurement.

Initial State preparation - In this stage, the initial qubit state is being prepared. It can be in all 0 states or it can be in superposition of all the 2^n states for n-qubit circuits.

Transformation -  In this stage, the classical data is being transformed into the hilbert space. In this hybrid architecture, we use angle embedding to achieve this. The classical data from the classical model is being taken as the angle of rotation within the Hilbert space. The embedding can be done by either X, Y or Z rotation in the Hilbert space.

Trainable Ansatz - Here, we perform the training operation. The Ansatz contains a set of gates along with linear entanglement. The angles of these gates are being optimized to converge into the final solution.

Measurement - This is the final stage of the model where we convert the data from the hilbert space to the classical world.

drylab


In our project, the specifications of the Quantum model are described below.

  • Initial State preparation - The initial state is an all 0's state.

  • Transformation - The 4 features from the classical model are directly passed as rotation angles for Y based rotation.

  • Trainable Ansatz - The Ansatz contains the above structure as described in above figure.

  • Measurement - All the qubits are being measured and values are passed for post processing.

Strongly Entangled Ansatz Model.

We will also be experimenting with the Hybrid QML using a strongly entangled Ansatz. The architecture of the Ansatz [9] is shown in below figure.

drylab


The main difference here is that the rotation gates with trainable parameters perform rotation in all the 3 directions within the Hilbert space. Additionally, the ansatz contains 2 layers of CNOT entanglement - one between the pair of adjacent qubits and the other between the qubit and qubit + 2.

Pure Quantum Models

Classical modes using deep learning techniques require a lot of computational power. It needs a GPU based environment to perform the optimization and inference. We can reduce this computational complexity by using pure quantum models. Instead of having a classical feature extractor, we can directly transform the classical data into the Hilbert space and sweep the space using the trainable ansatz using optimization. This quantum model architecture also comprises 4 sections - initial state preparation, transformation, trainable ansatz and measurement.

Amplitude Embedding with Basic Entangler Ansatz

The model has been initialized with all 0’s states. The key difference here is that it uses Amplitude based Embedding technique compared to directly mapping the features as angles. This kind of embedding gives us the best advantage in transforming large numbers of features with fewer numbers of qubits.For example - If we have 4 qubits, we have 2^4 = 16 dimensional state space in the Hilbert space. Using Amplitude embedding [10] we can directly map 16 dimensional classical data into the amplitude of the corresponding state in the Hilbert space.

A detailed mathematical representation of amplitude embedding is provided in the reference section [11]. For a 3 qubit system, to transform the classical data - [0.447, 0., 0.707, 0., 0., 0., 0.447, 0.707], the angles are being computed using the below formula [11].

drylab


The final angles after performing the computation will be

drylab


The basic structure of an amplitude embedding ([11] and [12]) circuit is shown below in figures.

drylab
drylab


Based on the calculated angles for the 3 qubits and using the generalized circuit, the amplitude embedding circuit for this approach is shown below in figure. The circuit is being designed and run on Qiskit [12] as shown below.

drylab
drylab


The measurement result is also shown in above figure. It can be noted that the measurement result needs to be squared to get the original value since the probability of obtaining the state is the square of the amplitude.

Quantum Convolutional Neural Network

Quantum Convolutional Neural networks (QCNN) behave in a similar way to Classical Convolutional neural networks (CNN). In classical (CNN) the image which is taken as input is being subjected to various convolution operations (feature extraction operations) and as the value is being propagated deep into the neural network, the size of the image reduces and finally a flattened set of features are being obtained which is passed to the classifier.

The QCNN uses the similar principle by taking the inputs which are being transformed into Hilbert space, and then extraction features using trainable unitary gates and as the circuit progresses deeper, the number of features gets reduced and thereby reducing the number of qubits with trainable unitary gates. Finally, it reduces to a single qubit from which the result can be measured. The architecture of generic QCNN circuits [12] [13] are shown below in figures.

drylab

drylab


The unitary circuit [12] in the convolution layer (denoted in U1) is shown below in figure

drylab


The above circuit is the reduced version optimal circuit for generating any general two qubit states which can sweep through a limited subspace of the Hilbert Space. In general, the optimal circuit for generating any general 2 qubit state is shown in figure, and is denoted by SU(4) as described in [14].

drylab


This SU(4) circuit [14] uses 15 elementary one qubit gates and 3 CNOT gates to generate any possible state within the 2 qubit state space. The reference paper [15] also mentions several other possible circuits and their performance.

The purpose of the pooling layer is to reduce the dimensions of the quantum circuit by reducing the number of qubits, while retaining as much information as possible from previously learned data. To ‘artificially’ [12] reduce the number of qubits in our circuit, we first begin by creating pairs of the N qubits in our system.

After initially pairing all the qubits, we apply our generalized 2 qubit unitary to each pair, as described previously. After applying this two qubit unitary, we then ignore one qubit from each pair of qubits for the remainder of the neural network.

This layer therefore has the overall effect of ‘combining’ the information of the two qubits into one qubit by first applying the unitary circuit, encoding information from one qubit into another, before disregarding one of qubits for the remainder of the circuit and not performing any operations or measurements on it

We note that one could also apply a dynamic circuit to reduce the dimensionality in the pooling layers. This would involve performing measurements on certain qubits in the circuit and having an intermediate classical feedback loop in our pooling layers. By applying these measurements, one would also be reducing the dimensionality of the circuit. The architecture of the pooling circuit used is shown below in the figure.

drylab


The idea here is that, we traverse the entire subspace in the Hilbert space for 2 qubit systems and pass the information to the next layer where the 2 qubit system will be downsized to 1 by the pooling layer and passed to the next layer in QCNN.

The QCNN model also requires the input embedding circuit to be a Z feature map for better performance. These are higher order state encoding schemes. The feature maps currently available in Qiskit [12]. These are facilitated by applying unitary operations on the initial state. The Z feature map with a depth of 2 is shown below in figure.

drylab


It has also been shown through Dynamic Lie algebra [16] in reference paper [17], this Quantum CNN circuit is not susceptible to the barren plateau problem even when it has been scaled for a large number of qubits.

In our approach, we use this Quantum CNN circuit along with two types of loss function - Cross Entropy (log loss) and Hinge Loss. The main difference between these 2 losses is the way in which we label the output since both use the same mathematical expression. In log loss, the outputs are labeled as 0 or 1, while in Hinge loss the outputs are labeled as -1 and 1 and trained. The features of both these loss functions [18] are shown below.

drylab


Additionally, the model has been trained using Gradient Free optimization techniques [11] - using the Quasi-Newton methods which approximate the Hessian. The reason being that gradient based optimization takes a very long time for these circuits as they are scaled, thereby requiring more computation.

Data reuploading Classifier

It has been shown in reference [20] that a single qubit provides sufficient computational capabilities to construct a universal quantum classifier when assisted with a classical subroutine. This fact may be surprising since a single qubit only offers a simple superposition of two states and single-qubit gates only make a rotation in the Bloch sphere. The key ingredient to circumvent these limitations is to allow for multiple data re-uploading.

The quantum circuit can be organized as a series of data reuploading and single qubit trainable layers. This can also be extended to multiple qubits with entanglement. The data reuploading classifier has been compared with the neural network in the below figure.

drylab


In our approach, we use the below single qubit [9] and two qubit data [20] reuploading classifiers as shown in figure.

drylab
drylab


In the single qubit version, we downsize the features to 2 and reupload to the unitary gates which takes 3 rotation angles. In the two qubit version, we use 10 features which can be reuploaded to the 4 layers of the unitary gates which take 3 rotation angles.

This model has also been trained using Gradient Free optimization techniques [19] - using the Quasi-Newton methods which approximate the Hessian. The reason being that gradient based optimization takes a very long time for these circuits due to the depth and parameter number, thereby requiring more computation.

Quantum Approximate Optimization Algorithm (QAOA) Embedding

Sometime, instead of directly transforming the input feature directly to Hilbert space, we can apply some function to the input features such that it becomes distinct based on the class making it easier to perform the classification. A variational embedding ansatz [9] based on QAOA is shown below in figure.

drylab


Results


The dataset contains the molecular features (PaDEL descriptors) extracted from the ChEMBL dataset targeting the bioactivity for Acetylcholinesterase (AChe) inhibitors. The classes are active and inactive towards AChe inhibition. We use this dataset to train the model which can identify new and unknown molecular compound’s bioactivity towards AChe inhibition. The model training and the evaluation code is being developed using Python along with the Pytorch and Pennylane packages. The model has also been coded using the Qulacs [21] package. It’s been developed on a Google Colaboratory based on Jupyter notebook and has been run on Google Cloud Instance with T4 GPU and CUDA enabled. The hyperparameters for Hybrid QML model are mentioned below.

  • Batch Size - 32

  • Learning Rate - 0.001

  • Epochs - 30

  • Loss (Cost) Function - Weighted Cross Entropy

  • Optimizer - Adam

The hyperparameters for pure Quantum Machine learning models are mentioned in their relevant section.

The results on training the model are showcased with the help of 2 sections.

  1. Hybrid QML - In this section, we showcase the results of using classical deep learning models and hybrid quantum machine learning models.

  2. PureQML - In this section, we showcase the results of using a pure Quantum Computing model and mention the challenges of using the pure QML model.

Section 1 - Hybrid QML model results

The results of training the model have been summarized in the below table. It can be seen that hybrid quantum classical models have better accuracy than their corresponding classical model. This shows the advantage of using quantum computing along with classical deep learning techniques.

Model 3 based hybrid models Accuracy (%)
Classical DNN
140 feature input
Model 3
Epochs -15
76.00%

Hybrid QML
Model 3 - 140 input binary features
Angle Embedding
Basic Entangler (with final to first CNOT)
Qubits - 4
Depth - 2
Weighted Cross Entopy loss
Epochs - 18

74.89%

Hybrid QML
Model 3 - 140 input binary features
Angle Embedding
Basic Entangler (with final to first CNOT)
Qubits - 4
Depth - 4
Weighted Cross Entopy loss
Epochs - 15

77.70%

Hybrid QML
Model 3 - 140 input binary features
Angle Embedding
Strong Entangler
Qubits - 4
Depth - 2
Weighted Cross Entopy loss
Epochs - 25

76.08%

It can be seen from the above table that, classical model has been outperformed by the hybrid quantum model by converging better and faster within 15 epochs. Similar results are being obtained in using Model 4 - a slightly deeper model as shown below.

Model 4 based hybrid models Accuracy (%)
Classical DNN
140 feature input
Model 4
Epochs -66
78.00%

Hybrid QML
Model 4 - 140 input binary features
Angle Embedding
Basic Entangler (with final to first CNOT)
Qubits - 4
Depth - 2
Weighted Cross Entopy loss
Epochs - 13

78.13%

Hybrid QML
Model 4 - 140 input binary features
Angle Embedding
Basic Entangler (with final to first CNOT)
Qubits - 4
Depth - 4
Weighted Cross Entopy loss
Epochs - 2 (oscillated between 40.8% and 59.19%)
Oscillated between all 0 or all 1

59.20%

Hybrid QML
Model 4 - 140 input binary features
Angle Embedding
Strong Entangler
Qubits - 4
Depth - 2
Weighted Cross Entopy loss
Epochs - 26

76.19%

In this case, we can see that the Hybrid QML model converges faster in 13 epochs gaining the same accuracy as the classical model which converges at 66 epochs. This proves that addition of quantum models aids classical models to train better and converge faster.

It can also be shown from both the table that, using strongly embedding layers reduces the ansatz repetition depth (since it has 3 times the parameters compared to a basic ansatz) by having the same number of parameters with a higher depth circuit without compromising the final accuracy.

Section 2 - Pure QML model results

The results of training the pure QML model have been summarized in the below table. Though the pure QML model provided advantages of using lesser GPU computation and gaining the advantages of dimensionality of the Hilbert space, it suffers some problems (especially in the NISQ era) in trainability.

drylab
drylab
drylab
drylab
drylab


drylab


It can be seen from the above figure, the trainability issue of pure QML. The accuracy of the pure QML model went to a maximum of around 60% while the classical was around 75%. One of the major causes is the Barren plateau. It is clear that models 4 and 10 attained the maximum accuracy of around 60% but unfortunately it wasn’t able to train further since the depth of amplitude embedding grows with qubits and in addition to that we need trainable ansatz (since we need more trainable parameters).

On the other hand, models 5, 6, 7 and 11 don't use gradients and are proven to be scalable without the Barren plateau issue yet achieve around 57% accuracy. The models 12 and 13 resemble a neural network in quantum computing and are proven to train better yet achieve 55% accuracy. One possible reason for this is the type of embedding which is being used to reduce the dimensions of the molecular features from 140 to the number of qubits. We prove this, using the below inference graphs from classical models - Model 3 and 4.

Using model 3 - inference

drylab
drylab


From the above 2 graphs, we can infer that as the molecular features that are grouped together to represent a feature in decimal increases (say groups of around 20 features into 1 decimal which has max value 2^20) the accuracy of the classical DNN model decreases and the number of epochs required to converge increase. A similar trend can also be observed in Model 4 inference graphs.

Using model 4 - inference

drylab
drylab


From the above 2 graphs, we can also infer that as the molecular features that are grouped together to represent a feature in decimal increases (say groups of around 20 features into 1 decimal which has max value 2^20) the accuracy of the classical DNN model decreases and the number of epochs required to converge increase.

Pure QML Model Inference

Thus we can see that, as we reduce the number of decimal input features by increasing the combination of the molecular features bits together, there is a huge loss of information required by the classical DNN causing the increase in training time and accuracy. As the NISQ era quantum devices are limited by number of qubits and the circuit depth, we cannot use the 140 molecular features directly thereby requiring this embedding (which causes information loss). Thus, the trainability of the pure Quantum Model models are limited and causing a bottleneck in QML trainability.

Finally, we have shown below the test set Confusion matrix of the best model below. Note that the Vertical axis denotes the true label and horizontal axis denotes the predicted label.

Final Best Model - Confusion Matrices

Hybrid QML
Model 4 - 140 input binary features
Angle Embedding -
Basic Entangler (with final to first CNOT)
Qubits - 4
Depth - 2
Weighted Cross Entropy loss
Epochs - 13
Accuracy - 78.13%

drylab



References


  1. McGleenon BM, Dynan KB, Passmore AP. Acetylcholinesterase inhibitors in Alzheimer's disease. Br J Clin Pharmacol. 1999 Oct;48(4):471-80. doi: 10.1046/j.1365-2125.1999.00026.x. PMID: 10583015; PMCID: PMC2014378.

  2. Algorithms in chemoinformatics canonical representations and ... (n.d.-c). https://bigchem.eu/sites/default/files/Martin_Vogt_algorithms_in_cheminformatics_150519.pdf

  3. Rdkit. (n.d.). Rdkit/rdkit: The official sources for the RDKit Library. GitHub. https://github.com/rdkit/rdkit.git

  4. Pollastri, M.P. (2010), Overview on the Rule of Five. Current Protocols in Pharmacology, 49: 9.12.1-9.12.8. https://doi.org/10.1002/0471141755.ph0912s49.

  5. Scientist, D. D. J. D., & Engineer, K. B. S. (2023, September 8). Machine learning mastery. MachineLearningMastery.com. https://machinelearningmastery.com/

  6. Dataprofessor. (n.d.). Dataprofessor/bioinformatics_freecodecamp. GitHub. https://github.com/dataprofessor/bioinformatics_freecodecamp.git

  7. Dataprofessor. (n.d.-a). Dataprofessor/bioinformatics: Bioinformatics stuff. GitHub. https://github.com/dataprofessor/bioinformatics.git

  8. Thatbiochemistryguy. (2021, June 23). Tutorial to smiles and canonical smiles explained with examples. Medium. https://luis-vollmers.medium.com/tutorial-to-smiles-and-canonical-smiles-explained-with-examples-fbc8a46ca29f

  9. Pennylane. PennyLane. (n.d.). https://pennylane.ai/

  10. Mottonen, M., Vartiainen, J. J., Bergholm, V., & Salomaa, M. M. (2005). Transformation of quantum states using uniformly controlled rotations. Quantum Information and Computation, 5(6), 467–473. https://doi.org/10.26421/qic5.6-5.

  11. Schuld, M., & Petruccione, F. (2019). Supervised learning with Quantum Computers. Springer.

  12. Qiskit. (n.d.). https://qiskit.org/

  13. Cong, I., Choi, S. & Lukin, M.D. Quantum convolutional neural networks. Nat. Phys. 15, 1273–1278 (2019). https://doi.org/10.1038/s41567-019-0648-8

  14. Optimal quantum circuits for general two-qubit gates - arxiv.org. (n.d.). https://arxiv.org/pdf/quant-ph/0308006.pdf

  15. ArXiv:2108.00661v2 [quant-ph] 11 feb 2022. (n.d.-a). https://arxiv.org/pdf/2108.00661.pdf

  16. Goh, M. L., Larocca, M., Cincio, L., Cerezo, M., & Sauvage, F. (2023, August 2). Lie-algebraic classical simulations for variational quantum computing. arXiv.org. https://arxiv.org/abs/2308.01432

  17. Absence of barren plateaus in quantum convolutional neural ... - arxiv.org. (n.d.-a). https://arxiv.org/pdf/2011.02966.pdf

  18. Karimi, W. by: A. (2023, March 16). Differences between hinge loss and logistic loss. Baeldung on Computer Science. https://www.baeldung.com/cs/hinge-loss-vs-logistic-loss

  19. (1963, October 1). How does the L-BFGS work?. Cross Validated. https://stats.stackexchange.com/questions/284712/how-does-the-l-bfgs-work

  20. Pérez-Salinas, A., Cervera-Lierta, A., Gil-Fuster, E., & Latorre, J. I. (2020, June 4). Data re-uploading for a universal quantum classifier. arXiv.org. https://arxiv.org/abs/1907.02085

  21. Qulacs. (n.d.). Qulacs/qulacs: Variational Quantum Circuit Simulator for Quantum Computation Research. GitHub. https://github.com/qulacs/qulacs