Hypothesis 1.1 - Early detection of Dementia from structural MRI scan

Organization


Dementia is a broad umbrella term for a vast majority of neuro cognitive disorders including Parkinson's disease, Huntington's disease, and Alzheimer’s disease. It has been estimated that using the Reisberg Scale [1], the average expected life time for someone depends on the stage of dementia they are at. Based on the results from [2], expected life expectancy is more than 10 years for very mildly demented and 10 years for mildly demented patients. The scale goes down vastly from moderately demented (3 to 8 years), moderately severe demented (1.5 to 6.5 years), severe demented (less than 4 years) and very severely demented (less than 2.5 years). So, an early detection of dementia will be very beneficial for slowing down the progression of the disease to severe stages where any treatment becomes impossible. We will be primarily focussing on early detection of the mildly demented stage, so that effective treatment can be started to slow down the progression.

The rest of the page is organzied as below

  1. Description

  2. Modeling

  3. Results

Note: The background information of Deep learning techniques and Quantum Computing principles are added HERE!


Description


Early detection of dementia

We use the structural MRI scan images of the brain provided by Kaggle's open source dataset and train a computation model to predict whether the patient is demented or not. Then, the model categorizes the MRI into a very mildly demented, mildly demented, or moderately demented stage based on the severity.

Data Sources

The project utilizes Kaggle data sources for training the computation model. The open source dataset from Kaggle [2] contains hand collected images from various websites with each label verified. It contains separate train and test datasets with a total of 6400 image samples each of which is categorized as mildly demented, very mildly demented, moderately demented and non-demented. The dataset from Kaggle is the pre-processed version of structural MRI into a 2D Image form with corresponding labels denoting these 4 classes.

The dataset has been divided into train and test sets in the ratio of 80% to 20% respectively. The train set contains MRI scans being segregated into 4 classes - mild demented, moderate demented, non demented and very mild demented with each class having 717, 52, 2560 and 1792 images respectively. The test set also contains 179, 12, 640 and 448 images for these 4 classes respectively.


Modeling


We will be using various Classical Deep learning techniques along with Quantum Computation to formulate the model required to produce the desired prediction. The models which are being used are described in the section below. The introduction about the deep learning techniques and quantum computing principles is added in the models page HERE!.

Deep Residual learning - Resnet34

These kinds of classical models use Deep learning techniques to train and perform inference on the dataset. The main advantage of using Deep Residual learning [3] comes from its architecture. Previously, as the depth of the neural network increases to obtain more trainable parameters, the gradient propagation from the final layer to the initial layer either vanishes causing a vanishing gradient problem or explodes causing an exploding gradient problem. To overcome this issue, Deep Residual networks come into play. Deep Residual networks contain various skip connections between the neural network layers. The skip connections in the architecture are denoted by the below figure.

drylab


During forward propagation, it moves through all the layers without skipping and produces the prediction. But during backpropagation, the gradients flow through the skip connections and thereby avoiding vanishing and exploding gradient problems while training.

Hybrid Quantum Classical Machine learning model

Classical models which use deep learning techniques are proven to be good feature extractors for the provided data. These models can be trained using various optimization techniques to extract the best features from the input data for which the prediction is being generated. On the other hand, quantum computing enables us to tap into the higher dimensional Hilbert space using various gate operations. By combining both the techniques, we can gain a lot of advantages in modeling.

The figure below shows the architecture of the hybrid quantum classical model. The resnet34 classical model acts as a feature extractor for the dataset. The final 512 neurons from this resnet34 is being downsized to the number of qubits being used in the quantum model. The measurement output from the quantum model is again passed to linear neurons based on the number of output classes.

drylab


The figure below shows the architecture of the quantum model [4] being used. The quantum model architecture comprises 4 sections - initial state preparation, transformation, trainable ansatz and measurement.

Initial State preparation - In this stage, the initial qubit state is being prepared. It can be in all 0 states or it can be in superposition of all the 2^n states for n-qubit circuits.

Transformation -  In this stage, the classical data is being transformed into the hilbert space. In this hybrid architecture, we use angle embedding to achieve this. The classical data from the resnet34 is being taken as the angle of rotation within the Hilbert space. The embedding can be done by either X, Y or Z rotation in the Hilbert space.

Trainable Ansatz - Here, we perform the training operation. The Ansatz contains a set of gates along with linear entanglement. The angles of these gates are being optimized to converge into the final solution.

Measurement - This is the final stage of the model where we convert the data from the hilbert space to the classical world.

drylab


The final architecture contains the Resnet34 feature extractor having the final 512 features. It is then downsized to 5 neurons. The value from these 5 neurons are being transformed into Hilbert space by Angle Embedding.

The circuit at first lies in a uniform superposition of all states. It is then followed by Angle Embedding in the Y axis. A trainable ansatz with Y based trainable rotation followed by CNOT between each pair of qubits (excluding last and first connection) is added next. Finally all the quits values are measured and exponentiated. These values are then passed to the output neurons with the Softmax function.

Pure Quantum Models

Classical modes using deep learning techniques require a lot of computational power. It needs a GPU based environment to perform the optimization and inference. We can reduce this computational complexity by using pure quantum models. Instead of having a classical feature extractor, we can directly transform the classical data into the Hilbert space and sweep the space using the trainable ansatz using optimization. This quantum model architecture also comprises 4 sections - initial state preparation, transformation, trainable ansatz and measurement. The key difference here lies in the transformation step.

In the case of a quantum computer with a large number of qubits, we can directly use the angle embedding technique to map individual elements of the classical data into the Hilbert space. On the other hand, we can use the large dimensionality of the Hilbert space to provide efficient embedding - Amplitude Embedding.

Amplitude Embedding

In this transformation, the classical data is being directly mapped into the amplitude of the elements of the state space. For example - If we have 4 qubits, we have 2^4 = 16 dimensional state space in the Hilbert space. Using Amplitude embedding [5] we can directly map 16 dimensional classical data into the amplitude of the corresponding state in the Hilbert space.

A detailed mathematical representation of amplitude embedding is provided in the reference section [6]. For a 2 qubit system, to transform the classical data - [0.5389, 0.7950, 0.2782, 0.0], the angles are being computed using the below formula [6].

drylab
drylab


These angles are being used in the below circuit depicted in below figure to produce the necessary transformation.

drylab


The final results of the state vector are shown below in figure. The simulation is being performed using IBM Qiskit simulator [7].

drylab


These pure quantum models do not need GPU based computation for training or inference. The forward pass is being done on a quantum computer and only the optimization algorithm is being performed in the classical computer. Note that in the pure QML model used in our experimentation, we have used classical neurons to apply the Softmax operation after measuring from the quantum circuit.

Barren Plateaus

A detailed explanation of Barren plateaus is mentioned in the models page HERE!. The idea here is that with a large number of qubits, we can gain a lot of advantages in quantum computing that are practically impossible in classical computers. A 100 qubit quantum computer can work on 2^100 dimensional feature space and can optimize in that regime which is classically impossible to do. But currently in the NISQ era having practically realizing a large number of qubits in a real device is very difficult due to architectural constraints and high susceptibility to error due to noise. Theoretically, even with higher number of qubits the optimization algorithms suffer from the Barren plateau problem causing flattening of cost function plot with respect to parameters thereby making it impossible to converge.

This barren plateau problem is more prevalent with devices using large numbers of qubits having more depth of the circuit. The key thing here is that we can have a smaller number of qubits with more depth having the same number of trainable parameters compared to circuits with a large number of qubits with lesser depth. Thus we need to find a middle way between these to gain advantages from using quantum computing.

There are various advantages of having a large number of qubits compared to increasing the depth of the quantum circuit.

  1. Angle Embedding - Using Angle embedded instead of amplitude embedding allows us to have a circuit with lower depth. In case of an amplitude embedding, even though it can map a huge dimensional data (2^n features for n qubits), it exponentially increases the depth of the circuit with increase in qubit count. Therefore amplitude embedding has a huge cost in terms of depth of the circuit in NISQ devices. The higher depth quantum circuits are more prone to errors as minor error in each gate adds up to totally dominating and damaging the results of the circuit. Also higher depth reduces T1 time of quantum devices causing state decay to ground state. The other problem with amplitude embedding is that it needs all the qubits of the circuit to be fully connected with each other. In physical devices it is not always possible to have that causing addition of SWAP gates thereby further increasing the depth of the quantum circuit. Thus, angle embedding solves these problems by having a lower depth quantum circuit but requiring a large number of qubits.

  2. Barren Plateaus -

    Barren plateaus [8] are large regions of the cost function’s parameter space where the variance of the gradient is almost 0; or, put another way, the cost function landscape is flat. This means that a variational circuit initialized in one of these areas will be untrainable using any gradient-based algorithm. This also happens when the ansatz architecture [9] involves - global measurements, Deep circuits with more expressibility, noise and entanglement. In paper [10] it is mentioned that global cost function also causes the barren plateau issue in trainability. In summary, when using a smaller number of qubits, this problem occurs with increase in depth (needed to have more trainable parameters).On the other hand, when using large number of qubits, we can equivalently have similar number of parameters of an ansatz using less qubit and more depth, without increasing the overall depth of the circuit.

Results


The dataset contains the pre-processed structural MRI scans of various patients being labeled based on their class. We use this dataset to train a model which can early detect the stage of dementia for newer patients. The model training and the evaluation code is being developed using Python along with the Pytorch and Pennylane packages. It’s been developed on a Google Colaboratory based on Jupyter notebook and has been run on Google Cloud Instance with T4 GPU and CUDA enabled. The hyperparameters used are mentioned below

  • Batch Size - 8

  • Learning Rate - 1E-5

  • Epochs - 25

  • Loss (Cost) Function - Cross Entropy

  • Optimizer - Adam

The project is being split into 2 sections

  1. Early detection of Dementia - The MRI scan images will be classified as demented or non demeted.

  2. Early detection of stage of Dementia - The MRI scan images are classified to find out the stage of dementia as - very mildly demeted, mildly demented, moderately demented or non demented.

The results of training the model have been summarized in the below table. It can be seen that hybrid quantum classical models have better accuracy than their corresponding classical model. This shows the advantage of using quantum computing along with classical deep learning techniques.

Result 1 - Early detection of Dementia

Model Accuracy (%)
Resnet 34 (Classical) 96.95%

Resnet34 + QML
Qubits - 4
Depth - 1
With learning rate scheduler

97.5%

Resnet34 + QML
Qubits - 4
Depth - 1

97.57%

Resnet34 + QML
Qubits - 4
Depth - 2

98.35%

Resnet34 + QML
Qubits - 4
Depth - 4

97.42%

Note - The learning rate scheduler lowers the learning rate as we proceed with the epochs. This helps to avoid unwanted fluctuations or overfitting in the final stages of training the model.


Result 2- Early detection of stage of Dementia

Model Accuracy (%)
Resnet 34 (Classical) 95.46%

Resnet34 + QML
Qubits - 4
Depth - 1

96.09%

Resnet34 + QML
Qubits - 4
Depth - 2

95.70%

Resnet34 + QML
Qubits - 4
Depth - 4

97.57%

Resnet34 + QML
Qubits - 8
Depth - 1

96.01%

Resnet34 + QML
Qubits - 8
Depth - 2

96.95%

Resnet34 + QML
Qubits - 8
Depth - 4

96.48%

Resnet34 + QML
Qubits - 16
Depth - 1

97.5%

drylab


We have also plotted the converged plot of various hybrid Quantum models for multi class classification and shown in above figure. There may be two ways to increase QML accuracy: (1) increase the parameterized quantum circuit depth (ansatz repetition depth) which, in turn, increases the number of tunable parameters; (2) increase the number of qubits, hence, increase the state space. Our hypothesis is that the latter is superior, i.e., huge state space is more effective at increasing QML accuracy than increasing tunable quantum circuit parameters due to the Barren plateaus. It can also be seen in the above figure , that the Hybrid QML model with 16 qubits and depth of 1 converges faster compared to the rest of the models.

Additionally, we have gathered some results with Resnet18 ( which is a slightly smaller model compared to Resnet34) and using a Pure Quantum Model. It has been summarized below.

Model Accuracy (%)

Resnet18 + QML
Qubits - 4
Depth - 1
With learning rate scheduler
2 classes

91.71%

Resnet18 + QML
Qubits - 4
Depth - 1
With learning rate scheduler
4 classes

88.75%

Pure QML
Amplitude Embedding (64x64 pixels)
Qubits - 12
Depth - 4
2 classes

63.35%

Resnet34 + QML
Qubits - 4
Depth - 4

97.42%

Confusion matrix 1 - Early detection of Dementia

Resnet34 + QML Qubits - 4 Depth - 2 -   98.35%

drylab


Confusion matrix 2 - Early detection of stage of Dementia

Resnet34 + QML Qubits - 4 Depth - 4 -   97.57%

drylab


References


  1. Life after a dementia diagnosis and dementia life expectancy. Age Space. (n.d.). https://www.agespace.org/dementia/life-expectancy.

  2. Dubey, S. (2019, December 26). Alzheimer’s dataset ( 4 class of images). Kaggle. https://www.kaggle.com/datasets/tourist55/alzheimers-dataset-4-class-of-images.

  3. He, K., Zhang, X., Ren, S., & Sun , J. (2015). Deep Residual Learning for Image Recognition. arXiv. https://doi.org/https://doi.org/10.48550/arXiv.1512.03385

  4. Pennylane. PennyLane. (n.d.). https://pennylane.ai/

  5. Mottonen, M., Vartiainen, J. J., Bergholm, V., & Salomaa, M. M. (2005). Transformation of quantum states using uniformly controlled rotations. Quantum Information and Computation, 5(6), 467–473. https://doi.org/10.26421/qic5.6-5.

  6. Schuld, M., & Petruccione, F. (2019). Supervised learning with Quantum Computers. Springer.

  7. Qiskit. (n.d.). https://qiskit.org

  8. McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, R., & Neven, H. (2018a). Barren Plateaus in quantum neural network training landscapes. Nature Communications, 9(1). https://doi.org/10.1038/s41467-018-07090-4

  9. YouTube. (2022). QHack 2022: Marco Cerezo —Barren plateaus and overparametrization in quantum neural networks. YouTube. Retrieved October 6, 2023, from https://www.youtube.com/watch?v=rErONNdHbjg.

  10. Cerezo, M., Sone, A., Volkoff, T., Cincio, L., & Coles, P. J. (2021). Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nature Communications, 12(1). https://doi.org/10.1038/s41467-021-21728-w