Software

Document

Software Questions	Path to section that answers question
How well is the software compatible with and leveraging existing synthetic biology standards? (e.g. SBOL, other RFCs, data formats, etc.)	Software Contributions
Was this software validated by experimental work?	Mapping/ Surface Plot Software Interpolating values using grid data
Is the software useful to other projects?	Software Contributions
How well can the software be integrated with external tools/software applications? (APIs, packages, etc.)	Check our Recognition Site Modifer software here Software Demonstration
Is the software user-friendly?	Check our Recognition Site Modifer software here Software Demonstration
Is the software well written/documented for future contributions? (Code documentation, comments in the code, performance evaluation, architecture diagrams, installation and execution scripts, etc.)	Software Contributions

INTRODUCTION

Introducing our groundbreaking Lithium Exploration Software Suite! This powerful suite seamlessly integrates two critical components: Biosensor Color Detection and Lithium Deposit Mapping. Each of these modules serves a distinct purpose in advancing our mission of sustainable lithium exploration.

Preamble

Initially, our vision was grand: to create software capable of capturing biosensor images, counting sodium alginate hydrogels, and identifying color changes to map lithium deposits. However, we faced data limitations that prevented the realization of this grand vision.

HOW THE INTENDED SOFTWARE WILL WORK COLOR DETECTION MAPPING/SURFACE PLOT SURFACE GRID FOR INTERPOLATION INTERPOLATING VALUES USING GRID DATA

The software is supposed to accept an image as input.

Figure 1: Sample image of a sodium alginate hydrogels

The software will then count the number of hydrogels in the image and check for a color change in each hydrogel. For example, the image above has ~95 hydrogels. Our biosensor has bioindicators meant to react in the presence of lithium and arsenic (a lithium pathfinder). The colour change is a visual indicator of the presence of lithium or arsenic. If the hydrogel has turned red, it means that lithium is present. If the hydrogel has turned yellow, it means that arsenic is present. If the hydrogel has not changed color, neither lithium nor arsenic is present.
If we have red and yellow hydrogels, there is a high chance of lithium being present. If we only have red hydrogels, there is a medium chance of lithium being present. If we have only yellow hydrogels, there is a low chance of lithium being present. If we have no red or yellow hydrogels, lithium cannot be present.
The probability of lithium being present is calculated using the following formula:

Where;
R is the number of red hydrogels Y is the number of yellow hydrogels N is the total number of hydrogels. P is probability

A probability coefficient of 0.5 is assigned for both Lithium and Arsenic. This entails that when both lithium and arsenic are detected there will be a higher chance of finding lithium.

(R + Y) represents the total count of hydrogels that have changed color. These are the hydrogels that indicate the presence of either lithium (R) or arsenic (Y).

N is the total number of hydrogels in the image, which provides a baseline for calculating the probability.

Probability-Based Insights: This lithium presence probability is then used by our mapping software then as input to create detailed lithium deposit maps.

Below we present a proof-of-concept Color Detection Software to identify color changes in a single sodium alginate hydrogel. This software aims to showcase its capabilities and lay the foundation for our plans:

THE COLOUR DETECTION SOFTWARE

The Color Detection Software is a tool developed for our project aimed at detecting the color of bioindicators based on the RGB value of an image. The software utilizes two machine learning models: Decision Trees and K-Nearest Neighbors (KNN). This documentation provides an overview of the software, its functionality, and the process of developing and improving it.

Figure 2: Training Dataset

1.0 GETTING STARTED
1.1 Prerequisites

1.2 Installation

Install the required libraries using pip:

1.3 Use the model

To use the model, load the `color_detection_model.pkl` file using the Pickle library: Then, use the `predict` function to predict the color of an image:

Or read the RGB values using the `cv2` (python-opencv) library

The model returns a numerical value representing the color category. The color categories are as follows:

0: Grey

1: Black

2: White

3: Orange

4: Brown

5: Blue

6: Yellow

7: Green

8: Violet

9: Red

2.0 DATA COLLECTION

2.1 Folder Structure

The software expects a structured dataset in a folder called 'training_dataset.' Inside this folder, each subfolder represents a different color category, containing images of that color.

2.2 Image Selection

The number of images under each color category should be roughly the same for higher training accuracy.

3.0 Data Preprocessing

3.1 Extracting RGB Values

The software extracts the RGB values from each image using the cv2 library. The RGB values are stored in a Pandas DataFrame.

3.2 Data Labeling

The color categories are labeled numerically using Pandas `factorize` function.

4.0 Machine Learning Model

4.1 Decision Tree Classifier

A Decision Tree Classifier is trained on the dataset with different `max_depth` values to determine the optimal tree depth. The accuracy of the model is evaluated using a test dataset.

4.2 K-Nearest Neighbors Classifier

A K-Nearest Neighbors (KNN) Classifier is trained on the dataset with varying numbers of neighbors. The 'distance' metric and 'euclidean' distance measure are used. The accuracy of the KNN model is also evaluated.

5.0 Model Comparison

5.1 Performance Metrics

Decision Tree and KNN models are evaluated using the accuracy metric, which measures the proportion of correctly classified samples.

5.2 Accuracy Visualization

The software generates accuracy vs. parameter plots to visualize the performance of both models and help determine the best model for the task.

Figure 3: Decision Tree Accuracy Plot

Figure 4: KNN Accuracy Plot

6.0 Model Deployment

The final KNN model with the highest accuracy is saved using the Pickle library as 'color_detection_model.pkl.'

7.0 Conclusion

After exploring two machine learning models, Decision Trees and K-Nearest Neighbors, the KNN model achieved the highest accuracy (92%) and was chosen for deployment. Future improvements could include fine-tuning the model and expanding the dataset for more accurate detection. This software is a valuable tool for color-based bioindicator detection and can be further enhanced to support a wider range of applications.

MAPPING/ SURFACE PLOT SOFTWARE

Our Lithium Deposit Mapping Software is the second critical piece of this suite. It visualizes and analyzes geological data to provide insights into lithium deposits. The software generates a surface plot to show lithium deposits. It aids in visualizing and analyzing geological data related to lithium deposits possibilities in the area that was mapped. This is how the software works:

1. Sample Geological Data Generation:

num_points defines the number of geological data points to generate.
x, y, and z are arrays of latitude and longitude coordinates within specified mapped area. The latitude and longitude entails the location of the biosensors which is extracted from the barcode of each biosensor to represent the location of lithium samples. For illustration of how the software works random values were used.
values represents lithology values; a probability between 0 and 1 distinguishes between different lithologies. This probability is calculated from the probability model software which make use of the color detection model. The different probability values are inteprated as follows:.

N.B Random values were used in this example for illustration purpose

2. Grid for interpolation:

meshgrid generates a grid of points in three dimensions (`xq`, `yq`, `zq`) to form a 3D mesh. This grid will be used for interpolation.

3. Interpolating Values Using griddata:

interp_values = griddata(x, y, z, values, xq, yq, zq, 'natural');

griddata performs interpolation to estimate lithology values (values) at the grid points defined by `xq`, `yq`, and `zq`. The 'natural' option specifies a natural neighbour interpolation method;

Where are the lithium deposits most probably located?

This code segment creates a 3D scatter plot to visualize the actual lithium deposits location.
The color and size of the points represent different lithology values.

3D Display of lithium deposits location

This section generates a 3D voxel plot using the interpolated values, providing a 3D visualization of the geological model.
The color bar shows the color relationship with the associated probability.
This plot is somehow similar with the one on Eyowa webisite:

Cross-sectional view of the geological model

This code generates a cross-sectional view of the geological model at a specified depth (z = 5) to visualize lithium deposits:

Comparison with Köhler et al.'s artificial neural networks model:

Our Mapping Software

Köhler et al.'s Li Geochemical anomaly map

Our Mapping Software reliably generates lithium maps that align with geological expectations as validated by Köhler et al.'s artificial neural networks model that provides precise predictions based on real-world data. As a result, our mapping software offers flexibility to assess lithium potential in any region. This adaptability is a crucial advantage over case-specific datasets. The code combines data generation, interpolation, and various visualization techniques to provide insights into the distribution of lithium deposits in a 3D geological model. The surface plot, isosurface plot, and cross-section view help mining companies and researchers to know the chances of finding lithium deposits in a particular area of interest. This pre-exploratory detection is crucial for lithium exploration.

CONCLUSION

In lithium exploration, our software shines as a dependable companion. It combines data generation, interpolation, and visualization techniques to understand lithium deposit distribution better. While synthetic data is used for illustration purposes, the software's real strength lies in its reliability, offering valuable insights for mining companies and researchers engaged in lithium exploration. It is a crucial tool for pre-exploratory detection, setting the course for sustainable energy solutions.

NOTE:

While working on the gene sequences for our project we encountered certain problems we realized could be solve with software tools. Hence, we developed tool softwares as a contribution to future iGem teams and synthetic biologists incase they face similar issues while working on their gene sequences. A link to the documentation of these softwares can be found here

LOCAL FUNCTIONS

REFERENCES

Köhler, M., Hanelli, D., Schaefer, S., Barth, A., Knobloch, A., Hielscher, P., Cardoso-Fernandes, J., et al. (2021). Lithium Potential Mapping Using Artificial Neural Networks: A Case Study from Central Portugal. Minerals, 11(10), 1046. MDPI AG. Retrieved from http://dx.doi.org/10.3390/min11101046