Welcome to the OptogenEYEsis software page!
As you know, our project revolves around the development of light-sensitive microbial opsins designed to rejuvenate the human retina's capacity to perceive and respond to light, ultimately allowing restoration of vision. To overcome the challenges of performing large amounts of experiments, our multidisciplinary team leverages the power of microfluidics to develop a sophisticated digital microfluidics hardware equipped with a myriad of features such as heating zones, cold zones, mixing and dilution capabilities. To automatize the system and allow for high throughput experiments, our software team’s task is to build algorithms and use AI capabilities to run experiments on this hardware with as little human intervention as possible, while providing an intuitive interface including vocal commands with our assistant Thalia.
Our team faces the task of finding the optimal combinations of algorithms to orchestrate the intricate dance of moving a multitude of droplets concurrently to various zones on the digital microfluidics hardware. These movements are not just simple transfers. They involve complex maneuvers, including avoiding collisions, merging, splitting, and pathfinding in a multi-agent, multi-size environment. For reactions that admit multiple possibilities, such as conducting PCR with fewer reagents, expediting processes, or optimizing for the highest quality results, we are driven by a singular goal: achieving the optimal outcome.
On this page, we delve into the software development that underpins our groundbreaking genetic therapy solution. We'll explain the challenges, innovations, and algorithms that drive us closer to an automatic biological laboratory, which combines cost efficiency, high throughput, and intuitive interfacing.
As explained on the Hardware page, our digital microfluidics hardware (Figure 1) is designed to move, merge, split, and perform physical reactions on a large number of droplets of droplets in parallel. These operations are the elementary components of the various reactions that we aim to perform: we claim that our automated hardware could potentially be used to perform close to any classic biological analysis. For the sake of design simplicity, the proof of concept we present only has a few physical reactions, namely heating and cooling droplets. Many more modules could be added to the hardware in future developments, such as mixing assisted with tiny magnets, or laser measurements of transparency at given wavelengths.
Figure 1. Our digital microfluidics hardware.
We take two simple and classic biological experiments as examples to demonstrate the capabilities of our digital microfluidics hardware: a dilution and a PCR reaction. We recall thereafter the more detailed protocol we followed for both reactions.
Dilution to the 10th:
Polymerase Chain Reaction:
PCR Program:
The previously mentioned reactions are operations that an experimenter would perform often, which is why they are especially relevant for automatization. Here is an overview of the way our software work:
The experimenter expresses his command to our vocal assistant Thalia, in natural language (that is to say, with everyday words without caring about a specific syntax). The command can be either:
Thalia answers to the request depending on what was requested, asks for more details if needed, and requires confirmation before launching the experiment.
The request is decomposed into many elementary operations to perform either sequentially (first move the droplet, then merge, then move to heat zone, then wait, etc.) or in parallel (two unrelated samples will follow their respective plans).
These elementary operations are performed on the hardware.
Note that the experimenter can also manually manipulate the droplets on the plate with an intuitive visual interface, if needed. We also develop the algorithms to make it easy and convenient to create personal protocols, in a format that is easily shareable and lightweight.
We mentioned previously that droplets are moved on the plate, but also merged together, split, and subject to module actions. The question of moving droplets can be directly related to robots movements planning in an automated warehouse [1], traffic modeling for cars [2], or even drones [3].
Droplets on the plate are considered computationally as occupying one square of a grid, which can be parametrized with coordinates. Their movement is therefore approximated as a discrete evolution from one square to another, only using 4 possible directions: up, down, left, right. No diagonal movement is allowed at the moment. As droplets should not merge if not intended, a margin distance of one square all around each droplet must be respected (unless a merging action is raised). The movement algorithms detailed in the sections to come are therefore solving the problem of moving a droplet from coordinate [x1,y1] to [x2, y2] without breaking the safety margin.
Figure 2. Schema of the virtual representation of the grid.
The merging of two droplets consists in placing both droplets on adjacent squares, which will cause them to merge by coalescence [4]. Note that because of the top glass above the electrodes below, the volume of liquid in one square is fixed. As a result, two droplets merged will occupy two (adjacent) squares on the grid.
Figure 3. Picture of merging.
The splitting process consists in putting the droplet to merge in a “snake configuration” (Figure 4) at the point where the split should occur. Then, deactivating one square will reintroduce the margin distance and separate the droplet into two.
Figure 4.Picture of splitting
Merging two droplets will not result instantaneously in a perfectly mixed liquid. WHAT TO DO TO MIX?
Figure 5. Picture of mixing
The A* algorithm [5] is a popular and widely used search algorithm in computer science, specifically designed to find the shortest path or optimal solution in a graph. The algorithm is often applied in contexts where you need to navigate from one point to another, such as in route planning [6], robotics [7], video games [8], and more. It therefore naturally applies to our droplet movement context. We use the A* algorithm to answer the following question:
The algorithm relies on the computation of the cost of paths, which are kept in memory for the period when the algorithm searches for a solution. This cost is composed of a current lower cost, and a heuristic cost which we define below. We can decompose the A* in the context of droplet movement in the following steps:
If the open set becomes empty and the goal square has not been reached, there is no path to the goal (no solution).
If the goal square is reached, reconstruct the shortest path from the start square to the goal square by following the parent from the goal square backward to the start square. The reconstructed path represents the shortest path from the start to the goal square.
A* ensures optimality of the solution, with potential obstacles on the way of the droplet, while being efficient in finding the shortest path on a grid.
The aim of our software was to act as an intermediary between the digital microfluidics hardware and the user, offering the full range of functions that might be required to carry out a typical protocol encountered in the field of biology. More specifically, we worked on functionalities enabling us to create droplets of a chosen type on our plate, to move them while respecting a certain number of constraints, to merge and mix different droplets or to send them to a heating zone for a given duration at the chosen temperature. All these functions also had to be adapted so that they could be controlled by Thalia, our voice assistant. While each of these functionalities may seem straightforward at first glance, we were faced with a number of difficulties in implementing them. In this section, we'll look at the major elements we've worked on in our software, as well as the solutions we've found to the problems we've encountered.
Figure 6. Preview of the software and its interface.
Goal: Creating different droplets at different locations, of different type with different colors by using only few clicks.
The software backend architecture should be optimized for path finding algorithms, and it starts from choosing appropriate representation for the droplets.
Goal: being able to move any droplet to any location in few clicks.
additionally, it is not only possible to move droplets taking a 1x1 space on the grid, but also 2x2 or even unusual shapes like (4, 3), (2, 1). This adds to the complexity, considering the different behaviour, collision detection, path and more. Avoiding obstacles is not as easy with classical algorithms which explains the need of CBS. But CBS is slow (not really usable with 10+ droplets in python, maybe a bit more with a C++ implementation).
Figure 7. Demo of droplet movement.
Figure 8. Merge droplets for two different types.
Goal: simulating a heat zone delimited in a specific part of the plate. It should be precisely delimited with obstacles. The heat temperature must me easily selectable by the user.
Goal: Being able to save/load procedures in an efficient way. Representing all procedures in a simple and universal way, that can be written in a .txt or .json file. It could save a huge amount of time for the user to be able to reuse different procedure that he would build only once.
Being able to talk to the AI assistant to ask him to perform specific task, make a list of running processes, answer a question.
Figure 9. Démo of Thalia, our voice activated assistant. Set the Volume of the Video to ON to listen to the voices.
Goal: Ensure that all automated processes could be stopped in one click by the user if he decides so. In particular, several security measures should be implemented to stop processes created by the AI assistant if needed, and deactivate it completely.
Nowadays, many of the tasks performed in a biology laboratory can be assisted and managed from a computer. They therefore require interaction between man and machine. Classically, we use a keyboard and mouse to interact with a computer, but during many protocols, lab assistants have to wear gloves and interact with different elements. It is therefore necessary to remove gloves, making interaction more delicate and less user-friendly. One solution would be to use a voice assistant, capable not only of interacting with software, but also of keeping track of reactions in progress and their status, of stocks remaining in the lab, or of scheduling a meeting. All this is now possible thanks to recent advances in artificial intelligence. Our ultimate goal is therefore to produce an autonomous, easy-to-use assistant that would enable people working in a laboratory to control their tasks on a computer using their voice. In particular, in the context of our project, to be able to control the plate and drip system automatically and simply by voice. A simple button would be needed to call the voice assistant, without interrupting the rest of the tasks. The main challenges are for the voice assistant to be able to respond in real time, and to have access to the various functionalities required.
In line with the rest of our project, a voice assistant based on a Large Language Model (LLM) would be capable of two main types of task. Firstly, a first type of high-level task is essential: answering a biology question, establishing a protocol (such as PCR) and modifying it as the user wishes, or reporting on the progress of a reaction in progress. In a second phase, it would be ideal for the assistant to be able to perform lower-level actions within the software. In our case, this would involve indicating which droplet movement we want to make, sending it to a certain location on a heating zone, for example, while avoiding collisions with other drops already present. We could also imagine the assistant interacting with any type of software, accessing the lab's databases and taking notes for us, but that's for later! We therefore chose to concentrate initially on an agent capable of chatting with the user in real time and performing a number of predefined high/low-level tasks. Over time, a greater number of tasks can be added, to better meet user needs.
To meet the necessary requirements, we have devised the following pipeline to form our intelligent voice assistant (Figure 10). First, the user's speech is recorded and transformed into text using Speech to text. Next, the text is interpreted using a Large Language Model such as ChatGPT, which provides a text response. Finally, using a speech algorithm, the text returned by the LLM is spoken back to the user. Using a precise prompt to introduce context into the someone Transformer open source Whisper developed by OpenAI. This model is relatively lightweight and can run on any computer relatively quickly.
Figure 10. Working pipeline to form our intelligent voice assistan.
This three-stage operating mode seems relatively straightforward, but it has a number of drawbacks:
These few points have a major impact on the user experience, making the pipeline presented above unusable as it stands. We therefore decided to solve each of these problems separately through a series of improvements.
For a relatively short response, the LLM response time is of the order of 1-2s, which is relatively acceptable, given that the other steps in our pipeline are negligible in terms of time compared with this delay. On the contrary, the very nature of LLMs, which generate a response token by token (word by word), with an almost constant delay for each token, makes long responses much slower to generate (15s and more), making the dialogue very unpleasant for the user. It is therefore necessary to optimize this response time.
We've chosen to use the stream function of ChatGPT's API (gpt-3.5-turbo), which means we don't have to wait for the complete response to be generated by the servers before sending it to the client (our machine). The response is sent in stream mode, i.e. the tokens are sent as they are generated. As soon as we've collected the first few words of the sentence, our voice assistant can start responding to the user. Subsequent words are collected in the background and spoken once the first part has been completed, in a similar way to what we're used to doing when chatting. This technical improvement reduces the LLM's response time to generate the text to around 1s, whatever the length of the response, resulting in a much smoother speech. The main disadvantage of using this method is that we don't have the complete response upstream to control it. It is therefore important to adapt our input prompt (LLM user side diagram vs. prompt + user input).
When we ask our assistant a question, he tends to present the answer in the form of a list, enumeration or even long paragraphs. In order not to limit the quality of the answers, we have chosen to allow the LLM to make long answers. However, these answers need not be read in full, which can sometimes take many minutes. We have chosen to allow the user a "short oral response" option, which modifies the way our Text to Speech algorithm reads the response. In this way, the algorithm responds more naturally (see example below). It simply answers and speaks only the leading part of the answer, leaving the user to read the text presented on the screen.
In order for our assistant to be able to launch processes, access data or take inventory of current processes, it needs to be able to execute a number of functions on its own, from a list of predefined functions. Initially, our approach was to include a sentence in the LLM prompt along the lines of "Provide a clear answer to the following question. You will be able to call up functions simply by mentioning its name. This approach proved to be very limited: the LLM couldn't call up the right function systematically, even if we gave it example answers. And so, the assistant would either try to do the task by itself or, on the contrary, respond that it didn't have the capabilities to do so, without calling any of the functions we provided.
We had a number of options: we could use a more powerful model (such as GPT4) that would better follow the instructions provided, we could fine-tune a small LLM for our answers with our list of available functions, or we could try to find a trick to solve the problem. We first looked at the first solution, which was the simplest. The responses were satisfactory, but introduced a number of additional drawbacks: the cost of GPT4 is significantly higher ($0.06 / 1K tokens for GPT4 vs. $0.002 / 1K tokens for GPT 3.5, i.e. almost 30x more expensive) and responses are slower to generate, which seriously affects our wizard's response times.
The second possible solution was to perform a model fine-tuning step, i.e. to train a pre-trained model to generate LLaMa (Meta AI) type responses on a dataset with sample responses and calls to available functions. This second method was perfectly feasible, but it also brought with it a non-negligible problem, apart from the cost of training the model and the time required to generate a suitable dataset with a few thousand examples: the need to have a server powerful enough to run the model continuously to generate the answers, which made deployment considerably more delicate and prone to bugs. The choice of relying on the OpenAI API avoided all these worries at a very low cost. So we abandoned the idea of fine-tuning our own LLM.
Our last recourse was to be more astute in the way we presented the problem to our beloved ChatGPT (GPT3.5). Indeed, while ChatGPT was limited in its ability to perform both tasks simultaneously when answering (generating an appropriate response to the user's question and calling one of the available functions if necessary), it is entirely possible to handle the two problems separately. Our final pipeline is therefore divided into two parts:
This latter approach effectively induces a double call to the API, which slightly increases our wizard's response time, but remains relatively more reasonable than using GPT4. We also have much greater control over the output and actions performed by the wizard.
During our interview with Prof. A Felner, renown expert of pathfinding algorithms and co-author of the seminal CBS paper [9], we discussed the legitimacy of the current pathfinding algorithms we use. In its current setting, the movement algorithms we presented do not achieve our long-term goal, which is to allow the plate to be used continuously, piling up experiments in a queue. In other words, instead of using an “offline” paradigm, we would like to switch to an “online” paradigm [10, 11]
Although they could be adapted to droplet movement, none of currently available algorithms take the merge and split actions into consideration. Consequently, more research needs to be done before such an algorithm is implemented.
Another possibility for evolution is to use reinforcement learning and deep learning methods, in a similar way as it is done in [12], which provides a scalable framework for droplets movement. The benefits would be to hopefully get a more efficient agent, especially on settings involving a large plate with a large number of droplets. Those benefits are yet conditioned on having a powerful enough computer to train the algorithm beforehand, which requires resources typically not found in a classic biology laboratory. For this reason, we did not push forward this axis for the moment, in order to make the simpler and resource efficient software available first.
Throughout the development of our software, we were interested in making it accessible to biologists with little to no programming training. This was done by developing our assistant Thalia, which does not require to know about programming. As discussed in the functions section, we deliberately chose a lightweight and transparent storage method, to allow people to easily share their experiments and modules.
We naturally think that this capability should be used at its fullest by building an open community where anyone can share its algorithms, in a similar way as Protocols.io. All the foreseeable benefits as well as potential negative outcomes of such a community are discussed in the Human Practice page.
For the moment, Thalia is able to understand experiments and actions that are in its database, which can include community and privately designed algorithms. The next step will be to link Thalia to the experiment creation process, during which the experimenter details the various steps of what they want to be performed. This development seems accessible with the emergence of LLMs with coding abilities like OpenAI [13], GitHub Copilot, Mistral-7B-v0.1, and would make the platform ever more inclusive to people without a high programming training.
Note that, as discussed in the Human Practice page, we never remove the responsibility from the experimenter, who remains in charge of the programs they built and run.