Software

Welcome to the OptogenEYEsis software page!
As you know, our project revolves around the development of light-sensitive microbial opsins designed to rejuvenate the human retina's capacity to perceive and respond to light, ultimately allowing restoration of vision. To overcome the challenges of performing large amounts of experiments, our multidisciplinary team leverages the power of microfluidics to develop a sophisticated digital microfluidics hardware equipped with a myriad of features such as heating zones, cold zones, mixing and dilution capabilities. To automatize the system and allow for high throughput experiments, our software team’s task is to build algorithms and use AI capabilities to run experiments on this hardware with as little human intervention as possible, while providing an intuitive interface including vocal commands with our assistant Thalia.

Our team faces the task of finding the optimal combinations of algorithms to orchestrate the intricate dance of moving a multitude of droplets concurrently to various zones on the digital microfluidics hardware. These movements are not just simple transfers. They involve complex maneuvers, including avoiding collisions, merging, splitting, and pathfinding in a multi-agent, multi-size environment. For reactions that admit multiple possibilities, such as conducting PCR with fewer reagents, expediting processes, or optimizing for the highest quality results, we are driven by a singular goal: achieving the optimal outcome.

On this page, we delve into the software development that underpins our groundbreaking genetic therapy solution. We'll explain the challenges, innovations, and algorithms that drive us closer to an automatic biological laboratory, which combines cost efficiency, high throughput, and intuitive interfacing.

Problem Setting

Recall of the hardware design and example of reactions

As explained on the Hardware page, our digital microfluidics hardware (Figure 1) is designed to move, merge, split, and perform physical reactions on a large number of droplets of droplets in parallel. These operations are the elementary components of the various reactions that we aim to perform: we claim that our automated hardware could potentially be used to perform close to any classic biological analysis. For the sake of design simplicity, the proof of concept we present only has a few physical reactions, namely heating and cooling droplets. Many more modules could be added to the hardware in future developments, such as mixing assisted with tiny magnets, or laser measurements of transparency at given wavelengths.

Figure 1. Our digital microfluidics hardware.

We take two simple and classic biological experiments as examples to demonstrate the capabilities of our digital microfluidics hardware: a dilution and a PCR reaction. We recall thereafter the more detailed protocol we followed for both reactions.

Dilution to the 10^th:

Merge the sample droplet of volume $ V_1 $ with a larger water droplet which volume is 10 times more important.
Mix the resulting droplet.
Split the resulting droplet into two droplets of volumes $ V_1 $ and $ 10 \times V_1 $, the final droplet being the former.

Polymerase Chain Reaction:

Split a droplet of PCR Master mix (polymerase, its buffer and dNTPs) into several smaller droplets for disassembly.
Merge a small droplet of Master Mix with one droplet containing a pair of primers for the PCR amplification.
Finally, merge a last droplet containing the template DNA solution (e.g. miniprep or bacterial lysate) to the master mix + primers.

PCR Program:

10 min 95°C Initial DNA template denaturation
25 cycles:

1 min - 95°C DNA template denaturation
2 min - 58°C Primers annealing
1-3 min - 72°C Elongation

10 min - 72°C Final elongation

Level of automatization and minimal human intervention

The previously mentioned reactions are operations that an experimenter would perform often, which is why they are especially relevant for automatization. Here is an overview of the way our software work:

The experimenter expresses his command to our vocal assistant Thalia, in natural language (that is to say, with everyday words without caring about a specific syntax). The command can be either:

Low-level: “move droplet A to position [4,9]”.
Medium-level: “merge droplets A and B” without indication about where to merge, “move droplet A to nearest heating zone”.
High-level: “perform a PCR reaction with sample S1, reactant R1, and primer P1; use 90°C for denaturation, and 72°C after the cycles are performed”, “For samples S1 to S5, perform a dilution to the tenth with water”.

Thalia answers to the request depending on what was requested, asks for more details if needed, and requires confirmation before launching the experiment.

The request is decomposed into many elementary operations to perform either sequentially (first move the droplet, then merge, then move to heat zone, then wait, etc.) or in parallel (two unrelated samples will follow their respective plans).

These elementary operations are performed on the hardware.

Note that the experimenter can also manually manipulate the droplets on the plate with an intuitive visual interface, if needed. We also develop the algorithms to make it easy and convenient to create personal protocols, in a format that is easily shareable and lightweight.

Formal definition of the droplets movement

We mentioned previously that droplets are moved on the plate, but also merged together, split, and subject to module actions. The question of moving droplets can be directly related to robots movements planning in an automated warehouse [1], traffic modeling for cars [2], or even drones [3].

Movement

Droplets on the plate are considered computationally as occupying one square of a grid, which can be parametrized with coordinates. Their movement is therefore approximated as a discrete evolution from one square to another, only using 4 possible directions: up, down, left, right. No diagonal movement is allowed at the moment. As droplets should not merge if not intended, a margin distance of one square all around each droplet must be respected (unless a merging action is raised). The movement algorithms detailed in the sections to come are therefore solving the problem of moving a droplet from coordinate [x1,y1] to [x2, y2] without breaking the safety margin.

Figure 2. Schema of the virtual representation of the grid.

Merging

The merging of two droplets consists in placing both droplets on adjacent squares, which will cause them to merge by coalescence [4]. Note that because of the top glass above the electrodes below, the volume of liquid in one square is fixed. As a result, two droplets merged will occupy two (adjacent) squares on the grid.

Figure 3. Picture of merging.

Splitting

The splitting process consists in putting the droplet to merge in a “snake configuration” (Figure 4) at the point where the split should occur. Then, deactivating one square will reintroduce the margin distance and separate the droplet into two.

Figure 4.Picture of splitting

Mixing

Merging two droplets will not result instantaneously in a perfectly mixed liquid. WHAT TO DO TO MIX?

Figure 5. Picture of mixing

Single droplet movement: A* algorithm

The A* algorithm [5] is a popular and widely used search algorithm in computer science, specifically designed to find the shortest path or optimal solution in a graph. The algorithm is often applied in contexts where you need to navigate from one point to another, such as in route planning [6], robotics [7], video games [8], and more. It therefore naturally applies to our droplet movement context. We use the A* algorithm to answer the following question:

What is the shortest path from a starting point to a goal point on the grid?

The algorithm relies on the computation of the cost of paths, which are kept in memory for the period when the algorithm searches for a solution. This cost is composed of a current lower cost, and a heuristic cost which we define below. We can decompose the A* in the context of droplet movement in the following steps:

Step 1: Initialization

Start with a grid representing the plate.
Fix the initial position $ s_0 $ of the droplet, and the destination position $ s_f $.
Initialize two sets: the so-called open set (squares to be explored) and the closed set (squares that have been explored).
Add the start square to the open set:

Set its actual cost to 0: $ g(s_0) = 0 $. This represents the cost to reach the square from the start.
Calculate and set its heuristic value $ h(s_0) $, which estimates the cost from the start square to the goal square. This heuristic is the Manhattan distance between the current square and the destination: $ h(s) = d_M = |x - x_g| + |y - y_g| $. This models “how far are we from the final destination, assuming there is no obstacle to avoid”.

Step 2: Main Loop

While the open set is not empty:

Select the square from the open set with the lowest total cost. This cost is the sum of the actual cost and the heuristic cost: $ f(s) = g(s) + h(s) $.
Move this selected square to the closed set to mark it as explored.
If the square chosen is the destination, the shortest path has been found. Proceed to reconstruct the path from the start to the goal using parent pointers (backtracking).

Step 3: Explore Neighbors

For each neighboring square of the square chosen in step 2:

Calculate the actual cost $ g(s) $ to reach this neighbor through the current square. This is the cost to reach the current square plus the cost to move from the current square to the neighbor, which we fix to 1.
If the neighbor is in the closed set and the new cost $ g(s) $ is higher than the previous cost, skip this neighbor (it has already been explored and a better path exists).
If the neighbor is not in the open set or the new current cost $ g(s) $ is lower than its existing cost:

Keep in memory the parent of the current square (for path reconstruction).
Calculate the neighbor's heuristic cost to the goal $ h(s) $.
Calculate the neighbor's total cost: $ f(s) = g(s) + h(s) $.
Add the neighbor to the open set.

Step 4: Termination

If the open set becomes empty and the goal square has not been reached, there is no path to the goal (no solution).

If the goal square is reached, reconstruct the shortest path from the start square to the goal square by following the parent from the goal square backward to the start square. The reconstructed path represents the shortest path from the start to the goal square.

A* ensures optimality of the solution, with potential obstacles on the way of the droplet, while being efficient in finding the shortest path on a grid.

Software functions and mechanisms

The aim of our software was to act as an intermediary between the digital microfluidics hardware and the user, offering the full range of functions that might be required to carry out a typical protocol encountered in the field of biology. More specifically, we worked on functionalities enabling us to create droplets of a chosen type on our plate, to move them while respecting a certain number of constraints, to merge and mix different droplets or to send them to a heating zone for a given duration at the chosen temperature. All these functions also had to be adapted so that they could be controlled by Thalia, our voice assistant. While each of these functionalities may seem straightforward at first glance, we were faced with a number of difficulties in implementing them. In this section, we'll look at the major elements we've worked on in our software, as well as the solutions we've found to the problems we've encountered.

Figure 6. Preview of the software and its interface.

Droplet creation

Goal: Creating different droplets at different locations, of different type with different colors by using only few clicks.

The software backend architecture should be optimized for path finding algorithms, and it starts from choosing appropriate representation for the droplets.

Droplet movement

Goal: being able to move any droplet to any location in few clicks.

additionally, it is not only possible to move droplets taking a 1x1 space on the grid, but also 2x2 or even unusual shapes like (4, 3), (2, 1). This adds to the complexity, considering the different behaviour, collision detection, path and more. Avoiding obstacles is not as easy with classical algorithms which explains the need of CBS. But CBS is slow (not really usable with 10+ droplets in python, maybe a bit more with a C++ implementation).

Figure 7. Demo of droplet movement.

Droplet merging

Goal: Merging n droplets essentially resumes to 4 tasks, each of them being complex.
Determining the final shape that should have all droplets when joined -> easy for square number of droplets but harder for number like 7, 10, ….
Placing the final shape: the final shape should be placed properly without being too close to other droplets. It may have to be rotated to fit in a constrained location.
Map each droplet to a target square in the final shape. Each square should be placed cleverly to avoid lots of collisions during the CBS algorithm which can make it very slow.
Solving the path finding problem of n droplets going each to n different (closeby) locations.

Figure 8. Merge droplets for two different types.

Droplet heating

Goal: simulating a heat zone delimited in a specific part of the plate. It should be precisely delimited with obstacles. The heat temperature must me easily selectable by the user.

Saving and loading

Goal: Being able to save/load procedures in an efficient way. Representing all procedures in a simple and universal way, that can be written in a .txt or .json file. It could save a huge amount of time for the user to be able to reuse different procedure that he would build only once.

Voice control

Being able to talk to the AI assistant to ask him to perform specific task, make a list of running processes, answer a question.

Figure 9. Démo of Thalia, our voice activated assistant. Set the Volume of the Video to ON to listen to the voices.

Security features

Goal: Ensure that all automated processes could be stopped in one click by the user if he decides so. In particular, several security measures should be implemented to stop processes created by the AI assistant if needed, and deactivate it completely.

Thalia: our vocal interface

An AI assistant in a lab

Nowadays, many of the tasks performed in a biology laboratory can be assisted and managed from a computer. They therefore require interaction between man and machine. Classically, we use a keyboard and mouse to interact with a computer, but during many protocols, lab assistants have to wear gloves and interact with different elements. It is therefore necessary to remove gloves, making interaction more delicate and less user-friendly. One solution would be to use a voice assistant, capable not only of interacting with software, but also of keeping track of reactions in progress and their status, of stocks remaining in the lab, or of scheduling a meeting. All this is now possible thanks to recent advances in artificial intelligence. Our ultimate goal is therefore to produce an autonomous, easy-to-use assistant that would enable people working in a laboratory to control their tasks on a computer using their voice. In particular, in the context of our project, to be able to control the plate and drip system automatically and simply by voice. A simple button would be needed to call the voice assistant, without interrupting the rest of the tasks. The main challenges are for the voice assistant to be able to respond in real time, and to have access to the various functionalities required.

Main purposes of our AI assistant Thalia

In line with the rest of our project, a voice assistant based on a Large Language Model (LLM) would be capable of two main types of task. Firstly, a first type of high-level task is essential: answering a biology question, establishing a protocol (such as PCR) and modifying it as the user wishes, or reporting on the progress of a reaction in progress. In a second phase, it would be ideal for the assistant to be able to perform lower-level actions within the software. In our case, this would involve indicating which droplet movement we want to make, sending it to a certain location on a heating zone, for example, while avoiding collisions with other drops already present. We could also imagine the assistant interacting with any type of software, accessing the lab's databases and taking notes for us, but that's for later! We therefore chose to concentrate initially on an agent capable of chatting with the user in real time and performing a number of predefined high/low-level tasks. Over time, a greater number of tasks can be added, to better meet user needs.

Our first pipeline in detail

To meet the necessary requirements, we have devised the following pipeline to form our intelligent voice assistant (Figure 10). First, the user's speech is recorded and transformed into text using Speech to text. Next, the text is interpreted using a Large Language Model such as ChatGPT, which provides a text response. Finally, using a speech algorithm, the text returned by the LLM is spoken back to the user. Using a precise prompt to introduce context into the someone Transformer open source Whisper developed by OpenAI. This model is relatively lightweight and can run on any computer relatively quickly.

Figure 10. Working pipeline to form our intelligent voice assistan.

Making our vocal assistant faster to respond

This three-stage operating mode seems relatively straightforward, but it has a number of drawbacks:

response times are sometimes long, especially when LLM responses take a long time to generate, making the conversation less pleasant for the user,
long responses are read out in full, which is sometimes unnecessary, as the user can read them more quickly on his screen, and the assistant has difficulty understanding the task he has to perform
the LLM doesn't necessarily call the requested functions correctly.

These few points have a major impact on the user experience, making the pipeline presented above unusable as it stands. We therefore decided to solve each of these problems separately through a series of improvements.

1. Improved response time

For a relatively short response, the LLM response time is of the order of 1-2s, which is relatively acceptable, given that the other steps in our pipeline are negligible in terms of time compared with this delay. On the contrary, the very nature of LLMs, which generate a response token by token (word by word), with an almost constant delay for each token, makes long responses much slower to generate (15s and more), making the dialogue very unpleasant for the user. It is therefore necessary to optimize this response time.

We've chosen to use the stream function of ChatGPT's API (gpt-3.5-turbo), which means we don't have to wait for the complete response to be generated by the servers before sending it to the client (our machine). The response is sent in stream mode, i.e. the tokens are sent as they are generated. As soon as we've collected the first few words of the sentence, our voice assistant can start responding to the user. Subsequent words are collected in the background and spoken once the first part has been completed, in a similar way to what we're used to doing when chatting. This technical improvement reduces the LLM's response time to generate the text to around 1s, whatever the length of the response, resulting in a much smoother speech. The main disadvantage of using this method is that we don't have the complete response upstream to control it. It is therefore important to adapt our input prompt (LLM user side diagram vs. prompt + user input).

2. Complete reading of long answers

When we ask our assistant a question, he tends to present the answer in the form of a list, enumeration or even long paragraphs. In order not to limit the quality of the answers, we have chosen to allow the LLM to make long answers. However, these answers need not be read in full, which can sometimes take many minutes. We have chosen to allow the user a "short oral response" option, which modifies the way our Text to Speech algorithm reads the response. In this way, the algorithm responds more naturally (see example below). It simply answers and speaks only the leading part of the answer, leaving the user to read the text presented on the screen.

3. Interaction between assistant and external systems

In order for our assistant to be able to launch processes, access data or take inventory of current processes, it needs to be able to execute a number of functions on its own, from a list of predefined functions. Initially, our approach was to include a sentence in the LLM prompt along the lines of "Provide a clear answer to the following question. You will be able to call up functions simply by mentioning its name. This approach proved to be very limited: the LLM couldn't call up the right function systematically, even if we gave it example answers. And so, the assistant would either try to do the task by itself or, on the contrary, respond that it didn't have the capabilities to do so, without calling any of the functions we provided.

We had a number of options: we could use a more powerful model (such as GPT4) that would better follow the instructions provided, we could fine-tune a small LLM for our answers with our list of available functions, or we could try to find a trick to solve the problem. We first looked at the first solution, which was the simplest. The responses were satisfactory, but introduced a number of additional drawbacks: the cost of GPT4 is significantly higher ($0.06 / 1K tokens for GPT4 vs. $0.002 / 1K tokens for GPT 3.5, i.e. almost 30x more expensive) and responses are slower to generate, which seriously affects our wizard's response times.

The second possible solution was to perform a model fine-tuning step, i.e. to train a pre-trained model to generate LLaMa (Meta AI) type responses on a dataset with sample responses and calls to available functions. This second method was perfectly feasible, but it also brought with it a non-negligible problem, apart from the cost of training the model and the time required to generate a suitable dataset with a few thousand examples: the need to have a server powerful enough to run the model continuously to generate the answers, which made deployment considerably more delicate and prone to bugs. The choice of relying on the OpenAI API avoided all these worries at a very low cost. So we abandoned the idea of fine-tuning our own LLM.

Our Last Recourse with ChatGPT

Our last recourse was to be more astute in the way we presented the problem to our beloved ChatGPT (GPT3.5). Indeed, while ChatGPT was limited in its ability to perform both tasks simultaneously when answering (generating an appropriate response to the user's question and calling one of the available functions if necessary), it is entirely possible to handle the two problems separately. Our final pipeline is therefore divided into two parts:

A first call to ChatGPT to determine exclusively which function to use according to the user's request. We provide it with the following prompt: "Based on the following user asks, determine if any of the following python functions should be called. Your answer must be exactly one of the following. Do not write the comment part. You will complete with appropriate arguments."
Once the function has been retrieved, we make a second call to generate the answer for the user with the following prompt: "Given the following function call: Function call: {function_call} Provide a natural, user-friendly explanation or response suitable for voice interaction. The response should be concise, informative, and free from jargon."

This latter approach effectively induces a double call to the API, which slightly increases our wizard's response time, but remains relatively more reasonable than using GPT4. We also have much greater control over the output and actions performed by the wizard.

Future Work and Perspectives

Evolutions in Droplet Movement

During our interview with Prof. A Felner, renown expert of pathfinding algorithms and co-author of the seminal CBS paper [9], we discussed the legitimacy of the current pathfinding algorithms we use. In its current setting, the movement algorithms we presented do not achieve our long-term goal, which is to allow the plate to be used continuously, piling up experiments in a queue. In other words, instead of using an “offline” paradigm, we would like to switch to an “online” paradigm [10, 11]

Offline. In this setting, the algorithm is given “all data at once”, which is to say: it is given the complete list of experiments beforehand. This allows the algorithm to perform optimizations before actually performing the experiments on the plate, but does not allow adding an experiment on the fly.
Online. In this setting, the algorithm needs to deal with new data, and adapt to it: we could add more droplets and more experiments to the plate while other experiments are currently ongoing. The benefits are obvious: we may add more experiments on the plate (in the case there is still enough space to do so), so that the experimenter should only care about which experiments are to be performed. However, this supposes to use different algorithms, while keeping the constraints of 1-square margin at all time.

Although they could be adapted to droplet movement, none of currently available algorithms take the merge and split actions into consideration. Consequently, more research needs to be done before such an algorithm is implemented.

Another Possibility for Evolution

Another possibility for evolution is to use reinforcement learning and deep learning methods, in a similar way as it is done in [12], which provides a scalable framework for droplets movement. The benefits would be to hopefully get a more efficient agent, especially on settings involving a large plate with a large number of droplets. Those benefits are yet conditioned on having a powerful enough computer to train the algorithm beforehand, which requires resources typically not found in a classic biology laboratory. For this reason, we did not push forward this axis for the moment, in order to make the simpler and resource efficient software available first.

Building an Open-source Community

Throughout the development of our software, we were interested in making it accessible to biologists with little to no programming training. This was done by developing our assistant Thalia, which does not require to know about programming. As discussed in the functions section, we deliberately chose a lightweight and transparent storage method, to allow people to easily share their experiments and modules.

We naturally think that this capability should be used at its fullest by building an open community where anyone can share its algorithms, in a similar way as Protocols.io. All the foreseeable benefits as well as potential negative outcomes of such a community are discussed in the Human Practice page.

Extend Thalia’s Capabilities

For the moment, Thalia is able to understand experiments and actions that are in its database, which can include community and privately designed algorithms. The next step will be to link Thalia to the experiment creation process, during which the experimenter details the various steps of what they want to be performed. This development seems accessible with the emergence of LLMs with coding abilities like OpenAI [13], GitHub Copilot, Mistral-7B-v0.1, and would make the platform ever more inclusive to people without a high programming training.

Note that, as discussed in the Human Practice page, we never remove the responsibility from the experimenter, who remains in charge of the programs they built and run.

References

[1] Li J, Tinka A, Kiesel S, Durham JW, Kumar TKS, Koenig S. Lifelong multi-agent path finding in large-scale warehouses. The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21) (2021) 35: 11272–11281.
[2] M. Hasby, M. L. Khodra. Optimal path finding based on traffic information extraction from Twitter. International Conference on ICT for Smart Society. (2013) 1–5.
[3] Ho F, Salta A, Geraldes R, Goncalves A, Cavazza M, Prendinger H. Multi-agent path finding for UAV traffic management. Proceedings of the 18^th International Conference on Autonomous Agents and MultiAgent Systems. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems (2019) 131–139.
[4] Kingman JFC. The coalescent. Stochastic Processes and their Applications (1982) 13: 235–248.
[5] Hart PE, Nilsson NJ, Raphael B. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics (1968) 4: 100–107.
[6] Yin W, Yang X. A Totally Astar-based multi-path algorithm for the recognition of reasonable route sets in vehicle navigation systems. Procedia - Social and Behavioral Sciences (2013) 96: 1069–1078.
[7] Duchoň F, Babinec A, Kajan M, Beňo P, Florek M, Fico T, Jurišica L. Path planning with modified a Star algorithm for a mobile robot. Procedia Engineering (2014) 96: 59–69.
[8] Cui X, Shi H. A*-based pathfinding in modern computer games. International Journal of Computer Science and Network Security 11: 125–130.
[9] Sharon G, Stern R, Felner A, Sturtevant NR. Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence (2015) 219: 40–66.
[10] Even G, Halldórsson MM, Kaplan L, Ron D. Scheduling with conflicts: online and offline algorithms. Journal of Scheduling (2009) 12: 199–224.
[11] Bnaya Z, Felner A. Conflict-oriented windowed hierarchical cooperative A*. 2014 IEEE International Conference on Robotics and Automation (ICRA). (2014) 3743–3748.
[12] Li Q, Lin W, Liu Z, Prorok A. Message-aware graph attention networks for large-scale multi-robot path planning. IEEE Robotics and Automation Letters (2021) 6: 5533–5540.
[13] Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Scialom T et al. Llama 2: open foundation and fine-tuned chat models. (2023) 2307.09288