Engineering Success | IIT-Madras

WET LAB

In the lab, designing each minor experiment itself is an experience that takes one through the DBTL cycle. As outlined in our Lab Notebook, even staple molecular biology techniques like PCR and electroporation can prove to be challenging when approaching it for the first time with new organisms and parts to test.

The first wall we hit in our experimental journey was the amplification of our ordered sequences. After thoroughly checking the primers and matching them with the plasmid we were using, we were dejected to get our amplified products. Our first thought was to reduce the annealing temperature since the temperature calculated from the sequence information could have a slight deviation. After multiple iterations of this, we were able to get satisfactory results only by implementing touchdown PCR with a range of annealing temperatures. Finally, we were able to obtain all the amplified products by reducing our reaction volume which would have aided in better conduction of heat during the process.

Our next hurdle was the electroporation. Electroporation is shown to be the standard approach in transforming our recombinant DNA into L. lactis. However, we weren't able to obtain many colonies after the first attempt. This could have been due to the presence of chemicals in the cuvettes after disinfection from previous attempts, which could have degraded DNA. But even after thorough washing steps, we were able to only add 3 strains to our initial library of 10 strains. This issue is something we continued to struggle with as we couldn't successfully obtain variants of the larger natural RBS sequence that we randomized.

Despite these setbacks, we were able to generate an RBS library with sufficient variation for further metabolic engineering needs. This also shows that apart from the large-scale hypothesis testing-based DBTL cycles that one goes through in research, there are also the everyday decisions to contend with on how to run experiments and troubleshoot when things don't go your way. Being able to produce results both in the wet lab and dry lab and in the process learning how to circumvent/solve problems of daily research work is our engineering success.

MODEL TRAINING AND TESTING

DBTL Iteration 1

We designed and trained commonly used regression-based machine learning models such as a Linear Regressor, a Support Vector Machine Regressor, a k-Nearest Neighbour Regressor, and a Random Forest Regressor. We did not tune their hyperparameters and just measured their base performance. On testing these models, we found that the Random Forest performs best.

DBTL Iteration 2

We then compared the Random Forest to other ensemble-based models that use "boosting" techniques, such as an Adaptive Boosting Regressor and a Gradient Boosting Regressor. Suprisingly, the Random Forest still performed better, despite the use of boosting in the other models.

DBTL Iteration 3

We finally fine-tuned the hyperparameters of the Random Forest Regressor by performing grid search cross-validation. We found that the combination of parameters that work best were max_depth=80, max_features=3, min_samples_leaf=3, min_samples_split=8, n_estimators=50 for random_state=23. We compared the performance of our model against already existing ones, which showed very favourable results. This comparison is described in detail under the 'Software' page.

Thus, through the Design-Build-Test-Learn methodology, we identified the best performing machine learning to predict the relative expression of a gene given data about the chassis, temperature and the sequences of the RBS and the coding region.

OPTIMIZATION ALGORITHM

We employed the same strategy to evaluate different optimization algorithms that can be used to modify the RBS sequence to achieve a desired relative expression. We designed, implemented, and tested a Gradient Descent optimizer, a Simulated Annealing optimizer, and a Genetic Algorithm optimizer. Our testing indicated that the Genetic Algorithm yielded the most favorable results among the three models.

This strategy encompassed the following steps:

1. Design: The formulation of each optimization model, such as Gradient Descent, Simulated Annealing, and Genetic Algorithm, tailored to the specific requirements of the protein expression optimization task.

2. Build: The designed models were implemented into code, thus translating the theoretical constructs into functional algorithms. For Gradient Descent, we adjusted learning rates and convergence criteria, while for Simulated Annealing, temperature schedules and acceptance criteria were carefully optimized. In the case of the Genetic Algorithm, we fine-tuned parameters like population size, mutation rates, and selection strategies.

3. Test: We rigorously tested the optimization algorithms, considering time efficiency and correlation with target expression levels. Simulations were conducted to compare their performance. The Genetic Algorithm emerged as the optimal choice due to its efficient resource utilization, high correlation with target levels, and consistent performance across various scenarios.

4. Learn: The results obtained from this Design-Build-Test process indicated that the Genetic Algorithm produced the most favorable results among the three algorithms. This suggested to us that, in the context of optimizing gene expression, the Genetic Algorithm demonstrated superior performance in reaching the desired target, leading us to select it as the main optimization algorithm.

This thorough and systematic approach allowed us to gain a comprehensive understanding of the strengths and weaknesses of each algorithm in the context of protein expression optimization. It also highlighted the adaptability and robustness of the Genetic Algorithm in achieving the desired protein expression targets.

Our optimization algorithm was put to a final test by generating different RBS sequences for GFP expression in L. lactis. This were then implemented in the wet lab. The results of this testing reflected a good differentiation between high and low expressing RBS variants. The results are described in detail under the 'Results' page.