Engineering
Silver Medal requirement of 🔗Engineering Success is Accepted! Check our 🔗Awards page for details!
Engineering Cycle 1: A molecular classifier used to distinguish between bacterial and viral infections
Design
Infectious diseases caused by viruses and bacteria are important public health issues of global concern. Accurate diagnosis early in the course of the disease is crucial for appropriate treatment strategies and the prevention and control of infectious diseases. Viral infections are often self-limiting and often require only supportive care, while bacterial infections require treatment with antibiotics.
The wrong use of antibiotic treatment during viral infection will accelerate the development of antibiotic resistance, and also bring more severe challenges to the treatment of bacterial infection.
However, in the clinical environment, the symptoms of these two infections often share great similarity, usually requiring laboratory confirmation using multiple techniques, such as cell culture, microscopy, molecular testing, serology and antigen detection, so the diagnostic efficiency is low.
Therefore, in the engineering cycle 1, we targeted the genes related to bacterial and virus infection in the human body, and made accurate and efficient diagnosis of bacterial and virus infection.
Screening of target
We referred to the literature and used the NCBI database to screen the targets suitable for bacteria or viruses for molecular calculation and the corresponding weights. To ensure the specificity of the target, the selected target sequence is 30 nt long, and the sequence of the seven targets is shown in the table below:
Target Name | Classification | Weight | cDNA Sequences(5‘to3’) | Part Name |
---|---|---|---|---|
ARG1 | Bacteria | +1 | CAAGAAGAACGGAAGAATCAGCCTGGTGCT | BBa_K4612100 |
CD177 | Bacteria | +4 | CTTCACCGCTGCTGACCACCCACACTCAAC | BBa_K4612101 |
VNN1 | Bacteria | +5 | TCTAGGGGCATTTGACGGACTGCACACTGT | BBa_K4612102 |
TRDV3 | Virus | -1 | CTCTCCAGTAAGGACTGAAGACAGTGCCAC | BBa_K4612106 |
IFIT1 | Virus | -1 | TGAGGAGTCTGGTGACCTGGGGCAACTTTG | BBa_K4612105 |
SIGLEC1 | Virus | -4 | TCTGCCTCTACCTCCACCTACTTTGGGGTC | BBa_K4612104 |
LY6E | Virus | -4 | GGAAAGCCCAGCCCTTTCTGGATCCCACAG | BBa_K4612103 |
The design of the probe
In order that the weight probe can be identified one-to-one correspondence to the target, we need to design the corresponding weight probes for the seven selected target genes with specific weights. According to the base complementary pairing principle, the 3' end of the orbital strand of each weight probe has a DNA sequence complementary to the target gene.
Moreover, in addition to having a specific sequence complementary to the target, the weight probe binds a number of weight strands corresponding to the number of target weights.
- In the detection process, the weight strand can be freed by the polymerase-mediated strand displacement reaction, so as to quantitatively convert the corresponding weight number of the target into the weight strand number.
In the process of designing the weight strand sequence, we divide the weight molecules into two domains, where code1 domain binds to the orbital strand of the weight probe, with a length of 18 nt;
The suspended code2 domain does not bind to the orbital strand, with a length of 5 nt. The suspension of the 3' end of the weight molecule is designed to prevent signal leakage and downstream extension of the polymerase-bound code region.
At the 3' nd of the track strand, also suspended a free sequence TTTTT of length 5 nt. The presence of the two partial free 5 nt sequence prevents reverse DNA replication during the strand displacement phase.
- Suspension of these two regions can avoid unnecessary DNA polymerization reactions and reduce the loss of polymerase activity.
Build
We provided the designed DNA sequence to the gene synthesis company. After obtaining the ssDNA synthesized by the company, we completed the self-assembly of the weight probe and the fluorescent reporter probe by mixing the various molecules together and then slowly cooling them down.
Take the weight probe corresponding to the VNN 1 target as an example(The weight is +5)
VNN1 Track Strand | Weight Molecular | Buffer | Water |
---|---|---|---|
1 | 5 | 10× | Make up to 250nM |
The final concentration of synthetic weight probes was 250 nM and fluorescent reporter probes were also assembled in a similar manner.
We used the PCR instrument to complete the cooling process, and the procedure of temperature reduction was as follows:
Step | Temperature | Time |
---|---|---|
1 | 90-85-80-75-70-65-60-55-50-45-40℃ | One minute per stage |
2 | 37℃ | 5 Minutes |
3 | 25℃ | 5 Minutes |
4 | 4℃ | forever |
At the end of the probe synthesis, we used electrophoresis to verify the construction of the weight probes.
The electrophoresis results showed a successful synthesis of all the weight probes.
Test
Report probe responses to weight molecules via TMSD reactions
We first verified whether our designed fluorescent reporter probes could produce a quantitative response to the weight molecular.
- We recorded the final fluorescence intensity in the solution after the reaction was complete and established an equation correlating fluorescence intensity with weight molecule concentration.
After the fluorescence curve is stable, the fluorescence intensity is linearly related with the concentration of the weight molecules, proving that the fluorescent reporter probe can produce a linear response to different concentrations of the weight molecules, which can realize the quantitative conversion of the weight molecular signal to the fluorescence intensity signal.
Detection and weighting of the target signal by the weight probes
Next, we verified the function of the weight probes, which should be able to bind the target molecules through base complementary pairing, thus triggering a polymerase-mediated strand displacement reaction to releasing the weight molecules of the corresponding weight number.
We reacted the 2 nM, 5 nM, 10 nM targets with the excess weight probe in solutions containing Bst polymerase, dNTPs, etc, and quantitative response to the weight molecules detached from the track strand was performed with the fluorescence reporter probe described above to detect the changes in the fluorescence intensity in the solution.
As shown, after the fluorescence signal is stabilized, the fluorescence intensity of the solution depends linearly on the target weight and the target concentration.
Weight Addition
Next, we verify that the weights are mixed detections of targets with the same symbol (both positive or both negative).
- The weight probe should only specifically identify the corresponding targets, without responding to other targets, and the signal of each target of the same symbol should be summed up after being weighted, and the summed result should be output in the form of fluorescence intensity signal.(Virus and bacterial infections were measured for two different fluorescence signals)
Fluorescence characterization of the probe reactions for bacteria-related genes
We first formulated the mixed solution containing one or more targets (the concentration of all targets was nM) according to the seven possibilities generated by target mixing, and mixed the seven target solutions with solutions containing weight probes and fluorescent reporter probes, Bst polymerase, dNTPs, etc., to detect the change of fluorescence intensity of the solution.
The results show that after the reaction, the sum of the fluorescence intensity and the weight number of the mixed target is linearly, and the individual targets do not interfere with each other in the binding of the weight probes, and the weighted signal sum can be realized.
Fluorescence characterization of the probes corresponding to the virus-related genes
We first formulated a mixed solution containing one or more targets (the concentration of all targets was nM) according to the eight possibilities of target mixing, and mixed the eight target solutions with solutions containing weight probes and fluorescent reporter probes, Bst polymerase, dNTPs, etc., to detect the change of fluorescence intensity of the solution.
We first formulated a mixed solution containing one or more targets (the concentration of all targets was nM) according to the eight possibilities of target mixing, and mixed the eight target solutions with solutions containing weight probes and fluorescent reporter probes, Bst polymerase, dNTPs, etc., to detect the change of fluorescence intensity of the solution.
The results show that at the end of the reaction, the fluorescence intensity is linearly related to the sum of the number of weights of the mixed targets, and the individual targets do not interfere with each other in binding to the weight probes, and the summing of the weighted signals can be achieved.
Weight subtraction and judgment
We verified the feasibility of the step of "subtracting the weighted results of each target to obtain the judgment results".
Calculate the difference (shown in D) between the two fluorescence signals in each combination described above. D> X indicates a bacterial infection, D <Y a viral infection and Y <D <X a healthy condition.
The experimental results show that the judgment result of "bacterial infection", "viral infection" or "health situation" can be obtained after subtracting for the different fluorescence values.
Learn
In this engineering cycle, we found some aspects that need to be improved to further refine our molecular classifier:
Limitations to the number of weights that can be weighted When the length of the track strand exceeds 100 nt, the synthesis cost will be greatly increased, and the success rate of the correct binding of the track strand and the weight molecule will be reduced. Therefore, the system we designed can only carry out the weighting process with the weights of 1, 2, 3, 4, 5. However, it is very likely that the target corresponds to more than 5 weights, so it is necessary to develop tools that can perform higher weights.
Expanding the target type To increase the diversity and applicability of the classifiers, we considered further expanding the target type. This involves exploring other types of nucleic acid targets to cover a broader spectrum of disease detection. This requires extensive literature studies and bioinformatics analysis to identify novel targets.
On machine learning algorithms
In order to better screen targets and process signal data, we can explore the use of machine learning algorithms. These algorithms can help us to automate the target selection process while improving the ability to analyze and interpret signal data. Machine learning can also help us optimize weight allocation to improve classifier performance.
Engineering Cycle 2: A molecular classifier used for lung adenocarcinoma detection
Design
Lung cancer is a primary cancer type with high morbidity and mortality, especially non-small cell lung cancer (NSCLC), accounting for more than 85% of lung cancer patients. However, due to the lack of obvious clinical symptoms, many patients with early-stage non-small cell lung cancer often miss the optimal treatment opportunity.
- Therefore, developing methods of early screening for NSCLC is essential to improve patient survival. In the second phase of this study, we selected lung adenocarcinoma in non-small cell lung cancer to validate the effectiveness of our upgraded molecular classifier.
The miRNA (microRNA) is a class of non-coding RNA molecules of about 20-22 nucleotides in length that regulate gene expression by interacting with the targeted mRNA. MiRNA plays a key role in the cell, participating in various biological processes such as cell life cycle, differentiation and apoptosis, and is also involved in the pathogenesis of various diseases, which has become a research hotspot in molecular biology and medicine. Previous studies have shown that miRNA has potential biomarker properties that can be used for the early diagnosis and treatment of cancer. However, current miRNA-related diagnostic methods usually require complex miRNA expression analysis, therefore, we aim to build a convenient and efficient diagnosis platform through molecular computational methods.
- In engineering cycle 2, we selected a set of miRNA as our targets from the miRNA expression profiling data.
Screening of targets and the determination of weights
The data related to lung adenocarcinoma were selected from the UCSC Xena database, and the relationship between the expression levels of related miRNA in the data and the detected diseases was analyzed by 🔗machine learning algorithm, and the optimal target of ACC, AUC, F1 and corresponding weights were comprehensively evaluated.
Nine targets were screened out and the corresponding weights are shown in Table 1.(The target directly detected by the weight probe is the cDNA sequence after miRNA reverse transcription)
Table 1. Lung adenocarcinoma-associated targets
miRNA Name | cDNA Sequence | Weight | Part Name |
---|---|---|---|
hsa-mir-192-5p | GGCTGTCAATTCATAGGTCAG | +1 | BBa_K4612108 |
hsa-mir-182-5p | AGTGTGAGTTCTACCATTGCCAAA | +4 | BBa_K4612109 |
hsa-mir-148a-3p | ACAAAGTTCTGTAGTGCACTGA | +10 | BBa_K4612110 |
hsa-mir-21-5p | TCAACATCAGTCTGATAAGCTA | +3 | BBa_K4612111 |
hsa-mir-375-3p | GGTTTGTGCGAGGGGCTCGTCGC | +2 | BBa_K4612112 |
hsa-let-7a-2-5p | AACTATACAACCTACTACCTCA | -25 | BBa_K4612113 |
hsa-let-7b-5p | AACCACACAACCTACTACCTCA | -8 | BBa_K4612114 |
hsa-mir-30a-5p | CTTCCAGTCGAGGATGTTTACA | -5 | BBa_K4612115 |
hsa-mir-143-3p | GAGCTACAGTGCTTCATCTCA | -5 | BBa_K4612116 |
Design of the weight molecules with the probes
Implementation of the weight multiplication
The three miRNA, hsa-mir-148a-3p, hsa-let-7a-2-5p, and hsa-let-7b-5p, were weighted more than 5, but the easily synthesized ssDNA has a limited length, and it is also necessary to ensure that the weight molecular and the probe bind stably.
Hence, the maximum allowable count of weight molecules on a probe should not exceed 5, that is, a weight greater than 5 cannot be achieved by only one probe.
Since the cascade is a common signal amplification mechanism in life activities, to solve the limitation of weight number, after brainstorming, we decided to use the cascade method for design:
CODE 1 is the primary weight strand of miRNA weighted over 5,after it is replaced and released by the newly synthesized substrands of DNA polymerase, it serves as a primer for the secondary weight probe, and the polymerase-mediated strand displacement reaction occurs again.
The secondary weight molecular CODE 2 is replaced and released to realize the quantitative conversion of the target signal into the number of weight molecules.
- We use 🔗NUPACK to design a first order weight molecular and corresponding probe sequence based on the known target sequence. The sequence and structure diagram are as follows:
The remaining six miRNA targets are weighted less than 5, and the design idea is the same as the engineering cycle 1. The sequence and the structure schematic diagram are as follows:
Build
Just like engineering cycle 1, we provide the designed DNA sequences to gene synthesis companies, and after obtaining the synthesized ssDNA, we mix the various molecules in proportion and then slowly cool down to complete the self-assembly of the weight probe and fluorescence reporter probe.
Test
Detection and weighting of the target signal by the weight probes
Verify the function of the weight probe: the weight probe should be able to bind to the target molecule through base complementary pairing, then triggering the polymerase-mediated strand displacement reaction, thus releasing the weight molecule of the corresponding weight number.
We reacted the 5 nM target with the excess weight probe in solutions containing Bst polymerase, dNTPs, etc., and detected the change in fluorescence intensity with the fluorescence reporter probe described above.
As shown in the figure, the results indicate that after the fluorescence signal stabilizes, the fluorescence intensity in the solution exhibits a linear relationship with the target weight. The weight probes enable the detection and weighting of target signals.
Due to time limitations, we only completed proof of concept of representative DNA probes in engineering cycle 2 (weights + 1, + 2, + 3, -5, -8, respectively). But except for the difference in sequence, the molecular calculation principle of other probes is the same as for the successfully verified probes mentioned above. Together with the successful validation of Engineering Cycle 1 and the demonstration of the kinetic modeling results, we believe in the feasibility of the system.
Learn
Analysis and outlook for the limitations of the cascade
We adopted the cascade signal amplification mechanism to realize the molecular calculation with the weight number higher than 5, and during the project, we found the limitations of the cascade amplification method
First of all, because the secondary cascade produces one more step of reaction, the whole calculation process is more complex and takes longer, and the types and number of probes added in the molecular calculation process will be significantly increased, increasing the possibility of signal leakage.
Additionally, since signal amplification is achieved by multiplication, the primes between 5 and 25 (for example, 7,11,13,17,19,23) cannot be represented by this method.Therefore, currently the molecular calculation system we designed is unable to complete the weighting process of these prime weight numbers.
| | 1 | 2 | 3 | 4 | 5 |
| --- | --- | --- | --- | --- | --- |
| 1 | | | | | |
| 2 | | | | | |
| 3 | | 6 | 9 | | |
| 4 | | 8 | 12 | 16 | |
| 5 | | 10 | 15 | 20 | 25 |
The secondary cascade may be the prototype of a multi-layer perceptron. The weight probe of the secondary cascade can be regarded as a certain signal conversion layer, and the complete signal amplification is only a special manifestation of the function of the signal conversion layer.
The function of the signal conversion layer can be expressed as mapping the input signal to a new representation space to extract higher-level features.
Through rational design, the signal conversion layer is expected to serve as the hidden layer of neural network, so as to construct DNA molecular calculation with the network structure of multilayer perceptron to realize more complex operations and more intelligent diagnostics.
Nucleic acid amplification is the key problem to be solved in this system
In both engineering cycles, we used the LATE-PCR method to perform an approximately equally proportional linear amplification of the target nucleic acid molecules to ensure that the molecular computing system could receive the input signal of sufficient intensity.
- This is beneficial to solve the problem of the relatively low sensitivity of molecular computing systems, but adds additional steps that conflict with the advantages of molecular computers in terms of cost and efficiency.
In the thessaloniki-meta 2022's project, the experimental approach using droplet microfluidics technology instead of traditional nucleic acid amplification (NAA), which provides us with new ideas. At the same time, by communicating with the CJUH-JLU-China team, we further gained knowledge on the use of CRISPR systems for nucleic acid detection.These inspirations provide us with reference for the future development of our project.
- At the same time, by communicating with the CJUH-JLU-China team, we further gained knowledge on the use of CRISPR systems for nucleic acid detection.These inspirations provide us with reference for the future development of our project.
To avoid the steps of nucleic acid amplification, we designed an improved molecular computing system based on droplet microfluidic and CRISPR/Cas14a. The details are shown in the figure below:
Molecular computing module
Identification: Each miRNA target corresponds to a weight probe, as shown in the figure, the sequence of the 3' end of the track strand is designed to be complementary to the miRNA target, so the weight probe can specifically bind to the target to complete the identification of the input signal. Weighting: the weight molecules corresponding to the DNA weight number of the target. In the process of DNA polymerase, the weight molecules will be replaced and released by the newly synthesized substrands. Through this process, the signal of the target is quantitatively converted into the weight molecule number. Summation: In the process of strand displacement reaction, the same kind of free weight molecules are replaced and released to the solution body, and the target signal is uniformly converted into the weighted weight molecule signal, and the signal will automatically be summed
Drosophila generation module After the sample (aqueous phase) of the molecular calculation reaction and the sample containing the CRISPR reporter system (aqueous phase) are mixed, a segmented flow is formed by microfluidic technology and divided into tens of thousands of droplets.
- These droplets are separated by immiscible liquids (e. g. mineral oil). The weight molecules (23 nt of ssDNA) in the sample were diluted to the single-molecule level and evenly distributed into tens of thousands of reaction units. Each microdroplet may contain zero or more than one weight molecule to be detected, so the effect is equivalent to the enrichment of the weight molecules.
Fluorescence reporter module After the weight molecular is recognized by the Cas14a / crRNA complex, the weight molecule induced cleavage of the reporter DNA located adjacent to the enzyme due to the trans-cleavage activity of Cas14a. As a result, the fluorescent group is released from the quencher, thus eliminating the fluorescence resonance energy transfer effect (FRET).
- The fluorescence signal is released, and the accumulated fluorescence illuminates the microdroplets, which are blocked by the oil phase, and thus does not interfere with the reaction and fluorescence detection of other microdroplets.
Drosophila detection module The fluorescence signal of each microdroplet reaction unit was collected after the fluorescence report. Each microdroplet may contain zero, one, or multiple weight molecules. For each microdroplet, it is marked as 1 if there is a fluorescent signal and 0 if there is no fluorescence signal. Finally, the total number of microdroplets and the number of positive reaction units were counted.
- Since the dispersion of the template molecules fits a Poisson distribution, the original concentration or content of the sample can be calculated using the Poisson distribution equation.