Motivation
Design of complementary DNA probes is not an easy task, and presents a huge problem space. For a probe designed to hybridize to a sequence of 40 nucleotides, there are 440 unique combinations. That's 1.2×1024 combinations, which is more than the number of seconds the universe is old! This is even before considering that we need to design two DNA sequences, the probe and sink, with contrasting hybridization abilities. Brute force laboratory testing to achieve optimal probe and sink combinations is simply not possible. Existing computational tools for probe design have several limitations. We aimed to use software/modeling techniques to design tools to increase probe design efficacy and verify the probe sequences we had previously designed.
Previous Methods
Previous work has developed a computational approach to choosing probe and sink combinations1. The authors utilize thermodynamic models to estimate the probe's fluorescence when exposed to a template strand of DNA. Using these parameters, they design a simulation which starts with a genetic algorithm to explore a broad solution space. Once a range of ideal local coordinates are chosen, a hill climbing algorithm performs a final optimization on the probe and sink combination. While this approach is helpful in exploring solutions, there are a few weaknesses we wanted to address:
- The genetic algorithm prohibits the hill climbing algorithm from optimizing probes at a range of genomic coordinates.
- The algorithm doesn't allow for mismatches in the probe-template interaction which drastically reduces the solution space.
- The simulation doesn't include fail-safe statistics to inform users of potentially false positive probe designs.
Gaps in Knowledge
It has been shown that the location of mismatches in DNA sequence is extremely influential on its binding energy. Recent research suggests that DNA is least affected by mismatches at the ends of DNA strands and at the center of DNA3. However, there are few DNA modeling techniques that account for this information. We developed our modeling strategy with awareness of the current modeling shortcomings. We wanted our model to be scalable, accurate, and reflective of physical trends.
Machine Learning
Machine learning is a modeling technique that uses algorithms to learn trends not explicitly told to the program. We feel that these techniques are well suited for predicting probe binding energy since the problem represents a complex physical process that may benefit from a learning algorithm.
Machine learning is predicated on data. Without good data, quality machine learning cannot happen. In order to train a machine learning algorithm to predict free binding energies of DNA, we needed to have a dataset of DNA binding energies and corresponding information about the nature of that interaction such as nucleotide matching and sequence similarity. In the following paragraphs, we present data exploration on the learning dataset in order to demonstrate where the model may perform well and where the training data is lacking.
For our training data, we used a dataset with 695 samples2. 340 of those samples are complementary, while 355 samples are non-complementary. The variance in sequence complementation allows us to train a machine learning model capable of predicting imperfect complements. In this dataset, the environmental sample temperatures where hybridization occurred assume a bimodal distribution with numerous experiments conducted at 25 or 37 degrees (Figure 1). Sequence lengths largely exist between 5-20 base pairs (Figure 2). Predictions within these parameter spaces will be more reliable due to higher sampling and more extensive training.
We hypothesized that capturing sequence parameters about both probe and template DNA could be insightful to machine learning models. Thermodynamic information could be encoded into a lower-dimensional space through sequence information, and machine learning algorithms could thereby learn to be very accurate through a relatively modest dataset. We wanted to include information relating to the content of purines and pyrimidines, as well as the continuity of the sequence.
To this end, our parameters included relative GC content, AT content, GA mismatches, GT mismatches, all other mismatch combinations, relative Hamming distance, and Ratcliff-Obershelp distance. All parameters relating to relative content aimed to encode hydrogen bonding information. Hamming distance is an absolute measurement of how similar two strings are4. We used this to compare the reverse complement to the template to understand how different the two sequences were. We also wanted spatially encoded information to understand where mismatches occurred. Ratcliff-Obershelp distance investigates the relative length of the longest unbroken substring between two sequences5. We believed Ratcliff-Obershelp distance could help the model understand where breaks in sequence matching likely occurred.
Data is often messy. Considering the wide range of possible errors that exist in the training data we acquired, we needed to find ways to create a robust processing scheme. Quantile transformation is a technique which maps potentially skewed distributions onto a normal distribution. This is a robust preprocessing technique that helps models to learn with outlier data14. However, it is also subject to frequent misuse, specifically when applied to the entire data set before making the train-test split. We made sure to meticulously exclude any data leakage from our training pipelines to ensure uncontaminated data. This ensures our model is independently validated and that we can be more confident of the results we achieved.
In order to learn the processed data, we utilized the random forest machine learning model. A random forest is a robust, ensemble machine learning model which aggregates and averages the predictions of many decision trees8. Decision trees are a simple learning model which make predictions based on random splits in the dataset. Since our training parameters were all scaled, we felt the random forest was a good architecture to account for class imbalance and the variance in outcome target values.
In order to create a generalized model able to predict oligo sequences longer than ~20 base pairs, we needed to scale our target data. Instead of directly predicting the free energy, we predict a free energy value proportional to the sequence length. We call this target value the Nucleotide Efficiency Coefficient (NEC). It is worth noting that we trained a separate machine learning model using scaled learning parameters, but we also included the length of the sequence. SHAP analysis, a game theoretic approach of evaluating parameter importance6, 7, of this model revealed that sequence length was the most important parameter (Figure 3). Therefore we believe that NEC introduces some error, but is still the most fair way of evaluating probe sequences.
There are other programs capable of predicting the free binding energy of DNA duplexes, such as MutliRNAFold, UNAFold, and others. Those programs predict free binding energy with a Pearson correlation coefficient (r) ranging from 0.67 to 0.69 among the exact complement samples of our dataset2. Tulpan et. al. was able to achieve an r value of 0.922. With our generalized scaled model, using mismatches and no sequence length information, we achieved an r value of .84. While this r value is not as high as Tulpan et. al. we proceeded with our model because it is more generalizable to sequence and is less stringent in DNA complement requirements.
One of the reasons we believe our model performs well is because we include information about the difference in substrings. Many previous models assume that where mismatches exist in the sequence is independent of free energy. Our model captures the SNP location dependence described in source3. This is demonstrated in Figure 4.
Simulation
We aimed to design a simulation around our machine learning program that was able to take advantage of the NEC. Our algorithm is a hill climbing algorithm which simultaneously optimizes a probe and sink combination to a given SNP.
The probe and sink are fundamentally different probe sequences with different goals. However, their combined goal can be identified as producing the greatest difference in fluorescence when being exposed to the mutant vs. the wild type template. To that end, an optimal solution should account for both the probe and sink sequences when determining an optimal combination. We designed a loss function to capture this goal.
We aim to minimize the loss function. However, this presents a challenge: since the NEC is a negative number, why doesn't the loss function blow up (binding energy for probe or sink driven to zero) where three of the parameters are zero and only one sequence binds to the reference frame?
The loss function exploits the codependency of Pw to Pm and Sw to Sm. Since both parameters refer to the same sequence and refer their NEC to very similar templates, the values are similar. Therefore the contrasting nature of the loss function prohibits any single probe from dominating.
Now, we can start to implement a proper simulation. We start by taking in two reference frames (wild and mutant) and two probes (probe and sink). We mutate the probe and sink randomly a number of times specified by the user. Then, by iteratively mutating and observing the effect of the NEC, we attempt to converge the solutions to the lowest minima. Many samples in the simulation may occupy the same solution. We therefore have a natural false positive safety mechanism built into the simulation. Predicted solutions with a higher degree of convergence are more likely to be valid solutions.
In order to end the solution, convergence must be defined. Simply, we say convergence is achieved if all samples, when randomly mutated, do not yield a more positive result compared to their prior sequence. In other words, each of the samples has gotten stuck at the bottom of a mutation free energy well.
Of course, achieving convergence for large simulations is extremely time consuming. Therefore, we integrate a more lenient and practical implementation into the simulation. Fundamentally, we implement the threshold parameter. The threshold parameter is a number from 0-1, which indicates that once that proportion of samples are not positively mutated in a single step, the simulation can stop. We also offer an alternative parameter called patience to avoid false positive convergence results. Patience indicates the number of consecutive steps the threshold must be reached in order to end the simulation.
In order to encourage more convergence and to increase the solution space explored, we also took inspiration from genetic algorithms. In genetic algorithms, there is a small chance unfavorable solutions are chosen for the next generation9. Likewise, in our simulation, there is a small chance an unfavorable mutation is chosen. We believe that this may help our simulation explore a discontinuous solution space. This could help our simulation reach non-obvious, but nonetheless optimal solutions.
Results
We have open sourced our simulation as Command Line Interface (CLI). We have provided documentation and a GitHub so that the community may take advantage of our work. Our CLI is capable of taking user probes and genes, running the simulation, ordering results, visualizing convergence, and doing reverse machine learning analysis to verify results. Preliminary results for F2RL3 can be found in our F2RL3 Probe Design Results table. The link to our CLI can be found at the Michigan iGEM 2023 GitHub repository.
F2RL3 Probe Design Results
Frequency | Loss Value | Probe | Sink | Probe Mutant Reference | Probe Wild Reference | Sink Mutant Reference | Sink Wild Reference |
---|---|---|---|---|---|---|---|
2 | -5.3022242 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCGGCAGGAGGTCAGCACCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -5.3022242 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCCCGAGGTTCGTC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.896437163 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCATGAGGGCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.896437163 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTTAGCAGCCGCGAGGTTCAGC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.795679652 | CCAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTC | CCAGGGCCAGCAGGAGGTCAGCAGCCGCAAGGTAC | GAACCTCGCGACTGCTGACCTCCTGCTGGCCCTGG | GAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTGG | GAACCTCGCGACTGCTGACCTCCTGCTGGCCCTGG | GAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTGG |
2 | -4.70928649 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAAGTTCGTC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.685741035 | GGGCCAGCAGGAGGTCAGCAGTGGCGAGGTCCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.685528914 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCGGCCGAGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.66198346 | CCAGCAGGAGGTCAGCAGTCGCGCGGTCCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.36202477 | GCCAGCAGGAGGTCAGCAGTCGCGAGTTTCTTCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -4.36202477 | CAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCA | CAGCAGGATGTCAGCAGCCGCGAGGTACATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
2 | -4.251653832 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCGGCAGCCGCGATGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.251653832 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAAGTCAGCAGCCGCGAGGGTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.228108378 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGCCAGCAGCCGCGAGGTTCATCATC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.217369952 | GCCAGCAGGAGGTCAGTAGTCGCGAGGTTCAACAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -4.217369952 | CAGGAGGAGGTCAGCAGTCGCGAGGTTCAACAGCA | CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
2 | -4.200470962 | CCAGCAGGAGGACAGCAGTCGCGAAGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.153385881 | CCAGCAGGAGCTCAGCAGTCTCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.136486891 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGAATGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.136486891 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAACATGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -4.1140778 | CCAGCAGGAGGTCAGCAGTCGCGAAGTTGATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.1140778 | GCAGGAGGTCAGCAGTCACGAGCTTCATCAGCAGC | GCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC |
2 | -4.1140778 | GGGCAGCAGGAGGTCAGCAGTCACGAGGTTCATCA | GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -4.1140778 | GGCCAGCAGGAGGTCAGGAGTCGCGAAGTTCATCA | GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -4.1140778 | CCAGCAGGAGGTCAGCAGTCACCAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.114030494 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTAAGTAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -4.111681338 | GCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGC | GCAGGAGGTCAGCAGCCGCGACGTTCAACAGCAGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC |
2 | -4.111681338 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCACGAGGTCAGCAGCCGCGAGGTTCATCTGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.092501265 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTAAACAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.086850527 | AGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCAT | AGGGACAGCAGGAGGTCATCAGCCGCGAGGTTCAT | ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT |
2 | -4.086850527 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTCATCAGCCGCGAGGTTAATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -4.064441436 | GCCAGCAGGACGTCAGCAGTCGAGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -4.064441436 | CCAGCAGGAGGTGAGCAGTCGAGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.064441436 | GGGCCAGCAGGAGGTCAGCAGTTGCGAGGTTAATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.064441436 | CAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCA | AACCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
2 | -4.04883363 | GGGTCAGCAGGAGGTCAGCAGTCGCGAGGTTCGTC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.03727578 | CCGCAGGAAGTCAGCAGTCGCGAGGTTCATCAGCA | CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
2 | -4.025288176 | GGGCCAGGAGGAGGTCAGCAGTCGCCAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -4.025288176 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGCTGAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.025076055 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCGGGAGGTAAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.025076055 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGGGGTAAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -4.005496383 | GCAGGAGGTCAGCTGTCACGAGGTTCATCAGCAGC | GCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC |
2 | -3.728020518 | GGGCCAGCAGGGCGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -3.656329843 | GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG | ACCAGCAGGAGGTCAGCACCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -3.488799552 | CCAGCATGAGGTCAGCAGTCGCGAGGTTCATCATC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -3.488799552 | GCCAGCAGGAGGTCATCAGTCTCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -3.471900562 | GCCAGCAGGAGGTCAGCAGTCGCGATGTTCATCAA | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -3.449638863 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCATCCGCGAGGCTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -3.422264198 | CCATAAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -3.422264198 | CCAGCAGGAGGTCAGAATTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -3.399855107 | CCAGAAGGAGGTAAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -3.282696929 | CAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCA | CAGCAGGAGGTCAGCATCGGCGAGGTTCATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
3 | -0.303730303 | CAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCA | CAGGGCCAGCAGGAGGTCAGCAGACGCGAGGTTCA | TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG |
3 | -0.303730303 | GCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGC | GCAGGAGGTCAGCAGACGCGAGGTTCATCAGCAGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC |
4 | -0.303730303 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGACGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.303730303 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGACGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
5 | -0.303730303 | GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGACGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
9 | -0.303730303 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTCAGCAGACGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
305 | -0.094 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
219 | -0.094 | GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
185 | -0.094 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
127 | -0.094 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
114 | -0.094 | GCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGC | GCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC |
88 | -0.094 | AGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCAT | AGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCAT | ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT |
61 | -0.094 | CAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCA | CAGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCA | TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG |
3 | -0.094 | GGAGGTCAGCAGTCGCGAGGTTCATCAGCAGCATG | GGAGGTCAGCAGCCGCGAGGTTCATCAGCAGCATG | CATGCTGCTGATGAACCTCGCGACTGCTGACCTCC | CATGCTGCTGATGAACCTCGCGGCTGCTGACCTCC | CATGCTGCTGATGAACCTCGCGACTGCTGACCTCC | CATGCTGCTGATGAACCTCGCGGCTGCTGACCTCC |
3 | -0.094 | CAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGCA | CAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAGCA | TGCTGCTGATGAACCTCGCGACTGCTGACCTCCTG | TGCTGCTGATGAACCTCGCGGCTGCTGACCTCCTG | TGCTGCTGATGAACCTCGCGACTGCTGACCTCCTG | TGCTGCTGATGAACCTCGCGGCTGCTGACCTCCTG |
6 | -0.094 | CAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCA | CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
2 | -0.094 | CCAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTC | CCAGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTC | GAACCTCGCGACTGCTGACCTCCTGCTGGCCCTGG | GAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTGG | GAACCTCGCGACTGCTGACCTCCTGCTGGCCCTGG | GAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTGG |
2 | -0.047 | GCCAGCAGGAGGTCAGTAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -0.047 | GCCAGCAGTAGGTCAGCAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -0.047 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGAAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
3 | -0.047 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTCAGCCGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGACGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -0.047 | GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -0.047 | GCCAGCAGGAGGTCAGCAGTCGCGAGTTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
4 | -0.047 | GCCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
3 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
3 | -0.047 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGCTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGGAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGTGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.047 | CCAGCAGGAGGGCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGAAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCACGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
3 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGTTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
3 | -0.047 | GCCAGGAGGAGGTCAGCAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -0.047 | GCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGC | GCAGGAGGTCAGCAGCCGCGAGGTTCATCACCAGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC | GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC |
3 | -0.047 | AGGGCCAGCAGGAGGTCAGCAGGCGCGAGGTTCAT | AGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCAT | ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT |
2 | -0.047 | GGGCCAGCAGGAGGTCAGCAGTCGGGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -0.047 | GGCCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -0.047 | GTCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -0.047 | GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTGCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
2 | -0.047 | GGACCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCCGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.047 | AGGGCGAGCAGGAGGTCAGCAGTCGCGAGGTTCAT | AGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCAT | ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT | ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT |
2 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTGATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.047 | CAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCA | CAGGGCCAGCAGGAGGTCAGAAGCCGCGAGGTTCA | TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG | TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG |
2 | -0.047 | CCAGCAGGAGGTCAGCCGTCGCGAGGTTCATCAGC | CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.047 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCAGCA | GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC | CCAGCCGGAGGTCAGCAGCCGCGAGGTTCATCAGC | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG | GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG |
2 | -0.047 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCAGCAGGAGGTCAGCAGCCGCTAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
2 | -0.047 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTCAGCAGCCGCGATGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | CAGCAGGAGGTCAGCAGTCCCGAGGTTCATCAGCA | CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
2 | -0.047 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGATGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCCTCA | GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTCAGCAACCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | GGCCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCA | GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | CAGCAGGAGGTCAGCAGTCGCGAGGTTCATTAGCA | CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
2 | -0.047 | AGCAGGAGGTCAGCTGTCGCGAGGTTCATCAGCAG | AGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAG | CTGCTGATGAACCTCGCGACTGCTGACCTCCTGCT | CTGCTGATGAACCTCGCGGCTGCTGACCTCCTGCT | CTGCTGATGAACCTCGCGACTGCTGACCTCCTGCT | CTGCTGATGAACCTCGCGGCTGCTGACCTCCTGCT |
2 | -0.047 | GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GACCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | GGGCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA | GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC | TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC |
2 | -0.047 | GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC | GGGCCACCAGGAGGTCAGCAGCCGCGAGGTTCATC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC | GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC |
3 | 0.164230303 | CAGCAGGAGGTCAGCAGACGCGAGGTTCATCAGCA | CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG | TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG |
2 | 0.209730303 | GCCAGCAGGAGGTCAGCAGACGCGAGGTTCATCAG | GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC | CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC |
Demo
Finally, we have also produced a demonstration video to guide users through the CLI tool. This video guide is available on our 'Michigan iGEM' Youtube page. The guide assumes minimal knowledge of command line interfaces and walks users through every step of the simulation. Users will be able to download the CLI, run simulations, and perform proper analysis of the generated solutions through our tutorial.
References
- Hyman, L. B., Christopher, C. R., & Romero, P. A. (2022). Competitive SNP-lamp probes for rapid and robust single-nucleotide polymorphism detection. Cell Reports Methods, 2(7), 100242. https://doi.org/10.1016/j.crmeth.2022.100242
- Tulpan, D., Andronescu, M., & Leger, S. (2010). Free energy estimation of short DNA duplex hybridizations. BMC Bioinformatics, 11(1). https://doi.org/10.1186/1471-2105-11-105
- de Bruin, D., Bossert, N., Aartsma-Rus, A., & Bouwmeester, D. (2018). Measuring DNA hybridization using fluorescent DNA-stabilized silver clusters to investigate mismatch effects on therapeutic oligonucleotides. Journal of Nanobiotechnology, 16(1). https://doi.org/10.1186/s12951-018-0361-2
- Sun, Y., Aljawad, O., Lei, J., & Liu, A. (2012). Genome-scale NCRNA homology search using a hamming distance-based filtration strategy. BMC Bioinformatics, 13(S3).https://doi.org/10.1186/1471-2105-13-s3-s12
- Maucec, M., Kacic, Z., & Horvat, B. (2004). Modelling highly inflected languages. Information Sciences, 166(1–4), 249–269. https://doi.org/10.1016/j.ins.2003.12.004
- Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NIPS. https://papers.nips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
- Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67. https://doi.org/10.1038/s42256-019-0138-9
- (May 2023). Random Forest Pruning Techniques: A Recent Review. Operations Research Forum. https://advance-lexis-com.proxy.lib.umich.edu/api/document?collection=news&id=urn:contentItem:689F-6MT1-F0C0-31MJ-00000-00&context=1516831
- Carr, J. (2014). An Introduction to Genetic Algorithms. ts, Walla, Washington.
- Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2
- Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/mcse.2007.55
- Waskom, M. (2021). Seaborn: Statistical data visualization. Journal of Open Source Software, 6(60). https://doi.org/10.21105/joss.03021
- Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.
- Sklearn. (n.d.). Sklearn.preprocessing.QuantileTransformer. scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html
- Oda, M., & Nakamura, H. (2000). Thermodynamic and kinetic analyses for understanding sequence-specific DNA recognition. Genes to Cells, 5(5), 319–326. https://doi.org/10.1046/j.1365-2443.2000.00335.x