Motivation

Design of complementary DNA probes is not an easy task, and presents a huge problem space. For a probe designed to hybridize to a sequence of 40 nucleotides, there are 440 unique combinations. That's 1.2×1024 combinations, which is more than the number of seconds the universe is old! This is even before considering that we need to design two DNA sequences, the probe and sink, with contrasting hybridization abilities. Brute force laboratory testing to achieve optimal probe and sink combinations is simply not possible. Existing computational tools for probe design have several limitations. We aimed to use software/modeling techniques to design tools to increase probe design efficacy and verify the probe sequences we had previously designed.

Previous Methods

Previous work has developed a computational approach to choosing probe and sink combinations1. The authors utilize thermodynamic models to estimate the probe's fluorescence when exposed to a template strand of DNA. Using these parameters, they design a simulation which starts with a genetic algorithm to explore a broad solution space. Once a range of ideal local coordinates are chosen, a hill climbing algorithm performs a final optimization on the probe and sink combination. While this approach is helpful in exploring solutions, there are a few weaknesses we wanted to address:

  1. The genetic algorithm prohibits the hill climbing algorithm from optimizing probes at a range of genomic coordinates.
  2. The algorithm doesn't allow for mismatches in the probe-template interaction which drastically reduces the solution space.
  3. The simulation doesn't include fail-safe statistics to inform users of potentially false positive probe designs.

Gaps in Knowledge

It has been shown that the location of mismatches in DNA sequence is extremely influential on its binding energy. Recent research suggests that DNA is least affected by mismatches at the ends of DNA strands and at the center of DNA3. However, there are few DNA modeling techniques that account for this information. We developed our modeling strategy with awareness of the current modeling shortcomings. We wanted our model to be scalable, accurate, and reflective of physical trends.

Machine Learning

Machine learning is a modeling technique that uses algorithms to learn trends not explicitly told to the program. We feel that these techniques are well suited for predicting probe binding energy since the problem represents a complex physical process that may benefit from a learning algorithm.

Machine learning is predicated on data. Without good data, quality machine learning cannot happen. In order to train a machine learning algorithm to predict free binding energies of DNA, we needed to have a dataset of DNA binding energies and corresponding information about the nature of that interaction such as nucleotide matching and sequence similarity. In the following paragraphs, we present data exploration on the learning dataset in order to demonstrate where the model may perform well and where the training data is lacking.

For our training data, we used a dataset with 695 samples2. 340 of those samples are complementary, while 355 samples are non-complementary. The variance in sequence complementation allows us to train a machine learning model capable of predicting imperfect complements. In this dataset, the environmental sample temperatures where hybridization occurred assume a bimodal distribution with numerous experiments conducted at 25 or 37 degrees (Figure 1). Sequence lengths largely exist between 5-20 base pairs (Figure 2). Predictions within these parameter spaces will be more reliable due to higher sampling and more extensive training.

Figure 1. Demonstrates the relative density of sample temperatures in the dataset. Temperatures assume a bimodal distribution, with the highest densities around 25C and 37C.
Figure 2. Demonstrates the relative density of sequence length in the dataset. Most sequences are between 5-20 base pairs; however, the range in sequence length is much greater than that.

We hypothesized that capturing sequence parameters about both probe and template DNA could be insightful to machine learning models. Thermodynamic information could be encoded into a lower-dimensional space through sequence information, and machine learning algorithms could thereby learn to be very accurate through a relatively modest dataset. We wanted to include information relating to the content of purines and pyrimidines, as well as the continuity of the sequence.

To this end, our parameters included relative GC content, AT content, GA mismatches, GT mismatches, all other mismatch combinations, relative Hamming distance, and Ratcliff-Obershelp distance. All parameters relating to relative content aimed to encode hydrogen bonding information. Hamming distance is an absolute measurement of how similar two strings are4. We used this to compare the reverse complement to the template to understand how different the two sequences were. We also wanted spatially encoded information to understand where mismatches occurred. Ratcliff-Obershelp distance investigates the relative length of the longest unbroken substring between two sequences5. We believed Ratcliff-Obershelp distance could help the model understand where breaks in sequence matching likely occurred.

Hamming Distance

Where D is the Hamming Distance between strings m and n. w(x,y) is a piecewise function which returns 1 if and only if the letters are the same, otherwise 0. L represents the length of the sequence.

Ratcliff-Obershelp Distance

Where D represents the Ratcliff-Obershelp distance. Lsubstring is the length of the longest possible substring between sequences m and n. Lm and Ln refer to the lengths of the sequences m and n

Data is often messy. Considering the wide range of possible errors that exist in the training data we acquired, we needed to find ways to create a robust processing scheme. Quantile transformation is a technique which maps potentially skewed distributions onto a normal distribution. This is a robust preprocessing technique that helps models to learn with outlier data14. However, it is also subject to frequent misuse, specifically when applied to the entire data set before making the train-test split. We made sure to meticulously exclude any data leakage from our training pipelines to ensure uncontaminated data. This ensures our model is independently validated and that we can be more confident of the results we achieved.

In order to learn the processed data, we utilized the random forest machine learning model. A random forest is a robust, ensemble machine learning model which aggregates and averages the predictions of many decision trees8. Decision trees are a simple learning model which make predictions based on random splits in the dataset. Since our training parameters were all scaled, we felt the random forest was a good architecture to account for class imbalance and the variance in outcome target values.

In order to create a generalized model able to predict oligo sequences longer than ~20 base pairs, we needed to scale our target data. Instead of directly predicting the free energy, we predict a free energy value proportional to the sequence length. We call this target value the Nucleotide Efficiency Coefficient (NEC). It is worth noting that we trained a separate machine learning model using scaled learning parameters, but we also included the length of the sequence. SHAP analysis, a game theoretic approach of evaluating parameter importance6, 7, of this model revealed that sequence length was the most important parameter (Figure 3). Therefore we believe that NEC introduces some error, but is still the most fair way of evaluating probe sequences.

Figure 3. SHAP analysis of a scaled XGBoost model including sequence length shows that the highest determining factor of free energy is sequence length.

There are other programs capable of predicting the free binding energy of DNA duplexes, such as MutliRNAFold, UNAFold, and others. Those programs predict free binding energy with a Pearson correlation coefficient (r) ranging from 0.67 to 0.69 among the exact complement samples of our dataset2. Tulpan et. al. was able to achieve an r value of 0.922. With our generalized scaled model, using mismatches and no sequence length information, we achieved an r value of .84. While this r value is not as high as Tulpan et. al. we proceeded with our model because it is more generalizable to sequence and is less stringent in DNA complement requirements.

One of the reasons we believe our model performs well is because we include information about the difference in substrings. Many previous models assume that where mismatches exist in the sequence is independent of free energy. Our model captures the SNP location dependence described in source3. This is demonstrated in Figure 4.

Figure 4. We estimated the residual and standard deviation of the model at each position of a 20 base pair long probe by iteratively and randomly mutating 1,000 randomly generated probes. There is a higher mean residual and greater standard deviation in the regions between the ends and center of the probe.

Simulation

We aimed to design a simulation around our machine learning program that was able to take advantage of the NEC. Our algorithm is a hill climbing algorithm which simultaneously optimizes a probe and sink combination to a given SNP.

The probe and sink are fundamentally different probe sequences with different goals. However, their combined goal can be identified as producing the greatest difference in fluorescence when being exposed to the mutant vs. the wild type template. To that end, an optimal solution should account for both the probe and sink sequences when determining an optimal combination. We designed a loss function to capture this goal.

Loss Function

where Sw is defined as the NEC of the sink binding to the wild type and Sm is the NEC of the sink binding to the mutant type. Correspondingly Pw and Pm are the NEC of the probe to the wild type and mutant type respectively.

We aim to minimize the loss function. However, this presents a challenge: since the NEC is a negative number, why doesn't the loss function blow up (binding energy for probe or sink driven to zero) where three of the parameters are zero and only one sequence binds to the reference frame?

The loss function exploits the codependency of Pw to Pm and Sw to Sm. Since both parameters refer to the same sequence and refer their NEC to very similar templates, the values are similar. Therefore the contrasting nature of the loss function prohibits any single probe from dominating.

Now, we can start to implement a proper simulation. We start by taking in two reference frames (wild and mutant) and two probes (probe and sink). We mutate the probe and sink randomly a number of times specified by the user. Then, by iteratively mutating and observing the effect of the NEC, we attempt to converge the solutions to the lowest minima. Many samples in the simulation may occupy the same solution. We therefore have a natural false positive safety mechanism built into the simulation. Predicted solutions with a higher degree of convergence are more likely to be valid solutions.

In order to end the solution, convergence must be defined. Simply, we say convergence is achieved if all samples, when randomly mutated, do not yield a more positive result compared to their prior sequence. In other words, each of the samples has gotten stuck at the bottom of a mutation free energy well.

Of course, achieving convergence for large simulations is extremely time consuming. Therefore, we integrate a more lenient and practical implementation into the simulation. Fundamentally, we implement the threshold parameter. The threshold parameter is a number from 0-1, which indicates that once that proportion of samples are not positively mutated in a single step, the simulation can stop. We also offer an alternative parameter called patience to avoid false positive convergence results. Patience indicates the number of consecutive steps the threshold must be reached in order to end the simulation.

In order to encourage more convergence and to increase the solution space explored, we also took inspiration from genetic algorithms. In genetic algorithms, there is a small chance unfavorable solutions are chosen for the next generation9. Likewise, in our simulation, there is a small chance an unfavorable mutation is chosen. We believe that this may help our simulation explore a discontinuous solution space. This could help our simulation reach non-obvious, but nonetheless optimal solutions.

Results

We have open sourced our simulation as Command Line Interface (CLI). We have provided documentation and a GitHub so that the community may take advantage of our work. Our CLI is capable of taking user probes and genes, running the simulation, ordering results, visualizing convergence, and doing reverse machine learning analysis to verify results. Preliminary results for F2RL3 can be found in our F2RL3 Probe Design Results table. The link to our CLI can be found at the Michigan iGEM 2023 GitHub repository.

F2RL3 Probe Design Results

Frequency Loss Value Probe Sink Probe Mutant Reference Probe Wild Reference Sink Mutant Reference Sink Wild Reference
2 -5.3022242 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCGGCAGGAGGTCAGCACCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -5.3022242 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCCCGAGGTTCGTC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.896437163 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCATGAGGGCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.896437163 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTTAGCAGCCGCGAGGTTCAGC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.795679652 CCAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTC CCAGGGCCAGCAGGAGGTCAGCAGCCGCAAGGTAC GAACCTCGCGACTGCTGACCTCCTGCTGGCCCTGG GAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTGG GAACCTCGCGACTGCTGACCTCCTGCTGGCCCTGG GAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTGG
2 -4.70928649 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAAGTTCGTC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.685741035 GGGCCAGCAGGAGGTCAGCAGTGGCGAGGTCCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.685528914 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCGGCCGAGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.66198346 CCAGCAGGAGGTCAGCAGTCGCGCGGTCCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.36202477 GCCAGCAGGAGGTCAGCAGTCGCGAGTTTCTTCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -4.36202477 CAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCA CAGCAGGATGTCAGCAGCCGCGAGGTACATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
2 -4.251653832 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCGGCAGCCGCGATGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.251653832 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAAGTCAGCAGCCGCGAGGGTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.228108378 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGCCAGCAGCCGCGAGGTTCATCATC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.217369952 GCCAGCAGGAGGTCAGTAGTCGCGAGGTTCAACAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -4.217369952 CAGGAGGAGGTCAGCAGTCGCGAGGTTCAACAGCA CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
2 -4.200470962 CCAGCAGGAGGACAGCAGTCGCGAAGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.153385881 CCAGCAGGAGCTCAGCAGTCTCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.136486891 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGAATGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.136486891 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAACATGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -4.1140778 CCAGCAGGAGGTCAGCAGTCGCGAAGTTGATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.1140778 GCAGGAGGTCAGCAGTCACGAGCTTCATCAGCAGC GCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC
2 -4.1140778 GGGCAGCAGGAGGTCAGCAGTCACGAGGTTCATCA GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -4.1140778 GGCCAGCAGGAGGTCAGGAGTCGCGAAGTTCATCA GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -4.1140778 CCAGCAGGAGGTCAGCAGTCACCAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.114030494 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTAAGTAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -4.111681338 GCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGC GCAGGAGGTCAGCAGCCGCGACGTTCAACAGCAGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC
2 -4.111681338 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCACGAGGTCAGCAGCCGCGAGGTTCATCTGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.092501265 CCAGCAGGAGGTCAGCAGTCGCGAGGTTAAACAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.086850527 AGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCAT AGGGACAGCAGGAGGTCATCAGCCGCGAGGTTCAT ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT
2 -4.086850527 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTCATCAGCCGCGAGGTTAATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -4.064441436 GCCAGCAGGACGTCAGCAGTCGAGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -4.064441436 CCAGCAGGAGGTGAGCAGTCGAGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.064441436 GGGCCAGCAGGAGGTCAGCAGTTGCGAGGTTAATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.064441436 CAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCA AACCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
2 -4.04883363 GGGTCAGCAGGAGGTCAGCAGTCGCGAGGTTCGTC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.03727578 CCGCAGGAAGTCAGCAGTCGCGAGGTTCATCAGCA CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
2 -4.025288176 GGGCCAGGAGGAGGTCAGCAGTCGCCAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -4.025288176 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGCTGAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.025076055 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCGGGAGGTAAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.025076055 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGGGGTAAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -4.005496383 GCAGGAGGTCAGCTGTCACGAGGTTCATCAGCAGC GCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC
2 -3.728020518 GGGCCAGCAGGGCGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -3.656329843 GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG ACCAGCAGGAGGTCAGCACCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -3.488799552 CCAGCATGAGGTCAGCAGTCGCGAGGTTCATCATC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -3.488799552 GCCAGCAGGAGGTCATCAGTCTCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -3.471900562 GCCAGCAGGAGGTCAGCAGTCGCGATGTTCATCAA GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -3.449638863 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCATCCGCGAGGCTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -3.422264198 CCATAAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -3.422264198 CCAGCAGGAGGTCAGAATTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -3.399855107 CCAGAAGGAGGTAAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -3.282696929 CAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCA CAGCAGGAGGTCAGCATCGGCGAGGTTCATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
3 -0.303730303 CAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCA CAGGGCCAGCAGGAGGTCAGCAGACGCGAGGTTCA TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG
3 -0.303730303 GCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGC GCAGGAGGTCAGCAGACGCGAGGTTCATCAGCAGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC
4 -0.303730303 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGACGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.303730303 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGACGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
5 -0.303730303 GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGACGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
9 -0.303730303 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTCAGCAGACGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
305 -0.094 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
219 -0.094 GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
185 -0.094 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
127 -0.094 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
114 -0.094 GCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGC GCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC
88 -0.094 AGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCAT AGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCAT ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT
61 -0.094 CAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCA CAGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCA TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG
3 -0.094 GGAGGTCAGCAGTCGCGAGGTTCATCAGCAGCATG GGAGGTCAGCAGCCGCGAGGTTCATCAGCAGCATG CATGCTGCTGATGAACCTCGCGACTGCTGACCTCC CATGCTGCTGATGAACCTCGCGGCTGCTGACCTCC CATGCTGCTGATGAACCTCGCGACTGCTGACCTCC CATGCTGCTGATGAACCTCGCGGCTGCTGACCTCC
3 -0.094 CAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGCA CAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAGCA TGCTGCTGATGAACCTCGCGACTGCTGACCTCCTG TGCTGCTGATGAACCTCGCGGCTGCTGACCTCCTG TGCTGCTGATGAACCTCGCGACTGCTGACCTCCTG TGCTGCTGATGAACCTCGCGGCTGCTGACCTCCTG
6 -0.094 CAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCA CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
2 -0.094 CCAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTC CCAGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTC GAACCTCGCGACTGCTGACCTCCTGCTGGCCCTGG GAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTGG GAACCTCGCGACTGCTGACCTCCTGCTGGCCCTGG GAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTGG
2 -0.047 GCCAGCAGGAGGTCAGTAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -0.047 GCCAGCAGTAGGTCAGCAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -0.047 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGAAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
3 -0.047 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTCAGCCGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGACGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -0.047 GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -0.047 GCCAGCAGGAGGTCAGCAGTCGCGAGTTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
4 -0.047 GCCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
3 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
3 -0.047 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGCTCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGGAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGTGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.047 CCAGCAGGAGGGCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGAAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCACGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
3 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGTTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
3 -0.047 GCCAGGAGGAGGTCAGCAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -0.047 GCAGGAGGTCAGCAGTCGCGAGGTTCATCAGCAGC GCAGGAGGTCAGCAGCCGCGAGGTTCATCACCAGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGACTGCTGACCTCCTGC GCTGCTGATGAACCTCGCGGCTGCTGACCTCCTGC
3 -0.047 AGGGCCAGCAGGAGGTCAGCAGGCGCGAGGTTCAT AGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCAT ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT
2 -0.047 GGGCCAGCAGGAGGTCAGCAGTCGGGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -0.047 GGCCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -0.047 GTCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -0.047 GCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTGCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC
2 -0.047 GGACCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCCGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.047 AGGGCGAGCAGGAGGTCAGCAGTCGCGAGGTTCAT AGGGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCAT ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGACTGCTGACCTCCTGCTGGCCCT ATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCT
2 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTGATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.047 CAGGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCA CAGGGCCAGCAGGAGGTCAGAAGCCGCGAGGTTCA TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGACTGCTGACCTCCTGCTGGCCCTG TGAACCTCGCGGCTGCTGACCTCCTGCTGGCCCTG
2 -0.047 CCAGCAGGAGGTCAGCCGTCGCGAGGTTCATCAGC CCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.047 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCAGCA GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 CCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCAGC CCAGCCGGAGGTCAGCAGCCGCGAGGTTCATCAGC GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGACTGCTGACCTCCTGCTGG GCTGATGAACCTCGCGGCTGCTGACCTCCTGCTGG
2 -0.047 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCAGCAGGAGGTCAGCAGCCGCTAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
2 -0.047 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTCAGCAGCCGCGATGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 CAGCAGGAGGTCAGCAGTCCCGAGGTTCATCAGCA CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
2 -0.047 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGATGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCCTCA GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTCAGCAACCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 GGCCAGCAGGAGGTCAGCAGGCGCGAGGTTCATCA GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 CAGCAGGAGGTCAGCAGTCGCGAGGTTCATTAGCA CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
2 -0.047 AGCAGGAGGTCAGCTGTCGCGAGGTTCATCAGCAG AGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCAG CTGCTGATGAACCTCGCGACTGCTGACCTCCTGCT CTGCTGATGAACCTCGCGGCTGCTGACCTCCTGCT CTGCTGATGAACCTCGCGACTGCTGACCTCCTGCT CTGCTGATGAACCTCGCGGCTGCTGACCTCCTGCT
2 -0.047 GGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GACCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 GGGCAGCAGGAGGTCAGCAGTCGCGAGGTTCATCA GGCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCA TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGACTGCTGACCTCCTGCTGGCC TGATGAACCTCGCGGCTGCTGACCTCCTGCTGGCC
2 -0.047 GGGCCAGCAGGAGGTCAGCAGTCGCGAGGTTCATC GGGCCACCAGGAGGTCAGCAGCCGCGAGGTTCATC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGACTGCTGACCTCCTGCTGGCCC GATGAACCTCGCGGCTGCTGACCTCCTGCTGGCCC
3 0.164230303 CAGCAGGAGGTCAGCAGACGCGAGGTTCATCAGCA CAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAGCA TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGACTGCTGACCTCCTGCTG TGCTGATGAACCTCGCGGCTGCTGACCTCCTGCTG
2 0.209730303 GCCAGCAGGAGGTCAGCAGACGCGAGGTTCATCAG GCCAGCAGGAGGTCAGCAGCCGCGAGGTTCATCAG CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGACTGCTGACCTCCTGCTGGC CTGATGAACCTCGCGGCTGCTGACCTCCTGCTGGC

Demo

Finally, we have also produced a demonstration video to guide users through the CLI tool. This video guide is available on our 'Michigan iGEM' Youtube page. The guide assumes minimal knowledge of command line interfaces and walks users through every step of the simulation. Users will be able to download the CLI, run simulations, and perform proper analysis of the generated solutions through our tutorial.

References

  1. Hyman, L. B., Christopher, C. R., & Romero, P. A. (2022). Competitive SNP-lamp probes for rapid and robust single-nucleotide polymorphism detection. Cell Reports Methods, 2(7), 100242. https://doi.org/10.1016/j.crmeth.2022.100242
  2. Tulpan, D., Andronescu, M., & Leger, S. (2010). Free energy estimation of short DNA duplex hybridizations. BMC Bioinformatics, 11(1). https://doi.org/10.1186/1471-2105-11-105
  3. de Bruin, D., Bossert, N., Aartsma-Rus, A., & Bouwmeester, D. (2018). Measuring DNA hybridization using fluorescent DNA-stabilized silver clusters to investigate mismatch effects on therapeutic oligonucleotides. Journal of Nanobiotechnology, 16(1). https://doi.org/10.1186/s12951-018-0361-2
  4. Sun, Y., Aljawad, O., Lei, J., & Liu, A. (2012). Genome-scale NCRNA homology search using a hamming distance-based filtration strategy. BMC Bioinformatics, 13(S3).https://doi.org/10.1186/1471-2105-13-s3-s12
  5. Maucec, M., Kacic, Z., & Horvat, B. (2004). Modelling highly inflected languages. Information Sciences, 166(1–4), 249–269. https://doi.org/10.1016/j.ins.2003.12.004
  6. Lundberg, S., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NIPS. https://papers.nips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
  7. Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., & Lee, S.-I. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67. https://doi.org/10.1038/s42256-019-0138-9
  8. (May 2023). Random Forest Pruning Techniques: A Recent Review. Operations Research Forum. https://advance-lexis-com.proxy.lib.umich.edu/api/document?collection=news&id=urn:contentItem:689F-6MT1-F0C0-31MJ-00000-00&context=1516831
  9. Carr, J. (2014). An Introduction to Genetic Algorithms. ts, Walla, Washington.
  10. Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2
  11. Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/mcse.2007.55
  12. Waskom, M. (2021). Seaborn: Statistical data visualization. Journal of Open Source Software, 6(60). https://doi.org/10.21105/joss.03021
  13. Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., ... & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. arXiv preprint arXiv:1309.0238.
  14. Sklearn. (n.d.). Sklearn.preprocessing.QuantileTransformer. scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html
  15. Oda, M., & Nakamura, H. (2000). Thermodynamic and kinetic analyses for understanding sequence-specific DNA recognition. Genes to Cells, 5(5), 319–326. https://doi.org/10.1046/j.1365-2443.2000.00335.x