We calculated a distance matrix for the collected homologous sequences and extracted distance values exclusively for sequences where the gRNA fits, utilizing these as an evaluation metric. While the magnitude of the distance-score is vital, questions remain regarding its significance. Specifically, whether a large distance-score truly indicates a shared common ancestor. Inspired by methods used to estimate the confidence of phylogenetic tree branching, we divided the distance matrix values into two groups: sequences where gRNA fits and those where it doesn't. We then statistically tested the differences in distributions between these two groups. If a significant difference in distance distribution exists between the two groups, it can be postulated that the difference stems from one group evolving from a common ancestor, while the other did not.
The distance matrix was computed from an alignment file, which was gathered by conducting a BLAST search targeting the core glucosylase gene.
Click here to view the PDF
Contains Substring:
Mean: 0.001551310394346494
Standard Deviation: 0.0033599778116293237
Does Not Contain Substring:
Mean: 0.0060427413411937875
Standard Deviation: 0.008760222510708869
F-test for variance:
Statistic: 57.61857103415576
p-value: 8.626561392219624e-14
t-test for means:
Statistic: -3.386874224556088
p-value: 0.0014825895221861304
Contains Substring:
Mean: 0.00019916747993386802
Standard Deviation: 0.0005428100492883209
Does Not Contain Substring:
Mean: 0.019962965062449198
Standard Deviation: 0.01930261649221101
F-test for variance:
Statistic: 691.6574467531636
p-value: 3.1831150037514676e-108
t-test for means:
Statistic: -8.984205877104541
p-value: 1.284330533959997e-13
For many gRNA candidates, we have successfully rejected the null hypothesis that "the distribution of the population that fits and the population that doesn't fit is indistinguishable from a random selection." For instance, the average DNA distance-score for TCTTTAAGCGATAATTATAC. If there's a difference in the distribution of distances between populations that fit a certain gRNA and those that don't, it indicates that the gRNA fits populations that have evolved from a common ancestor.