Microcystis aeruginosa is a naturally competent [1], non-model cyanobacteria that is principally responsible for toxic harmful algal blooms (HABs) in freshwater. M. aeruginosa secretes microcystin, a potent hepatotoxin and allelopathic inhibitor that threatens the utility and biodiversity of freshwater ecosystems globally [2]. Initially, our team sought to leverage the natural competence of M. aeruginosa to selectively disrupt the production of microcystin in transformed cells. However, to disrupt microcystin production at the scale of a HAB, we recognized a greater need to advance genetic tractability in non-model organisms like M. aeruginosa. Restriction-modification (R-M) systems are a major barrier of the domestication of non-model species due to the targeting of short nucleotide sequences by restriction enzymes [3]. We believed that transformation efficiency in non-model organisms could be improved by avoiding species-specific restriction sites during plasmid synthesis. To this end, we developed the Chameleon project. Our work is built off of the Stealth program [4], which produces putative restriction sites by identifying short nucleotide sequences that are underrepresented in a specific organism's genome. Chameleon removes putative restriction sites from protein-coding regions of plasmids through synonymous codon-optimization. The following DBTL cycles were aimed at validating the use of Chameleon in future bioengineering projects by transforming M. aeruginosa with modified and unmodified plasmids.
Design: M. aeruginosa is naturally-competent [1]. Natural transformation involves localizing naked plasmid DNA and cells on a millipore filter over solid, nonselective media (see Journal: Protocols). After 1 day, the filter is transferred to solid, selective media to isolate natural transformants. This method does not form distinct colony forming units (CFU); instead, transformed cells form a lawn over the millipore filter. Our team was unable to quantify transformation efficiency by CFU/ug DNA plated with this method, so we designed a protocol based on serial plasmid dilutions. By plating constant cell densities with varying serial plasmid dilutions over each millipore filter, we believed we could quantify transformation efficiency on the basis of plasmid concentration needed to form a visible lawn. We reasoned that if transformation efficiency improved with a modified plasmid, then the modified plasmid should form a visible lawn at a lower concentration than the unmodified plasmid
Build: For this purpose, we ordered pSHDY (Addgene #137661): a broad-host range, high-copy plasmid that has been engineered for expression in cyanobacteria. We facilitated three cycles of natural transformation with M. aeruginosa aimed at identifying the concentration of pSHDY that would not form a visible lawn. We based our stock pSHDY concentration (50 ng/ul) on a protocol devised by a former iGEM team (SCCHK 2019), and made twenty 1:10 serial dilutions of the stock in our final cycle.
Test: We plated 100 uL of cells (OD730= 0.4) and 40 uL of plasmid on each membrane filter numbered 1-20 onto BG11 plates with 50 ug/ml Chloramphenicol (Fig. 1). Filter 1 was plated with 40 ul pSDHY stock, and filter 20 was plated with 40 ul of a 10-20 serial dilution of pSHDY stock. The plate labeled + control demonstrates the effect of plating 40 ul of stock with 100 ul or 10 ul of cells, while plate labeled - control demonstrates the effect of plating 100 ul or 10 ul of cells without plasmid.
Learn: We observed a lawn of cells at every dilution of pSHDY at the time of taking this image, which was 10 days after transferring the filters to selective media. We recognized that a 10-20 dilution of pSHDY stock should not have yielded such a robust natural transformation of M. aeruginosa, and that the no plasmid control exhibited more growth than we'd expect on selective media. We attributed these results to the bacteriostatic mechanism of action for chloramphenicol [5]. Chloramphenicol inhibits bacterial growth rather than killing bacteria directly. The duration of the M. aeruginosa cell cycle is about 3-4 days, so we reasoned that chloramphenicol selection would not be effective for quantifying transformation efficiency by plating serial plasmid dilutions. After 3 natural transformation cycles, our team recognized that a serial plasmid dilution with chloramphenicol selection would be unviable for assaying transformation efficiency with modified and unmodified plasmids in M. aeruginosa, especially with respect to our project's time constraints. Moving forward, we decided to transform M. aeruginosa with electroporation because the protocol had been validated by a previous iGEM team (SCCHK 2019). This approach allows for chloramphenicol selection in liquid cultures, so we decided to quantify transformation efficiency by comparing hemocytometer counts of untransformed cultures to those transformed with modified and unmodified plasmid. We hypothesized that cultures transformed with modified plasmid would have greater hemocytometer counts than those transformed with unmodified plasmid.
Design: In order to apply Stealth to M. aeruginosa and to generate a plasmid modified by the Chameleon project, it was necessary to sequence the genome of the UTEX 2385 M. aeruginosa strain. Several coding sequences of this strain had been published, but a complete genome was not available. To assemble a full genome, we designed protocols to extract and purify M. aeruginosa DNA to conduct Nanopore sequencing [6].
We decided to use a MinION device from Oxford Nanopore along with R9.4.1 flow cells and SQKLSK-112 DNA Library Prep Kit. Our initial materials and methodology were heavily influenced by available resources.
Build:
DNA Extraction:
To extract Microcystis DNA, it was necessary to lyse the cell without compromising the integrity of the DNA. The UTEX 2385 cell wall consists of peptidoglycan and lipopolysaccharide layers. We believed that enzymatic digestion of peptide bonds would be effective in gently lysing the cell without fragmenting the DNA. Following extraction, the DNA was purified through ethanol precipitation (see Journal: Protocols).
Sequencing:
The SQKLSK-112 DNA Kit was used to prepare a library which was loaded onto two R9.4.1 flow cells. Due to reduced pore occupancy in one of our flow cells, we ran two MinION devices in parallel, aiming to obtain more comprehensive data for genome assembly. The minimum read length considered for both runs was set to 1000 bp. Reads were collected over 72 hours and we performed a wash and reload step after the first 24 hours to improve pore efficiency. Raw signals collected from the MinION were translated into nucleotide sequences using the base calling software Dorado 0.3.2 [7]
Assembly:
Following base-calling, we utilized Flye 2.9.2 [8] to assemble and align the reads, and Bandage 0.9.0 [9] to visualize the complete assembly.
Test: To validate that the resulting genomic assembly represented M. aeruginosa, we used RNAmmer [10] to extract genes that encode 16s ribosomal RNA. We identified 11 contigs that contained 16s ribosomal RNA. Contigs that contained ribosomal RNA genes were then compared to sequences in the rRNA/ITS databases using NCBI BLASTN.
Learn: Of the 11 contigs that contained ribosomal RNA genes, 3 had direct hits to M. aeruginosa. Interestingly, only 2 of these 3 hits made it into the Flye assembly; these are depicted in dark orange in Figure 1. The remaining contigs had hits to the genera Gemmatimonas, Blastomonas, Hydrogenophaga and Stenotrophobacter, which are colored in gray in Figure 1. While we could verify the identity of 2 contigs depicted in the M. aeruginosa assembly, the full assembly was composed of 7 contigs. It was clear that we needed a definitive method to distinguish contigs that truly belonged to M. aeruginosa to ensure confidence in the sequence we used for the Stealth analysis. Ultimately, this analysis confirmed that our UTEX 2385 culture was xenic, and that we successfully extracted M. aeruginosa DNA, however, our efforts only resulted in a partial assembly of the UTEX 2385 genome. By combining data from both runs, we assembled approximately 4 megabases from the expected 6 megabases of the M. aeruginosa genome using Flye [8] .
Design: To verify that the Nanopore-generated contig assembly belonged to M. aeruginosa, we utilized Cluster-K: a novel software tool written by our principal investigator that groups genetic sequences on a three-dimensional graph by considering their unique tetramer signature (Fig. 1). Cluster-K samples all of the tetramers that exist in a genetic sequence and constructs a frequency matrix composed of 256 elements (44 elements). The frequency matrix generated for any given sequence is unique, and in some sense can be conceived as a genetic signature. In addition to the frequency matrix, Cluster-K considers GC content and codon usage of each tetramer to determine a sequence's signature, so signatures become more distinct as the length of the sequence increases. From the 256 elements that describe a collection of sequences, Cluster-K will graphically depict three elements that have the greatest variance within a collection. The software employs Principal Component Analysis, a dimension reduction method based on the variance in the dimensional tree, and outputs a three-dimensional diagram with clustered contigs. These results can indicate whether the given group of contigs belongs to the same species.
Build: We employed Cluster-K to analyze the assembled contigs, resulting in the generation of a 3D graph that revealed eleven distinct clusters of correlated contigs (Figure 1).
Test: We compared Cluster-K output to that of the Bandage-visualized Flye assembly () to validate contigs that belong to the M. aeruginosa genome.
Learn: We determined that the Cluster-K data reinforces the Bandage-visualized Flye assembly (Figure 1, 2) with contigs that are potentially a part of M.aeruginosa's gDNA and do not have a 16S rRNA, like contigs 10, 12, 18, 198, 199, 7 (Figure 3).
Design: Starting from the pSHDY plasmid (Addgene #137661), we added an insert in silico containing the CaMV35S core promoter, the eGFP gene, the T7 terminator, and a RP4 mobilization gene; the insert is depicted in dark red in Figure 1. We named the new plasmid pSPDY and designed fragments to assemble two pSPDY constructs using Golden Gate: one that was modified by the Chameleon project to remove putative restriction sites (Fig. 2), and one that was unmodified (Fig.1). The modified and unmodified pSPDY plasmids were identical in all non-protein-coding regions. Protein-coding regions on the modified plasmid were codon-optimized by Chameleon for codon-usage in our M. aeruginosa UTEX 2385 partial genome assembly. Additionally, Chameleon removed putative, Stealth-identified restriction sites from protein-coding regions through synonymous codons optimization. This process necessitated that modified pSPDY be assembled from IDT genes blocks with Golden Gate, so we designed flagged primers to introduce ligation sites compatible with the PaqC1 Type IIS restriction enzyme. We also designed sequencing primers that would amplify across each ligation site to verify Golden Gate assembly.
Build: gBlock Gene Fragments were ordered from IDT for each of the Golden Gate fragments to be assembled. Since these gene fragments did not include PaqC1 binding sites and fusion sites, they were first amplified with the flagged primers. The resulting amplicons were the Golden Gate fragments to be assembled in a Golden Gate reaction. Plasmids were assembled from these fragments by Golden Gate assembly, then transformed into TOP10 E. coli and plated on chloramphenicol selective media for replication before miniprep. Positive and negative controls were conducted alongside this process; the negative control was untransformed TOP10 E. coli and the positive control was TOP10 E. coli transformed with pSHDY, which confers resistance to chloramphenicol just as the pSPDY plasmids would and was already known to be functional in E. coli.
Test: Successful putative pSPDY transformant E. coli were transferred to liquid cultures and viewed under a fluorescent microscope to check for eGFP expression. Checking for eGFP was necessary, particularly in the case of putative unmodified pSPDY transformants, because the original pSHDY plasmid was used as a template in one of the PCR reactions preceding Golden Gate assembly; it was therefore possible for observed transformants to have been pSHDY transformants rather than pSPDY transformants, necessitating distinguishing the two on the basis of the expression of eGFP, which would only occur in pSPDY transformants. To the same end, plasmid from putative pSPDY transformants was also miniprepped; this was both directly run on an agarose gel against pSHDY to compare plasmid sizes (as pSPDY was expected to be larger due to the insert) and used as the template in PCR reactions using the sequencing primers to verify each of the Golden Gate ligations.
Learn: Successful Golden Gate assembly and transformation into TOP10 E. coli of our pSPDY plasmids required several attempts. Due to our controls, sequencing primers, agarose gels, fluorescent microscope, and submission of miniprepped plasmids for Sanger sequencing, we were able to identify the cause of most of these failures. In some cases, this was due to our cells not being competent, which was made evident by the absence of growth in the positive control. In other cases, this was due to the pSHDY plasmid being carried through PCR and Golden Gate assembly, being transformed into TOP10 E. coli instead of pSPDY; this was discovered via failing to observe fluorescence, the absence of expected amplicon bands in agarose gels following PCR with sequencing primers, and unexpectedly small plasmid bands on agarose gels of miniprepped plasmid. For unclear reasons, incorrect Golden Gate assembly occurred in at least one case wherein Sanger sequencing revealed the fragments to have assembled in an unintended order.
Design: The design phase of the Chameleon project was initiated after identifying the need for a solution to the barrier imposed by the restriction-modification systems, reaching past our project goals and into the goals of our field of research. At its core, the project is integrated with Stealth, a program developed and published by our PI, David Bernick. Stealth yields putative restriction sites that enhance transformation efficiency in non-model organisms if avoided in a genetic construct [4]. With respect to this advancement, we set out to develop an automated software pipeline that would take nothing more than a genomic sequence of the intended host (partial or complete) and a plasmid annotated in GenBank record to optimize transformation into the intended host. We prioritized efficiency, simplicity, and accessibility throughout software design.
Build: To meet our aim, we identified a systematic approach toward plasmid optimization. Our pipeline starts with the Stealth program, which identifies underrepresented sequences in any given genome and saves those that are palindromic. (see Software). The Chameleon project developed by our team predicts open reading frames (ORF’s) and derives a codon usage frequency profile from the host’s genomes. Concurrently, Chameleon analyzes the input GenBank record plasmid in order to identify the regions of the construct that may be altered; i.e non-overlapping protein-coding regions. This is a process we refer to as “unpacking”; this is an essential step in the pipeline in order to avoid introducing nucleotide changes that alter plasmid function. We recognized that codon usage is uniquely skewed in different organisms, so we remove putative restriction sites through stochastic, synonymous codon optimization with respect to the host’s endogenous codon usage. In this way, we ensure that we edit plasmids in a way that mirrors the codon frequency statistics of the host organism. Coding sequences must retain coding frames during substitution with synonymous codons. A sliding window approach is employed to achieve this, acting in conjunction with a method that generates every possible codon permutation. The window is sized to handle the maximum sized motif to be screened and no larger, preserves the stochastic usage of codons. Codon permutations are then filtered and collapsed based on the number of motifs present prior to further selection of optimal codon usage governed by usage frequencies, and deviation from the seed (which is minimized to maintain a codon distribution that reflects the hosts tRNA pool). The project was built to contain several proof-reading functions to ensure that computation and processing proceeds as intended.
Test: A wide variety of test cases were employed throughout the build process, as the Chameleon project is the culmination of several key modules that perform discrete computation in an additive manner, requiring all components of the Chameleon project to work as intended on the individual level. To serve our project needs, the unmodified pSHDY backbone was used as input to the pipeline function of the Chameleon project to generate a modified pSPDY construct that contains a minimal number of motifs that Stealth identified as underrepresented. The Chameleon project successfully removed 76% of motif occurrences (Fig. 1 and 2) while optimizing codon usage and without altering the amino acid sequence (Fig. 3).
Learn: Our initial design strategy led to several recursive instances of sub-DBTL cycles, and resulted in the Chameleon project functioning exactly as intended. The Chameleon project is able to produce a modified plasmid that contains a minimum number of Stealth identified palindromic motifs while retaining functionality and enhancing transformability [4]. Our initial design principles including modularity (to enable scalability and flexibility), generalizability (to impart a broad scope of applicability), and overall robustness allowed the structure and functionality of the pipeline to grow throughout the entire DBTL cycle. By the end of the cycle, we made the project available for anyone to conveniently download via pip, and have provided thorough documentation of the project. As an open source project, our source code is available for anyone to use. The TABI team plans to continue development of the Chameleon project post-Jamboree (2023).