Design

Overview

 Our project is organized around five key components that aim to develop CoPlat into an efficient platform for producing and demonstrating functional proteins, and to implement an artificial intelligence model to predict and validate potential adhesive proteins. Each component includes different designs and experiments, all contributing to the overall goal of the project.


Construction of CoPlat by mussel and barnacle proteins

In the first phase of the project, our goal was to select the most suitable adhesive proteins for the core of the CoPlat platform. We specifically chose mussel and barnacle proteins, which are known for their excellent adhesion properties in nature, and the core design concept of CoPlat focuses on the combination of CsgA and adhesive proteins. We wanted to present an adhesive platform that can anchor on the membrane because of the property of CsgA. Like the following figure shows.

Figure1: The photo about E. coli producing CoPlat and CoPlat anchor on the membrane.

  • CsgA: Enhancing extracellular functionality

    CsgA is a key component of our design and is critical to the success of the platform. It is unique in its ability to carry adhesive proteins out of the cell and significantly improve adhesion.

  • Mussel and barnacle proteins: Exploring versatility

    We designed a series of adhesion proteins, including mussel proteins such as CsgA+Mfp3 (BBa_K4854014) and CsgA+Mfp5 (BBa_K4854015), as well as CsgA+Bamcp20k-1 (BBa_K4854016), CsgA+cp19k (BBa_K4854019), CsgA +Mrcp20k (BBa_K4854020) and CsgA+Aacp19k (BBa_K4854021) barnacle proteins. Our goal was to determine which adhesion proteins are best suited as core components of CoPlat.

CsgA+Mfp3 (BBa_K4854014) Mussel foot protein Mfp3 is an adhesive protein from the Mediterranean mussel (Mytilus galloprovincialis). Mussels can secrete special proteins to adhere to surfaces under turbulent environments. [1]
CsgA+Mfp5 (BBa_K4854015) Mussel foot protein Mfp5 is an adhesive protein from the Mediterranean mussel ((Mytilus galloprovincialis). Mussels can secrete special proteins to adhere to surfaces under turbulent environments. [2]
CsgA+Bamcp20k-1 (BBa_K4854016) Barnacle cement protein Bamcp20k-1 is an adhesive protein from the striped barnacle (Amphibalanus amphitrite). At the cyprid stage before a barnacle becomes mature, the cyprid releases cement protein to anchor itself to substrates like rocks. [3]
CsgA+cp19k (BBa_K4854019) Barnacle cement protein cp19k is an adhesive protein from the barnacle (Balanus albicostatus). At the cyprid stage before a barnacle becomes mature, the cyprid releases cement protein to anchor itself to substrates like rocks. [4]
CsgA +Mrcp20k (BBa_K4854020) Barnacle cement protein Mrcp20k is an adhesive protein from the barnacle (Megabalanus rosa). At the cyprid stage before a barnacle becomes mature, the cyprid releases cement protein to anchor itself to substrates like rocks. [3]
CsgA+Aacp19k (BBa_K4854021) Barnacle cement protein Aacp19k is an adhesive protein from the striped barnacle (Amphibalanus amphitrite). At the cyprid stage before a barnacle becomes mature, the cyprid releases cement protein to anchor itself to substrates like rocks. [5]
m3
Figure 2.The Biobrick of CsgA+Mfp3

This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and Mfp3(BBa_K4854000) gene.

m5
Figure 3.The Biobrick of CsgA+Mfp5

This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and Mfp5(BBa_K4854001) gene.

m5
Figure 4.The Biobrick of CsgA+Bamcp20k-1

This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and Bamcp20k-1(BBa_K4854002) gene.

b19
Figure 5.The Biobrick of CsgA+cp19k

This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and cp19k(BBa_K4854005) gene.

m20
Figure 6.The Biobrick of CsgA+Mrcp20k

This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and Mrcp20k(BBa_K4854006) gene.

a19
Figure 7.The Biobrick of CsgA+Aacp19k

This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and Aacp19k(BBa_K4854007) gene.

If you want to see more detailed descriptions, please click on our Parts link.
In summary, the first part of our project was inspired by the natural adhesive properties of mussels and barnacles, which led to the selection of the best adhesive proteins for CoPlat. Additionally, we employed a unique CsgA design at the front end of the biobrick to enable continuous production of adhesive proteins by live E. coli. This design turned CoPlat into an efficient production platform, which was critical to the success of our project.


Proof of CoPlat through functional tests

In this part of the project, we conducted a series of experiments to verify which adhesive proteins have adhesive properties.

1. Flushing Test
In this experiment, we coated slides with adhesive proteins, washed them with ddH2O, and then observed them under a microscope. Our hypothesis is that the successful adhesion of E. coli to the slide surface indicates the production of CoPlat, as it resists rinsing with ddH2O.

2. Rheometer
After the flushing test, we will use a rheometer to assess the degree of protein adhesion, thus further confirming our experimental hypothesis. We will adjust the instrument settings to obtain relevant data.

3.Modified ELISA
Follow-up tests will be performed to confirm binding of the adhesive protein to the antibody and to confirm our results. The experiment consists of using PB buffer to wash away non-specific binders and then using TMB reagent to detect if the protein binds successfully to the antibody (blue color change indicates successful binding). Finally, we will evaluate the results by measuring the OD630 value.

If you want to see more detailed descriptions, please click on our Experiments link.
To view the results of the experiment, please click on our Results link.
These experimental phases were designed to prove the adhesive properties of the adhesive proteins and to ensure the utility of CoPlat.


Prediction and Validation of Potential Adhesive Proteins

In the third part of the project, we aim to expand CoPlat's application range and transform it into a versatile platform. To accomplish this, our team aims to find “potential adhesive proteins” with machine learning techniques.

1. Collection of protein database
First, we searched for words such as "adhesive," "cohesive," "sticky," etc., in UniProt's function and keyword searching field to collect protein data and expand our database.
Then, we analyzed the data and organized it into positive data for our project database. Then, we collected negative data according to the length of the positive data to test the stability of our model.

2. Establishment and Testing of Machine Learning Prediction Model
To attain our objective, we must build a classifier to identify potential adhesive proteins to enhance the flexibility and abundance of CoPlat in use. We employed multiple protein sequence coding methods, including AAPC, AA index, and ESM, and transformed them into vector representations for machine learning model testing with limited data.
Then, we adopted three machine learning models: Gradient Boost, Random Forest, and SVM, each coupled with three encoding methods. These models can effectively distinguish between adhesive and non-adhesive proteins while providing a probability score.
To ensure the accuracy and fairness of the models, we chose SVM as the model and used balanced training and Jackknife methods to prevent bias and assess accuracy. Stability tests and overfitting tests further validate the reliability of the model.
Finally, we applied the tested and analyzed models to datasets with different characteristics, such as specific GO domains, sequence lengths, specific motifs, etc., to achieve accurate prediction of adhesive proteins.

3.Software Interface
We have developed a tool, "FoxyProt," for identifying proteins with adhesive properties. To increase accessibility, we developed a user interface with a graphical user interface. Key steps of the software include:

A. Model export We used the joblib module to enable efficient export of machine learning models to ensure that future teams can easily load and use the models without retraining.

B. Graphical user interface We used the Tkinter library to create a graphical user interface that allows users to input data, make predictions, and visualize results easily.

C. Current functionality Users can select a fasta file, and then by embedding features, our SVM model categorizes the proteins and outputs the probability of being an adhesive protein.

This part involved using machine learning techniques and modeling to predict and validate potential adhesive proteins. We expanded the database with multiple protein sequence coding methods, built high-performance machine learning models, and provided future teams with easy-to-use tools and interfaces to easily identify adhesive proteins.


Design and Functional Test of Potential Adhesive Proteins

We finally selected the following four potential adhesive proteins and planned to conduct experiments to confirm our predictions:

  • CsgA+ecpA(BBa_K4854022)

    ecpA is a ciliary protein from Eikenella corrodens. It belongs to the cilium family of proteins and has membrane-penetrating properties. [6]

    ecpA
    Figure 8.The Biobrick of CsgA+ecpA

    This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and Mfp3(BBa_K4854008) gene.

  • CsgA+Nid1(BBa_K4854023)

    Nidogen-1 (Nid1) is a sulfur-containing glycoprotein from the rat (Rattus norvegicus). It is distributed in the basement membrane and may play a role in the interaction between cells and the extracellular matrix. [7]

    Nid1
    Figure 9.The Biobrick of CsgA+Nid1

    This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and Nid1(BBa_K4854009) gene.

  • CsgA+epd2(BBa_K4854024)

    Ependymin-2 protein (epd2) is a glycoprotein from rainbow trout (Oncorhynchus mykiss). It may play a role in neuroplasticity and axonal regeneration.[8]

    epd2
    Figure 10.The Biobrick of CsgA+epd2

    This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and epd2(BBa_K4854010) gene.

  • CsgA+zig-4(BBa_K4854025)

    The dual Ig structural protein zig-4 from Caenorhabditis elegans plays a role in maintaining axon position.[9]

    zig-4
    Figure 11.The Biobrick of CsgA+zig-4

    This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and zig-4(BBa_K4854011) gene.

If you want to see more detailed descriptions, please click on our Parts link.
At this stage of the experiment, we still used three methods to validate the accuracy of our prediction of potential adhesive proteins, including a flushing test, a rheometer, and a modified ELISA. This helped to confirm whether our prediction was successful in identifying potential adhesive proteins.
If you want to see more detailed descriptions, please click on our Experiments link.
To view the results of the experiment, please click on our Results link.
Our goal is to extend the application of the CoPlat platform to more different types of adhesive proteins, thereby increasing its versatility and usefulness. By successfully selecting potential adhesive proteins and planning for experimental validation, we have not only enhanced the functionality of CoPlat, but also provided a useful tool for future iGEM teams.


Validation of CoPlat's Modularity with Functional Proteins

Our goal was to validate the modularity of CoPlat to determine if it was able to successfully bind functional proteins as we expected like the following figure.

Figure12: The photo about E. coli producing CoPlat with functional proteins and anchor on the membrane.

To check our hypothesis, we chose to use Green Fluorescent Protein (GFP) as an example of a functional protein. GFP was chosen because of its unique fluorescent properties, which can be used to test CoPlat's functionality and ensure that our experimental aims can be clearly verified.
To accomplish this goal, we designed two parts: CsgA+Mfp5+GFP (BBa_K4854026) and GFP+CsgA+Mfp5 (BBa_K4854027). These two parts add green fluorescent protein (GFP) to the C-terminus and N-terminus of CsgA+Mfp5 (BBa_K4854015), respectively.

m5g
Figure 13.The Biobrick of CsgA+Mfp5+GFP

This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), CsgA(BBa_K4854013), GS linker(BBa_K4854012), Mfp5(BBa_K4854001), a flexible linker(BBa_K105012) and GFP(BBa_E0040) gene.

gm5
Figure 14.The Biobrick of GFP+CsgA+Mfp5

This composite part contains lac promoter(BBa_R0010), RBS(BBa_B0034), GFP(BBa_E0040), a flexible linker(BBa_K105012), CsgA(BBa_K4854013), GS linker(BBa_K4854012) and Mfp5(BBa_K4854001) gene.

If you want to see more detailed descriptions, please click on our Parts link.
In this experimental phase, in addition to the three previously mentioned methods, we also introduced a fluorescence test for evaluating the functionality of GFP, which was performed by observing the difference in fluorescence intensity for validation.
If you want to see more detailed descriptions, please click on our Experiments link.
To view the results of the experiment, please click on our Results link.
Through the design of these two parts, we can ensure that the modular structure of CoPlat can properly engage and activate the functional proteins. This phase of experimentation will help validate the design of CoPlat, ensure that it can fulfill our goals, and provide critical evidence for future applications.


Summary

Our project is organized into five key components, ranging from adhesive protein selection to module structure validation, and aims to improve the versatility and usability of the CoPlat platform. We experimentally verified the adhesion properties of adhesive proteins, extended the database using artificial intelligence and modeling, created reliable machine learning models, and also provided user-friendly tools. Ultimately, our project will help future iGEM teams characterize adhesive proteins more easily and expand the application areas of CoPlat.