Motivation
In protein databases, typically, only the major features and functions of proteins are noted, with some functions or features possibly going unrecorded or undiscovered. In our case, our aim was to identify proteins with general adhesion capabilities. To facilitate this, we developed “FoxyProt”, a tool designed to assist us in classifying potential adhesive proteins. Furthermore, we've enhanced it with a user-friendly graphical interface, making it accessible for future iGEM teams.
Set Up
Model Exportation
In our project, efficient model exportation is an essential step in ensuring the accessibility and usability of our machine learning model. To achieve this, we employ the joblib module in Python. Joblib is a library that allows us to serialize Python objects efficiently, making it particularly suitable for exporting complex machine learning models built using libraries like scikit-learn (sklearn). By exporting our machine learning models as .joblib files, we create a portable format that preserves the model's state, parameters, and structure, ensuring future iGEM teams can easily load and use the model without needing to retrain it from scratch. Additionally, future iGEM teams can easily test other functions by replacing the .joblib file with the model they trained.
Graphical User Interface
For an interactive and user-friendly experience with our project, we've opted to develop a graphical user interface (GUI). Our chosen tool for crafting this interface is Tkinter, a widely adopted Python library for building graphical interfaces. Tkinter is a versatile and robust library, enabling us to design windows, dialogs, buttons, menus, and other graphical elements with ease. Its integration with Python makes it an excellent choice for cross-platform GUI application development. Through Tkinter, we craft intuitive interfaces that seamlessly facilitate user interaction with our project's features and functionalities. Our GUI not only enhances accessibility to our machine learning models but also offers an intuitive means for users to input data, make predictions, and visualize results.
If you want to try our FoxyProt, please visit our Github page or visit our website.
Current Functions
Input
With the user interface we provide, you can browse your file and choose a FASTA file.
Embedding
After obtaining the entryID in the FASTA file, we can check the ESM2 embedding table to obtain the characteristics of the protein.
Classifying
The trained SVM model classifies the proteins entered by the user with respect to the Embedding feature corresponding to the EntryID.
Output
The user interface outputs the adhesion score resulting from the SVM model.