The pursuit of innovation in the realm of engineering, while ensuring it's responsible and good for the world.
Our engineering team collaborated with an outstanding professor, Professor Pokrath. Professor Pokrath is an honorable member of the King Chulalongkorn Memorial Hospital, one of the most prestigious universities in Thailand. He is also the instructor of our IGEM team. Professor Pokrath pointed our team out to the problem of the time-wasting process of finding the specific DNA sequence of T-cells that matches the desired neoantigen for our patients.
We started our design process by determining the most suitable UI for hospital use. The UI we designed is derived directly from Professor Pokrath’s demands for the protein-sequencing algorithm we are creating. The main features of this web are letting users add tasks, specify task names and HLA-type, and upload four DNA sequences. The task list table shows all the details, and users can access detailed information by clicking on a "result" button. Following the user interface, We designed the UX or workflow of the Neoantigen prediction system.
Our system started with collecting patient sample data for analysis and comparison with their neutral genome. Then, we align the sequences and identify the alignment for both DNA and RNA. HLA-type is then identified as it is a required receptor for the immune system that consists of 6 different types that bind to different antigens. After that, we used a method called variant calling to store sequence variations and identify the antigen parts that could be specific to tumor cells. To accommodate for the cases where there is a mutation in cancer cells creating new neoantigens, we screened the neoantigen and used the results from HLA-typing and Variant Calling to determine a specific neoantigen sequence.
Our building phase occurred systematically due to our Sprint Planning technique. We divided the process of software development into many small intervals, usually around two weeks to a month; this information was all shared with team members through Figma. Our development process consisted of data preparation, neoantigen prediction service development, a demo of the service, validation of the software’s performance, and troubleshooting. We dug down into our service development, which we spent the most effort on. In service development, we divided the work into three main sections: Web app development, backend development, and database management. We used a pipeline called NextNEOpi for database management for our sequencing algorithm. There were various types of problems we encountered in the development process,
The NextNEOpi problem primarily revolves around hardware constraints, specifically high-memory multi-core servers. To address this, our team initially attempted to utilize LANTA, the Thai Supercomputer; however, it lacked an internet connection in the compute nodes, resulting in extended processing times and manual file downloads, causing issues with specific processes. We decided to leverage AWS EC2 as a solution for computing tasks, benefiting from its robust server capabilities and reliable internet connection. This allowed us to run sample test data successfully and necessitated adjustments to resource allocation for handling larger datasets effectively.
At the same time, our web development unit worked to produce the project's web application, which included frontend and backend development. The front end of our website was created with React and MaterialUI, while the back end of the website was created with FastAPI and Mongo Atlas.
The front end integrates well with the FastAPI backend. Axios is used to call APIs, and MongoDB is used to store and retrieve data. The function of our web app was discussed in our design phase. We faced a few challenges during the front-end development of our peptide predictions website. React TSX is challenging as our developer found it difficult to find resources on TypeScript. We have also made a few changes to the UI because I wanted to make it as user-friendly as possible. We also encountered a few time zone differences, which made communication between the front end and back end difficult. Despite these challenges, the collaborative efforts are steering us towards the successful completion of the Web app.
Shifting towards the backend aspect, we started with our FastAPI development for back-end development, which connects and sends requests to our database in Mongo Atlas. We used Docker to connect with our NextNeopi pipeline. Amazon web service was also implemented for our web service deployment.
As a test, we ran two types of data through our software. The first one is sample data acquired from the NextNEOPI website, and the second one is the current patient data acquired from the breast cancer department of King Chulalongkorn Memorial Hospital. The result given from the NextNeopi pipeline consists of 4 main folders: analyses, pipeline information, supplemental, documentation, and neoantigens. The folder to be analyzed in focus was neoantigens. The folder contains the different HLA restrictions and a list of all patient’s possible HLA phenotyping. The foreign antigen presented by the tumor is integrated with MHC class I as an endogenous peptide. The predicted result illustrates the long list of possible neoantigen peptides with their affinity with all the possible MHC class I phenotypes in the patient’s body. The screening is performed after the ranking of the success rate, prediction confidence, binding coefficient (IC50), and other features.
The result also includes the logging and result of the run with the prediction corresponding to each step of neoantigen screening stored in the analyses and pipeline information, respectively.
In short, our test with the acquired data from the NextNEOPI website was successful. However, we ran into some problems in the test with actual data. We still face struggles. The main one is that our LANTA supercomputer has an inadequate internet speed, as our patient’s resource files are around 150 times larger than the test data from the NextNEOPI website. To emphasize, there are large resource files from databases such as iedb and vep_cache. There are also Singularity container images. We tried to solve this problem by manually downloading large resource files and adding them to the working directory. However, we still cannot complete the pipeline because some processes still need an internet connection, so it broke run in the compute node in this HPC(LANTA) that didn’t have the internet connection.
To solve the problem, We changed to use an AWS EC2 instance(16 cores, 64 GB), then we managed to finish the pipeline with test data provided in the pipeline’s repository, but when we used real data that had a large size. We have still not completed the pipeline yet, but we can run most of the process in the pipeline. Therefore, we need to configure process resource usage in the pipeline, run the pipeline with large-size data, and repeatedly debug it until it can run the entire pipeline.
Our engineering process still needs to be completed. Our software took much longer than expected due to many problems we are currently debugging. We have discussed our overall plan and what should have happened in our design section. Our current progress is solving all the problems while running with test data acquired from actual patients, as we discussed in the Testing phase. We are trying to do is to complete the software and test it with actual patient data. We are determined to reach our goal and make protein sequencing for DC Vaccines a less time-consuming and energy-consuming process for doctors across Thailand.