Model

iDRPro-SC: Identifying DNA-Binding Proteins and RNA-Binding Proteins based on Subfunction Classifiers

Confirming the specificity of our DNA aptamer for the CEA protein was an essential step. Although it has specificity as it was built on the SELEX process, we still had to check whether other DNA/RNA molecule or protein has strong binding affinity on CEA.

The first step was to check if CEA had a binding affinity with nucleic acid-type molecules. So iDRPro-SC, the classifier AI model that identifies whether the protein is an RNA-binding protein or a DNA-binding protein with the protein’s FASTA sequence, was operated to analyze CEA and CEA binding domain protein [1].

The result above shows that CEA and its binding domain do not have a binding nature with nucleic acid-type molecules. It shows that our DNA aptamer has a high diagnostic ability on CEA because other viruses or human DNA cannot interfere with the binding of our DNA aptamer and CEA.

HDock

What is HDock?

HDock is a computer program used to predict how proteins interact with each other and with DNA/RNA molecules. It does this by combining two different methods: one that uses existing templates as a guide, and another to find the best possible docking configurations.

HDOCK Server

The procedure of expecting binding affinity

In our project, we used HDock to predict the affinity of the CEA N domain and six DNA aptamers mentioned in Blocking the attachment of cancer cells in vivo with DNA aptamers displaying anti‐adhesive properties against the carcinoembryonic antigen [2].

The paper mentions six short, 74‐base long DNA aptamer sequences specifically recognizing a recombinant protein-coding for residues 1–132 of mature CEA (referred to as CEA N domain), identified by the SELEX. Below are six candidates of DNA aptamers:

N54: GACGATAGCGGTGACGGCACAGACG-TCCCGCATCCTCCGCCGTGCCGACCC-GTATGCCGCTTCCGTCCGTCGCTC N56: GACGATAGCGGTGACGGCACAGACG-CCCCAGGAAGAACCTACTCACTGATC-GTATGCCGCTTCCGTCCGTCGCTC N57: GACGATAGCGGTGACGGCACAGACG-CACCGCTCTTATGCCACCATTTTCAC-GTATGCCGCTTCCGTCCGTCGCTC N59: GACGATAGCGGTGACGGCACAGACG-CCGCTACCCCATCCACGCCAATCCC-GTATGCCGCTTCCGTCCGTCGCTC N65: GACGATAGCGGTGACGGCACAGACG-CCCGCATCCTCCGCCGTGCTGACCCC-GTATGCCGCTTCCGTCCGTCGCTC N71: GACGATAGCGGTGACGGCACAGACG-AAATCCCCGCTGAATTACCACTTTAC-GTATGCCGCTTCCGTCCGTCGCTC cApt: GACGATAGCGGTGACGGCACAGACG-AGTCAGTCAGTCAGTCAGTCAGTCA-GTATGCCGCTTCCGTCCGTCGCTC

Our goal is to sort out the best DNA aptamer which shows the highest binding affinity to the CEA N domain. We plan to use the selected DNA aptamer in EMSA to verify the binding affinity experimentally.

HDock requires two input files: receptor file and ligand file.

We used the PDB file of the CEA N domain from RCSB pdb as a receptor. Here is the PDB file of the CEA N domain:

[2qsq.pdb]

As a ligand, we used the fasta file of seven DNA aptamers(N54, N56, N57, N59, N65, N71, cApt), respectively, because HDock only accepts fasta file or pdb file as input. Here are the codes we used to convert six DNA sequences into a fasta file:

def convert_to_fasta(input_file, output_file): with open(input_file, 'r') as f: lines = f.readlines() with open(output_file, 'w') as f: sequence_id = "DNA aptamer name" f.write(f'>{sequence_id}\n') f.write(''.join(lines).strip() + '\n') convert_to_fasta('DNA aptamer name.txt', 'DNA aptamer name.fasta')

Result (CEA N domain-DNA aptamer affinity prediction)

📢 Limitation: Please note that HDock is a computer program that measures affinity based solely on protein and DNA sequences. Consequently, there is a high likelihood that the predicted outcome image may differ from the actual binding patterns.

Figure 1 is the visualized docking result of the CEA N domain and N56 DNA aptamer. The result visualizes the top 10 binding models. The following description explains the meanings of the Docking score and Confidence score:

👉 Docking score : more negative docking score means a more possible binding model Confidence score - confidence score > 0.7: very likely to bind - 0.5 < confidence score < 0.7: the two molecules would be possible to bind - confidence score < 0.5: unlikely to bind

Here are the docking score and confidence score of our six DNA aptamers:

	N54	N56	N57	N59	N65	N71	cApt
Docking score	-219.07	-226.66	-218.33	-226.18	-219.93	-217.00	-221.87
Confidence score	0.7992	0.8225	0.7968	0.8211	0.8020	0.7925	0.8081

Of all the six candidates, N56 stands out with the most negative docking score of -227.57 and the highest confidence score of 0.8251. This strongly suggests that N56 is the most probable binder for the CEA N domain. Consequently, we have decided to utilize N56 DNA aptamer in our experiment.

Mathematical estimation of aptamer-target binding affinity

What is Kd?

The binding of aptamers to their targets is mediated by various forms of interactions such as hydrogen bonding, van der Waals interaction, electrostatic interactions, and hydrophobic interaction holds, referred to as binding affinity. It is the most important component that defines aptamer’s performance [3].

Kd, dissociation constant is one of the numerical values of aptamer’s binding affinity. Kd is defined as the ratio of the concentrations of the target protein (T) to the aptamers (A) when the reaction reaches equilibrium.

Where:

[T] = Concentration of target protein [A] = Concentration of aptamer [T] + [A] ⇌ [TA]

When the reaction is in equilibrium, the dissociation constant can be calculated like below:

Kd = [T] × [A] / [TA] Kd = ([A]t) - [TA])[T] / [TA]

A higher Kd value indicates a greater tendency for the reaction to proceed in the forward direction (towards the target protein), whereas a lower Kd suggests a preference for the reverse reaction (towards the aptamers).

ODE for aptamer-protein interaction

ODE(ordinary differential equation) can be solved in MATLAB which shows a plot of concentration against time. From the biochemical equation, we set four ODE equations:

d[A] / dt = Kd • [TA] - 1/Kd • [A][T] d[T] / dt = Kd • [TA] - 1/Kd • [T][A] d[TA] / dt = 1/Kd × [A][T] - Kd[TA] fa = [TA]/[A]t = [T] × Bmax / Kd + [T]

MATLAB coding for plot visualization

%function setting
function dydt = f(t, y, Kd, At)
T = y(1);
A = y(2);
TA = y(3);
f_a = y(4);

dAdt = Kd*TA - (1/Kd)*A*T;
dTdt = Kd*TA - (1/Kd)*A*T;
dTAdt = -Kd*TA + (1/Kd)*A*T;
f_a = TA/At;
dydt = [dAdt; dTdt; dTAdt; f_a;];
end

function parameter
Kd = 0.78;
At = 20;

A0 = 20;
T0 = 5;
TA0 = 0;
f_a0 = 0;

y0 = [A0; T0; TA0; f_a0;];
tspan = [0, 5];

[t,y] = ode45(@(t,y)f(t, y, Kd, At), tspan, y0); 
plot(t,y(:,1), 'b',t, y(:,2), 'r',t, y(:,3), 'g',t, y(:,4), 'y')
xlabel('time')
ylabel('concentration')
legend('Aptamer', 'Target', 'Complex', 'Fraction')
end

function createfigure(X1, YMatrix1)
%CREATEFIGURE(X1, YMatrix1)
%  X1:  plot x 데이터로 구성된 벡터

% figure 생성
figure1 = figure;

% axes 생성
axes1 = axes('Parent',figure1);
hold(axes1,'on');

% plot에 대한 행렬 입력값을 사용하여 여러 개의 line 객체 생성
plot1 = plot(X1,YMatrix1,'Parent',axes1);
set(plot1(1),'DisplayName','Aptamer','Color',[0 0 1]);
set(plot1(2),'DisplayName','Target','Color',[1 0 0]);
set(plot1(3),'DisplayName','Complex','Color',[0 1 0]);
set(plot1(4),'DisplayName','Fraction','Color',[1 1 0]);

% ylabel 생성
ylabel('concentration');

% xlabel 생성
xlabel('time');

box(axes1,'on');
hold(axes1,'off');
% legend 생성
legend(axes1,'show');

Binding site prediction

Previous research on aptamer binding has typically focused on the general binding state, providing limited information. Our team aimed to advance beyond this by identifying not only the binding state but also the specific binding sites between DNA and proteins. To enhance the prediction accuracy, we employed two models utilizing different methods, allowing for cross-validation.

Generating DNA aptamer, N56’s PDB file

First and foremost, we generated a PDB file of selected DNA aptamer with DNA structure prediction and generation software because its PDB file must be needed to predict its structure and binding with CEA, our target protein [4].

[N56_aptamer.pdb]

1) DP-Bind(Protein-DNA binding site prediction)

DP-Bind is a model designed to predict the binding sites of DNA molecules on proteins [5]. However, it's important to note that these predictions are just candidates. Our approach goes beyond selecting candidates based on a specific criterion (a binding score over 0.9). We have collaborated with the wet lab to obtain 3D images of the protein-DNA binding complex using PAD. This collaboration allows us to compare the in-silico binding site candidates' locations with the actual 3D binding images, enabling us to pinpoint the precise binding site of the protein and DNA aptamer.

Results

2) FTMove: A Web Server for Detection and Analysis of Cryptic and Allosteric Binding Sites by Mapping Multiple Protein Structures

FTMove is an automated web server that identifies all structures of the protein available in the PDB, runs mapping on them, and combines the results to form binding hot spots and binding sites [6]. Despite the different methods used for predicting binding sites with DP-Bind, we identified a common binding site candidate, labeled as Binding Site 005, which is indicated by a cobalt blue color in the image.

Results


  - 

These are binding site candidates’ PDB files.
    
[binding_site.000.pdb]
[binding_site.001.pdb]
[binding_site.002.pdb]
[binding_site.003.pdb]
[binding_site.004.pdb]
[binding_site.005.pdb]

3) Cross-validation results

We identified six binding sites in FTMove and focused on determining whether they corresponded to CEA's predicted binding sequence parts by DP-Bind. Ultimately, we selected Binding Site 5 as the accurate binding site for CEA and our DNA aptamer N56. Upon comparing this with the wet lab's 3D image of the N56-CEA binding complex, we confirmed the accuracy of our prediction.

Intoduction

iDRPro-SC: Identifying DNA-Binding Proteins and RNA-Binding Proteins based on Subfunction Classifiers

HDock

What is HDock?

The procedure of expecting binding affinity

Result (CEA N domain-DNA aptamer affinity prediction)

Mathematical estimation of aptamer-target binding affinity

What is Kd?

ODE for aptamer-protein interaction

MATLAB coding for plot visualization

Binding site prediction

Generating DNA aptamer, N56’s PDB file

1) DP-Bind(Protein-DNA binding site prediction)

2) FTMove: A Web Server for Detection and Analysis of Cryptic and Allosteric Binding Sites by Mapping Multiple Protein Structures

3) Cross-validation results

Conclusion

Reference