RETRACTED ARTICLE: Drug discovery through Covid-19 genome sequencing with siamese graph convolutional neural network

Pati, Soumen Kumar; Gupta, Manan Kumar; Banerjee, Ayan; Shai, Rinita; Shivakumara, Palaiahnakote

doi:10.1007/s11042-023-15270-8

RETRACTED ARTICLE: Drug discovery through Covid-19 genome sequencing with siamese graph convolutional neural network

Published: 10 May 2023

Volume 83, pages 61–95, (2024)
Cite this article

Download PDF

Multimedia Tools and Applications Aims and scope Submit manuscript

RETRACTED ARTICLE: Drug discovery through Covid-19 genome sequencing with siamese graph convolutional neural network

Download PDF

Soumen Kumar Pati¹,
Manan Kumar Gupta¹,
Ayan Banerjee²,
Rinita Shai³ &
…
Palaiahnakote Shivakumara⁴

1711 Accesses
2 Altmetric
Explore all metrics

This article was retracted on 24 August 2024

This article has been updated

Abstract

After several waves of COVID-19 led to a massive loss of human life worldwide due to the changes in its variants and the vast explosion. Several researchers proposed neural network-based drug discovery techniques to fight against the pandemic; utilizing neural networks has limitations (Exponential time complexity, Non-Convergence, Mode Collapse, and Diminished Gradient). To overcome those difficulties, this paper proposed a hybrid architecture that will help to repurpose the most appropriate medicines for the treatment of COVID-19. A brief investigation of the sequences has been made to discover the gene density and noncoding proportion through the next gene sequencing. The paper tracks the exceptional locales in the virus DNA sequence as a Drug Target Region (DTR). Then the variable DNA neighborhood search is applied to this DTR to obtain the DNA interaction network to show how the genes are correlated. A drug database has been obtained based on the ontological property of the genomes with advanced D3Similarity so that all the chemical components of the drug database have been identified. Other methods obtained hydroxychloroquine as an effective drug which was rejected by WHO. However, The experimental results show that Remdesivir and Dexamethasone are the most effective drugs, with 97.41 and 97.93%, respectively.

Predicting Antiviral Drugs for COVID-19 Treatment Using Artificial Intelligence Based Approach

Generating novel molecule for target protein (SARS-CoV-2) using drug–target interaction based on graph neural network

Article 18 December 2021

A separable temporal convolutional networks based deep learning technique for discovering antiviral medicines

Article Open access 22 August 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The world faced a huge spread of various unknown coronavirus variants. After 21st June 2021, nearly 50,000 new coronavirus cases broke out, and almost 18.1 Cr cases were confirmed, causing 39.2 L deaths worldwide [16, 36]. Due to deplorable hospital infrastructure and treatment facilities, various art-of-war (and drug reutilizing) have been used, but the contamination rate is escalating worldwide. In October 2020, the Food and Drug Administration (FDA) accepted the therapeutic drug Remdesivir to heal COVID-19 patients [38, 50]. This drug can be used in treating adults and children ages 12 or more. It has also been discovered that people recover much faster after taking Remdesivir. On the other hand, WHO recommended using either ibuprofen or acetaminophen to reduce aches and fever temperature affected by the coronavirus infection; however, the situation is still out of control. Some proper drugs are genuinely needed in this condition, which helps people cure and play a significant role in vaccine production. However, for drug reusing, detailed knowledge is needed of the DNA sequences of this virus.

Genome sequencing [53] is functionally dealt with high dimensional DNA nucleotide data for disease classification with abstract analysis, which helps in drug discovery to fight against it. But, the information is compassionate, costly to obtain, and one of the most difficult to use without proper domain knowledge. For this reason, most researchers avoid sequence data and use conventional trial and error tactics for inefficient drug discovery as well as the waste of time. However, to increase the speed of the drug discovery process, the sequence data should be used in a more straightforward form, and automation techniques should be utilized. Figure 1 shows the sequence of the SARS-COV-2 (Fig. 1a) and MERS-COV-2 (Fig. 1b) and their difference (marked in red color). It can be observed that the data contains strings of alphabets corresponding to the amino acids that are very much time consumable to work with it.

Recently, the advancement in the fields of machine learning [14, 44] has made a significant impact in the field of drugs and vaccine discovery, although only a very few works contain genome sequencing (~11%). In recent times, most researchers focused on Covid-19 detection through the chest X-Ray [23] by circumventing the conventional RAD or RT-PCR test. With the identification of the disease, its healing pathfinding is an essential task. So, this is the key area that needs lots of improvement. Also, most research directly involves the graph convolutional neural network (GCNN) [57] for drug discovery. It is a sequential model, and the inputs should be in adjacency matrix form. So, much time has been wasted in preprocessing, and if the raw data is used to form the adjacency matrix without proper amplicon sequencing, the efficiency will be compromised. On the other hand, some works used RNN [11, 25] and LSTM [15, 52] to fulfill the above goal. This kind of neural network is effective in sequencing, but it has a shallow success rate in drug prediction. Some researchers used generative adversarial networks (GAN) [13, 55], which can generate drug databases as their own in the generator part and validate ROI in the discriminator part. This is the most adopted technique in automated drug discovery yet suffers from its efficiency if ROIs contain the missing link. Consequently, a simplified end-to-end model should be obtained to tackle the current pandemic situation. Table 1 obtained a brief comparative study to show how existing methodologies perform on the above depicted Covid-19 genome sequences.

Table 1 Motivational Study

Full size table

From the above discussion, it can be concluded that the previous drug discovery techniques propose hydroxychloroquine and similar substances, which WHO already obsoletes. So, a novel drug discovery technique has to come out. SGCNN is obtained, which predicted Remdesivir, Ivermectin-like substances globally accepted to fight against the Covid-19 genome.

Most of the state-of-the-art approaches mentioned in Section 2 have an excess false positive rate and predicted drugs are already rejected by the WHO as they are using conventional sequential methods with convolutional layers (suffers with vanishing gradient problem) or following the generative approaches (random initialization distorted the original DNA sequence connection which results in bad interaction network). This motivates us to obtain a parallel matching system that can superimpose the DNA interaction and Drug interaction for effective drug reproduction. Among all the existing networks siamese network holds this property. Besides that, as drug reproduction needs relational connectivity to check, among all the current models, GNN holds this property. Moreover, the prone of overfitting has been avoided by constraining the number of parameters with minimized operational frequency at each, and gradient vanishing has been overcome with re-normalization. So to take advantage of all the contributions in a single End-to-End networks SGCNN has been proposed where the convolution layer of the conventional Siamese is replaced by the GNN layers.

1.1 Objective and proposed outcome

Besides the above motivation, most of the state-of-the-art methods are highly expensive in terms of their computational cost. Besides that, they are very unstable (especially GAN) due to random initialization. All the networks are suffered to guess the drugs if the proper amino acid definition of the ROIs is missing. The novel architecture has been proposed to recognize the potential drugs and their targets for COVID-19 treatment from genome sequencing knowledge to overcome the stated shortcomings. The proposed network (SGCNN) has the potential to tackle the imbalance dataset, which is essential in drug discovery. Not only that, it can be trained with minimal data without compromising accuracy. Besides that, the knowledge of neural graph networks helps to obtain the relationships between the genome sequence and drug molecules. That’s why, firstly, a comprehensive analysis of the sequences has been done to discover the noncoding proportion, density of genes, and profusions in functional sets of genes utilizing optimized next gene sequencing (oNGS). This investigation finds the distinctive areas of the virus genome (target region) for necessary drug development. Here, the VDNS obtained the DNA interaction network of the obtained DTR to report the gene correlation, as the drug chemical environment entirely depends on this interaction property. This information is fed in the proposed model (SGCNN), where the graph convolutional neural network (GCNN) tackles the DNA interaction network. Also, the Advanced D3Similarity investigation is executed to study the probably confined modification of a drug candidate and its influences on the identified area. Then, these two information gain is used to train the proposed Siamese architecture for drug mapping, and a gradient-based optimization is performed to predict the drugs for the genome validation. The performance of the method (SGCNN) is compared with the current state-of-the-art methods and concludes that the SGCNN obtains higher precision in drug discovery than the rest of the methods. At last, it achieves with this knowledge that the drugs should be measured as a study for better treatment of COVID-19. In order to summarize, the key contribution of the method is listed below.

The existing next gene sequencing has been optimized to extract the drug target region from the sequence data by liganding the long fragment with Tagmentation.
The paper proposed a novel variable DNA neighborhood search algorithm to obtain a DNA interaction network from the drug target region.
According to our best knowledge, for the first time, D3 Law has been utilized for molecule dockerization, which is faster and more effective in obtaining drug databases based on molecular reactivity.
Replace the CNN layers of the Siamese network with GNN to obtain SGCNN for repurposed drug evaluation.

1.2 Organization of the paper

The rest of the article is coordinated into the accompanying segments. Section 2 portrays the literature review of the corresponding work. In Section 3, the proposed strategy is introduced and discussed thoroughly. The experimental results and comparison is introduced in Section 4, and Section 5 makes the inference.

2 Literature review

On 3rd April 2021, the World Health Organization acknowledged that the second wave of the COVID-19 pandemic would be more lethal than the previous one. Until then, eminent scientists have proposed many computational and examination strategies to repurpose the current medications strategies. Among them, [34] obtained a detailed review of the existing deep learning paradigm and the fundamental method of COVID-19 drug discovery for future works. This work creates a base for the comparative study of various deep learning strategies to achieve the desired goal. The paper [21] performs immune mapping with Dense Neural Network (DenseNet121), which works on annotated phenotypes, so without a proper biomedical guide, the experiment is impossible to achieve. Not only that, the importance and key features of the DenseNet121 have no adequately discussed. The article [47] proposed a phenotype compound screening with a graph neural network. Although the system has high throughput, it suffered from a vanishing gradient descent problem as well, as the difference between the training and testing curve shows the data is surely overfitted. The paper [30] utilized a multimodal restricted Boltzman machine to discover the highlights and the particular likelihood of circulation of every datum of drug repurposing. However, the sequence data does not have a modal structure. So the experiments cannot be recreated on the genome sequencing dataset. The paper [22] proposed PolypharmDB, which is a responsive drug repurposing approach but mainly focused on the protein-chain reaction. This is an effective drug repurposing technique but requires human trial, which is a time-consuming solution. In the meantime, the article [1] utilized bidirectional convolutional long short term memory (Bi-ConvLSTM) with DeepH-DTA to predict the drug-target interactions, which is a very bulky model having exponential time complexity, and the concept of DeepH-DTA is somehow theoretical and difficult to implement in real-life applications. The paper [24] reported that zafirlukast is a repurposing candidate with gradient boosted tree ensemble, a supervised technique that works only on the labeled drugs screening. Later on, Cheung et al. [19] showed the graph neural network has an important potential ineffective drug discovery of COVID-19. However, data preprocessing is needed to prepare the data to feed into this model. These works are primarily involved in existing drug classification, but in order to fight against COVID, we need to propose new drugs. The work [3] obtained an ensemble docking pipelining approach for drug repurposing, which can only be computed using a supercomputer. The paper [26] obtained a new training paradigm for COVID-19 drug discovery with Stochastic Multi-view learning, which cannot take care of the exploding gradient problem. The method [39] utilized a 1D-convolutional architecture to obtain the drug-target interaction (DTI) based on the Kinase Inhibitor Bioactivity (KIBA) scores. This approach, as mentioned earlier, works only for the S-glycoprotein of the SARS-COV2 sequence, and convolutional layers are not effective for repurposing tasks. In 2020, the articles [31] advanced the GNN-based approach with less shot link prediction to reduce its feature complexity. Although it performs significantly well to get rid of the vanishing gradient problem still suffers from exploding gradient problem. The method [18] used generative adversarial networks (GAN) to achieve the target-specific drug discovery with optimal time complexity, but the model is volatile due to random initialization. Additionally, the paper [8] designed a Monte-Carlo-based QASR with molecular docking and drug screening for repurposing, which is an exact cost expensive approach. This work is very effective in this scenario; however, molecular docking is a very time-consuming technique. The work [43] used algebraic topology with the deep convolutional neural network (DCNN) to discover the protease inhibitor. For this purpose, they study 137 protein crystal structures with useful biomarkers. But how the topological information has been fed into DCNN has not been answered. Later, [28] utilized graph machine learning to develop a drug discovery framework. But the experiments are complicated to reconstruct for real-life applications. Also, [64] obtained a knowledge graph completion to accomplish the same goal. The paper [56] extended this work with knowledge graph reconstruction for drug purposing report generation. However, the knowledge they generate a knowledge graph for the interaction and then reconstruct it for drug repurposing. These two downstream tasks should be possessed simultaneously. A few days ago, the paper [51] explored the Siamese network for drug discovery with the help of one-shot learning for the first time. It is the most advanced approach to date. However, the convolutional layers in the Siamese network (SN) are not suitable for accurate prediction as GNN does.

From the above discussion, it can be observed that Graph neural network and Siamese network is the most potential framework for drug repurposing. In contrast, the rest of the framework is bulky and not efficient. However, GNN and SN have their drawbacks. To overcome this scenario, the convolutional layers of SN have been changed with graph convolutional layers to utilize the potential of both networks. Not only that, but this article also works on the preprocessing and drug-targeted interaction (DTI) steps to obtain an optimal end-to-end framework with minimal runtime complexity, which depicts the novelty of the proposed approach.

3 Drug discovery architecture

The key objective of the proposed work is to develop a novel end-to-end model for effective drug repurposing for Covid-19 diseases. The main challenge of the proposed work is the presence of identifying the drug target region from the long DNA sequence. As a result, it affects accurate drug repurposing as the features of DNA sequences may overlap with the garbage features during data transformation, which leads to poor drug discovery. In the same way, the loss of DNA information leads to obtaining a poor chemical environment, and if the obtained environment is not stable, the repurposed drug might be harmful to human beings. At the same time, developing individual modules for each of the subtasks which are not compatible with each other is not advisable because of error-prone. To overcome those uncertainties, this section describes the proposed SGCNN architecture with oNGS and gene interaction network to guess the most appropriate chemical constituents for COVID-19 from the genome sequencing knowledge. The SGCNN has several definite layers executed similarly, and the proper drugs can be attained for the experimental target genome with high precision after passing over all the layers. Consequently, the computational power can be increased in terms of accuracy and computational complexity (time). Figure 2 shows the generalized view of the overall methodology of this paper.

As depicted in Fig. 2, the experimental dataset first went through the oNGS techniques that obtained the optimal targeted region for drug discovery. This targeted region then goes through two processes. On the one hand, it has gone through the VDNS to obtain the DNA interaction network suitable to feed in SGCNN. On the other hand, it went through the AD3 Similarity to understand the appropriate chemical environment for drug repurposing. Both this knowledge is provided in the two parallel paths of SGCNN to propose an effective drug for the given Covid-19 genome.

3.1 Optimized next gene sequencing

The information advancement stage optimizes the genome sequencing step to make the process faster and cheaper. However, this section utilized the synthesized information from the NCBI databank (MT050943, NC_045512.2, and MT012098) to obtain the dug target region in the DNA sequence. Figure 3 gives a short thought regarding this oNGS procedure.

Here, the DNA sequence is firstly shaved with 10-Kb fragments, and ligation is applied to them to mark the ligand to keep track of the ‘long fragment’ adapted non-overlapping sequence. The obtained fragments are adulterated into the 384-well plate, and each plate contains 3000 significant molecules related to their chemical structure. An ensuing ‘Tagmentation’ procedure has been performed with Illumina’s Nextera Technology [42, 60]. Here, the plate items are additionally divided, and non-overlapping sequences are added all the while. Later on, the segmented DNA fragments are again amplified, a unique index (barcode) has been added, and the following P5 and P7 adapter sequences are tagged in every well of the 384-well plate. And these yields are sequenced as a classical Illumina library and pooled to prepare the data for emulsion-based sequence clusterization. To implement this, the chromium machine algorithm [45] has been utilized by PCR reagents to partition the plates into 1 million micelles (‘GEMs’) beside the barcoded primers. In every GEMs, more modest pieces of DNA are enhanced from the first fragments, everyone with a barcode tag recognizing the original GEMs. The final fragments are re-pooled, aligned, and linked together to form the final optimal drug target reads.

Whereas the NGS methodology can obtain states of up to 2–300 base pairs of DNA sequence in 1 hour, the oNGS can generate reads of up to 10Kb-1 Mb DNA sequence in the same duration. Not only that, it is less biased and not affected by the lack of PCR amplification in the case of homogeneous and heterogeneous genome coverage, which depicts the novelty of the proposed optimization.

3.2 Variable DNA neighborhood search (VDNS)

The target region obtained through oNGS is unstructured and impossible to feed in the network containing graph convolutional layers. To convert this data into suitable graph adjacency, form a novel neighborhood search technique has been utilized here.

To circumvent the loss of data, it is determined to hold the ceaseless qualities in the neighborhood search stage and to utilize nearby inquiry calculations appropriate for constant improvement issues. Here, six local search equations are utilized (LS₁, ⋯, LS₆) to obtain a DNA interaction network. All the local search algorithms are solicited to a definite number of iterations (num), and all equations are used for different purposes. Equation (1) has been applied to transfigure the arrangement inside the sequence space. Equation (2) joins the current arrangement with the global best DNA molecule, and Eq. (3) consolidates the current arrangement with the nearby best DNA molecule. Equation (4) blends the current arrangement, the global best fit, and the neighborhood best DNA molecule. Equation (5) and Eq. (6) are sorts of hybrids with the neighborhood best and the global best molecule, individually. The techniques are portrayed with the accompanying individual conditions.

$$L{S}_1(x)=\frac{x_{ij}\left({s}_1+1\right)-{x}_{ij}\left({s}_1\right)}{\mathit{\operatorname{ran}}{d}_5}$$

(1)

$$L{S}_2(x)=\frac{x_{ij}\left({s}_1+1\right)-\left(1-\mathit{\operatorname{ran}}{d}_6\right)\bullet {x}_{ij}\left({s}_1\right)}{\mathit{\operatorname{ran}}{d}_6\times gbes{t}_j}$$

(2)

$$L{S}_3(x)=\frac{x_{ij}\left({s}_1+1\right)-\left(1-\mathit{\operatorname{ran}}{d}_7\right)\bullet {x}_{ij}\left({s}_1\right)}{\mathit{\operatorname{ran}}{d}_6\times lbes{t}_{ij}}$$

(3)

$$L{S}_4(x)=\frac{x_{ij}\left({s}_1+1\right)-\left(1-\mathit{\operatorname{ran}}{d}_8\right)\bullet {x}_{ij}\left({s}_1\right)}{\mathit{\operatorname{ran}}{d}_8\bullet \mathit{\operatorname{ran}}{d}_9\bullet gbes{t}_j+\mathit{\operatorname{ran}}{d}_8\bullet \left(1-\mathit{\operatorname{ran}}{d}_9\right)\ast lbes{t}_{ij}}$$

(4)

$$L{S}_5\left({x}_{ij}\left({s}_1+1\right)\right)=\left\{\begin{array}{c} lbes{t}_{ij},\kern0.5em if\ \mathit{\operatorname{ran}}{d}_{10}<0.1\\ {}{x}_{ij}\left({s}_1\right),\kern2.75em Otherwise\end{array}\right.$$

(5)

$$L{S}_5\left({x}_{ij}\left({s}_1+1\right)\right)=\left\{\begin{array}{c} gbes{t}_j,\kern0.5em if\ \mathit{\operatorname{ran}}{d}_{11}<0.1\\ {}{x}_{ij\left({s}_1\right)},\kern3.25em Otherwise\end{array}\right.$$

(6)

Where the local iterated search number is s₁, rand₅ is an arbitrary number in [−1, 1], and rand₆ to rand₁₁ are the arbitrary numbers situated in the interval [0, 1]. Algorithm 1 provides a detailed idea of the obtained VDNS.

Now a modified sequence technique of the VDNS algorithm has been taken. This Sequential version is the model of the classic VDNS algorithm, and in every iteration, all the local searches are arranged sequentially. The novelty of this design is that it can successfully remove the random initialization problem of the generator part of the GAN model [62], and the worst-case time complexity of the proposed algorithm is O(n³) Which has one of the fastest convergence rates for DNA interaction networks.

3.3 Obtaining drug database with advanced D3Similarity (AD3)

AD3 looks through the area of changes of a solitary medication applicant based on the sub-atomic properties and bioactivity of the DNA interaction network in the above segment. It is a blend of the administered and solo techniques where the regulated strategy creates the substance change, and unaided strategies track down all potential sets between bunches of synthetic mixtures dependent on protein structure. Figure 4 shows the mean engaged with advanced D3Similarity.

D3 Law

This law is also known as the 3rd Density law [32]. It depicted the fact that to obtain the best possible similar DNA pair, their cubic affinity should tend to one when they are substituted through property optimization to obtain the chemical structure of the drugs.

Dockerization

It is a measurement to look through synthetic likeness and foresee its action. The tokenization of the two synthetic compounds is the most significant base that shows up in both.

Steps involved in AD3

To discover the capability of the AD3 calculation (Algorithm 2), all atoms of the DNA corrosive construction of the objective area have been removed that compute their fondness utilizing D3 Law (Eq. (7)). If it gets higher partiality, continue to the subsequent stage where it computes the design activity similarity index (DASI) in Eq. (8).

$${D}_{d_1,{d}_2}={\left|{\log}_2\left({D}_{d_1}\right)-{\log}_2\left({D_d}_2\right)\right|}^3$$

(7)

Where, ${D}_{d_1,{d}_2}$ is the general proclivity index somewhere in the range of d₁ and d₂ structure, and ${D}_{d_i}=\frac{\omega_i}{\tau_i}$ is the affinity of the ${D}_{d_i}$ atom. ω_i and τ_i represent the angular quantum and spin quantum of the DNA helix structure.

$$DAS{I}_{d_1,{d}_2}=\frac{\left|{\textrm{Ac}}_{d_1}-{\textrm{Ac}}_{d_2}\right|}{1-{D}_{d_1,{d}_2}}$$

(8)

Where, ${\textrm{Ac}}_{d_1}$and ${\textrm{Ac}}_{d_2}$ are the activity of the d₁ and d₂ structure, respectively, in the DNA interaction network of the target region attained in Section 3.2. In light of the above realities, the AD3 calculation is designed in Algorithm 2.

Then, the main work was done for the drug revelation. The data is currently taken care of in the Siamese Graph Convolutional Neural Network (SGCNN) in Section 3.4 to foresee the antiviral medications appropriate to fight against the COVID-19 pandemics.

3.4 Siamese graph convolutional neural network (SGCNN)

This paper obtained a twofold graph convolutional siamese network for predicting the most significant drug along with their effectivity percentage against the COVID-19 virus. As the input data contains Spatio-temporal correlations [5,6,7], the primary thought behind this plan is that the appearance highlights (i.e., GCNN) prepared in a closeness learning issue, and the semantic highlights (i.e., Siamese Network) prepared in drug characterization issue supplement one another. It like ought to be together considered for adequate drug disclosure.

3.4.1 Network architecture

The overall network design of the proposed SGCNN work is depicted in Fig. 5. The Network has two sets of inputs, namely, the targeted drug database (y) and the searched DNA interaction network (x) and their compliments (y’ and x’ respectively). Here, both y’ and x’ have a dimension (w_s × h_s), and y and x have a size of (w_t × h_t) [w_t < w_s; h_t < h_s]. It may be looked at as an assortment of competitor interaction network x in the inquiry district with a similar chemical environment as y.

SA-Siam comprises the graph convolutional neural network (GCNN) and the Siamese Network (S-Net). The yield of each branch is a reaction map demonstrating the likeness between target y and DNA fix x inside the inquiry district. The two branches are independently prepared and concatenated to obtain the desired final output at testing time.

3.4.2 The GCNN

The primary goal of this section is to generate an optimum node g_v representation by aggregation of the ROI features (g_x) and neighbor’s features (g_u) where g_u ∈ N(g_v). In this paper, five graph convolutional layers have been stacked to extract high-level drug information with node classification (to obtain the optimum number of drugs) and graph classification (to obtain the reaction map). To begin with a GCNN, the spectral convolution of the input graph should be defined using Eq. (9).

$${g}_{\theta}\ast {g}_x=U\ast {g}_{\theta}\ast {U}^T\ast {g}_x$$

(9)

Where g_θ is the filter applied to a convolutional layer, and U is the normalized eigenvector graph Laplacian matrix (NEGLM) which can be obtained using Eq. (10).

$${L}_g={I}_N-{D}_L^{\frac{1}{2}}{A}_{ij}{D}_U^{\frac{1}{2}}=U\ast \lambda \ast {U}^T$$

(10)

Here, λ is the diagonal matrix of eigenvalues, I_N is the identity matrix of N^th order, A_ij is the input interaction matrix, and D_L and D_U are the lower and upper diagonal matrix of A_ij, respectively. Using fact, the spatial convolution equation can be simplified as Eq. (11).

$${g}_{\theta}\ast {g}_x={\sum}_{k=0}^N{\theta}_k^{\prime }{T}_k\left({L}^{\prime}\right){g}_x$$

(11)

Where T_k is the Chebyshev Polynomial and ${L}^{\prime }=2\ast \frac{L_g\hbox{--} {I}_N}{\lambda_{max}}$ .

As mentioned, the GCNN model is constituted by stacking convolution layers in Eq. (11). Here, the number of convolution layers is restricted with k = 5, and meanwhile, approximate λ_max = 10 is taken, and then overall spatial convolution has been obtained using Eq. (12).

$${g}_{\theta}\ast {g}_x={\sum}_{k=0}^3{\theta}_k^{\prime}\ast {\left({L}_g-{I}_N\right)}^k\ast {g}_x$$

(12)

Furthermore, the prone of overfitting has been avoided by constraining the number of parameters with minimized operational frequency at each layer. But in Deep Neural Network (DNN), the repetitive operation may result in numerical instability and gradient vanishing. To overcome this drawback, re-normalization (using Eq. (13)) has been applied to Eq. (6), and Eq. (12) is readjusted with their weights accordingly.

$${L}_g={I}_N+{D}_L^{\frac{1}{2}}{A}_{ij}{D}_U^{\frac{1}{2}}\to {D_L}^{\prime \left(\frac{1}{2}\right)}{A}_{ij}{D}_U^{\prime \frac{1}{2}}$$

(13)

Where D_L^′ and ${D}_U^{\prime }$ are the differential lower and upper diagonal matrix, respectively.

This definition upscales the ROI features (g_x) (where x ∈ R^N ∗ C, N is the number of nodes, and C is the dimensional Eigenvector). Now in these upscaled features, F filters have been applied with C channels. So Eq. (13) is more filtered and downsampled in Eq. (14).

$${G}_Z={D_L}^{\prime \left(\frac{1}{2}\right)}{A}_{ij}{D}_U^{\prime \frac{1}{2}}\ast {g}_x\ast {g}_{\theta }$$

(14)

Where θ ∈ R^C ∗ F is the filter parameter, and G_Z ∈ R^N ∗ F is the gene expression matrix after the convolution layer. The overall complexity of this approach is the order of O(FC). With this, the GNN concludes its operation, and this article gets an optimal number of genes caused to the disease. The overall GCNN network architecture is shown in Fig. 6.

It is a five-layered graph convolutional neural organization where the acquired quality similitude framework feeds the information layer (i.e., Gconv1). It comprises of covered-up layer (i.e., Gconv2, Gconv3, and Gconv4) and one yield layer (Gconv5). ReLu has been utilized as an initiation work for quality advancement in all middle the road layer. Three pooling layers (Pool1, Pool2, Pool3, and Pool4) have been utilized to give a disconnected type of portrayal to decrease the computational expense by diminishing the number of boundaries to learn and giving fundamental interpretation invariance to the inside portrayal. After Gconv5, a sigmoid capacity to breakdowns the quality of portrayals. A Multi-layer perceptron (MLP) [54] with softmax enactment has followed the readout capacity to assemble a start-to-finish network for improvement. After that, GCNN returns the ideal number of quality drug products with their perfect way portrayals.

3.4.3 The S-Net

The S-Net [41] takes (y’ and x’) as information. The proposed method straightforwardly utilizes a CNN trained from scratch in the drug arrangement task as S-Net and fixes every one of its boundaries during preparation and testing. It helps S-Net to yield highlights from the last two convolutional layers (conv4 and conv5 in the method) since they give various degrees of reflection, and the low-level highlights are not extricated.

The highlighted features from various convolutional layers have a unique spatial goal. For the sake of simplicity, it indicates the concatenated staggered includes by C_s(∙). To make the semantic highlights reasonable for the relationship activity, a combination module is embedded, executed by (1 × 1) ConvNet, after highlight extraction. The combination is performed inside highlights of a similar layer. The element vector for search locale x after combination can be composed as S(C_s(x)).

The objective handling is somewhat unique. S-Net takes y’ as the objective info. y’ has y in its middle and contains the setting data of the objective. Since S-Net is completely convolutional, it can, without much of a stretch, get C_s(y) from C_s(y^′) by an essential yield activity. The consideration module takes C_s(y^′) as info and yields the channel loads ε. The subtleties of the consideration module are remembered for the following subsection (Subsection 3.4.4). Channel loads increase the highlights before they are melded by (1 × 1) ConvNet. Thus, the reaction map from the semantic branch can be composed as an eq. (15).

$${h}_s\left({y}^{\prime },{x}^{\prime}\right)={C}_s\left(S\right(\varepsilon \bullet {\textrm{C}}_{\textrm{s}}(y),S\left({\textrm{C}}_{\textrm{s}}(x)\right)$$

(15)

Where ε has similar measurement as the number of channels in C_s(y) and (·) is the insightful component increase.

In the semantic branch, it just trains the combination module and the channel consideration module. With preparing sets (y_i, x_i) and ground-truth reaction map Z_i, the semantic branch is streamlined by limiting the accompanying calculated loss function L(·) depicted in Eq. (16).

$$\underset{\varnothing_{\textrm{s}}}{\min}\frac{1}{\delta}\sum\nolimits_{i=1}^{\delta }L\left\{{h}_s\left({y}_i^{\prime },{x}_i^{\prime };{\varnothing}_{\textrm{s}}\right),{Z}_i\right\}$$

(16)

Where ∅_s means the teachable boundaries, δ is the number of preparing tests.

The general warmth map is processed as the weighted normal of the warmth maps from the two branches depicted as eq. (17) during testing time.

$$h\left({y}^{\prime },{x}^{\prime}\right)=\lambda {h}_s\left(y,x\right)+\left(1-\lambda \right){h}_s\left({y}^{\prime },{x}^{\prime}\right)$$

(17)

Where λ is the weighting boundary to adjust the significance of the two branches, practically, λ can be assessed from an approval set. The situation of the most significant worth in h(y^′, x^′) proposes the middle area of the followed target. It also utilized multi-scale contributions to manage scale changes. It is observed that utilizing three scales finds some kind of harmony between execution and speed.

3.4.4 The attention network

The consideration module is acknowledged by channel-wise activities. Figure 7 depicts the consideration cycle for the i^th channel. Take a deformable convolution highlight map; for instance, the spatial measurement is utilized in this execution. It partitions the element map into (3 × 3) matrices, and the middle matrix with size (6 × 6) relates to the following objective y. Max pooling is performed inside every lattice, and afterward, a two-layer multi-facet perceptron (MLP) is utilized to create a coefficient for this channel. At long last, a Sigmoid capacity with predisposition is utilized to produce the previous yield weight ε_i. The MLP module shares loads across channels separated from the equivalent convolutional layer.

Note that consideration is just added to the objective preparation. Every one of the initiations in the i-th channel for the objective fix is increased by ε_i. Therefore, this module is passed just a single time for the principal edge of the following grouping. The computational overhead is unimportant. As all the necessary step has been discussed, Fig. 8 obtained a brief example to understand the flow of the overall methodology.

After playing out every one of the means recorded over this test finished up with the most critical medications for novel COVID-19 treatment, in the following segment, every outcome is acquired relating to this philosophy, discuss their significance, and make a liberal correlation with a comparative procedure to demonstrate its effectiveness.

4 Result Analysis

The trial is completed on three openly accessible (benchmarks) with comprehensive genome information, which is obtained from the NCBI databank presented by the Indian Government (MT050943 [48], and MT012098 [61]) and the Government of China (NC_045512.2 [59]). This genome information contains all critical genomes gathered from Riboviria, Pisuviricota, Pisoniviricetes, Orthornavirae, Nidovirales, Coronaviridae, Cornidovirineae, Sarbecovirus, Orthocoronavirinae, and Betacoronavirus, which are straightly indistinguishable. The highlights of these data are recorded in Table 2.

Table 2 Summary of the dataset

Full size table

It is observed that the dataset submitted by two different governments experimented on the different biological processes as well as followed different assembly methods, but their cellular components are the same. This shows the COVID-19 viruses attacked a specific kind of protein body which helps in effective vaccine development in the further processes.

4.1 Experimental setup

The proposed system is executed utilizing the RStudio IDE, Jupyter Notebook, and Spyder (Anaconda-based IDE) committed to running the R programming and python programming individually. To oNGS Jupyter Notebook has been utilized. Also, for the DNA association organization, RStudio has been used. It can run on the work area (Linux, Windows, and Mac) or in a program connected to RStudio Server Pro/RStudio Server (Ubuntu, Red Hat, and SUSE Linux). In this paper, the proposed technique and the comparative approaches and investigations are acted in Ubuntu-based OS with 16GB Ram and Intel i7 processor. All the exhibition examination is figured nervous with the bio conductor, NumPy, pandas, Keras, and a few other python and R bundles.

4.2 Implementation details

In our method, we set the maximal number of epochs as 10,000 for the training process with the Adam optimizer and set the initial learning rate and the weight decay to 0.005 and 0.0001, respectively. We also applied a dropout ratio of 0.2 in front of each layer for all datasets. In all experiments, the proposed method with the PyTorch framework employs a graph convolutional neural network (GCN) with hidden layers as the backbone network. Additionally, we re-implement all the comparison methods based on the public codes provided by the authors and use the same backbone networks as our method. The hyperparameter λ in eq. (17) is set to 0.5, and∅_s is set to 0.1; for balancing the magnitudes of L_g and G_Z in Eqs. (13) and (14), respectively, the parameter ε was set from 5 to 25 unless otherwise stated (in eq. (15)). The hyperparameter ${\theta}_k^{\prime }$ in (12) was set from 0.2 to 0.8 unless otherwise stated. For a fair comparison, we obtained the author-verified codes for all comparison methods and searched the best hyperparameters via a grid search to achieve their best performance. All the experiments are tested by masking some nodes of the interaction network to validate the model performance; as the input data is in the relational form, it is impossible to apply a conventional train test split. [46].

4.3 Performance evaluation

This section is dedicated to depicting all the qualitative results described in the drug discovery methodology. It discusses the usefulness of every section and the overall benefits obtained by the proposed method.

4.3.1 Optimized next gene sequencing

A bunch of appropriation learning has been made to approve the predominant variety for every amplicon, wherever it characterizes the no. of indistinguishable peruses and gets the drug target region. Figure 9 a, b and c show the drug targeted region of MT050493, MT012098, and NC_045512.2, respectively.

The above course shows that the number of genomes can fluctuate among the designated grouped amplicons yet develop associated genome bunches. Nonetheless, it will keep up with the predominant sequencing methodically between the groups, not just that the non-covering district displays this inconstancy issue. This isn’t reasonable for additional examination as it shows the drug target region yet is incapable of producing any data about DNA neighborhood just as their genomic interaction. For this reason, the load of groups is associated with the information on the DNA interaction network for further analysis.

From Fig. 9, it can be concluded that the optimization of next gene sequencing with liganding and tagmentation obtained the optimal drug target region. Here, optimization is important as DNA sequences is a Big categorical data, and if we fail to obtain the optimal target region, we have to deal with garbage information in the later stage which increases the time complexity and leads to poor accuracy. As depicted in Fig. 9, oNGC successfully surpassed the existing DNA transform algorithms.

4.3.2 DNA interaction network

The targeted region obtained through oNGS is clustered and not reasonable for additional examination. This progression readies the information through Subsection 3.1 in a structure to play out the AD3 investigation and feed the information into the SGCNN model. In reality, it acquires a communication network from all the genome bunches by dissecting the qualities of their particular DNA gatherings. Figure 10 a, b and c show the aftereffects of the DNA association organization of MT050493, MT012098, and NC_045512.2, respectively. The objective locale acquired in this segment is utilized for the Ad3 examination and drug prediction in the further segments.

From Fig. 10 a, it can be concluded that the DNA of the targeted region are strongly connected to each other, so each of them has a potential contribution to the disease. Not only that, now this network can be depicted as an adjacency matrix where (1/0) signifies the occurrence and non-occurrence of the interaction, which is more reasonable to feed in a graph convolutional structure.

Figure 10 a, b and c prove the fact that the targeted region obtained through oNGS is highly accurate, as all the genomes in the DNA interaction network are strongly connected to each other. Now it should proceed to the next step to obtain the chemical structure of the effective drugs through Advanced D3Similarity (AD3).

4.3.3 Advanced D3Similarity

The fundamental goal of this section is to get the synthetic climate reasonable for the medication expectation and discover the core similarity and habit-forming properties among them, and record all medications having a similar core to get the last medication dataset appropriate to take care of into the AI module. Table 3 shows the AD3 examination results, and it shows the characterization of 48 medications having 18 comparable compound conditions dependent on the D3Similarity score. It also gets the medication posting into the medication dataset dependent on bioactivity and sub-atomic reactivity yield score. As the medication dataset is acquired in this progression and target locale got in the past advance (subsection 4.2.2) and then fed both the information into SGCNN for different expectation.

Table 3 Drug Database obtained through AD3 Similarity

Full size table

From Table 3, it can be concluded that the obtained chemical environment is highly effective as the yield score are very high (>85). This is another practical step of drug repurposing because these obtained chemical environments are fed as an input to the SGCNN module. Here, most of the existing approaches failed, and their yield score is very low, ~65%, due to wrong selection of the DTR or random initialization. Based on these chemical environments, the effective drugs and their accuracy have been reported in the following subsections.

4.3.4 Drug repurposing with SGCNN

For the medication genome collaboration expectation, the SGCNN has been misused where the objective locale is specified as a contribution to the type of nearness grid, (1/0) addresses the occurrence and non-occurrence of the grouped genome communication and medication information base are known as additional information. An angle-based enhancement has been executed where the connection factor is acquired as an inclination to decrease the results of the medications. The portrayal of the anticipated medications, their compound property, and their importance got in Table 3. The ROC curve for all the datasets is shown in Fig. 11. In Table 4 and Fig. 11, it tends to be presumed that the utilized SGCNN is ~96% exact for the acquired target regions. It’s anticipated that Temsirolimus, Deutetrabenazine, Sirolimus, Everolimus, Bremelanotide, Pheneticillin, Latamoxef, Bacampicillin, Cefazolin, Cefoxitin, Enoxacin, Colchicine, Meropenem, Ticarcillin, Mezlocillin, Cefuroxime, and so on and the combination of Nafamostat and Camostat, Sarilumab and Tocilizumab, Dexamethasone and Remdesivir are achieved as the most appropriate medications for the treatment of COVID-19 patients.

Table 4 Drug Repurposing Description

Full size table

The drug reported from Temsirolimus to Cefuroxime belongs to the antibiotic category, and the rest of the drugs are extracted from the steroid group; both are essential to defeating the COVID-19 virus. Now the ROC curve of the model (SGCNN) is shown in Fig. 10.

From Fig. 11, it is shown that the proposed SGCNN method achieves more on the ROC based on the True Positive Rate and False Positive Rate for NC_045512.2 dataset (indicated by the blue colored line). Also, From Table 4 and Fig. 11, it tends to be inferred that the proposed procedure works as per the proposed objective of this paper. However, to get the liberality of this methodology, an ablation study (shown in Table 5), the best results are marked with bold text in Table 5 and a definite similar investigation have been acquired in Subsection 3.2 and Subsection 3.3, respectively.

Table 5 Ablation Study

Full size table

4.4 Ablation study

To prove the effectiveness of every step, a detailed evaluation has been performed based on several statistical parameters (accuracy (Acc), recall (R), precision (P), and F1-score (F1)). Firstly, the evaluation of SGCNN has been performed. After that, the one-by-one layer was added, and their statistical performance was reported in Table 4. This experiment is repeated for each of the experimental data. Here only the oNGS + SGCNN and the overall methodology have been reported as the DNA interaction network, and AD3 are auxiliary steps that help to improve the overall methodology; however, if some researchers want to use publicly available drug database, they can skip the AD3 though the result will be compromised. Similarly, if someone is least interested to know about the interaction, they will not utilize Subsection 3.2. Nevertheless, the oNGS cannot be skipped as the genome sequencing data cannot be readable through any kind of neural network.

Table 5 depicts the bare necessity of every methodological step reported in Section 3. Now a detailed comparative study has been stated in the following subsections.

4.5 Comparative study

From the above outcomes, it is currently a setup reality that the proposed technique can battle the unusual COVID-19 pandemic by foreseeing appropriate antiviral medications. Nevertheless, as referenced to mark it nonexclusive and check its curiosity, a substantial correlation is acquired in part with the current artificial intelligence-based medication revelation methods. The equivalent data and a similar exploratory arrangement have been utilized to get this comparative investigation. This comparative study comprises several outcomes. Firstly, the potential of the proposed neural network compared with state-of-art neural networks, namely, ANN [29], CNN [10], RNN [12], LSTM [20], AutoEncoder [37], Deep Belief (DBN) [2], GAN [27], and GNN [63] used in drug discovery based on MT050493 benchmark, with several neural network parameters (Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSiM) and, Feature Similarity Index (FSiM)) in Table 6. Low PSNR, high FSiM, and high SSiM values are always desirable for better outcomes. The calculative steps of PSNR, SSiM, and FSiM have been depicted in Eqs. 18, 19, and 20, respectively.

$$PSNR=20\bullet {\log}_{10} MA{X}_I-10\bullet {\log}_{10} MSE$$

(18)

$$SSiM\left(I,{I}_w\right)=\frac{\left(\left(2\mu {\mu}_w+{C}_1\right)\left(2\sigma \left(I,{I}_w\right)+{C}_2\right)\right)}{\left({\mu}^2+{\mu}_w^2+{C}_1\right)\left(\sigma {(I)}^2+{\sigma}_w{(I)}^2+{C}_2\right)}$$

(19)

$$FSiM=\prod\nolimits_{i=1}^w1-{\left(\frac{\left|{I}_i-{I}_{w_i}\right|}{\mid {I}_i+{I}_{w_i}\mid}\right)}^{\frac{I_i}{I_{w_i}}}$$

(20)

Table 6 Comparative Study based on Neural Network parameters

Full size table

Where I is the confidence score obtained through SGCNN, I_w is the weighted confidence score, μ depicts the mean and σ depicts the standard deviation respectively variables, C₁ and C₂ are the similar constants of I and I_w, respectively.

From Table 6, the SGCNN achieved low PSNR and high FSiM (marked by bold font). But, this method is unable to beat the performance of GAN based on SSiM as it has a generator model. Later, a detailed comparative study among the SGCNN and the methods [2, 10, 12, 20, 27, 29, 37, 63] was shown based on four well-known statistical parameters (Acc, P, R, and F1). Table 7 reported the performance evaluation of the proposed and compared methods where a higher value represents the best performance. Besides that, a comparative study has been obtained based on the convergence rate of various types of neural networks to obtain faster outputs. Figure 12 shows the performance of various neural networks based on their convergence rate on MT050493 benchmark.

Table 7 Comparative Study of overall methodology with state-of-the-art approaches

Full size table

It can be observed that the GNN layers converge in very fewer epochs compared to the convolution and fully connected layers, which makes the justifiable replacement of the CNN layers of the Siamese network with the GNN one.

From Table 7, the result shows that the method SGCNN outperformed (marked by bold font) the state-of-the-art methods [4, 9, 17, 33, 35, 40, 49, 58]. Lastly, a detailed comparison based on training time (hour) and response time (second) has been reported in Table 8. Not only that, as predicted, drug classification is an essential downstream task, a comparative study of all the comparative approaches based on their ROC curve obtained in Fig. 13.

Table 8 Comparative Study based on training and response time

Full size table

From this discussion, it can be concluded that SGCNN is the most stable classifier than the rest, which supports the superiority of this method.

From the above discussion, it may be inferred that the proposed system (SGCNN) is nonexclusive and give a preferable outcome over the current methods [4, 9, 17, 33, 35, 40, 49, 58]. That likewise takes less time (~6 hours) to train the data, whereas the other approaches require more hours to train the datasets. The paper Ray et al., [49] takes less time to train the datasets (MT050493, and MT012098) and Martinez [40] for NC_045512.2 because it can create information on its own, but SGCNN needs a couple of information to prepare the datasets. Other than that, it has a fast reaction time, likewise in drug predicting (~1.5 sec). So considering all the above realities, this novel proposed technique has the enormous possibility to battle against the recent COVID-19 pandemic.

5 Conclusion

A novel construction has been proposed to track down the most appropriate antiviral medications to battle against the COVID-19 pandemic. This type of infection is presently spread globally, and as of now, 10 million individuals kicked the bucket because of this intense respiratory sickness. A few analysts are included to discover an immunization against this infection, yet till now, any effective results are not getting.

Here, the proposed method attempts to foresee antiviral medications through computerized reasoning procedures based on the information on genome sequencing. The genome datasets are sequenced through oNGS to acquire drug target region genome bunches. At that point, from these bunches, an element removal has been done, and from these highlights, a DNA interface network has been built to make a remarkable objective area. An AD3 investigation has been achieved to get the comparing substance conditions and get a medication information base dependent on core comparability. Finally, the objective region and medication information base feed the SGCNN to foresee the medications to treat the sickness shaped by COVID-19. The model has ~96% exactness and exceptionally less preparing time(~6 hours) and fast reaction time(~1.5 s) and achieves better than present methodologies.

Be that as it may, some sections should be refreshed later on. Till now, it can foresee the medication and combination of a few medicines to battle against the sickness. Yet, it is incapable of anticipating the synthetic composition (in mg/L) of the blend required for the ideal treatment. The future work is right now toward this path.

Data availability

The datasets analysed during the current study are available in the (MT050943 [48], MT012098 [61] and NC_045512.2 [59]) repository, [www.ncbi.nlm.nih.gov/nuccore/MT050493, www.ncbi.nlm.nih.gov/nuccore/MT012098 and 32015508. https://www.ncbi.nlm.nih.gov/nuccore/1798174254].

Change history

24 August 2024
This article has been retracted. Please see the Retraction Notice for more detail: https://doi.org/10.1007/s11042-024-20150-w

References

Abdel-Basset M, Hawash H, Elhoseny M, Chakrabortty RK, Ryan M (2020) DeepH-DTA: deep learning for predicting drug-target interactions: a case study of COVID-19 drug repurposing. IEEE Access 8:170433–170451
Article Google Scholar
Abdulrahman SA, Salem ABM (2020) A efficient deep belief network for Detection of Corona Virus Disease COVID-19. Fusion Pract, Appl
Google Scholar
Acharya A, Agarwal R, Baker MB, Baudry J, Bhowmik D, Boehm S, Byler KG et al (2020) Supercomputer-based ensemble docking drug discovery pipeline with application to COVID-19. J Chem Inf Model 60(12):5832–5852
Article Google Scholar
Alakus TB, Turkoglu I (2021) A Novel Protein Mapping Method for Predicting the Protein Interactions in COVID-19 Disease by Deep Learning. Interdiscip Sci: Comput Life Sci 13(1):44–60
Article Google Scholar
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 80(20):31401–31433
Article Google Scholar
Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870
Article MathSciNet Google Scholar
Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247
Article Google Scholar
Amin Abdul SK, Ghosh K, Gayen S, Jha T, (2020) Chemical-informatics approach to COVID-19 drug discovery: Monte Carlo based QSAR, virtual screening and molecular docking study of some in-house molecules as papain-like protease (PLpro) inhibitors. J Biomol Struct Dyn, pp. 1–10
Amin S, Ghosh K, Singh S, Qureshi IA, Jha T, Gayen S (2022) Exploring naphthyl derivatives as SARS-CoV papain-like protease (PLpro) inhibitors and its implications in COVID-19 drug discovery. Mol Divers 26(1):215–228
Article Google Scholar
Aslan MF, Unlersen MF, Sabanci K, Durdu A, (2021) CNN-based transfer learning–BiLSTM network: A novel approach for COVID-19 infection detection. Appl Soft Comput, 98, pp.106912
Bahri S, Kdayem M and Zoghlami N, (2021) Long Short-Term Memory based RNN for COVID-19 disease prediction. In 2021 22nd IEEE International Conference on Industrial Technology (ICIT), Vol. 1, pp. 901–906
Bandyopadhyay SK and Dutta S, (2020). Machine learning approach for confirmation of covid-19 cases: Positive, negative, death and release. MedRxiv
Banerjee A, Pati SK (2022) Predicting Antiviral Drugs for COVID-19 Treatment Using Artificial Intelligence Based Approach. In: Nayak J, Naik B, Abraham A (eds) Understanding COVID-19: The Role of Computational Intelligence. Studies in Computational Intelligence, vol 963. Springer, Cham. https://doi.org/10.1007/978-3-030-74761-9_11
Chapter Google Scholar
Banerjee A, Ghosh C, Mandal SN (2022) Analysis of V-Net Architecture for Iris Segmentation in Unconstrained Scenarios. SN COMPUT SCI 3:208. https://doi.org/10.1007/s42979-022-01113-0
Article Google Scholar
Banerjee A, Maji D, Datta R, Barman S, Samanta D, Chattopadhyay S (2022) SHUBHCHINTAK. Multimed Tools Appl:1–27. https://doi.org/10.1007/s11042-022-13539-y
Barnes SJ, Diaz M, Arnaboldi M, (2021) Understanding panic buying during COVID-19: A text analytics approach, Expert Systems with Applications, Volume 169, 114360, ISSN 0957–4174, https://doi.org/10.1016/j.eswa.2020.114360
Bhati AP, Wan S, Alfè D, Clyde AR, Bode M, Tan L, Titov M et al. (2021) Pandemic Drugs at Pandemic Speed: Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning-and Physics-based Simulations on High Performance Computers. arXiv preprint arXiv:2103.02843.
Chenthamarakshan V, Das P, Padhi I, Strobelt H, Lim KW, Hoover B, Hoffman SC, Mojsilovic A, (2020) Target-specific and selective drug design for covid-19 using deep generative models.
Cheung M and Moura JM, (2020) Graph Neural Networks for COVID-19 Drug Discovery. In: 2020 IEEE International Conference on Big Data (Big Data) (pp. 5646-5648)
Chimmula VKR, Zhang L, (2020) Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos, Solitons Fractals, 135, pp.109864
Cuccarese MF, Earnshaw BA, Heiser K, Fogelson B, Davis CT, McLean PF, Gordon HB, Skelly KR, Weathersby FL, Rodic V, Quigley IK, (2020) Functional immune mapping with deep-learning enabled phenomics applied to immunomodulatory and COVID-19 drug discovery. bioRxiv
Dar'ya SR, MacKinnon SS, Landon M, Windemuth A, Kurji N and Shahani V, (2020) PolypharmDB, a deep learning-based resource, quickly identifies repurposed drug candidates for COVID-19
de Moura J, Novo J and Ortega M, (2022) Fully automatic deep convolutional approaches for the analysis of COVID-19 using chest X-ray images. Appl Soft Comput, 115, p.108190
Delijewski M, Haneczok J, (2021) AI drug discovery screening for COVID-19 reveals zafirlukast as a repurposing candidate. Medicine in Drug Discovery, 9, pp.100077
Du B, Liu Z, Luo F, (2022) Deep multi-scale attention network for RNA-binding proteins prediction, Information Science,582, pp. 287–301. https://doi.org/10.1016/j.ins.2021.09.025
Galindez G, Matschinske J, Rose TD, Sadegh S, Salgado-Albarrán M, Späth J, Baumbach J, Pauling JK (2021) Lessons from the COVID-19 pandemic for advancing computational drug repurposing strategies. Nature Comput Sci 1(1):33–41
Article Google Scholar
Gan R, Rosoman NP, Henshaw DJ, Noble EP, Georgius P, Sommerfeld N (2020) COVID-19 as a viral functional ACE2 deficiency disorder with ACE2 related multi-organ disease. Med Hypotheses 144:110024
Article Google Scholar
Gaudelet T, Day B, Jamasb AR, Soman J, Regep C, Liu G, Hayter JB, Vickers R, Roberts C, Tang J, Roblin D, (2020) Utilising graph machine learning within drug discovery and development. arXiv preprint arXiv:2012.05716
Hasan N, (2020) A methodological approach for predicting COVID-19 epidemic using EEMD-ANN hybrid model. Internet of things, 11, pp.100228
Hooshmand SA, Ghobadi MZ, Hooshmand SE, Jamalkandi SA, Alavi SM, Masoudi-Nejad A, (2020) A multimodal deep learning-based drug repurposing approach for treatment of COVID-19. Mol Divers, pp.1–14
Ioannidis VN, Zheng D, Karypis G, (2020) Few-shot link prediction via graph neural networks for covid-19 drug-repurposing. arXiv preprint arXiv:2007.10261
Ionov NS, Pogodin PV, Poroikov VV (2020) Assessing the Prediction Quality of the Anti-SARS-CoV-2 Activity Using the D3Targets-2019-nCoV Web Service. Biomed Chem: Res Meth 3(4):e00140–e00140
Google Scholar
Jacobs AS, Moon T, McLoughlin K, Jones D, Hysom D, Dong H, Gyllenhaal AJ, et al. (2020) Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative models. The International Journal of High Performance Computing Applications (2020): 10943420211010930.
Jamshidi MB, Lalbakhsh A, Talla J, Peroutka Z, Roshani S, Matousek V, Roshani S, Mirmozafari M, Malek Z, La Spada L, Sabet A, (2021) Deep Learning Techniques and COVID-19 Drug Discovery: Fundamentals, State-of-the-Art and Future Directions. Emerging Technologies During the Era of COVID-19 Pandemic, 348, pp. 9
Kumari M and Subbarao N, (2021) Deep learning model for virtual screening of novel 3C-like protease enzyme inhibitors against SARS coronavirus diseases. Comput Biol Med, 132, p.104317.
Lazarus JV, Ratzan SC, Palayew A, Gostin LO, Larson HJ, Rabin K, Kimball S, El-Mohandes A (2020) A global survey of potential acceptance of a COVID-19 vaccine. Nat Med:1–4
Li D, Fu Z, Xu J (2021) Stacked-autoencoder-based model for COVID-19 diagnosis on CT images. Appl Intell 51(5):2805–2817
Article Google Scholar
Liang C, Tian L, Liu Y, Hui N, Qiao G, Li H, Shi Z, Tang Y, Zhang D, Xie X, and Zhao X, (2020) A promising antiviral candidate drug for the COVID-19 pandemic: A mini-review of remdesivir. Eur J Med Chem, pp.112527
Majumdar S, Nandi SK, Ghosal S, Ghosh B, Mallik W, Roy ND, Biswas A, Mukherjee S, Pal S, Bhattacharyya N (2021) Deep learning-based potential ligand prediction framework for COVID-19 with drug–target interaction model. Cogn Comput 2021:1–13
Google Scholar
Martinez MA (2022) Efficacy of repurposed antiviral drugs: Lessons from COVID-19, Drug Discovery Today, Volume 27, Issue 7, Pages 1954–1960, ISSN 1359–6446, https://doi.org/10.1016/j.drudis.2022.02.012
Meng Y, Deng Z, Zhao K, Xu Y, Liu H (2021) Hierarchical correlation siamese network for real-time object tracking. Appl Intell 51(6):3202–3211
Article Google Scholar
Nag S, Dalgaard MD, Kofoed PE, Ursing J, Crespo M, Andersen LOB, Aarestrup FM, Lund O, Alifrangis M (2017) High throughput resistance profiling of Plasmodium falciparum infections based on custom dual indexing and Illumina next generation sequencing-technology. Sci Rep 7(1):1–13
Article Google Scholar
Nguyen DD, Gao K, Chen J, Wang R, Wei GW (2020) Unveiling the molecular mechanism of SARS-CoV-2 main protease inhibition from 137 crystal structures using algebraic topology and deep learning. Chem Sci 11(44):12036–12046
Article Google Scholar
Oktay T, Yoğurtçuoğlu E, Sarıkaya RN, Karaca AR, Kömürcü MF, Sayar A, (2021) Multimodel anomaly detection on Spatio-temporal logistic datastream with open anomaly detection architecture, Expert Systems with Applications, Volume 186, 2021, 115755, ISSN 0957–4174, https://doi.org/10.1016/j.eswa.2021.115755.
Paul R, Turzo AK and Bosu A, (2021) A Dataset of Vulnerable Code Changes of the Chromium OS project. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) pp. 244-245
L. Peng et al. Reverse Graph Learning for Graph Neural Network. In: IEEE Transactions on Neural Networks and Learning Systems, https://doi.org/10.1109/TNNLS.2022.3161030
Pham TH, Qiu Y, Zeng J, Xie L, Zhang P (2021) A deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to COVID-19 drug repurposing. Nat Mac Int 3(3):247–257
Article Google Scholar
Potdar V, Choudhary M, Yadav PD, Shete-Aich A (2020) Maximum Containment Laboratory, National Institute of Virology, Pashan, Pune, Maharashtra 411021, India. www.ncbi.nlm.nih.gov/nuccore/MT050493
Ray S, Lall S, Mukhopadhyay A, Bandyopadhyay S, Schönhuth A (2020) Predicting potential drug targets and repurposable drugs for covid-19 via a deep generative model for graphs. arXiv preprint arXiv:2007.02338
Sainz-Pardo JL, Valero J (2021) COVID-19 and other viruses: Holding back its spreading by massive testing, Expert Systems with Applications, Volume 186, 115710, ISSN 0957–4174, https://doi.org/10.1016/j.eswa.2021.115710
Savioli N, (2020) One-shot screening of potential peptide ligands on HR1 domain in COVID-19 glycosylated spike (S) protein with deep siamese network. arXiv preprint arXiv:2004.02136.
Shahid F, Zameer A and Muneeb M, (2020) Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos, Solitons Fractals, 140, pp.110212
Smith TR, Patel A, Ramos S, Elwood D, Zhu X, Yan J, Gary EN, Walker SN, Schultheis K, Purwar M, Xu Z (2020) Immunogenicity of a DNA vaccine candidate for COVID-19. Nat Commun 11(1):1–13
Article Google Scholar
Song Y and Sun W, (2021) PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration. In International Conference on Machine Learning, pp. 9801-9811)
Waheed A, Goyal M, Gupta D, Khanna A, Al-Turjman F and Pinheiro PR, (2020) Covidgan: data augmentation using auxiliary classifier gan for improved covid-19 detection. IEEE Access, 8, pp.91916–91923
Wang Q, Li M, Wang X, Parulian N, Han G, Ma J, Tu J, Lin Y, Zhang H, Liu W, Chauhan A, (2020) COVID-19 literature knowledge graph construction and drug repurposing report generation. arXiv preprint arXiv:2007.00576.
Wang SH, Govindaraj VV, Górriz JM, Zhang X, Zhang YD (2021) Covid-19 classification by FGCNet with deep feature fusion from graph convolutional network and convolutional neural network. Inform Fusion 67:208–229
Article Google Scholar
Wang B, Ding Y, Zhao P, Li W, Li M, Zhu J, Ye S (2022) Systems pharmacology-based drug discovery and active mechanism of natural products for coronavirus pneumonia (COVID-19): An example using flavonoids. Comput Biol Med 143:105241
Article Google Scholar
Wu F, Zhao S, Yu B, ChenYM, WangW, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ (2020) A new coronavirus associated with human respiratory disease in China, Nature 579 (7798), pp. 265–269 (2020) 32015508. https://www.ncbi.nlm.nih.gov/nuccore/1798174254
Xia D, Jiang S, Yang N, Hu Y, Li Y, Li H, Wang L (2021) Discovering spatiotemporal characteristics of passenger travel with mobile trajectory big data. Phys A: Statist Mechan Appl 578:126056
Article Google Scholar
Yadav PD, Potdar V and Abraham P (2020) Maximum Containment Laboratory, National Institute of Virology, Pashan, Pune, Maharashtra 411021, India. www.ncbi.nlm.nih.gov/nuccore/MT012098.
Yang R, Liu H, Bai C, Wang Y, Zhang X, Guo R, Wu S, Wang J, Leung E, Chang H, Li P, (2020) Chemical composition and pharmacological mechanism of Qingfei Paidu Decoction and Ma Xing Shi Gan Decoction against Coronavirus Disease 2019 (COVID-19): in silico and experimental study. Pharmacol Res, 157, pp.104820
Yu X, Lu S, Guo L, Wang SH, Zhang YD (2021) ResGNet-C: A graph convolutional neural network for detection of COVID-19. Neurocomputing 452:592–605
Article Google Scholar
Zhang R, Hristovski D, Schutte D, Kastrin A, Fiszman M, Kilicoglu H, (2021) Drug repurposing for COVID-19 via knowledge graph completion. J Biomed Inform, 115, pp.103696

Download references

Acknowledgments

The authors would like to thank anonymous reviewers for their valuable comments.

Author information

Authors and Affiliations

Department of Bioinformatics, Maulana Abul Kalam Azad University of Technology, Haringhata, West Bengal, 741249, India
Soumen Kumar Pati & Manan Kumar Gupta
Department of Computer Science & Engineering, Jalpaiguri Governmemt Engineering College, Jalpaiguri, West Bengal, 735102, India
Ayan Banerjee
Department of Mathematics, Behala College, Calcutta University, Kolkata, West Bengal, 700060, India
Rinita Shai
Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Palaiahnakote Shivakumara

Authors

Soumen Kumar Pati
View author publications
You can also search for this author in PubMed Google Scholar
Manan Kumar Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Ayan Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Rinita Shai
View author publications
You can also search for this author in PubMed Google Scholar
Palaiahnakote Shivakumara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Soumen Kumar Pati or Ayan Banerjee.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest in this paper.

Ethical approval

This article does not contain any studies with human participants performed by any authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article has been retracted. Please see the retraction notice for more detail: https://doi.org/10.1007/s11042-024-20150-w

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

About this article

Cite this article

Pati, S.K., Gupta, M.K., Banerjee, A. et al. RETRACTED ARTICLE: Drug discovery through Covid-19 genome sequencing with siamese graph convolutional neural network. Multimed Tools Appl 83, 61–95 (2024). https://doi.org/10.1007/s11042-023-15270-8

Download citation

Received: 16 April 2022
Revised: 23 September 2022
Accepted: 06 April 2023
Published: 10 May 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15270-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

RETRACTED ARTICLE: Drug discovery through Covid-19 genome sequencing with siamese graph convolutional neural network

Abstract

Similar content being viewed by others

Predicting Antiviral Drugs for COVID-19 Treatment Using Artificial Intelligence Based Approach

Generating novel molecule for target protein (SARS-CoV-2) using drug–target interaction based on graph neural network

A separable temporal convolutional networks based deep learning technique for discovering antiviral medicines

1 Introduction

1.1 Objective and proposed outcome

1.2 Organization of the paper

2 Literature review

3 Drug discovery architecture

3.1 Optimized next gene sequencing

3.2 Variable DNA neighborhood search (VDNS)

3.3 Obtaining drug database with advanced D3Similarity (AD3)

D3 Law

Dockerization

Steps involved in AD3

3.4 Siamese graph convolutional neural network (SGCNN)

3.4.1 Network architecture

3.4.2 The GCNN

3.4.3 The S-Net

3.4.4 The attention network

4 Result Analysis

4.1 Experimental setup

4.2 Implementation details

4.3 Performance evaluation

4.3.1 Optimized next gene sequencing

4.3.2 DNA interaction network

4.3.3 Advanced D3Similarity

4.3.4 Drug repurposing with SGCNN

4.4 Ablation study

4.5 Comparative study

5 Conclusion

Data availability

Change history

24 August 2024

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation