Molecular interactions of some phenolics with 2019-nCoV and related pathway elements

: As of June 2021, the novel coronavirus disease (SARS-CoV-2) resulted in 180 million cases worldwide and resulted in the death of approximately 4 million people. However, an effective pharmaceutical with low side effects that can be used in the treatment of SARS-CoV-2 infection has not been developed yet. The aim of this computational study was to analyze the interactions of twenty-two hydroxycinnamic acid and hydroxybenzoic acid derivatives with the SARS-CoV-2 receptor binding domain (RBD) and host organism's proteases, transmembrane serine protease 2 (TMPRSS2), and cathepsin B and L (CatB/L). According to the RBCI analysis, the ligands with the highest affinity against 4 enzymes in the molecular docking study were determined as 1-caffeoyl-β -D-glucose, rosmarinic acid, 3-p -coumaroylquinic acid and chlorogenic acid. It has also been observed that these compounds interacted more strongly with spike RBD, CatB and CatL enzymes. Although the top-ranked ligand, 1-caffeoyl-β -D-glucose, violated the drug-likeness criteria at 1 point (NH or OH>5) and ADMET in terms of AMES toxicity, the second top-ranked ligand rosmarinic acid neither violated drug-likeness nor exhibited incompatibility in terms of ADMET. In conclusion, with its anti-inflammatory properties, rosmarinic acid can be considered and further investigated as a plant-based pharmaceutical that can offer a treatment option in SARS-CoV-2 infection. However, our findings should be supported by additional in vitro and in vivo studies.


INTRODUCTION
Corona Virus Disease 2019 , which spread all over the world within months after emerging in China, infected more than 180 million people as of June 2021 and caused the death of approximately four million people (Wu et al., 2020;Zhu et al., 2020;Worldometers.info, 2021).The World Health Organization (WHO) defined the microorganism causing this disease as the new type of Severe Acute Respiratory Syndrome Corona Virus 2 (2019-nCoV) (Wu et al., 2020;Zhou et al., 2020;Zhu et al., 2020).
It has been determined that the genome of 2019-nCoV consists of a single-stranded RNA molecule with a length of approximately 30.000 nucleotides.As a result of genome analysis, genes encoding the four main components of the virus (nucleocapsid protein, envelope protein, membrane protein and spike glycoprotein) were identified (Wu et al., 2020).Among these proteins, especially the spike glycoprotein is the basic structural molecule that enables the virus to bind to receptors on the host cell surface.Therefore, it is one of the most targeted molecules in treatment development strategies against COVID-19 (Luan et al., 2020).
Angiotensin converting enzyme 2 (ACE2) is the most critical host component that plays a role in the binding of 2019-nCoV to the host cell.The virus binds to ACE2 via spike glycoprotein, thus ACE2 functions as the key for virus entry into the cell (Li et al., 2003;Li et al., 2005).The majority of researchers trying to develop an effective drug against COVID-19 have turned to targeting the 2019-nCoV-ACE2 interaction (Letko et al., 2020;Zhou et al., 2020).Focusing on elucidating the details of the binding interaction between the spike glycoprotein and ACE2, researchers found contact between some amino acids of the viral component in question (Leu455, Phe486, Glu493, Ser494, Asp501 and Tyr505) and mammalian ACE2 (Zakaryan et al., 2017;Andersen et al., 2020).
Of course, ACE2 is not the only molecule mediating the entry of 2019-nCoV into the host cell.In addition to this molecule, transmembrane protease, serine 2 (TMPRSS2) and cathepsins (CatB and L) were also found to assist in the entry of 2019-nCoV into the cell.TMPRSS2 is a serine protease and cleaves the spike glycoprotein from the S1/S2 site, thus facilitating virus entry and activation (Hoffmann et al., 2020).Cathepsins (CatB and CatL) are lysosomal endopeptidases that function in the host cell (Huang et al., 2006;Sudhan & Siemann, 2015;Hoffmann et al., 2020).
In this study, it was aimed to determine the interaction of some hydroxybenzoic and hydroxycinnamic acids with 2019-nCoV spike glycoprotein and host proteases (TMPRSS2, CatB and CatL).

Protein retrieval and energy minimization of the Spike, TMPRSS2, CatB and CatL proteins using Nanoscale Molecular Dynamics (NAMD)
Detailed information on protein retrieval and energy minimization of the spike and other target proteins are given in the Supplementary file.

Molecular docking analyses of phenolic acids
Detailed information on molecular docking analyses of phenolic acids is given in the Supplementary file.

Calculation of relative binding capacity index (RBCI)
Detailed information on calculation of RBCI is given in the supplementary file (Figure 2 and Table S1).

Drug-likeness, ADMET profile and target prediction
Detailed information on drug-likeness, ADMET profile and target prediction is given in the Supplementary file.

Network pharmacology analysis
Detailed information on network pharmacology analysis is given in the Supplementary file.

Molecular docking results of phenolic acids
In this study, the molecular level interactions of 22 phenolic acids based on molecular docking analyses with target proteins spike glycoproteins, TMPRSS2, CatB and CatL were calculated; however, based on the RBCI analysis, the non-bonded interactions of only 4 compounds (1caffeoyl-β-D-glucose, rosmarinic acid, 3-p-coumaroylquinic acid and chlorogenic acid) determined as 'hit' as a result of RBCI analysis are given (Tables S2, S3, S4 and S5).On the other hand, the binding free energies (ΔG, kcal/mol) and calculated inhibition constants (Ki, µM) of the 22 phytochemicals are given in Table 1.

RBCI values of phenolic acids
In this study, molecular scale interactions between 22 different phenolic acids and 4 different protein targets were investigated using molecular docking analysis.As a result of molecular docking calculations, the binding free energy values (kcal/mol) of each phytochemical against 4 proteins in question were used to calculate the relative binding capacity index (RBCI).The details of this method have been explained in detail in the 'Materials and methods' section.Using the RBCI value, the efficiency of phytochemicals on different proteins can be calculated using different data sets at the same time (Figure 2 and Table S1).
As a result of the RBCI analysis we applied, it was determined that 1-caffeoyl-β-D-glucose, rosmarinic acid, 3-p-coumaroylquinic acid and chlorogenic acid were the 'hit' compounds among 22 phytochemicals.Top ranked poses of 1-caffeoyl-β-D-glucose, rosmarinic acid, 3-pcoumaroylquinic acid and chlorogenic acid on the RBD of spike glycoprotein, TMPRSS2, CatB and CatL are presented in Figures 3, 4, 5, and 6, respectively.RBCI analysis also revealed that isovanillic acid, benzoic acid and salicylic acid were the ligands with the weakest affinity for 4 different protein targets.

Pharmacokinetic properties of phenolic acids
Drug-likeness, ADMET and target profiles of analyzed phenolic acids against spike glycoprotein, TMPRSS2, CatB and CatL are given in Tables 2 and 3, respectively.Except for chicoric acid, all phenolic acids were determined to obey the Lipinski's rule-of-five.Chicoric acid violates this rule because it has N or O > 10, or NH or OH > 5.It was determined that no ligand other than ferulic acid, p-coumaric acid, cinnamic acid, 4-hydroxybenzoic acid, benzoic acid and salicylic acid could pass blood-brain barrier (BBB).It has also been determined that no compound other than chicoric acid is a substrate of P-glycoprotein (P-gp).Protocatechuic acid and gallic acid were observed to inhibit CYP3A4, while no other compounds showed an inhibitory effect on cytochrome enzymes.While 1-caffeoyl-β-D-glucose, 6-o-p-coumaroyl-Dglucose and 1-o-feruloyl-β-D-glucose show mutagenicity in the AMES bacterial test system, no other phenolic acid studied showed AMES toxicity.On the other hand, only pcoumaroyltartaric acid and N-caffeoyl-L-aspartic acid showed hepatotoxic effect data among 22 ligands studied.The LD50 doses of the molecules in the rat ranged from 1.737 mol/kg to 2.811 mol/kg.While the most toxic molecule in the rat was 3-p-coumaroylquinic acid (1.737 mol/kg), the phytochemical with the least toxicity was rosmarinic acid (2.811 mol/kg).

Target predictions and network pharmacology of hit phenolic acids
In this study, the screening of hit phytochemicals against possible intracellular targets was performed using Swiss Target Prediction (http://www.swisstargetprediction.ch/) and STITCH 5.0 (http://stitch.embl.de/)public databases.
Intracellular targets of chlorogenic acid generally consist of proteases (32%), various enzymes (26%) and lyases (10%) (Figure 7D).Although chlorogenic acid shows a statistically insignificant level of probability (p = 0.09 -0.33) in terms of interaction with the proteases and lyases, its interaction with various intracellular enzymes appears slightly significant (p = 0.77 -0.80).The STITCH platform was used to predict putative targets of hit phenolic acids in the human proteome.Thus, the direct interactions of 1-caffeoyl-β-D-glucose, rosmarinic acid, 3-pcoumaroylquinic acid and chlorogenic acid with different proteins were mapped (Figure 8).Prior to mapping target-component interactions, the minimum required interaction score was set to a high confidence score which was ≥ 0.7.A high confidence score indicates a strong interaction between hit phytochemicals and protein(s).According to the targets-components interaction network in Figure 8, it can be observed that 2 ligands other than rosmarinic acid and chlorogenic acid do not directly interact with the human proteome.Rosmarinic acid interacts with FOS, IL2, LCK, CCR3 and IKBKB proteins, while chlorogenic acid interacts with NINJ1, CASP3, DNMT1, MAPK8 and HMGB1 proteins.For rosmarinic acid, it is noteworthy that kinases (LCK, IKBKB) play a central role among protein targets predicted by both SwissTargetPrediction and STITCH platforms (Figures 7A, 7B, 7C, 7D and Figure 8).

DISCUSSION and CONCLUSION
Although the novel coronavirus disease (SARS-CoV-2), which emerged in Wuhan in 2019, affected the whole world as a pandemic in a short time, an effective drug molecule has not yet been found in the treatment of this virus.Therefore, a safe orally administered pharmaceutical would be a breakthrough in the treatment of SARS-CoV-2.In this context, organic molecules of plant origin always maintain their potential among possible treatment options against viral diseases (Chavez et al., 2006;Ruibo et al., 2017;Taguchi et al., 2017;Wu et al., 2017;Jahan & Onay, 2020;Piccolella et al., 2020;Cano-Avendaño et al., 2021).On the other hand, although herbal compounds are biologically diverse and accessible, drug discovery is an arduous process and molecular docking-based bioinformatics approach has become an increasingly important tool in this field in recent years (Meng et al., 2011).
In the present study, the molecular interactions of twenty-two polyphenolic compounds, derivatives of hydroxycinnamic acid and hydroxybenzoic acid (Table 1, Figure 1), with the receptor binding domain (RBD) of SARS-CoV-2 spike glycoprotein, TMPRSS2, CatB and CatL were analyzed.According to the RBCI analysis we applied, it was determined that 1-Caffeoyl-β-D-glucose was the phytochemical with the highest affinity (binding free energy) against all target proteins (Table 1).There is no docking or molecular dynamic study of 1-Caffeoyl-β-D-glucose with the four target proteins (spike RBD, TMPRSS2, CatB and CatL) in the literature.Therefore, our results regarding this ligand constitute the first record in terms of literature.In our study, the second top-ranked ligand, rosmarinic acid, showed a high binding affinity for spike, TMPRSS2, CatB and CatL proteins (-6,13, -5,10, -5,44 and -6,21 kcal/mol, respectively) (Table 1).Although there is no other study showing the binding affinity of rosmarinic acid against spike protein, this ligand was found to show a strong binding (MolDock score of -89.17) against the host cell furin (a protein convertase) enzyme that the spike protein of SARS-CoV-2 uses to enter the cell using the Molegro Virtual Docker program (Kumar Verma et al., 2021).On the other hand, in support of our study, it was reported in a different study using the Glide XP docking program that rosmarinic acid showed a reasonably high binding affinity (-5.6 kcal/mol) for the TMPRSS2 enzyme (Coban et al., 2021).However, the molecular docking study of rosmarinic acid regarding CatB and CatL enzymes has not been found in the literature.Regarding 3-p-coumaroylquinic acid (Table 1), the third promising ligand in this study, no molecular docking studies on spike, TMPRSS2, CatB or CatL enzymes were found in the literature.However, it has been reported that 3-p-coumaroylquinic acid showed a very favorable binding affinity (-7.2 kcal/mol) against non-structural protein 10 (nsp10) of SARS-CoV-2 in molecular docking simulation (Mohammad et al., 2021).This result supports our finding we obtained with SARS-CoV-2 spike glycoprotein.In a molecular docking study conducted with the post fusion core of the SARS-CoV-2 spike S2 subunit, spike glycoprotein open state and spike glycoprotein closed state structures, chlorogenic acid, our 4th top-ranked ligand (Table 1), has been reported to bind to these different spike protein states with high affinities (-101.663, -108,993 and -92,121 MolDock scores) (Adem et al., 2021).These results are also in agreement with our study.
In this present study, 1-caffeoyl-β-D-glucose, rosmarinic acid, 3-p-coumaroylquinic acid, and chlorogenic acid, which we determined as the first four top-ranked ligands, have druglikeness properties according to Lipinski's Rule of Five (1-caffeoyl-β -D-glucose 1 violation [NH or OH>5]; chlorogenic acid 1 violation [NH or OH>5]).6-tri-o-caffeoyl-β-Dglucopyranose (TCGP), an isomer of 1-caffeoyl-β-D-glucose isomer, has been reported to effectively inhibit the entry of HIV-1 Env pseudovirus into host cells (IC50 = 5.5 µg/mL) (Dong et al., 2013).Therefore, although the spatial arrangement of the atoms of these two isomers are different, it can be expected that these two close isomers will show similar antiviral activity.It has been also shown that rosmarinic acid and chlorogenic acid have wide-ranging pharmacological effects on the inflammatory response, tumor occurrence and development, antioxidant, antimicrobial, anti-inflammatory and hepatoprotective effects in the host organism (Maalik et al., 2016;Guan et al., 2021).In general, it is obvious that the ligands in our study have various pharmacological effects (Table 2).However, considering the ADMET profiles, 1caffeoyl-β-D-glucose was positive in the bacterial gene mutation test (AMES test), indicating that 1-caffeoyl-β-D-glucose may be a potential mutagen (Table 3).
Although our top-four ligands generally interact with cellular enzymes, lyases, proteases and kinases according to SwissTarget analysis (Figures 7A,7B,7C and 7D), based on probability analysis, only rosmarinic acid showed potential for interaction (p = 0.96) with its putative targets (lyases, various enzymes, proteases and kinases).Interestingly, it should be noted that rosmarinic acid showed an anti-inflammatory effect by inhibiting the PI3K-Akt pathway stimulated by IL2 in mice gastric cells (Nam et al., 2020) and by inhibiting the kinase IKK-β (inhibitor of nuclear factor kappa-B kinase subunit beta) activity in human dermal fibroblasts (Lee et al., 2006).This reported that inhibitory potential of rosmarinic acid on interleukins and kinases in the inflammatory pathway is in high agreement with the 'targets-components analysis' we performed on the STITCH public server.Based on this analysis, putative targets of rosmarinic acid include IKK-β (IKBKB), CCR3 and IL2, which play central roles in the inflammatory pathway (Figure 8).
In this study, it was investigated whether 22 polyphenolic phytochemicals (hydroxycinnamic acid and hydroxybenzoic acid derivatives) have effective pharmaceutical properties in the fight against SARS-CoV-2 by determining their binding affinities against the spike glycoprotein, TMPRSS2, CatB and CatL via molecular docking method.Considering the molecular docking scores in combination with the drug-likeness and intracellular target predictions, it can be suggested that rosmarinic acid may be the most promising ligand among these 22 phytochemicals.As known, SARS-CoV-2 viral replication causes an aggressive inflammation (cytokine storm) in patients infected with this virus, and increased plasma levels of IL-2, IL-7, IL-10, GCSF, IP-10, MCP-1, MIP-1A and TNF-α are frequently observed in affected individuals (Fu et al., 2020).Therefore, based on its additional anti-inflammatory effects, we believe that further molecular optimization and investigation of the efficacy of rosmarinic acid in pre-clinical in vitro and in vivo studies may be promising in drug development efforts against SARS-CoV-2.

Structural optimization of ligands
The Automatic Topology Builder (ATB) online server was utilized for geometry optimization of all the ligands using density functional theory DFT/B3LYP/6-31G* basis set.Following geometry optimization, the energetically minimized 3D ligand structures generated by the ATB server was exported in pdb file format for further use in molecular docking analyzes (Malde et al., 2011).

Homology modeling of TMPRSS2
The crystallographic structure of human TMPRSS2 enzyme has not been resolved until today, therefore, a homology model was generated for this enzyme to utilize in further molecular docking and molecular dynamics analyses.The amino acid sequence of TMPRSS2 was downloaded from UniProtKB (https://www.uniprot.org/uniprot/O15393). Template search for TMPRSS2 catalytic domain was performed against the SWISS-MODEL template library with BLAST and HHBlits.BLAST was used to search the TMPRSS2 catalytic domain target sequence against the primary amino acid sequence in the SMTL (Remmert et al., 2012).
ProMod3 was used to carry out model building for TMPRSS2 catalytic domain based on the target-template alignment.The rest of the procedure was carried out as previously described (Guex et al., 2009).
The model quality (global and per-residue) of TMPRSS2 obtained was evaluated with the QMEAN scoring function (Studer et al., 2020).A near-zero QMEAN score was a good value in terms of the quality of the fit between model structure and the experimental structure.According to the QMEAN score, however, scores of 4.0 and below indicated that the model was of poor quality.Therefore, among the top 5 TMPRSS2 models we obtained as a result of homology modeling, we determined the 5ce1.1.A (model 06) model as the target structure in the molecular docking analysis.
In addition, whether our model has an energetically favorable conformation, we generated a Ramachandran plot (Figure S1) using the PROCHECK web server (Laskowski et al., 1993).Also, ERRAT web-based tool (Figure S2) was also deployed to calculate the overall quality factor (OQF) for non-bonded atomic interactions (Colovos and Yeates, 1993).

Protein retrieval and energy minimization of the Spike, TMPRSS2, CatB and CatL proteins using Nanoscale Molecular Dynamics (NAMD)
The purpose of performing energy minimization (optimization) of proteins is the physical significance of the obtained 3D structures: the optimized structures often resemble to their native conformation as they found in nature.Thus, this process searches to find the lowest energy conformation of the proteins in question.The spike glycoprotein was retrieved by removing the ACE2 subunit from the angiotensin-converting enzyme 2 -2019nCoV RBD complex in Discovery Studio Visualizer v16.This model was downloaded from "https://swiss model.expasy.org/interactive/HLkhkP/models/03"(PDB ID: model_03.pdb)(Camacho et al., 2009;Remmert et al., 2012).Since, the crystallographic structure of human TMPRSS2 enzyme has not still been defined, using SWISS-MODEL, we generated a homology model of this enzyme to utilize in further molecular docking analyses.The details of the homology modeling procedure can be found in the study of Istifli et al. (2020).The PROCHECK web server was utilized whether our generated model has an energetically favorable conformation (Figure S1) and the ERRAT web-based tool was also employed to calculate the protein overall quality factor (OQF) (Figure S2) (Colovos & Yeates, 1993;Laskowski et al., 1996).The crystal structures of CatB (PDB ID: 1GMY) and CatL (PDB ID: 2YJ9) enzymes were retrieved from Protein Data Bank.
Water molecules, co-crystallized ligands and non-interacting ions were removed from proteins before energy minimization.During the protein energy minimization step, the atom types and electrical charges of the spike glycoprotein, TMPRSS2, CatB and CatL were fixed using CHARMM22_PROT force field and Gasteiger-Marsili charges in the Vega ZZ software (Pedretti et al., 2004).Next, for the energy minimization of all the proteins using NAMD, the parameters were loaded from a template file.The number of time steps (number of minimization steps) were set to 10.000 and CHARMM22_PROT was set as the force field.When the energy minimization was completed, the 3D structures corresponding to the last minimization step of all proteins were saved as the lowest energy conformation.
Docking calculations were performed using 100 genetic algorithm (GA) runs, an initial population of 150 individuals, max.number of 3.000.000energy evaluations, and a max.number of 27.000 generations.The mutation and crossover rates were set as default values, 0.02 and 0.8, respectively.After 100 independent docking runs, all the ligand binding modes (conformations) were clustered and ranked on the basis of the most negative free energy of binding (kcal/mol).The best poses of receptor-ligand pre-reactive complexes obtained by AutoDock 4.2.6 were visualized and examined with BIOVIA Discovery Studio Visualizer v16.

Calculation of relative binding capacity index (RBCI)
In this study, the relative binding capacity index (RBCI) was applied to statistically rank the activity potentials of phytochemicals using the binding free energy values obtained from the binding analysis (Figure 2 and Table S1) (Istifli et al., 2020).Using RBCI, it is possible to compare statistically relevant data with different scientific meanings.Since the binding energies of ligands are different for each protein, phytochemicals can only be ranked in terms of their potential at this parameter if they are performed in the light of their binding energies only to one protein.However, sequencing based on only one of these proteins cannot represent the full activity potential of these molecules.The most common method used to calculate the interaction between each receptor and ligand is the "central bias" in which components are ranked according to the mean value of each component.
If the values (binding free energy) in each data set are converted into standard scores, it is possible to compare them with each other.Arithmetic mean and standard deviation values were calculated for each protein by using the binding free energies of the ligands.Raw standard scores were obtained by subtracting the binding free energies of each protein for each ligand from this arithmetic mean and then dividing by the standard deviation value (see equation given below) (Sharma, 1995).The RBCI values of each phytochemical were then calculated by taking the average of these standard scores obtained separately for each protein target.
Although RBCI is a relative measure and does not represent the specific binding capacities of the components, it makes it possible to rank components reasonably based on their binding free energy values.Therefore, it can be used as an integrated approach to evaluate the molecular interaction of the components, considering all parameters.

Drug-likeness, ADMET profile and target prediction
The determination of drug-likeness, ADMET and target profiles of promising hit compounds in structure-based drug design studies is important in terms of reducing their side effects on the target organism.In this study, web-based SwissTargetPrediction and pkCSM tools were used to investigate such effects of analyzed phenolic acids (Pires et al., 2015;Daina et al., 2019).

Figure 1 .
Figure 1.Chemical structures of the phenolic acids.

Figure 8 .
Figure 8. Targets-components analysis (chemical-protein interactions) of top 4 hit phytochemicals performed via the STITCH platform (http://stitch.embl.de).Note the explicit role of rosmarinic acid in the targets-components network.

Table 1 .
Free energy of binding and calculated inhibition constant values of the compounds.

Table 2 .
Drug-likeness properties of docked phenolic acids.

Table S2 .
Molecular interactions between the phenolic acids and RBD of the spike glycoprotein of 2019-nCoV.

Table S5 .
Molecular interactions between the phenolic acids and CatL.