SARS CoV-2 SPIKE GLYCOPROTEIN MUTATIONS AND CHANGES IN PROTEIN STRUCTURE

: Severe Acute Respiratory Syndrome Corona Virus-2 (SARS CoV-2) is a single-stranded positive polarity RNA virus with a high virulence effect. Spike (S) glycoprotein is the outermost component of the SARS CoV-2 virion and is important in the entry of the virus into the cell via the angiotensin converting enzyme 2 (ACE2) receptor. ACE2 plays an important role in the regulation of human blood pressure by converting the vasoconstrictor angiotensin 2 to the vasodilator angiotensin 1-7. In this study, the changes that mutations in Asian isolates may cause in S glycoprotein structure were analyzed and modeled to contribute to drug and vaccine targeting studies. Genome, proteome and mutation analyses were done using bioinformatics tools (MAFFT, MegaX, PSIPRED, MolProbity, PyMoL). Protein modelling was performed using ProMod3. We detected 26 mutations in the S glycoprotein. The changes that these mutations reveal in the general topological and conformational structure of the S glycoprotein may affect the virulence features of SARS CoV-2. It was determined that mutations converted the receptor binding domain (RBD) from down-formation to like-up formation. It is thought that conformational change occurring after mutation in RBD may cause an increase in receptor affinity. These findings could be beneficial for disease prevention of and drug/vaccine development for SARS CoV-2.


Introduction
SARS CoV-2 is a single-stranded RNA virus with positive polarity (Baltimore 1971, Perlman 2020). COVID-19 infection caused by SARS-CoV-2 has affected 6.8 million people in approximately 179 countries, resulting in 398 thousand deaths from December 2019 to June 06, 2020. Asia is one of the major centres affected by the disease, with 1.3 million cases and 33.791 deaths (Worldometer 2020). The etiology of the disease, which first appeared in Hubei province of China, has not been fully clarified (Bogoch et al. 2020). Although the origin of the virus is not known for certain, evidence suggests that it may have a bat origin ). The disease is manifested by nonspecific symptoms such as cough, fever and fatigue. Subsequently, shortness of breath and acute respiratory distress lead to mechanical ventilation and multiple organ failure in patients .

E. Akbulut
The SARS CoV-2 genome is a single-stranded RNA containing 12 protein-encoding regions with a length of 29,903 nucleotides ). Similar to other betacoronavirus, SARS-CoV-2 has a long ORF1ab polyprotein at the 5′ end, followed by four major structural proteins, including the spike surface glycoprotein, small envelope protein, matrix protein and nucleocapsid protein (Phan 2020). RNA virus genomes have high mutation potential (Drake 1993). In RNA viruses, mutations are thought to be the basis for adaptation and escape from the host cell's immune response (Kuljića & Budisin 1992). Mutations in some cases result in the weakening or complete eradication of the pathogenic effects of the viruses, and in other cases it may result in an increase in the severity of the infection with an opposite effect (Conenello et al. 2007, Zhang et al. 1998). To enter host cells, coronaviruses first bind to a cell surface receptor for viral attachment, subsequently enter endosomes, and eventually fuse viral and lysosomal membranes (Shang et al. 2020a). The trimeric S glycoprotein plays an important role in the entry of the virus into the cell by binding to ACE2 receptor of the host cell (Gallagher & Buchmeier 2001. ACE2 cleaves angiotensin II to angiotensin (1-7), which exerts vasodilating, anti-inflammatory, and anti-fibrotic effects through binding to the Mas receptor (Bourgonje et al. 2020). The stability of the S glycoprotein and ACE2 linkage is closely related to the severity of the viral infection. S glycoprotein is a critical determinant of viral host range and tissue tropism and a major inducer of host immune responses. S glycoprotein contains a large ectodomain, a single-pass transmembrane anchor, and a short intracellular tail (Li 2016). S glycoprotein binds to ACE2 on the host cell surface through its S1 subunit and then fuses viral and host membranes through its S2 subunit. In addition to S glycoprotein activity, transmembrane protease serine 2 (TMPRSS2), lysosomal proteases and Neuropilin 1 (NRP1) activities are also important in the entry of the virus into the cell (Daly et al. 2020, Hoffman et al. 2020).
Clarification of the virus-host interactions will provide significant opportunities not only for the elucidation of the pathogenesis of the disease but also for the development of antiviral drugs (Mahajan et al. 2018).
Mutations in spike glycoprotein can cause conformational and structural changes that affect the affinity of binding to the receptor. The receptor binding domain in which the virus interacts with the ACE2 receptor can be folded independently from the rest of the spike protein, and this makes the structural changes that occur in this region even more important in terms of antiviral drug design studies .
In this study, the S glycoprotein structure of 128 SARS CoV-2 Asian isolates was analyzed for possible mutational changes in the secondary, tertiary and quaternary structure to contribute to target identification studies involved in drug and vaccine design.

Sequence Data
Nucleotide and protein sequence information of 128 isolates from Asia continent were obtained from NCBI Virus database (NCBI 2019). Reference spike glycoprotein accesion code is YP_009724390.

Bioinformatic Tools
Protein sequence information of 128 isolates were aligned with the MAFFT (v7.463) multiple sequence alignment program FFT-NS-i algorithm (Carroll et al. 2007, Katoh 2002, Katoh et al. 2018). The scoring matrix BLOSUM 80 was chosen for the amino acid sequences (Mount 2008). Gap opening penalty was used as 2.0. The mutated residues were analyzed with MegaX bioinformatic workbench (Kumar et al. 2018). MolProbity tool was used for structural validation and model quality (covalent geometry, torsion angle, optimized hydrogen placement and whole atom contact analysis) of wild type and variant spike proteins (Chen et al. 2010). Physiochemical properties of wild type and variant spike proteins were estimated by ProtParam tool from ExPASy portal (Wilkins et al. 1999). Secondary structure components (random coils, beta strands alpha helices) of spike protein were identified by using PSIPRED web server (Buchan et al. 2013). Threedimensional model of wild type and variant spike proteins was generated by the method of homology modeling using Swiss-Model (Waterhouse et al. 2018). 3D structure alignments were performed by PyMOL (Ver2.3.4 Schrödinger).

Template Search
Template search with BLAST (Basic Local Alignment Search Tool) and HHBlits (Hidden Markov Model-HMM-based lightning-fast iterative sequence search) was performed against the SWISS-MODEL template library. The target sequence was searched with BLAST against the primary amino acid sequence contained in the SMTL (Swiss Model Template Library) (Camacho et al. 2009). A total of 170 templates were found. An initial HHblits profile was built using the procedure outlined in Remmert et al. (2012), followed by 1 iteration of HHblits against NCBI non-reductant protein sequence database (NR20). The obtained profile was then searched against all profiles of the SMTL. A total of 685 templates were found. SMTL ID (6vxx) was selected as template.

Model Building
Models were built based on the target-template alignment using ProMod3. Coordinates which were conserved between the target and the template were copied from the template to the model. Insertions and deletions were remodelled using a fragment library. Side chains were then rebuilt. Finally, the geometry of the resulting model was regularized by using a force field (Guex et al. 2009).

Model Quality Estimation
The global and per-residue model quality was assessed using the QMEAN (Qualitative Model Energy Analysis) scoring function and MolProbity workbench (Benkert et al. 2011).

Oligomeric State Conservation
The quaternary structure annotation of the template was used to model the target sequence in its oligomeric form. The method is based on a supervised machine learning algorithm, Support Vector Machines (SVM), which combines interface conservation, structural clustering, and other template features to provide a quaternary structure quality estimate (QSQE) (Bertoni et al. 2017). The QSQE score is a number between 0 and 1, reflecting the expected accuracy of the interchain contacts for a model built based on a given alignment and template. Higher numbers indicate higher reliability. This complements the GMQE (Global Model Quality Estimation) score which estimates the accuracy of the tertiary structure of the resulting model.

Results
In this study, a total of 26 mutations were detected in the S glycoprotein of SARS CoV-2 in Asian isolates ( Table  1). Most of these mutations were found in the structurally and functionally important regions of the S glycoprotein, which function in virus-host interaction. Fourteen mutations were detected in the N-terminal domain and three in the C-terminal domain of the S glycoprotein. The mutations 367V>F, 408R>I and 519H>Q were detected in the receptor binding domain, where the S glycoprotein interacts with the ACE2 receptor of the host cell (Fig. 1).
The 791 T>I mutation was detected in the fusion peptide region, which plays an important role in viral fusion and disruption of the membrane integrity of the host cell (Fig. 2). The 8L>V mutation was detected in the signal peptide sequence involved in the translocation of the viral protein. The 930A>V mutation was observed in the heptad repeat (HR) 1 domain. HR1 promotes the fusion process of the host cell membrane with the viral envelope by bringing the fusion peptide closer to the C-terminal domain of the ectodomain which is the domain of the membrane protein that extends into the extracellular space (Lu et al. 2008). Ectodomains are usually the parts of proteins that initiate contact with surfaces and cause signal transduction.
The residues in which the mutation was determined were rearranged using the MegaX software to obtain the mutant spike protein sequence. Protein 3D structures were modelled with the Swiss-Model web service using wild type and variant amino acid sequences (Fig. 3).
Structural differences between the wild type and variant were analyzed using bioinformatics tools. Model quality was assessed by QMEAN and MolProbity. The MolProbity score was 1.38 for the variant and 1.42 for the wild type. The MolProbity score of the model used as a template (6vxx) was 2.8. The MolProbity score of the model was lower than that of the model used as a template, suggesting that the quality of the model was better than the average structure at this resolution (Davis et al. 2007 (Fig. 4) suggest that the protein models produced have acceptable polypeptide backbone and phi (φ) and psi (Ψ) torsion angles for the alpha-helix and beta-strand regions (Lovell et al. 2003). The scores obtained are considered to be within the appropriate limits in terms of model quality (Benkert et al. 2011).

E. Akbulut
Secondary structure predictions were performed using the PSIPRED workbench, which predicted that the wild type glycoprotein has 65 helices and 20 beta strands, while the variant glycoprotein has 65 helices and 22 beta strands (Fig. 5). It seems that mutations can cause changes in the conformational and topological structure of the spike glycoprotein (RMSD value: 0.047) (Fig. 6).
Structural analysis data revealed the presence of tryptophan-rich conserved region (1208-YIKWPWYIWL-1219), called the proximal transmembrane region, in S glycoprotein S2 subunit in both wild type and variant models. The last five residues of this region, which are conserved in other coronavirus species, are located in the transmembrane domain and are responsible for viral infectivity (Lu et al. 2008).

Discussion
In our study, changes in the secondary (Fig. 5) and quaternary (Fig. 6) structure of the spike glycoprotein caused by the 26 mutations seen in SARS CoV-2 Asian isolates were modeled. It was observed that mutation-derived changes in the wild type S glycoprotein secondary structure may cause changes that affect the topological and conformational structure of the S glycoprotein. The emergence of additional 2 beta strand formation in the variant structure at the S glycoprotein receptor binding site may result in an increase in the structural stability of the binding site. The entry of the coronaviruses into the cell occurs via two stages. In the first stage, the virus recognises the host cell ACE2 receptor for viral binding, and in the second stage, the viral and host membranes merge (Li 2016). Receptor recognition and binding is the first stage of viral infection. This affects the determination of the host cell type and tissue tropism (Li et al. 2006). The SARS CoV-2 S glycoprotein and its affinity to the human ACE2 (hACE2) receptor are thought to be associated with the severity of the disease and the spread of the virus . The high spreading rate and virulence effect of SARS CoV during its first occurrence in 2002 were not seen at the same severity when it reappeared in 2004, and this has been associated with decreased receptor affinity of spike glycoprotein (Kan et al. 2005, Walls et al. 2020.) While 24 residues in SARS CoV S glycoprotein interact with 53 residues in hACE2, we identified that 37 residues in SARS CoV-2 S glycoprotein interact with 77 residues in hACE2. The strong relationship of SARS CoV-2 S glycoprotein with hACE2 may bring important problems in terms of the prognosis of the disease, considering the metabolic function of hACE2. SARS CoV-2-mediated down-regulation of ACE2 causes hyperinflamation through dysregulation of the renin-angiotensinaldosterone system, attenuation of Mas receptor, increased activation of (des-Arg9)-bradykinin, and activation of the C5a and C5b-9 complement system (Mahmudpour et al. 2020). Hyperinflammation results in severe lung tissue damage and loss of lung function (Gustine & Jones 2020).
In addition, it is thought that apart from the S glycoprotein-ACE2 receptor relationship, host gender, comorbidity, presence of immunosuppressive states and ACE2 polymorphism may also have a role in the observation of different mortality rates in different populations , Bosso et al. 2020). In the study conducted by Srivastava et al. (2020) in Indian population, it was stated that ACE2 rs228566 A>G polymorphism increased ACE2 expression by up to 50%, and this situation was associated with the relatively low mortality and morbidity rates in India. ACE2 has two forms in the body. The full-length form with a transmembrane domain contains 808 amino acids, while the soluble form contains 740 amino acids (Marquez et al. 2020). The circulating soluble ACE2 interaction with SARS CoV-2 can limit the interaction of the virus with membrane-bound ACE2 and prevent the virus from entering the cell. In this context, increased ACE2 expression may limit viral infection , Kruse 2020, Khan et al. 2017). On the other hand, reduced S glycoprotein receptor affinity may activate the normalization of the peptide receptor relationship, thus the conversion process of angiotensin II to angiotensin (1-7). Angiotensin (1-7) may limit the inflammatory response by inducing the conversion of proinflammatory des-arg 9 bradykinin 1-8 to bradykinin 1-7 , Santos et al. 2019. It is seen that the protein structure formed after mutations in S glycoprotein is not alone a determinant in host virus interaction and the development of viral infection and will affect this process in many biological pathways and factors.
Two conformational states are observed in the S glycoprotein receptor binding site. One of them is the down-formation in which the receptor binding site is hidden / masked, and the other is the up-formation, which is accessible to the receptor binding site and exhibits less stable structure (Gui et al. 2017, Walls et al. 2017, Wrapp et al. 2020. The findings of this study indicate that the changes in the hACE2 binding region of the S glycoprotein transform the spike protein binding region from down-formation to like-up formation (Fig. 6). It is known that mutations that occur in the receptor binding domain will affect the S glycoprotein receptor affinity (Jia et al. 2020, Kim et al. 2019. The data we obtained in the study point to the structure transformed into a like-up formation. This transformation may result in an increase in S protein receptor affinity, viral infection severity, and transmission. These data support the view that the enhanced affinity of SARS CoV-2 to the receptor increases the severity and spread of the disease. He et al. (2020) emphasised that the affinity of SARS CoV-2 spike glycoprotein to the hACE2 receptor is higher than that of the SARS CoV spike glycoprotein and that it may be associated with the severity of infection. Mutations in spike glycoprotein may result in the tropism of the virus to new host or receptors, increasing or decreasing the virulence effect (Shang et al. 2020b). It is believed that the mutations of 367V>F, 408R>I, and 519H>Q detected at the receptor binding domain may affect the binding dynamics. The topological and conformational changes that these mutations reveal in coils (Mason & Arndt 2004), which have important structural roles in binding action can affect the clamping of three receptor binding S1 structures that are embedded on the homo-trimeric S2 stem. 510D>G and 529I>T mutations in MERS CoV spike glycoprotein were reported to reduce the receptor affinity and may play a role in reducing the virulence effects of the disease (Kim et al. 2016). Changes in the receptor binding domain of SARS CoV-2 are also likely to lead to similar results as well as opposite results. Mutation can also cause enhanced affinity and virulence. The increase in SARS CoV-2 S glycoprotein receptor affinity increases the rate of human-to-human transmission of the virus (Shang et al. 2020c).
The mutation in the HR1 region (930A> V) involved in the fusion process caused minor changes in the conformational structure of HR1. The HR regions play important roles in membrane fusion and viral entry. Chan et al. (2006) reported that 927P, 941P, 955P, and 1165P mutations in the HR region of SARS CoV S glycoprotein resulted in inhibition of membrane fusion and impairment of viral entry. Cavallo and Oliva (2020) noticed that the 929S>I, 939S>F and 936D>Y mutations seen in the HR region of SARS CoV-2 S glycoprotein caused a loss in the stability of the construct after fusion. The 936D>Y mutation was found to cause the loss of the 936D-1185R salt bridge, a strong inter-monomer interaction, and weaken the post-fusion assembly (Cavallo & Oliva 2020).
The 791T> I mutation was detected in the fusion peptide region, which was involved in the entry of viruses into the cell, deterioration of host cell membrane stability, fusion pore formation and fusion of the viral envelope with cellular membranes (Ou et al. 2016, Peisajovich & Shai 2003. Many studies showed that changes caused by inhibition or mutation in fusion peptide prevent fusion (Duffus et al. 1995   showed that the signal peptide can contribute to the severity of infection through viral protein translocation. The 8L>V mutation detected in the signal peptide region may affect the translocation properties of the viral protein structure and therefore the virulence of SARS CoV-2 (Fig. 2).

E. Akbulut
Mutations in S glycoprotein, which is an important structural target for treatment and prophylaxis, will affect not only the change in virulence properties but also the validity and stability of the vaccine to be developed. The change in the antigenic structure after the mutations detected in influenza virus necessitates periodic adjustments in influenza vaccine (Yang et ) showed that influenza A (H1N1) virus 127D>E, 191L>I, 222D>G/N and 223Q>R mutations caused change in antigen and decreased receptor affinity. In vaccine validity studies conducted with hepatitis B virus (HBV), which is a DNA virus with a lower mutation risk than RNA viruses. It has been reported that there is a significant decrease in the sensitivity of the vaccine used in hepatitis B prophylaxis against variant forms (Torresi 2008, Hsu et al. 2004). In the study conducted by Kamili (2010), it was observed that HBV 173V>L, 180L>M, 204M>V and 145G>R mutations had a negative effect on the protective properties of the vaccine, and the vaccine did not provide prophylaxis against variant forms. It is known that SARS CoV-2 mutations will produce 2 different results for the COVID19 vaccine to be developed. Either a new vaccine will be required for each new mutation, as in seasonal influenza, or it will remain valid without causing a change in the immune response. Analysis and modeling of mutation data will contribute to the determination of the correct target structure, the interpretation of changes in the viral proteome, and preventive/ therapeutic approaches.

Conclusion
As a result, it was determined that mutations converted the receptor binding site from down-formation to like-up formation. It is thought that conformational change occurring after mutation in RBD may result in an increase in receptor affinity. The changes that these mutations reveal in the general topological and conformational structure of the S glycoprotein may affect the virulence features in the functional structure. These findings could be beneficial for the disease prevention and drug/vaccine development of SARS CoV-2.