Examination of Substrate Specificity of the First Adenylation Domain in mcyA Module Involved in Microcystin Biosynthesis

: The cyanotoxin microcystin (MC) is a secondary metabolite, synthesized by nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) enzymes. It has many isoforms and the mechanism of its diversity is not well understood. One of the MC synthetase genes, mcyA, codes for the McyA module containing two adenylation (A) domains. The first domain, McyA-A1, generally binds to L-serine (L-ser). Then the N -methyl transferase (NMT) domain converts L-Ser into N -methyldehydroalanine (Mdha), which usually occupies position 7 on the MC molecule. However, various other amino acids (AAs) might also be present at this position. In this study, bioinformatic analyses of selected cyanobacteria were performed to understand whether genetic information in the first adenylation domain of mcyA could explain incorporation of different AAs at position 7 of the MC molecule. Binding pocket signatures of McyA-A1 and putative activated AAs were determined via various bioinformatics tools. Maximum likelihood phylogenetic trees of full length mcyA , mcyA -A1 and 16S rRNA genes were prepared in Mega 6. Phylogenetic analysis of mcyA -A1 nucleotide sequences was in agreement with the predictions of activated AAs by McyA-A1. In comparison with the 16S rRNA and full length mcyA gene trees, mcyA -A1 phylogenetic trees suggested horizontal transfer of the A domain in either Planktothrix agardhii (Gomont) Anagnostidis & Komárek or Planktothrix rubescens (De Candolle ex Gomont) Anagnostidis & Komárek strains. Predictions of activated AAs were generally in agreement with the chemically determined position 7 AAs. However, there were exceptions suggesting the multispecificity of the first A domain of McyA in some cyanobacteria.


INTRODUCTION
There are approximately 500 natural amino acids [1] and they can be classified based on their R groups, stereoisomer structures (D-and L-form), being proteinogenic or nonproteinogenic. Proteinogenic amino acids are all 20 kinds of L-form amino acids, where mRNA, tRNA and ribosomes are involved in the protein synthesis pathway. Nonproteinogenic amino acids are D-form or β-amino acids and synthesized by either non-ribosomal peptide Cyanobacteria can produce a variety of toxic secondary metabolites (cyanotoxins) which are generally subdivided into neurotoxins, hepatotoxins, cytotoxins and dermatotoxins. Cyanotoxins are usually synthesized nonribosomally [2,3]. Microcystin (MC) is also a nonribosomally synthesized cyanotoxin, which shows hepatotoxic and neurotoxic effects, inhibiting eukaryotic serine/threonine protein phosphatases PP1 and 2 [4,5]. Its general structure is cyclo(-D-Ala 1 -X 2 -D-MeAsp 3 -Z 4 -Adda 5 -D-Glu 6 -Mdha 7 ), where X and Z are various L-amino acids ( Figure 1) and D-Glu is D-iso-glutamic acid, Mdha is N-methyl-dehydroalanine, D-Ala is D-Alanine, D-MeAsp is D-erythro-β-methyl aspartic acid and Adda is (2S,3S,4E,6E,8S,9S)-3-amino-9-methoxy-2,6,8-trimethyl-10-phenyldeca-4,6-dienoic acid. Microcystin producing species are found in various genera including Microcystis, Anabaena, Nostoc, Planktothrix (Oscillatoria), Phormidium, Fischerella and Hapalosiphon [6,7]. Sequencing of MC biosynthesis genes was initially completed for Microcystis aeruginosa PCC 7806 [4], followed by Planktothrix agardhii NIVA CYA 126/8 [8] and Anabaena sp. 90 [9]. Within the MC synthetase operon, NRPS modules are expressed by mcyA, -B and -C; PKS modules are expressed by mcyD; hybrid NRPS / PKS modules are expressed by mcyG and mcyE genes. The mcyJ, -F, -H, -I, -L and -T genes do not have a modular structure and generally only contain a domain (Table 1). A module is a functional unit in NRPS or PKS and each module contains active units called domains. Adenylation (A), condensation (C) and peptidyl carrier protein (PCP) domains are the core domains in NRPS modules. The A domain recognizes the corresponding AA and activates it by using ATP. The PCP domain transfers the -SH (thiol) group to the activated residue. The C domain synthesizes the peptide bond between the -SH groups of the two activated residues. In addition, some modules may have modified domains called tailoring enzymes such as epimerization (E) and N-methyl transferase (NMT) domains [10].
The  [4,5]. There are two A domains in McyA and the order of the domains is A1-NMT-PCP-C-A2-E in most MC producing species; however the NMT domain may not always be present (Table 1). Within these domains, the A1 domain usually activates L-Serine. Then, NMT domain transfers the methyl group from S-Adenosyl-L-methionine (SAM) to L-Serine followed or preceded by a dehydration reaction for the formation of Mdha at position 7 of the MC molecule. Various AAs can be incorporated in all 7 positions [11,12].  The substrate specificity of A domains is important to understand the structure of secondary metabolites, including MC. The substrate specificity determination of A domains is usually carried out by determining of binding pocket residues of A domains via bioinformatics analyses. Ten residues constitute the binding pocket of an A domain and are numbered as "Positions-235, 236, 239, 278, 299, 301, 322, 330, 331 and 517" depending on their location in the GrsA module in Bacillus brevis (GenBank: CAA33603.1) [13,14]. Position-235 is generally occupied by Asp (Asp235) and position-517 generally by a Lys residue (Lys517). These residues are highly conserved, where Asp235 binds to -NH2 group and Lys517 binds to -COOH group of activated AAs. Rest of the eight residues in the binding pocket can be variable and this variability is thought to be an important factor for A domain substrate specificity. It is suggested that A domain substrate specificity is associated with AA interactions between R groups of the activated AAs and R groups of the eight residues in the binding pocket [15,16].
There are nearly 280 variants of MCs [17] and the mechanisms that cause this diversity are not well understood. As more sophisticated techniques are used, both environmental samples and cyanobacteria cultures prove to produce more diverse and novel MC analogues [18]. Current data suggest that this diversity is based on either genetic diversity coded in the MC synthetase genes (e.g. insertions or deletions within various mcy genes) or biochemical status of MC producing cells [12]. For instance, point mutations for a few critical AAs in the A domain of GrsA were reported to change the substrate specificity of A domain in GrsA [13]. Fewer et al. [19]  To our knowledge, there are no detailed bioinformatics studies regarding the first A domain of mcyA. Therefore, the main purpose of this study was to investigate whether genetic information in this domain could explain incorporation of different AAs at position 7 of the MC molecule. This is important since the toxicity of MCs might change with incorporation of various AAs. Additionally, mechanisms explaining A domain AA selectivity will help in combinatorial biosynthesis of new NRPS and PKS enzymes [5]. For this purpose, ten MC producing cyanobacteria with sequenced MC synthetase genes were selected for bioinformatics analyses (i) to predict the binding pocket residues of the A1 domain in McyA (McyA-A1); (ii) to determine activated AAs by McyA-A1 based on the binding pocket residues; (iii) to investigate concordance between activated AAs and chemically determined position-7 AAs of MCs reported in the literature for the studied strains; (iv) to investigate concordance between activated AAs and phylogenetic clustering of mcyA-A1 sequences.

MATERIAL and METHODS
A detailed literature search was conducted to determine the best investigated cyanobacteria strains in terms of MC variant production to report the variation at position 7 of the MC molecule ( Table 2). Ten cyanobacteria with sequenced MC synthetase genes and detailed MC characterizations were selected ( Table 2) based on literature and GenBank searches. Full length mcyA and 16S rRNA gene nucleotide sequences of these cyanobacteria were downloaded from NCBI (National Center for Biotechnology Information) (https://www.ncbi.nlm.nih.gov) using Mega 6 [20]. Alignment was performed using Muscle [21] as implemented in Mega 6 [20] using codons for mcyA and nucleotides for 16S rRNA genes. Since mcyA is composed of domains, the borders of the first adenylation domain (mcyA-A1) in mcyA were determined using the NCBI conserved domain program [9]. Then the determined McyA-A1 domains were extracted from the full length mcyA alignment and a new codon-based alignment was performed after addition of the B. brevis GrsA domain nucleotide sequence. The nucleotide sequence alignment was converted to an AA alignment and binding pocket residues of McyA-A1 were predicted by comparison to the B. brevis GrsA domain AA sequences and by using an online tool [14]. Predictions of the activated AAs by the A domains were also performed using the same online tool.
In order to construct phylogenetic trees, best-fitting models among 24 different nucleotide substitution models were estimated in Mega 6 [20] using an automatic neighbor-joining tree. Models with the lowest Bayesian Information Criterion scores were assumed to describe the substitution patterns best. These models were used to construct phylogenetic trees of the full length mcyA, mcyA adenylation domain (mcyA-A1) and 16S rRNA gene nucleotide sequences of investigated cyanobacteria using Mega 6 [20]. Phylogenetic tree of the 16S rRNA genes of selected cyanobacteria (10 nucleotide sequences and a total of 1350 nucleotide positions) was constructed with the maximum likelihood (ML) method using the Hasegawa, Kishino, and Yano (HKY) model [22]. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.3331)). Unrooted phylogenetic tree for the full length mcyA gene (10 nucleotide sequences and 7077 nucleotide positions) was constructed with the ML method using the general time reversible (GTR) model [23]. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.8351)). Phylogenetic tree for the mcyA-A1 domain nucleotide sequences (11 sequences and 1152 nucleotide positions) was constructed with the ML method based on the GTR model [23]. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 1.1089)). For all phylogenetic trees, 1000 bootstrap replicates were used to test the stability of monophyletic groups. GrsA [24] and 16S rRNA gene nucleotide sequences of Bacillus brevis (Genbank: X15577.1) were chosen as the outgroups for mcyA-A1 and 16S rRNA gene-based phylogenetic trees, respectively.

RESULTS and DISCUSSION
Binding pocket analyses for McyA-A1 sequences were performed to predict which AA was incorporated at position 7 of the MC molecule. The predicted binding pocket residues and the predicted AAs potentially activated by McyA-A1 are summarized in Table 2. Of the ten cyanobacteria species, eight were predicted to activate L-Ser and two Planktothrix rubescens strains were predicted to activate L-Thr. Binding pocket signatures for all Microcystis strains Among the potentially L-Thr activating P. rubescens strains, NIVACYA 98 was reported to produce Dha 7 containing MCs [25] and PCC 7821 was reported to produce both Dhb 7 and Mdha 7 containing MCs [11,25] (Table 2). Three different residues, namely MeSer 7 , Dha 7 and Mdha 7 were reported for Anabaena sp. 90, M. viridis NIES 102 and M. aeruginosa PCC7806 [12,27]. While production of Mdha 7 were reported for Planktothrix agardhii NIVA CYA 126/8 [12], Microcystis aeruginosa NIES 843 and Fischerella sp. CENA161 [3,6], Dha 7 was the only AA residue reported for M. aeruginosa K-139 [12]. Among the investigated species in this study, Mdhb 7 formation was only reported for Nostoc sp. 152 [27] that also produced Dha 7 , MeSer 7 and Mdha 7 [12,25] containing MCs (Table 2).
In most MC producing cyanobacteria, McyA-A1 domain usually activates L-Ser. In the case of L-Thr activation by the McyA-A1 domain, Dhb 7 formation may occur in species lacking NMT domain, as expected for the P. rubescens strains in this study (Table 1). However, in previous LC/MS analyses of P. rubescens NIVACYA 98, only Dha 7 formation was reported [7,25], which indicated L-Ser activation instead of L-Thr and conflicted with the binding pocket analysis results in this study (Table 2). Similarly, Kurmayer et al. [11] reported that P. rubescens PCC7821 produced MCs containing both Mdha 7 (5%) and Dhb 7 (95%), which indicated activation of both L-Ser and L-Thr. These results contradicted the proposition that polar AA activating A domains such as McyA-A1 had higher selectivity than hydrophobic AA activating A domains [11,28].
Another similar situation was observed for the McyA-A1 domain in Nostoc sp. 152, which was predicted to activate L-Ser. MCs containing Mdha 7 , MeSer 7 , Dha 7 [25,28] and Mdhb 7 [27] were reported for Nostoc sp. 152 (Table 2). Mdhb 7 formation (probably a minor variant [29]) indicates L-Thr activation and shows an active NMT, since it is a methylated form of Dhb. Apparently, L-Thr and L-Ser are simultaneously activated by McyA-A1 for Planktothrix rubescens PCC7821 and Nostoc sp. 152. This multispecificity suggests that there is a flexibility for McyA-A1 domain to activate various AAs during MC production.
The afore-mentioned A domain flexibility has been reported for various secondary metabolites. For example, the A domain involved in the biosynthesis of maremycin from Streptomyces sp. was predicted to activate Cys based on the specificity-conferring code, however in vitro and in vivo studies showed the activation of Cys, Me-Cys and Ser. It should be noted that in vitro catalytic efficiency of A domain towards Cys and Me-Cys were comparable and 4 times higher than Ser, in agreement with the observation of the minor product maremycin D incorporating Ser in vivo [30]. In a comparable situation, L-Phenylalanine activating A domain for the antibiotic Tyrocidine A was shown to be flexible for various AAs in vitro [31]. The specificity of this A domain towards L-Phe in vivo suggested that gatekeeping function of the condensation (C) domain might be in place [31]. In fact, Meyer et al. [3] demonstrated flexibility of A domains and the gatekeeping and specificity-regulatory role of the C domains of mcyB and mcyC genes of the microcystin synthetases in Microcystis spp. Their in vitro experiments showed that the predicted AAs were used by A domains when the C domain was present. On the other hand, minor amounts of unanticipated AAs were incorporated in the MC molecules due probably to the leaky control of the C domain [3], which probably would explain the minor incorporation of un-predicted AAs by mcyA-A1 (e.g. Mdha 7 in P. rubescens PCC7821). However, it doesn't explain the sole production of Dha 7 containing MCs in P. rubescens NIVACYA 98, in which L-Thr activation was predicted ( Table 2).
Anabaena sp. 90, Microcystis spp., Planktothrix agardhii NIVACYA 126/8 and Fischerella sp. CENA 161 investigated in this study were predicted to activate L-Ser (Table 2). Previous LC/MS analyses of these species reported MCs containing Mdha 7 , MeSer 7 and Dha 7 , supporting the activation of L-Ser by McyA-A1 [19]. MeSer 7 was suggested to form through an incomplete dehydration reaction involving the condensation domain of mcyA [18]. On the other hand, formation of Dha 7 might have resulted due to transient inactivity of NMT domain or SAM limitation during MC biosynthesis [12].   Phylogenetic analysis of the 16S rRNA gene of the investigated cyanobacteria species clearly separated strains belonging to Nostocales, Oscillatoriales and Chrooccocales with high bootstrap support (Figure 2). A similar tree, with high bootstrap support, was obtained when the full length mcyA nucleotide sequences were analyzed ( Figure 3). However, P. agardhii NIVACYA 126/8 did not cluster with P. rubescens strains when mcyA-A1 domain was analyzed ( Figure 4). Clustering of other strains were in agreement with the clusters obtained in 16S rRNA and mcyA gene phylogenetic analyses.
In general, phylogenies of full length mcyA gene and mcyA-A1 domain sequences followed that of 16S rRNA gene of the analyzed strains. The exception was clustering of P. agardhii NIVACYA126/8 mcyA-A1 domain sequences with Nostocales and Chrococcales sequences, albeit with low bootstrap support. This suggests that either P. agardhii or P. rubescens acquired its A1 domain via horizontal gene transfer after the separation of both species [5,11]. Phylogenies of the mcyA condensation domains of the investigated strains had similar topology to full length mcyA and 16S rRNA gene trees (data not shown), further strengthening the idea that some Planktothrix strains might have acquired their A domains via horizontal transfer.

CONCLUSION
The phylogenetic tree and clustering of A1 domains are in agreement with the prediction of activated AAs by A1 domains. L-Thr activating P. rubescens A1 domains are clearly separated from the L-Ser activating domains. These results suggest that phylogenetic analyses of A domains might potentially give information on the activated AAs by these domains. On the other hand, MC congener analysis of these strains show that binding pocket or phylogenetic analysis alone may not explain the AA flexibility of A domains in some strains. Further work involving heterologous expression of mcyA-A1 and -C domains together or separately, followed by protein purification and in vitro substrate specificity assays might help to explain variation of position 7 AAs in MCs.

Declaration of Conflicting Interests and Ethics
The authors declare no conflict of interest. This research study complies with research publishing ethics. The scientific and legal responsibility for manuscripts published in IJSM belongs to the author(s).