In silico Characterization of bHLH Transcription Factor Genes in the Genome of Rainbow Trout (Oncorhynchus mykiss)

The significance of seafood in nutrition has started to be better understood after the change in the understanding of nutrition in the world. One of the most common species cultivated in the world is rainbow trout ( Oncorhynchus mykiss ) from the origin of North America. Transcription factors are a group of proteins containing different functional components for the accomplishment of various activities. The basic helix (bHLH) domain is a highly preserved amino acid motif that characterizes a family of transcription factors. The bHLH gene family in the rainbow trout ( Oncorhynchus mykiss ) genome has been identified in the current study for the first-time using bioinformatics tools. According to the results, 441 bHLH genes ( OmybHLH ) were identified in the rainbow trout genome and the physicochemical properties of those proteins were determined. The highest number of the genes was in 7th chromosome of rainbow trout with 29 OmybHLH genes. 38 of OmybHLH genes had no intronic regions. OmybHLH proteins were divided into 4 main groups in the phylogenetic tree consistent with their motif content. The common biological function of OmybHLH proteins was the regulation of biological processes. The mode of action of OmybHLH proteins was binding activity. The OmybHLH gene family in the rainbow trout and the bHLH gene family in the Atlantic salmon ( SsabHLH ) had 95 orthologous gene relationships and average separation times of those orthologous genes were found to be 298 million years ago (MYA). Almost all the OmybHLH protein family members have dominated by the α-helix motif which is a stable conformation. Identification of the bHLH proteins and evaluation of their properties in rainbow trout can open new perspectives for aquaculture applications and fish culture to get better yield using genetic data.


Introduction
Fish meat is a highly nutritional food and is rich in protein and unsaturated fatty acids, as well as containing essential amino acids such as methionine and lysine (İzci et al., 2009). Unsaturated fatty acids, which have benefits such as lowering blood cholesterol levels and preventing cardiovascular diseases, are abundant in fatty fish. Besides, iron, phosphorus and calcium are also abundant (Sayılı et al., 1999;Kocaman and Sayılı, 2014). The significance of seafood in nutrition has started to be better understood after the change in the understanding of nutrition in the world. Fish farming has gained importance because of the need for animal protein for the increasing population (Doğan and Güven, 2005).
In the beginning, carp cultivation was preferred because of its easy cultivation, and then the cultivation of sea bass, sea bream and trout species, which have an economic value over time, gained importance (Kocaman and Sayılı, 2014). One of the most common species cultivated in the world is rainbow trout (Oncorhynchus mykiss) from the origin of North America. Properties of rainbow trout, such as resistance to high temperatures and good adaptation to environmental conditions, easy to feed and good growth and having a shorter incubation period at higher temperatures than other trout types, provide easy adaptation to cultural conditions (Aydın, 2009). Trout, which is one of the most important freshwater fish, has become an important option in the market compared to marine fish in terms of both increasing the amount of production and being preferred (Yiğit and Aral, 1999).
Transcription factors are groups of proteins containing different functional components for the accomplishment of various activities such as DNA binding, activation, phosphorylation and protein oligomerization. The basic helix (bHLH) domain is a highly preserved amino acid motif that characterizes a family of transcription factors. The bHLH domain includes two regions: one of them includes about of 10-15 predominantly basic amino acids (the basic region) and other includes about 40 amino acids to construct two α-helices separated by a loop of variable length (the helix-loop-helix region) (Jones, 2004;Pires and Dolan, 2010). bHLH proteins are characterized by protein-protein interaction and highly protected areas for DNA binding (Murre et al., 1989;Atchley et al., 1999). The effects of these transcription factors are seen in many events such as neurogenesis, myogenesis, cell line detection, gender detection, cell proliferation and differentiation in organisms ranging from plants to mammals (Atchley et al., 1999).
Many of bHLH proteins have been detected in organisms ranging from yeast Saccharomyces cerevisiae to zebrafish and human (Robinson et al., 2000;Ledent et al., 2002;Wang et al., 2009), however, in our knowledge, there is no study about the bHLH gene family in rainbow trout. The bHLH gene family in the rainbow trout (Oncorhynchus mykiss) genome has been identified in the current study for the first-time using bioinformatics tools. Therefore, it was aimed to determine the properties of the bHLH proteins such as their chromosomal localization, motif regions, homology models, gene structure, the orthologous relationships between the Atlantic salmon (Salmo salar) and the predicated roles, molecular functions and cellular localization in the rainbow trout.  (Voorrips, 2002). Gene Structure Display Server (Hu et al., 2014) was utilized to determine the exon-intron structure of rainbow trout bHLH genes.

Phylogenetic Tree Construction and Determination of Conserved Motifs
ClustalW algorithm was used for protein sequence alignment and the phylogenetic tree was created by the Maximum Likelihood method with bootstrap analysis for 1000 replicates in MEGAX (https://www.megasoftware.net/) program (Kumar et al., 2018). MEME-Suite database, one of the motif determination tools, was used to define sequence motifs with short and repeating patterns in DNA that are assumed to have a biological function (Bailey et al., 2015). The maximum motif length was selected as between 4-20 aa.

Gene Ontology Analysis
Functional data analysis of bHLH proteins belonging to rainbow trout was performed using Blast2GO software (Conesa et al., 2005). Analyses were carried out in three steps: in the first step, BLASTP search against the NCBI database was performed, in the second step, mapping according to the BLASTp results was accomplished (MAPPING) and in the third step, an information file about the sequences was prepared (ANNOTATION). At the end of those analyses, three categories have been created as predicted molecular functions, cellular locations and determination of biological functions for bHLH proteins in rainbow trout.

Rainbow Trout and Atlantic Salmon (Salmo salar)
The amino acid sequences of bHLH proteins in rainbow trout and Atlantic salmon were aligned by the BLASTp algorithm. As a result of the BLAST query, genes meeting the ≥50 similarity condition and the expectation value of e ⁻50 were selected. Orthologous of bHLH genes in Rainbow Trout and Atlantic salmon fish were aligned using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) software (Li, 2003). The homologous (Ks) and non-homologous (Ka) exchange rates of the aligned orthologous protein sequences were calculated, and this calculation was performed with the PAL2NAL (http://www.bork.embl.de/pal2nal/) (Suyama et al., 2006) database.

Prediction of the Three-Dimensional Structure of bHLH Proteins in Rainbow Trout
Predicted three-dimensional structures of the bHLH proteins were shown by PHYRE2 (Protein Homology/Analog/YRecognition Engine; http://www.sbg.bio.ic.ac.uk/phyre2) database (Kelley et al., 2015). BlastP search was performed to search for similar sequences and the best result was determined based on the three-dimensional structure in Protein Data Bank (PDB).

Rainbow Trout Genome
After multiple searches for determination of bHLH genes in rainbow trout genome, 441 bHLH genes were identified and they were named from OmybHLH-01 to OmybHLH-441 based on their chromosomal localization. Amino acid (aa) lengths of their protein product were between 79 and 2435 aa. The protein with the shortest aa sequence was found to be the OmybHLH-52 protein, and the protein with the longest aa sequence was OmybHLH-262. Considering the isoelectronic points (pI) of those proteins, these points varied between 4.49 and 11.02; it was found that 257 OmybHLH protein were acidic, 184 OmybHLH protein had basic properties. The percentage of the acidic OmybHLH proteins was 58.28%. The molecular weights of those proteins ranged from 9051.53 kDa to 259871.6 kDa. Besides, 97.96% of the OmybHLH proteins were unstable (Supplementary Material 1).
Considering of the previous studies, 183 in the rice (Oryza sativa) genome (Buck and Atchley, 2003;Wei and Chen, 2018)  is seen that the number of bHLH genes found in organisms is close to each other, and the number of bHLH genes in the rainbow trout genome is higher than other organisms. This can be explained by the fact that bHLH proteins in rainbow trout can be involved in many key developmental processes and require many bHLH proteins to regulate these processes. Moreover, wide and detailed searches and high resolute genomes may be the reason for these high numbers.
Map Chart software was used to show the genomic distribution of bHLH genes on rainbow trout chromosomes. It was realized that the highest number of genes were carried by in the 7th chromosome of rainbow trout with 29 OmybHLH genes. This was followed by 8th, 5th, 4th and 3rd chromosomes with 26, 25 and 24 OmybHLH genes, respectively. The least number of genes were located on the 29th chromosome with 2 OmybHLH genes ( Figure 1). Besides, chromosomal localizations of 11 OmybHLH genes were mapped in the scaffold level (Figure 2).

Phylogenetic Tree and Conserved Motifs Analysis
Phylogenetic relationships are tree-like diagrams that show the kinship between species and how long the species split apart in the evolutionary process. The phylogenetic tree was drawn to evaluate evolutionary relations of OmybHLH proteins of rainbow trout. After visualization of the constructed phylogenetic tree by Interactive Tree of life (iTOL) online tool (Letunic and Bork, 2019), OmybHLH proteins are divided into 4 main groups (Class I, Class II, Class III, Class IV). It was determined that there were 63 proteins in Class I, 128 in Class II, 122 in Class III and 128 in Class IV. Class II and Class IV were found to contain an equal number of the OmybHLH proteins ( Figure 3). The phylogenetic tree of OmybHLH proteins may provide information about protein sequence similarities, differences and their ancestors.
Protein sequence motifs are often used in genome research to identify and classify proteins, to determine specific binding sites in proteins and to find functional regions in proteins. Conserved motif structures in OmybHLH proteins identified in rainbow trout were determined and detailed results are shown in Supplementary Material 3. According to the MEME Suite database analysis to identify conserved motifs, 12 preserved motifs were defined for the 441 OmybHLH proteins. Those motif analyses verified the phylogenetic tree allocation. Proteins with similar motif compositions were especially in the same cluster in the phylogenetic tree.
Considering the studies on the relationship and evolutionary processes of bHLH protein families in different organisms, it was observed that atonal bHLH protein homologs in organisms such as mice, chickens, and humans are preserved throughout the evolutionary process and the coding regions are quite similar (Ben Arie, 1996). Besides, according to the phylogenetic analysis with 9 species of land plants, bHLH proteins found in these plants have come from a common ancestor (Pires and Dolan, 2010). In addition, it has been concluded that motifs of bHLH proteins found in mushrooms were preserved with aa sequences and were associated with bHLH proteins included in group B in animals (Sailsbery et al., 2012).

Gene Ontology Analysis
Gene ontology is one of the bioinformatics tools used to reveal the features and functionality of gene products in the species. Blast2GO software was utilized to define cellular localization, biological process and molecular function of OmybHLH proteins (Figure 4). According to the analysis results, OmybHLH proteins were found to have biological functions such as regulation of biological processes, roles in metabolic, cellular and developmental processes, multicellular organismal processes and response to a stimulus. Consistent with those results, in the literature, bHLH proteins are involved as positive regulators in biological processes (Norton, 2000), are involved as negative regulators (Benezra et al., 1990;Norton, 2000;Perk et al., 2005), are promoting myogenic cell proliferation and differentiation and are involved in determining flexibility in skeletal muscles and respond to mechanical or neuronal stimuli (Voytik et al., 1993;Molkentin and Olson, 1996;Puri and Sartorelli, 2000;Walters et al., 2000;Perry et al., 2001;Pownall et al., 2002;Buckingham et al., 2003;Ishido et al., 2004;Tapscott, 2005;Legerlotz and Smith, 2008). Also, they are involved in the nervous system development stages and function in neuronal cell development and differentiation in the brain (Campuzano, 1985;Guillemot et al., 1993;Guillemot, 1995;Yasunami et al., 1996;Borges et al., 1997;Miyata et al., 1999;Olson et al., 2001). Moreover, they play an important role in embryonic development and embryonic cell differentiation (Malecki et al., 1999;Norton, 2000).
When the cellular locations of OmybHLH proteins were evaluated, it was observed that they dispersed to different parts of the cell and this dispersion covered the cell, intracellular parts and as a cellular anatomical entity. There are studies in the literature about that bHLH proteins were found in embryonic cells (Nambu et al., 1991;Muralidhar vd., 1993), in muscle cells (Perry et al., 2001;Tapscott, 2005), in nerve cells (Guillemot, 2007;Jahan vd., 2010), in blood cells (Drake et al., 1997;Gering et al., 1998), in melanocyte, and mast cells (Hodgkinson et al., 1993;Steingrimsson et al., 2004;Levy et al., 2006). Mode of action of OmybHLH proteins was found to be the binding activity. In addition, those protein groups had transcription regulator activity which is consistent with their roles. In the literature, it has been revealed that members of the bHLH protein family show their molecular functions as binding in different studies (Ledent and Vervoort, 2001;Ledent et al., 2002;Berkes and Tapscott, 2005;Murre, 2019).
Considering the current results, OmybHLH proteins have significant roles in regulation and development processes in many cell types. High gene numbers of this protein family in rainbow trout may be explained by multiple significant roles of bHLH proteins in different cell types. Gene ontology analysis of OmybHLH proteins can draw a frame about the importance of these multifunctional proteins in the organisms and their predicated roles and localization in the cells.

Orthologous Genes between Rainbow Trout and Atlantic Salmon (Salmo salar)
Although they are found in different organisms, genes that have the same ancestral origins and have structural and functional similarities but are separated from each other during the species formation are called orthologous genes. By comparing the rainbow trout and Atlantic salmon with the bHLH gene sequences, the orthologous gene analysis was performed and the separation times of the species were determined. Considering of the results, The OmybHLH gene family in the rainbow trout and the bHLH gene family in the Atlantic salmon (SsabHLH) had 95 orthologous gene relationships and the substitution ratio [non-synonymous (Ka) versus synonymous (Ks)] was 0.26 (Ka/Ks). Average separation times of those orthologous genes were found to be 298 million years ago (MYA) ( Figure 5, Supplementary Material 4). According to the results, it can be estimated that there is a high degree of differentiation between the two fish species.

Prediction of the Three-Dimensional Structure of bHLH Proteins in Rainbow Trout
Homology is based on the sequence similarity of one or more proteins of which structure is known, and the high sequence similarity of proteins in this context supports the hypothesis that they come from the same ancestor. Predicted three-dimensional structures of the OmybHLH proteins were determined by homology modelling in the Phyre2 database. This modeling was carried out based on >90% confidence interval under intensive mode. It was determined that the similarity level ranged between 4% and 100% and 12 OmybHLH proteins showed similarity over 90% 58,145,359,370,371,372,373,380,391,424, 433) ( Figure  6). According to the predicted three-dimensional structure of the proteins, it was realized that almost all of the protein family members were dominated by the α-helix motif which is a stable conformation formed by the rotation of the alpha carbon atoms of 4 to 50 amino acids in a spiral shape. This structure consists of two spirals, which include the basic region and the variable loop region (Pires and Dolan, 2010). The amino terminal end in the structure allows DNA binding, while the carboxyterminal end provides dimerization. When modeled threedimensional structures of OmybHLH proteins are evaluated, it is seen that these two amphipathic α-helices (H1, H2) produced by the dimerization component are present in almost all of these protein structures, which is consistent with the literature (Murre et al., 1989;Atchley et al., 1999).

Conclusion
The study evaluated the bHLH proteins in rainbow trout, which are significant transcription factor family in eukaryotes. Properties of this protein family such as their chromosomal localization, motif regions, gene structure, predicted threedimensional structures, orthologous relationships with the Atlantic salmon (Salmo salar) bHLH proteins and the predicted biological roles, molecular functions and cellular localization were determined using in silico analysis by bioinformatics tools. Identification of bHLH proteins and evaluation of their properties in rainbow trout can open new perspectives for aquaculture applications and fish culture to get better yield using genetic data. Besides, defined genetic data can be used in healthy and high-yield fish farming and those findings can be applied to other economically important fish species in aquaculture. From these views, the current study includes valuable data for OmybHLH proteins, which has potential usage in aquaculture implications.

Authors' Contributions
Author YCA designed the study, YCA and GD wrote the first draft of the manuscript, GD performed and managed the bioinformatics analyses. Both authors read and approved the final manuscript.

Conflict of Interest
The authors declare that there is no conflict of interest.

Ethical Approval
For this type of study, formal consent is not required.