Metagenomic analysis of the microbial community in Çal Cave soil to elucidate biotechnological potential

Turkiye henuz arastirilmamis ve mikrobiyal cesitliligi belirlenmemis cok sayida karstik magaraya sahiptir. Bu magaralarin biyocesitlilik karakterizasyonu henuz molekuler bakis acisiyla ele alinarak sistematik bir sekilde incelenmemistir. Trabzon’da yer alan Cal Magarasi onemli karstik magaralardan biridir. Bu calismada Cal Magarasi’nin gen kaynaklarinin biyoteknolojik potansiyelini degerlendirmek amaciyla, magaranin mikrobiyal cesitliligi ilk kez metagenomik yaklasim ile arastirilmistir. Detayli taksonomik siniflandirma 16S rRNA gibi sadece prokaryotlari hedef alan spesifik bir markor gen yerine tum cevresel genomlarin dizilenmesi ile gerceklestirilmistir. Taksonomik analize gore Cal Magarasi topragindaki mikrobiyal cesitliligin %98’ni bakteriler, %2’sini okaryotlar, %0.3’unu arkealar ve %0.01’ini virusler temsil etmektedir. Sonuclar, Cal Magarasi toprak orneginde temsil edilen 31 farkli bakteri filumunun %65’inin Actinobacteria ve %31’inin Proteobacteria oldugunu gostermektedir. Bunlar arasinda en baskin bakteri cinsi Streptomyces olarak tespit edilmistir. %2’lik okaryotik populasyon arasinda en genis filum Ascomycota’dir ve bu filumun toprak ornegi icindeki en yaygin temsilcisinin Sordariomycetes oldugu gorulmustur. Arkealarin %77’sinin Halobacteria oldugu belirlenmistir. Cal Magarasi topraginda yasayan en yaygin virus sinifinin Caudovirales oldugu ortaya cikmistir. Toplam okumalarin %91.61’i icin ise herhangi bir spesifik siniflandirma yapilamamistir. Siniflandirilmis ve siniflandirilmamis tum verilere bakildiginda Cal Magarasi’nda klasik mikrobiyoloji teknikleriyle tanimlanamayacak olan cok buyuk bir mikrobiyal biyocesitliligin oldugunu ve bu mikrobiyal cesitliligin biyoteknolojik uygulamalarda kullanilacak yeni enzim ve biyoaktif bilesenlerin kesfi icin umut verici bir gen kaynagi sagladigi dogrulanmaktadir.


Introduction
Investigation of novel species of microorganisms is an important strategy for the discovery of new bioactive compounds which have antibiotic, antimetabolite or antitumor activity, and have genes that code industrially important proteins (Ghosh et al., 2017). Caves, which are extreme environments in terms of microbial diversity, are ideal habitats for searching for novel microorganisms and, consequently, new compounds and proteins. Caves represent unique ecosystems with extreme conditions such as darkness, nutrient limitation, low oxygen level, high humidity, low temperature and high-level concentrations of minerals (Grothet et al., 1999;Schabereiter-Gurtner et al., 2003;Zhou et al., 2007). Because of these harsh conditions, caves contain rich and largely undiscovered microbial diversity and cave-dwelling microorganisms have unique properties from which to explore novel enzymes and different bioactive substances (Oliveira et al., 2017;Riquelme et al., 2017;Wiseschart et al., 2018). However, caves have not yet received the attention they deserve, and studies on cave microbial diversity are typically based on traditional cultivation methods which can identify only an estimated 1% of cave microbial flora (Cheeptham, 2012).
Most cave microorganisms cannot be cultured and isolated because their original living conditions cannot be provided in the laboratory (Ghosh et al., 2017). Recent studies have shown that owing to the next-generation of sequencing technology, metagenomics which is a cultureindependent approach, provides a wealth of information on the whole microbial diversity of cave habitats as well as on novel genes and their functional properties (Rastogi and Sani, 2011;Mendoza et al., 2016;Jones et al., 2016;Katz et al., 2016). Literature shows that metagenomic mining of novel genes from cave habitats is still in its infancy and most of the studies reported so far have generally implemented sequencing of the 16S rRNA gene clone library for profiling the microbial diversity. Genes encoding the 16S rRNA subunit contain 9 regions (V1-V9) which have hypervariable and evolutionary conserved amino acid sequences. Each of these regions or complete 16S rRNA sequence has been recently used for discriminating and analysing the bacterial diversity of several caves. D' Auria et al. (2018) reported seven phyla -Proteobacteria, Firmicutes, Chloroflexi, Chlorobi, Bacteroidetes, Actinobacteria and Acidobacteria dwelling in Villa Luz caves, in the southern Mexican state of Tabasco by reading the bacterial 16S rRNA gene sequences spanning V1-V3 hypervariable regions. Wiseschart et al. (2018) investigated the bacterial diversity and potential of secondary metabolites of Manao-Pee Cave in Thailand by comparing 16S rRNA sequences. They showed that Actinobacteria highly dominated Manao-Pee Cave soil and it has a promising wealth of microbialderived bioactive compounds. In a different study, De Mandal et al. (2017) identified the bacterial community dominated by Actinobacteria and Proteobacteria in five unnamed caves in Northeast India by analysing the V3 hypervariable region of the 16S rRNA gene amplicon.
Turkey has a great number of karstic caves compared to European countries. The biodiversity characterisation of the microorganisms dwelling in these caves could represent an opportunity to develop biotechnological applications, as most have not yet been systematically studied from the molecular point of view. Çal Cave in Trabzon, Turkey is one of the most important karstic caves. Until now, the microbial investigation of Çal Cave has not been reported either by using culture-dependent or independent approaches. Therefore, in this study, we conducted a metagenomic analysis to explore the microbial diversity of Çal Cave for the first time to assess the potential of gene sources in terms of new enzymes and bioactive compounds.
In contrast to the previous cave studies mentioned above, the present study represents not only a detailed taxonomic profiling of bacteria and archaea, but also of fungi, algae, virus and protozoa owing to an independent sequencing approach from amplification of the taxonomic gene marker, 16S rRNA, which only targets prokaryotes. Also, complete next-generation sequencing of the metagenome extracted from our microbial community provides data for analysis of the relative abundance of microbial species and more reliable quantification of the microorganisms through assessment of all the genomic information rather than of common markers such as only 16S rRNA. Next-generation sequencing of the genomes of all microorganisms in the sample instead of a specific marker location also serves as bioinformatic data to be mined in future studies.

Cave description and sample collection
Çal Cave is located within Çal countyside at an altitude of 1154 m, 5 km southwest of Düzköy town, Trabzon, Turkey (40° 51´ 55.1592"N and 39° 22´ 45.4368"E). The entrance of the cave is about human height. Inside the cave the height reaches 25-30 m in some sections. The cave, which is estimated to have a length of approximately 4 km. It is also considered among the longest caves in Turkey. Çal Cave has an underground water channel with a small stream flowing through it. The average temperature of Çal Cave varies between 12-15 °C. It is one of the caves with high humidity due to having underground water (Zaman et al., 2011). For this study, cave soil samples were taken from the cave floor in aseptic condition using sterile 50 ml Falcon tubes. The soil samples were obtained from approximetely 100-200 m away from the cave entrance and from 0-5 cm depth from the ground, from the dark zone and right down the cave wall. Additionally, the location was chosen as a place where is prevented from human contact. The samples were only handled with sterile stainless steel spoons and not touched by ungloved hands. Samples were stored at 4 °C prior to DNA extraction.

DNA extraction for metagenomic analysis
Metagenomic DNA from the cave soil samples was extracted according to a modified SDS-based method by Zhou et al. (2007). A 5 g soil sample was mixed with 13.5 ml of DNA extraction buffer (100 mM Tris-HCl [pH 8.0], 100 mM sodium EDTA [pH8.0], 100 mM sodium phosphate [pH 8.0], 1.5 M NaCl, 1% CTAB) and 100 ml of proteinase K (10 mg/ml) in sterile 50ml Falcon tubes by horizontal shaking at 225 rpm for 30 min at 37°C. Following shaking, 1.5 ml of 20% SDS was added, and the sample was incubated in a 65°C incubator for 2 h with gentle end-over-end inversions every 15 to 20 min. The supernatant was collected after centrifugation at 6.000 x g for 10 min at room temperature and transferred into 50 ml centrifuge tubes. The soil pellets were extracted two more times by adding 4.5 ml of the extraction buffer and 0.5 ml of 20% SDS, as before. Supernatants from the three cycles of extractions were combined and mixed with an equal volume of chloroform: isoamyl alcohol (24:1, vol/vol). The aqueous phase was recovered by centrifugation and precipitated with 0.6 volume of isopropanol at room temperature for 1h. The pellet of crude nucleic acids was obtained by centrifugation at 16.000 x g for 20 min at room temperature, washed with cold 70% ethanol, and re-suspended in sterile deionised water, to give a final volume of 50 µl. The integrity of 5 µl total DNA was analysed on 1% agarose with SYBR® safe DNA gel stain. The 260/280 ratio of extracted metagenomic DNA was measured using the NanoDrop® ND-1000 Spectrophotometer (ThermoFisher Scientific Inc., Milan, Italy).

DNA sequence analysis
Total DNA concentration was adjusted to 200 ngμl -1 and sequence analysis of the sample was performed by using Illumina HiSeq (2×150 bp) chemistry in GATC Biotech AG, Germany (INVIEW Metagenome Explore). Paired-end Illumina reads were merged using the tool PEAR 0.9.6 Paired-End reAd mergeR (Zhang et al., 2014). Low-quality calls were removed before proceeding with further bioinformatic processing. Using a sliding window approach, bases with low quality were removed from the 3' and 5' ends. Bases were removed if the average Phred quality was below 15. Finally, only mate pairs (forward and reverse read) were used for the next analysis step after removing host sequence reads. The most abundant sequences in each Operational Taxonomic Unit (OTU) were selected as representative sequences and used for the taxonomic assignment using the BLAST algorithm (Altschul et al., 1990)

Taxonomic profiling
The total number of high-quality reads of our sample was 29.355.096 (99.0%) (Table 1). After screening and removing host sequence reads, non-host reads were subjected to a taxonomic profiling algorithm. Taxonomic profiling was done using Kraken (Wood and Salzberg, 2014) and the MiniKraken reference database. As a result, 8.398% (2.462.1269) of total reads (29.355.096) were classified into a specific kingdom and 91.6 % (26.892.970) of total reads were unclassified. Taxonomic profiling results produced by Kraken were used to generate interactive plots using Krona (Ondov et al., 2011) for intuitive exploration of the relative abundances and confidences within the complex hierarchies of metagenomic classifications.
Taxonomic analysis revealed that the Çal Cave soil sample contained 98% Bacteria, 2% Eukaryota, 0.3% Archaea, and 0.01% Virus. Figure 1 shows the taxonomic distribution of microbial diversity detected from all domains.
Actinobacteria, which is the predominant phylum in our sample, can be found in relatively large percentages of cave habitats (De Mandal et al., 2017;D'Auria et al., 2018). In Actinobacteria, the most prevalent classes were detected as Pseudonocardiaceae (26%), Propionibacteriales (17%), Corynebacteriales (14%), Streptomycetaceae (14%), Micrococcales (9%), Mycobacterium (6%) and Micromonosporaceae (5%). Among them, the most dominant genera were Streptomyces (12%) (Figure 2) and Kribbella flavida (10%). Actinobacteria and the genus Streptomyces are well known as potential sources of bioactive compounds, including antibiotics, antimetabolites and antitumor agents (Ghosh et al., 2017). Yücel and Yamaç (2010) isolated the Streptomyces species from soil samples of 19 different karstic caves in Turkey and they recorded the antimicrobial activity of the extracted compounds by using classical culturing techniques. Differently, by using the metagenomic approach, in this paper, we report crucial information about the gene sources of not only the culturable, but also un-culturable Actinobacteria species in Çal Cave.
In the bacterial population, Rhodococcus fascians, which is reported to have a role in the calcite biomineralisation process (Rusznyák et al., 2011), was found in high ratios. Besides, we detected some important antibiotic producing species which are Kutzneria albida (1.5%), Saccharopolyspora erytraea (1.4%), Saccharothrix espanaensis (1.2%), and Amycolatopsis mediterranei (1.5%). There were a number of chemoautotrophs such as those involved in manganese oxidation, e.g., Geobacter, ammonium oxidation, Bacillus and Nitrospira, sulfur oxidation, and Paracoccus and Thiobacillus. These bacteria are linked with the formation of speleothems.

Diversity of eukaryota
Taxonomic analysis revealed that Fungi (44%) and Apicomplexa (30%) were the two most abundant groups of Eukaryota. Data analysis showed that Ascomycota (80%) was the most abundant phylum in the Fungi. Studies about the mycofloral diversity of cave environments are fewer than those focusing on bacterial diversity and those studies have reported that Ascomycota is the most abundant phylum in cave samples (Ghosh et al., 2017). The remaining fungi diversity was classified as Basidiomycota (19%), Microsporidia (0.4%) and Chytridiomycetes (0.1%).
The readings corresponding to the Ascomycota phylum were represented as 5 classes of fungi: Sordariomycetes (44%), Eurotiomycetes (22%), Dothideomycetes (14%), Leotiomycetes (12%), and Saccharomycetales (7%). The largest class Sordariomycetes, which spreads in terrestrial and aquatic environments, could be the pathogens of plants, arthropods and mammals. Several studies prove that fungi from Sordariomycetes have industrially important gene sources. Ramakrishnan et al., (2018) recently reported two types of III polyketide synthases from microorganisms belonging to Sordariomycetes (Sordaria macrospora and Chaetomium thermophilum) and showed that these enzymes have high catalytic efficiency for the synthesis of novel polyketide scaffolds with promising biological activity. The other considerable fungi class is Eurotiales known as a secondary metabolite producer. Especially, Aspergillus nidulans and Aspergillus terreus have been predicted to have approximately 50-70 biosynthetic gene clusters for polyketide synthases (PKS), non-ribosomal peptide synthases (NRPS), terpene cyclases and prenyl transferases (Inglis et al., 2013).
The second most common Eukaryota class in the sample, Apicomplexa (30%), are known as parasites of all marine and terrestrial vertebrates including humans. Among them, the most abundant species was Hammondia hammondi (78% of Apicomplexa). They have significant research potential because it is considered that only 0.1% of the 1-10 million Apicomplexan species have been described to date (Morin-Adeline at al., 2011).

Diversity of archaea
94% of the Archaea, which accounts for 0.3% of total microbial diversity in the sample, were classified as Euryarchaeota. In the Euryarchaeota phylum, Halobacteria and Methanomicrobia genera were represented as 77% and 16%, respectively.
Halobacteria, which is an extremely halophilic archaea, produces biologically active compounds in response to environmental changes (UV radiation, temperature anomalies, oxidative stress, lack of nutrients, oxygen availability or dehydration) and to interaction with other microorganisms. These biologically active compounds are of widespread interest in several biotechnology industries (Kalenov et al., 2018). Sorokin et al., (2018) reported that strains from extremely halophilic Euryarchaea use insoluble celluloses, cellobiose, as their carbon and energy source. Figure 2. Species diversity of Streptomyces: 36 species were defined belonging to Streptomyces, which is the predominant genus in the Çal Cave soil sample. The plot was produced by using Krona (Ondov et al., 2011)

Diversity of viruses
The most abundant class of viruses dwelling in Çal Cave was determined as Caudovirales (40%) which are tailed bacterial viruses infecting bacteria and archaea (King et al., 2011). Herpesvirales was represented in up to 13% of all the viruses. It was established that all of the order Herpesvirales belong to the family Herpesviridae which contains the mammal, bird and reptile viruses (Davison et al., 2009). The other two groups of viruses were Phycodnaviridae (13%) that infects algae, and Invertebrate Iridovirus (13%) that has probably been transmitted by endoparasitic wasps or parasitic nematodes (King et al., 2011). The other taxa represented below 10% were Baculoviridae (3%) and Inovirus (3%). Rhodobacter Phage and Human endogenous retrovirus accounted for 4% of readings.

Conclusion
Each cave has a unique biological habitat and consequently microbial diversity which is affected by its own physical or chemical properties. Therefore, every cave offers its source for industrially relevant molecules. This study represents detailed identification of the microbial diversity of Çal Cave including microorganisms from all hierarchies for the first time in virtue of the whole genome sequencing metagenomic approach. Classified reads showed that cave soil was dominated by Actinobacteria and Proteobacteria. Eukaryota, Archaea and Viruses were poorly represented in the cave habitat. Without distinguishing the origin, the most seen genera in the cave soil samples were Streptomyces, Kribbella, Mycobacterium and Amycolatopsis, detected as 8%, 6%, 4% and 4%, respectively of the whole microbial population. Beside the classified diversity, 91.61% of the total reads which could not be classified according to any specific kingdom, verify that there exists vast microbial biodiversity in Çal Cave, and as these could not be identified with classical microbiology techniques, this habitat provides a promising source for novel enzymes and bioactive compounds. Our future research will build on the characterisation of these potential genes to correlate with their biological activity.