Exact Microsatellite Density Differences among Capsicum Tissues and Development Stages

Density and position differences of microsatellites in genomes may indicate important roles of microsatellites in genetic development and regulation of gene expression. However, there is no or limited study on microsatellite density differences among tissues of development stages. In the present study, exact microsatellite densities and motifs among 7 different tissues and development stages were determined using Capsicum annuum L. expressed sequence tags (ESTs), which were reassembled into in silico libraries. Results indicated that densities of exact microsatellites (1 to 6 bp repeats) in housekeeping and tissue specific ESTs of anther, flower bud, and placenta specific ESTs were statistically different, being low in comparison to that of leaf, fruit, early and hairy root. Further analyses also indicated that exact microsatellite density of anther and placenta was significantly low while exact microsatellite density of flower bud, early and hairy root was significantly higher. There were density differences among mono-, di-, triand hexa-nucleotides between housekeeping and tissue specific ESTs. Density of triand penta-nucleotides was not statistically significant. Overall results of the present study indicated that since the microsatellite densities differed between housekeeping and tissue specific genes, genes containing microsatellites may differ among tissues and development stages.


Introduction
Microsatellites and minisatellites found in plant and animal genomes have been traditionally thought of as functionally unimportant but they have been commonly used as genetic markers.Microsatellite DNA motifs can consist of a single base to six bases, which are repeated several times.The repeats can be either exact (perfect) tandem repeats or interrupted by several nonrepeat nucleotides (inexact or imperfect) or compound repeats (Bilgen et al 2004).Microsatellite repeat variations in plant species have been extensively used as markers of choice in genetic research since they exhibit high level of polymorphism within species; inherit as codominant fashion discriminating the homozygous from heterozygous individuals (Karaca et al 2002;Karaca et al 2004;Tyrka et al 2008;Ince et al 2009a;Ince et al 2010a).
Recent studies have shown that densities of microsatellites were considerably higher than they would be predicted purely on the grounds of base composition in many organisms (Bilgen et al 2004;Ince et al 2009b;Polat et al 2010).Including or excluding mononucleotide repeats in a genome greatly affect densities of microsatellites.For instance, the human genome contains approximately one million mononucleotide repeats which are longer than 9 bp (Cohen et al 2004).However, there still exist controversies in the microsatellite density differences among literatures in which some studies exclude mononucleotides and in some other studies the upper limit of repeat number is decreased as low as 5 bp or as great as 10 bp (Chambers & MacAvoy 2000;Ellegren 2004;Karaca & Ince 2011).Regardless of the definition of microsatellites, studies in animal genome have shown that microsatellites play a more active role in terms of gene regulation, development and evolution (Li et al 2004;Kashi & King 2006).However, there is limited information about the microsatellite density differences among plant tissues and development stages as well as between genes specific to a tissue or housekeeping.
This study was undertaken to identify exact microsatellite density differences in Capsicum annuum L. tissues and development stages as well as the genes specific to a tissue or housekeeping functions.In order to investigate microsatellite density differences in silico databases were constructed and these data were used in this study.

ESTs
A total of 116,535 Capsicum annuum L. ESTs from National Center for Biotechnology Information at http://www.ncbi.nlm.nih.gov/containing 129,149,486 base pair nucleotide information were initially used.Keyword Finder and Organism Miner (Ince et al 2008) were implemented to obtain ESTs specific to each of anther, hairy root, early root, leaf, young fruit, placenta and flower bud library.A total of 20,738 ESTs containing 9.93 mega base nucleotides were selected from the database based on the library 293 identification number (Lib ID) and assembled into contiguous sequences (contigs) using Sequencher software (Gene Codes, Ann Arbour, MI).Contig assembly parameters were set to minimum overlap of 50 bases and 95% identity match.

Microsatellite analyses
Microsatellites in each dataset were identified using the Tandem Repeats Analyzer 1.5 (TRA 1.5) program (Bilgen et al 2004).Microsatellites in the present study were considered sequences containing a minimum of 18, 9, 7, 5, 5 and 4 nucleotide perfect (exact) repeats for mono-di-, tri-, tetra-, penta-and hexa-nucleotides, respectively.These repeat numbers were chosen since they are commonly used in other plant species (Karaca et al 2005;Li et al 2004;Lawson & Zhang 2008).

Statistical analysis
Chi-square (χ 2 ) goodness-of-fit tests with 1 degree of freedom were applied to test whether microsatellite densities were significantly different within and between datasets mentioned above.
where E i is the expected number of microsatellites in a dataset; N is the total number of microsatellites in the two different datasets; L is the total length in base pairs of the two datasets; and L i is the length in base pairs of the dataset under investigation (Lawson & Zhang 2008).1).

Construction of in silico databases
All the EST sequences given in Table 1 were assembled into 22 in silico libraries (Figure 1).These in silico libraries consisted of singletons (S), consensus mutual (CM) and consensus specific (CS) for each of the seven cDNA sets to investigate microsatellite density differences among the genes specific to tissues and development stages (Table 2).Classification of sequences in Table 2 was obtained from the analyses summarized in Figure 1.Tissue specific singletons (S) and contigs (CS) were considered those ESTs that had no homolog to other ESTs.On the other hand, those singletons and contigs with homology to other ESTs were considered non-tissue specific (CM).For instance a total of 510 anther ESTs were divided into CS, CM and S. CS of anther consisted of 20 Type I AO consensuses and 355 Type I A0 S sequences.CS and S of anther were only present in anther tissues while anther CM consisted of Type II A consensus sequences (135) that were also present in some other tissues and development stages.
In Figure 1 tissue and development specific in silico libraries are shown.This shema represented 22 (numbers 0 to 15) in silico libraries.For instance, anther in silico libraries have ESTs specifically expressed in anther (indicated as A0) and numbers 2, 3, 5, 6, 7, 8, 9, 14 and 15 represent ESTs which are also expressed in other tissues or development stages.A0 in silico anther library contained a total of 375 ESTs, which are the combination of 20 anther CS and 355 anther S shown in Table 2. On the other hands, 135 ESTs represented ESTs which collected from the all possible combination of seven tissues or development stages.

Microsatellite densities among tissues and development stages
As shown in Table 3, exact microsatellite density of tissue specific sequences consisting of singletons and consensus (TS + CS) for a tissue or development stage was compared to total number of sequences, which were 13,261 (Total).As shown in Table 3    Exact microsatellite densities between leaf and other tissues and between fruit and other tissues were not statistically different.These findings indicated that microsatellite containing ESTs in leaf and fruit also expressed in other tissues.Flower bud, hairy root and early root ESTs contained more microsatellite densities whereas anther and placenta ESTs contained fewer amounts of microsatellites (Table 3).

exact microsatellite densities
Mononucleotide repeat differences were statistically different with the exception of fruit and leaf tissues.Di-nucleotide repeat densities were significantly low in flower bud whereas it was significantly higher in leaf.There were no significant tri-and hexa-nucleotide repeat densities between the tissues.Placenta contained higher tetra-nucleotide density, while it contained less amount of mono-nucleotide density.Pentanucleotide density of early root was significantly higher than the others.Based on the in silico studies we observed that mono-nucleotide densities are higher or lower in many tissues, whereas tri-and hexa-nucleotide repeat densities randomly distributed among the all tissues and development stages.Among the all 6 microsatellite motif densities mono-nucleotides were the most different repeat types, followed by the di-, tetra and penta-nucleotide repeats.On the other hand tri-and hexa-nucleotide repeats randomly distributed among the tissues and developmental stages (Table 3).In order to investigate whether there existed exact microsatellite density differences between tissue specific (TS) and housekeeping (HS) gene or gene segments (ESTs), comparison analyses were performed and shown in Table 4. Results indicated that genes specifically expressed in anther, flower bud and placenta contained less density of microsatellites than expected while other tissues contained expected number of microsatellite densities.Among the microsatellite motifs, densities of mononucleotides between tissue specific and housekeeping genes were significantly different in anther and placenta ESTs (Table 4).Dinucleotide microsatellite density was significantly low in early root tissue specific ESTs and trinucleotide microsatellite density was also low in anther tissue specific ESTs.Leaf specific ESTs contained more amount of hexanucleotide microsatellites than housekeeping ESTs.Flower bud housekeeping ESTs contained more hexanucleotides than flower bud specific ESTs (Table 4).There were no statistically significant differences between tissue specific and housekeeping ESTs for tetra-nucleotides and penta-nucleotides.

Microsatellite densities between tissue specific and housekeeping ESTs
Up to date, limited research on variations in microsatellite density has been studied among tissues, populations, and species in plants.In a previous study, using a total of 16 cDNA samples obtained from different pepper tissues and at different developmental stages, it was observed that some types of microsatellite-containing genes showed differential expression patterns (Ince et al 2010b).In this study the use of in silico databases clearly showed that some types of microsatellite differently expressed among different tissues and there were microsatellite density differences between tissues specific and housekeeping genes.Lawson & Zhang (2008), based on in silico analyses in mouse and human, indicated that microsatellite densities of housekeeping genes were about 1.7 times higher than those in tissuespecific genes and also showed that microsatellite domain contents were different between housekeeping and tissue-specific genes.In this study we observed that microsatellite density differences between housekeeping and tissue specific genes were also present in plant species.Furthermore this study also clearly showed the existence of microsatellite density differences among tissue and development stages.
Among the microsatellite motif, tri-and hexanucleotide motifs in plants (Bilgen et al 2004) and in human (Karaca et al 2005) have been shown to occur more than other repeats types excluding the mononucleotides.The occurrences of more trinucleotides indicate that genes with trinucleotide repeats may play significant roles in the maintenance of cellular physiology.For example, Huntington's disease and spinocerebellar ataxia (SCA) disease in human that alteration in CAG trinucleotide repetitive sequences was found to be associated with expansion in length.In another example, changing in GCG repeat numbers causes oculopharyngeal muscular dystrophy disease in human (Yamada et al 2002;Krol et al 2007;Pizzi et al 2007).These examples indicate that some disease may occur as a result of trinucleotide repeat variations.In the present study it was observed that tri-nucleotide repeat differences were not statistically different among the tissues and development stages.
In this study using publicly available cDNA libraries a total of 22 in silico libraries were constructed.Using these libraries genic microsatellite and single nucleotide markers can be identified.The use of ESTs for microsatellite primer pairs has been intensively utilized in Capsicum species and other species (Ince et al 2010a;Polat et al 2010).However EST-based microsatellites show low level of polymorphisms than genomic microsatellites (Blair et al 2011).The level of polymorphism in EST-based microsatellites could be improved by the use of CAPS-microsatellite technique (Ince et al 2010c).

Conclusion
In this study it was observed that microsatellite densities among tissues and development stages as well as between tissue-specific and housekeeping ESTs were different.Although the number of ESTs studied in the present study is relatively low to represent the whole Capsicum genome, it is the first study to investigate densities of microsatellite motifs distributed among 22 in silico libraries.In spite of the fact that some EST numbers low in some microsatellite motifs these findings may indicate that densities of microsatellites were higher than they would be predicted purely on the grounds of base composition.Tissue specific microsatellites with known function can be effectively used in genetic studies in plants.In the present study we also demonstrated that cDNA libraries could be reassembled to construct tissue and development stage specific in silico libraries which could be used in gene identification and annotation studies as well as identification of single nucleotide polymorphism.