INVESTIGATION OF THE DEPENDENCE STRUCTURE IN SEISMIC HAZARD ANALYSIS : AN APPLICATION FOR TURKEY SERPIL ÜNAL

In this study, using the earthquake occurrence data (Richter magnitude is equal to 4 or greater than 4 in the years 1901-2014) of the areas limited by 39.5◦−42◦ N latitudes and 26◦−45◦ E longitudes of North Anatolia and 36◦ − 39.5◦ N latitudes and 26◦ − 31◦ E longitudes of West Anatolia, it is aimed to model the dependence structure of Semi-Markov model via conditional copulas in which the copula is parametric and its parameter varies as the covariate, based on the assumption that the successive earthquakes in the same structural discontinuity should not be independent events and the occurrence of the earthquakes should be influenced by the elapsed time between them. From the results obtained for these regions with high seismicity, it is seen that the variation in the strength of dependence between the time elapsed from the previous seismic event and the magnitude of the next seismic event at different magnitudes of previous seismic event is highly significant and a usage of the parametric linear form in the copula parameter will be adequately characterized.


Introduction
Turkey takes part on the Alpine-Himalayan (Mediterranean) seismic belt, one of the important seismic belts of the world and its tectonic framework shows abundant evidence of past and continuing mobility.The tectonic inequilibrium of Turkey is re ‡ected by numerous active faults.The north Anatolian fault (NAF) is the most important and active of these.The NAF was reactivated on 17 August (M w = 7:4) and 12 November (M w = 7:2) 1999 with two destructive earthquakes in the eastern Marmara region as a result of the westward migrating large earthquake series in the 20th century.West Anatolia also takes part among the signi…cant mechanizm of active tectonics in Turkey.The area was exposed to continuous earthquakes due to its highly mixed tectonic appearance and it also creates a region with a high potential for earthquakes in the future.
Nowadays it is accepted that it is impossible either to know where the earthquakes occur or when they occur, their magnitudes, and to prevent these devastating natural events.However, the statistical studies existing in the …elds of geophysical, geological and earthquake engineering show that the parameters of possible earthquakes and the severity of ground motions they created can only be estimated probabilistically.In other words, since earthquakes demonstrate randomness according to parameters and there are various uncertainties (such as some de…ciencies in the earthquake records), seismic hazard estimation with probabilistic methods is seen as the most appropriate method.Semi-markov model, one of the most commonly used probabilistic model, is based on the assumption that although the earthquakes are dependent on the space dimension, they are independent from the unit-time on the time dimension.According to the model, the magnitude of an earthquake depends on the magnitude of the previous earthquake and the time interval between them.This may indicate that a long period seismic quiescence may end with an earthquake of large magnitude [6,7,8,33].
Long-term or short-term prediction of the earthquakes has been a real challenge in science for decades and several papers have appeared for and/or against the ability to do so.The contributed papers vary with respect to the time horizon of their prediction, the magnitude range that can be predicted, the area covered and of course the method upon which prediction is based.Parallel to this, there are also some theories that try to provide information towards the prediction challenge.One, rather controversial, such theory is the so-called 'seismic gap' theory that relates the time elapsed from the previous seismic event to the magnitude of the next seismic event.According to this theory, there is a positive relation, which implies that an area without seismicity for a time period increases its probability to have a major earthquake.If such a relationship exists then by modeling these two parameters we can predict (in statistical sense) the size of a future earthquake [27].
Whenever we are dealing with the issue of modeling dependencies among random variables, copula models come into play.Copulas are functions which link multivariate distribution functions to their one-dimensional marginal distribution functions.Their use in several scienti…c …elds has a long history dating back to, in economics [13]; in survival analysis [14,15,20,29,34]; in …nance [10,12]; in insurence [18,24] and in geology [25,27].Although copulas have been used in the applied statistical literature for long years, the covariate adjustment for copulas has been considered only recently.In [31], Patton proposed the concept of conditional copula in which the correlation is a¤ected by covariate(s).To our knowledge, so far conditional copula have been applied only in …nance [9,23,30,31,32], in survival analysis [2,3,4,19].
In the present paper, based on the seismic gap theory, we aim to estimate the seismic hazard by modeling the dependence structure of Semi-Markov model with conditional copula, based on the assumption that the successive earthquakes in the same structural discontinuity should not be independent events and the occurrence of the earthquakes should be in ‡uenced by the elapsed time between them.

Unconditional and conditional copulas
A copula is a function which joins or couples a multivariate distribution function to its one-dimensional marginal distribution functions.More formally, the following de…nition can be given.
The informal and formal de…nitions are connected by the following theorem which also elucidates the role that copulas play in the relationship between multivariate distribution functions and their univariate margins.
Theorem 2. (Sklar's theorem) Let H be a joint distribution function with margins F 1 and F 2 :Then there exists a copula C such that for all x 1 ; x 2 in IR, H (x 1 ; x 2 ) = C (F 1 (x 1 ) ; F 2 (x 2 )) : (1) ( 1; 1) f0g 2.1.Estimation (Inference functions for margins method).The log-likelihood function is given by In the …rst stage of the method, the parameters of the marginal distributions F i are estimated and in the second stage, the copula parameters conditioned to the previous estimates of marginal distributions are estimated.In each stage the maximum likelihood method is used [30].
2.2.Goodness of …t test for copulas.Given that we have followed IFM approach to estimate the parameters of a set of copulas, if the number of estimated parameters is the same across all maximum likelihood functions and in the estimation process the same data is used for each model speci…cation, an obvious criterion is to compare the maximized value of the log likelihood function l( #) [17].
2.3.Model selection.In [5], Allcroft and Glasbey proposed a bootstrap method for testing the hypothesis, for k = 1; :::; m; (3) Since the joint distributions of the log-likelihoods are approximately multivariate normal based on the central limit theorem, Mahalanobis squared distance is appropriate for making the above comparisons.This distance between the vector of log-likelihoods at the original data and the vector of average log-likelihoods k at the simulated data from the kth copula is obtained as, for k = 1; :::; m; ) where S is the sample covariance matrix for these log-likelihoods.Based on the normality of the log-likelihoods, the quantity D 2 k follows an mF m;B 1 , distribution under the null hypotheses that kth copula distribution is correct [27,28].
When the covariate is added to the model in the issue of modeling dependence between two or more random variables, you need to model the conditional copulas whose copula parameter varies according to the values of a measured covariate.
with the following properties: 1) For 8u; v 2 [0; 1] and 8x 2 ; x and be the support of X.Then, there exists a unique conditional copula C (:; :jx) ; whenever F 1jX (:jx) and F 2jX (:jx) are continuous in y 1 and y 2 , for all x 2 , such that Conversely, if we let F 1jX (:jx) be the conditional distribution of Y 1 jX = x; F 2jX (:jx) be the conditional distribution of Y 2 jX = x and fC (:; :jx)g be a family of conditional copulas measurable in x; then the function H X (:; :jx) de…ned in ( 5) is a conditional bivariate distribution function with conditional marginal distributions F 1jX (:jx) and F 2jX (:jx) [31,32].
Since, for most copula families, the form of the calibration function (:) characterizing the underlying dependence structure is di¢ cult to discern by inspection, for a nonparametric approach in estimating the target function see [1,2,3].
2.5.Choice of bandwidth (Leave-one-out cross-validation).Choice of bandwidth parameter is an important issue in local estimation.A too small bandwidth parameter will yield an estimator with a smaller bias but a greater variance, and the calibration function will be undersmoothed.A too large bandwidth will produce a less variance but a larger bias, and an oversmoothed calibration function.For chosing of optimum bandwidth see [2,3].
2.6.Conditional copula selection.Suppose we have a (…nite) set of candidate families C = fC q : q = 1; :::; Qg with copula parameter function q (:) ; q = 1; :::Q from which we want to choose the one that best represents the data at hand.Since the estimation depends on the bandwidth parameter h; these estimates are denoted by ^ hq (X); q = 1; :::Q: Hence, from each family there is a candidate model and we face the task of choosing the family whose representative best …ts the data.For selecting to copula family see [2,3].2.7.Generalized likelihood ratio test for copula functions.Suppose that f(U 11 ; U 21 ; X 1 ) ; :::; (U 1n ; U 2n ; X n )g is a random sample from the conditional copula model (6).The null hypothesis of interest restricts the space of calibration functions to a subspace f that is fully speci…ed parametrically, such as the set of all linear functions on :Then we are interested in testing For testing a parametric null hypothesis versus a nonparametric alternative hypothesis, e.g.(7), in [16], Fan et al. developed exploration of the asymptotic distribution of the ratio test falls within the scope of the generalized likelihood ratio test (GLRT).According to the test, the log-likelihood function under the null hypothesis is given by and the alternative hypothesis is given by The di¤erence between the two log-likelihoods allows us to evaluate the evidence in the data in favor of (or against) the null model.Hence, GLRT statistic is given by Since nonparametric maximum likelihood estimators are di¢ cult to obtain and may not even exist, in [16], Fan et al. suggested using any reasonable nonparametric estimator under the alternative model.In particular, using a local polynomial estimator to specify the alternative model of a number of hypothesis testing problems, in [16], Fan et al. showed that the null distribution of the GLRT statistic follows asymptotically a chi-square distribution with the number of degrees of freedom independent of the nuisance parameters.Namely, let j j be the range of the covariate X and de…ne Then, . Also, for simplicity, according to the Epanechnikov kernel, some values in identifying the degrees of freedom are given in the following table [37]:

Application to earthquake data
In this study, it is investigated the following two regions of Turkey by the considerations of earthquake zones map in geographic information system, the seismic activity maps to Turkey, its vicinity in the Integrated Homogeneous Earthquake Catalog and Turkey's fault lines (North Anatolia, Eastern Anatolia, Western Anatolia), as follows [35,36]: Region 1, if latitude 39:5 ; Region 2, if latitude < 39:5 and longitude 31 ; and using the earthquake occurrence data (Richter magnitude is equal to 4 or greater than 4 in the years 1901-2014) of these regions, it is aimed to model the dependence structure of Semi-Markov model via conditional copulas in which the copula is parametric and its parameter varies as the covariate, based on the assumption that the successive earthquakes in the same structural discontinuity should not be independent events and the occurrence of the earthquakes should be in ‡uenced by the elapsed time between them.
In the modeling process with conditional copula, …rstly, conditional marginals are estimated by the following equations: where X ti 1 ; X ti and Y i 1 = t i t i 1 are respectively the earthquake magnitude at time t i 1 ; the earthquake magnitude at time t i and the elapsed time between successive earthquakes.The …rst step in specifying the conditional marginal distributions given by equations ( 12) and ( 13) is to …nd the marginal distributions of the elapsed time between successive earthquakes and the earthquake magnitude at time t i .Accordingly, the distributions determined are as follows: Table 3.The marginal distributions of the elapsed time between successive earthquakes and the earthquake magnitudes The next and …nal step in specifying the conditional marginal distributions is the selection of the most appropriate copula which describe the dependence between variables X ti 1 and Y i 1 and X ti 1 and X ti .Accordingly, copula parameter estimation, the log-likelihood value l( #) and simulation results regarding each region are as follows: Table 4.Copula parameter estimation, the log-likelihood value and simulation results for Region 1 …t measure hint towards the conclusion that the Tawn copula with ^ = 0:1750 best represents the dependence structure between X ti 1 and Y i 1 (p = 0:2768) : Also, the Clayton copula with ^ = 0:8511 best represents the dependence structure between X ti 1 and X ti (p = 0:0413) : Table 5.Copula parameter estimation, the log-likelihood value and simulation results for Region 2 of …t measures hint towards the conclusion that the Gumbel copula with ^ = 1:0362 best represents the dependence structure between X ti 1 and Y i 1 (p = 0:1296) : Also, the Frank copula with ^ = 2:5237 best represents the dependence structure between X ti 1 and X ti (p = 0:0828) : After estimating conditional marginals, it is needed to estimate the calibration function in the local copula-likelihood method for estimating the functional relationship between the copula parameter and the covariate.For this purpose, …rstly, in the local constant estimation (p = 0) ; under the considered copula families, the optimum bandwidths regarding each region are chosen as follows: For comparison, we also perform the global estimations regarding each region with constant forms to be ã01 = 0:449375, ã02 = 1; 164563:Then, under the chosen copula, for each region, we perform the generalized likelihood ratio test to check whether the earthquake magnitude at time t i 1 has a signi…cant e¤ect on the strength of dependence.According to this, results regarding each region are as follows: For Region 1, under FGM copula, using optimum bandwidth, we obtain the degrees of freedom of chi-square distribution as 15:71.The test statistic 35:197 yields a p-value to be 0:003735.Thus, we conclude that the e¤ect of X ti 1 , the earthquake magnitude at time t i 1 ; on the strength of dependence between X ti ;the earthquake magnitude at time t i ; and Y i 1 ; the elapsed time between successive earthquakes, is statistically signi…cant.
For Region 2, under Galambos copula, using optimum bandwidth, we obtain the degrees of freedom of chi-square distribution as 7:139.The test statistic 27:6269 yields a p-value to be 0:000257.Thus, we conclude that the e¤ect of X ti 1 ; the earthquake magnitude at time t i 1 ; on the strength of dependence between X ti ;the earthquake magnitude at time t i ;and Y i 1 ; the elapsed time between successive earthquakes, is statistically signi…cant.
Since, for regions 1 and 2, we conclude that the e¤ect of X ti 1 on the strength of dependence between X ti and Y i 1 is statistically signi…cant, for this regions, under the chosen copulas, we perform the generalized likelihood ratio test to check whether the functional relationship between the covariate and the calibration function is linear when approached the calibration function with local linear estimates.According to this, in the local linear estimation (p = 1) ; the optimum bandwidths are chosen as follows: 1.00 Our copula selection method chooses the FGM family for Region 1, having the minimum cross-validated prediction error with value 209:1614, the Galambos family with value 312:6570 for Region 2. For comparison, we also perform the parametric estimations regarding Region 1 and Region 2 with linear forms to be ã01 = 0:666436 and ã11 = 0:047099; ã02 = 2:676213 and ã12 = 0:325290, respectively.According to the generalized likelihood ratio test, obtained results regarding Region 1 and Region 2 are as follows: For Region 1, under the FGM copula, using optimum bandwidth, we obtain the degrees of freedom of chi-square distribution as 9:4331.The test statistic 20:31946 yields a p-value to be 0:016043.Thus, we conclude that the linear e¤ect of X ti 1 ; the earthquake magnitude at time t i 1 ; on the strength of dependence between X ti , the earthquake magnitude at time t i ; and Y i 1 ; the elapsed time between successive earthquakes, is statistically signi…cant.So, for Region 1, the copula parameter is calculated to be ^ 1 x ti 1 = sin 0:666436 0:047099x ti 1 : For Region 2, under the Galambos copula, using optimum bandwidth, we obtain the degrees of freedom of chi-square distribution as 5:9442.The test statistic 16:81059 yields a p-value to be 0:010004.Thus, we conclude that the linear e¤ect of X ti 1 ; the earthquake magnitude at time t i 1 ;on the strength of dependence between X ti ,the earthquake magnitude at time t i ; and Y i 1 ;the elapsed time between successive earthquakes, is statistically signi…cant.So, for Region 2, the copula parameter is calculated to be ^ 2 x ti 1 = exp 2:676213 + 0:325290x ti 1 : According to the results obtained regarding regions 1 and 2, since the e¤ect of X ti 1 on the strength of dependence between X ti and Y i 1 is statistically sig-ni…cant, the dependence structure of Semi-Markov model will be modeled with conditional copula.According to this, for Region 1, to be among a number of copula families the one that best represents dependence structure between X ti 1 and X ti is Clayton (unconditional) copula with ^ = 0:8511 and X ti Gen P areto (k = 0:2371; = 0:7564; = 3:9296) the conditional marginal distribution can be written as follows: where A = (F (x)) 0:8511 + (F (x 1 )) :Additionally, for Region 1, to be among a number of copula families the one that best represents dependence structure between X ti 1 and Y i 1 is Tawn (unconditional) copula with ^ = 0:175; X ti Gen P areto (k = 0:2371; = 0:7564; = 3:9296) and Y i 1 W eibull ( = 0:3910; = 0:0396; = 0) ; the conditional marginal distribution can be written as follows: where B = exp : To be the conditional marginal distributions given by the equations ( 14) and ( 15), P Y i 1 yjX ti 1 = x = U 1 and P X ti x 1 jX ti 1 = x = U 2 and among a number of copula families the one that best represents dependence structure between X ti and Y i 1 is FGM (conditional) copula with ^ 1 (x) = sin (0:666436 0:047099x) ; the conditional joint distribution can be written as follows: Similarly, for Region 2, to be among a number of copula families the one that best represents dependence structure between X ti 1 and X ti is Frank (unconditional) copula with ^ = 2:5237 and X ti Gen P areto (k = 0:1455; = 0:5741; = 3:9454) ; the conditional marginal distribution can be written as follows: where N = exp ( 2:5237F (x)) and E = exp ( 2:5237F (x 1 )) : Additionally, for Region 2, to be among a number of copula families the one that best represents dependence structure between X ti 1 and Y i 1 is Gumbel (unconditional) copula with ^ = 1:0362; X ti Gen P areto (k = 0:1455; = 0:5741; = 3:9454) and Y i 1 W eibull ( = 0:4237; = 0:0232; = 0), the conditional marginal distribution can be written as follows: where J = ( ln (F (x))) 1:0362 + ( ln (G (y))) 1:0362 .To be the conditional marginal distributions given by the equations ( 17) and ( 18), P Y i 1 yjX ti 1 = x = U 1 and P X ti x 1 jX ti 1 = x = U 2 and among a number of copula families the one that best represents dependence structure between X ti and Y i 1 is Galambos (conditional) copula with ^ 2 (x) = exp ( 2:676213 + 0:325290x), the conditional joint distribution can be written as follows: where I = exp ( ln (u 1 )) As we have already mentioned, it is accepted that it is impossible either to know where the earthquakes occur or when they occur, their magnitudes.However, the equations obtained above show that the parameters of possible earthquakes and the severity of ground motions they created can be estimated probabilistically.

Result
Turkey is a country in which earthquake hazard is extremely high in terms of geological, historical and instrumental earthquake activities since it takes part on the Alpine-Himalayan (Mediterranean) seismic belt, one of the important seismic belts of the world.In two destructive earthquakes Marmara, August 17, 1999 and Düzce, November 12, 1999, thousands of people died and tens of thousands were wounded, hundreds of thousands of buildings were destroyed.The experiences we had in the past indicate that we will face with these type destructive earthquakes in the future.For this purpose, with some statistical analysis and predictions done in this study, we tried to show that the casualties and damage occurred in the results of the earthquakes in Turkey, an earthquake zone, could be prevented to some extent.
In this study on the basis of seismic gap theory, it is aimed to model the dependence structure of Semi-Markov model via conditional copulas.According to results obtained, for regions 1 and 2 with high seismicity, we conclude that the variation in the strength of dependence between the time elapsed from the previous seismic event and the magnitude of the next seismic event at di¤erent magnitudes of previous seismic event is highly signi…cant and using the parametric linear form in the copula parameter will be adequately characterized, namely FGM (conditional) copula parameter ^ 1 x ti 1 = sin 0:666436 0:047099x ti 1 for Region 1 and Galambos (conditional) copula parameter ^ 2 x ti 1 = exp 2:676213 + 0:325290x ti 1 for Region 2. According to these results, when 1999 Marmara earthquake with a magnitude 7.4 that occurred in region 1 is considered, the probability of an earthquake with a magnitude greater than 7 in region 1 in the next 20 years is insigni…cant.Similarly, when 2017 Bodrum earthquake with a magnitude 6.5 that has occurred in region 2 recently is also considered, the probability that an earthquake with a magnitude greater than 7 in region 2 in the next 10 years is insigni…cant.Furthermore, with these conclusions, we have emphasized the rightfulness of the use of the Semi-Markov model in the previous studies [6,7,8] to obtain the earthquake occurrence probabilities in the regions 1 and 2.

Figure 1 .
Figure 1.Separation of regions of Turkey.

Table 1 .
[26]and F 2 are continuous, then C is unique; otherwise, C is uniquely determined on RanF 1 RanF 2 .Conversely, if C is a copula and F 1 and F 2 are distribution functions, then the function H de…ned by (1) is a joint distribution function with margins F 1 and F 2[26].Some of one parametric families of copulas

Table 6 .
The optimum bandwidths in the local constant estimation In the local copula-likelihood method, another stage is the selection of the most appropriate conditional copula regarding each region under the optimum bandwidths.Our copula selection method chooses the FGM family for Region 1, having the minimum cross-validated prediction error with value 209:4950, the Galambos family with value 312:5947 for Region 2.

Table 7 .
The optimum bandwidths in the local linear estimation