NEHİRLERDE SEDİMENT MİKTARININ BELİRLENMESİNDE AMPİRİK MODELLEME TEKNİKLERİNİN DEĞERLENDİRİLMESİ

Nehirlerdeki sediment tasinim surecleri uzun yillardir onemli bir arastirma konusu olmustur. Nehirlerde tasinan sediment miktari, nehrin akimi ve sediment konsantrasyonu ile guclu bir iliski icerisindedir. Bu calisma, bu iliskiyi gostermeyi ve dort farkli modelleme teknigi olan MLR, PLS, SVM ve ANN metotlarini kullanarak sediment miktarini hesaplamayi amaclamaktadir. Turkiye’nin Dogu Akdeniz bolgesinde yer alan Goksu Nehri’ne ait akim, sediment konsantrasyonu ve sediment miktari modellerde girdi verisi olarak kullanilmistir. Bu calismanin amaci, nehir akimiyla tasinan sediment miktarinin tahmin edilmesinde ANN modelleme tekniginin etkisini degerlendirmektir. Verilerin yuzde ellisi modelin gelistirilmesi icin ogrenme seti olarak, kalan veriler ise modelin validasyonu  icin test seti olarak kullanilmistir. Test setinin belirleme katsayisi (r 2 pred ) dikkate alinarak dort modelin performansi degerlendirilmistir. Sonuclar ANN’nin en etkili yontem oldugunu (r 2 pred = 0.94) ve onu SVM’nin takip ettigini (r 2 pred =0.72) gostermektedir. MLR ve PLS ise Goksu Nehri’ndeki sediment miktarinin belirlenmesinde en az etkili yontemlerdir  (r 2 pred = 0.67). Bu nedenle, nehirdeki sediment miktarini tahmin etmek icin en etkili yontem, ANN’nin farkli konfigurasyonlari calisilarak arastirilmistir.


INTRODUCTION
The sediment transport of streams is a complex phenomenon, which have been the subject of research for many years due to its importance in planning the management of water resources. Process-based numerical models based on the relation between sediment concentration values and streamflow data have been widely used for prediction of sediment amount (Engelund and Fredsoe, 1976;Dietrich et al., 1999;Nelson et al., 2006;Jarritt and Lawrence, 2007;Kettner and Syvitkski, 2008). However, a river system is a complex network including various physical and morphologic dynamics, thereby modelling such systems requires a detailed spatial and temporal data. For this reason, a simpler, user-friendly approach is required and preferable for modelling sediment transport in rivers.
Empirical modelling is an alternate method to estimate the sediment amount in rivers using the regression techniques to fit the measured data. Such methods facilitate to control the data inputs and identify the irrelevant variables and provide a flexible approach to produce reasonable solution from small data sets (Abrahart and White, 2001). Different regression models have been studied in literature for modelling sediment transport in rivers. For example, Sinnakaudan et al. (2006) developed a model to estimate the total bed material for rivers in Malaysia using Multiple Linear Regression (MLR) analyses. Shi et al. (2013) used Partial Least Squares (PLS) regression to explore the relationship between the landscape characteristics and sediment amount. A study carried out by Kisi (2012) investigated the ability of Least Square Support Vector Machine (LSSVM) for modelling discharge-suspended sediment relationship.
Artificial Neural Network (ANN) is an alternative data-driven modelling, which has been widely applied in a variety of areas, especially for the last decades. Recent studies reveal that ANN has become an effective methodological approach for modelling sediment transport  Abrahart and White (2001) carried out a study on the comparison of ANN and MLR techniques using small data sets, and proposed ANN was able to exceed the limitations of MLR method. Tayfur (2002) modelled the sheet sediment transport using ANN and tested the performance against that of the most commonly used physically-based models, whose transport capacities were based on flow velocity, shear stress, stream power, and unit stream power. The results revealed that ANN performed as well as the physically-based models for simulating nonsteady-state sediment loads from different slopes. Yitian and Gu (2003) applied ANN for modelling daily and annual sediment discharges in the Yangtze River and Dongting Lake, China. The comparison of the predicted and observed data demonstrated that ANN technique was a powerful tool for real-time prediction of flow and sediment transport in complex network of rivers. Arı Güner et al. (2013) applied ANN method for modelling longshore sediment transport (LST) in Karaburun, Turkey and evaluated the accuracy of the ANN predictions against the measured values. They also compared ANN with two well-known empirical formulas (CERC, Kamphuis), and a numerical model (LITPACK). According to the results, ANN followed the most successful method "Kamphuis" for estimation of LST rates and provided a practical and accurate determination of the LST rate for most regions.
This paper aims to develop four different regression models; MLR, PLS, SVM, ANN, and test the performance of these models for the estimation of sediment amount in the Göksu River. In addition, the effect of different network topologies of ANN are studied and the best configuration for the prediction of river sediment amount is assessed. Here, we aimed at proposing an effective and simple regression model, which could provide a reliable alternative to more complicated process-based models for the estimation of sediment amount in the study area.

Data Requirements
The classical and commonly used method in the estimation of sediment amount is based on the relation between measured suspended sediment concentration values and measured water discharge, which can be represented by the below formula: (1) where Q S is the sediment amount (ton/day), Q w is the flow-rate (m 3 /s), C s is the sediment concentration (ppm) and k is a coefficient.
The data for the Göksu River including river flow, sediment concentration, and sediment amount is obtained from Turkish General Directorate of Electrical Power Resources Survey and Development Administration (EIE). A total number of 493 data including daily flow and monthly sediment concentrations between years 1999 and 2010 are entered to regression models as independent variables, while monthly sediment amount are used as dependent variable.

Regression Models
Molegro Data Modeller (MDM) software is used to estimate the sediment amount by the application of four different regression models: MLR, PLS, SVM, and ANN. Finally, three different network topologies of ANN methods are further assessed to determine the best configuration for the prediction of river's sediment amount.
MLR model assumes that the dependent variable y is a linear function of the independent variables, x i , which can be written as: ( 2) where the c i 's are the regression coefficients in the linear model (MDM User Manual, 2013).
In PLS, a smaller set of factors called latent components is extracted from the set of available descriptors (independent variables x i ), which models the dependent variable y. PLS regression creates latent components from the independent variables, x i , while taking the dependent variable y into account (MDM User Manual, 2013).
SVM is used for linear classification. MDM considers that different types of objects are positioned on a 2D plane and is interested in a classifier capable of predicting the type of an object given its position in the plane. In this case the data are linearly separable with several possible choices of lines dividing the plane into regions according to class of objects. Support vector machines try to find the maximum separating hyperplane, which in 2D corresponds to the line with the widest borders (MDM User Manual, 2013).
ANN consists of input, hidden and output neurons arranged in layers. The neural network is constructed by assigning each independent variable to a neuron in the input layer. Each input is connected to a number of neurons, which constitute the hidden layer (Van Maanen et al., 2010). The network is first trained, whereby the target output neuron in each output neuron is minimized by adjusting the weights and biases through some training algorithm. During training, each connection multiplies the neuron output by a weight before the output enters the connected neuron. The combination of the weighted inputs can be expressed as (Tayfur, 2002): where net j is the summation of the weighted input for the jth neuron, x i is the input from the i th neuron to the j th neuron, w ij is the weight from the i th neuron in the previous layer to the j th neuron in the current layer, and bj is the threshold value, also called the bias, associated with node j. The sigmoid function is applied as an activation function in the training of network to understand if the activation of a neuron is strong enough and produces a successive output that is sent to other neurons as an input. The sigmoid function is represented below (Tayfur, 2002): In this study, flow rate and sediment concentration are entered to ANN model as input layer and the connections from the hidden layer are connected to the output layer, which is trained to estimate the dependent variable: sediment amount. The number of layers and neurons in hidden layers are adjusted by considering different network configurations, which are given in Figure 1.

RESULTS AND DISCUSSION
MLR, PLS, SVM, and ANN analyses are applied to investigate the relationship between dependent variable and independent variables (descriptors) and to predict sediment amount in the Göksu River. Depending on the availability of field data, model validation is undertaken based on the predicted and observed sediment amounts. MDM divides the existing database into two groups for all regression models. One is used for training, and the other for validation purposes. Hence, the existing data sets are splitted into two subsets where 50% of them are used for training and the other 50% are used for prediction and validation. The same training/ prediction sets are used for generation of all models. The regression results for MLR, PLS and SVM are illustrated in Figure 2a, 2b and 2c, respectively. The model outcomes for SVM fit the observed values better, whereas more outliers are observed for MLR and PLS model results.
Outliers are observations that have large residual values and may be originated from errors or from initially accepting marginal or unacceptable data (Sinnakaudan et al., 2006). Parameter settings for SVM are given in Table 1. The same method is followed for development of ANN model. ANN configuration given in Figure 1a is set up to predict the sediment amount in the Göksu River. Determination coefficient of prediction set (r 2 pred ) is used to compare the performance of the four models and select the best method. The model that have maximum r 2 pred value is selected for further analysis. The model results reveal that ANN is the most effective method for estimating sediment amount in the Göksu River. Previous studies also revealed that ANN is a powerful tool for prediction of flow and sediment transport in river systems and preferable to exceed the limitations of other regression methods and physically-based models ( Two different network topologies are also applied to determine the best configuration, one of which includes 2 nd hidden layer with two neurons and the other also contains the 2 nd layer with four neurons. Initial weight range values between 0.2 and 0.8 are entered to ANN model. The best regression outcomes are obtained for the weight value 0.5 (r 2 pred =0.94), so this value is maintained for all ANN methods. Parameter settings of the models and outcomes are given in Table 2 and Figure 3, respectively. Overall statistics of four models are also given in Table 3. According to the model results, it is observed that increased number of neurons in the 2 nd layer does not have a significant influence on regression outcomes.

Figure 3: Predicted vs. observed sediment amounts for (a) ANN (3-0), (b) ANN (3-2), and (c) ANN (3-4) models
In addition to r 2 pred , Spearman's rank correlation coefficient (rho) and Nash-Sutcliffe efficiency coefficient (NS) are also calculated. NS is defined as one minus the sum of the absolute squared differences between the predicted and observed values normalized by the variance of the observed values during the period under investigation (Krause et al., 2005). According to the overall statistics given in Table 3, ANN (3-0) can be suggested as the most reliable model among the four regression techniques and different configurations of ANN.
It is important to define an applicability domain of the proposed models for future applications on different data scales. Applicability domain is a structural space, knowledge, or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new data points (Roy et al., 2015). The model results reveal that 92% of the predicted values of ANN (3-0) fall within the applicability domain of the proposed model.

CONCLUSION
The aim of the present study is modelling the sediment amount in the Göksu River via different black box models by using the water discharge and sediment concentrations as input data. For this purpose, four regression techniques; MLR, PLS, SVM, and ANN are applied to develop the models and the performance of such models are evaluated by determination coefficient of prediction set (r 2 pred ). The ANN model gives the most reliable predictions among the regression models tested, with a r 2 pred value 0.94, followed by SVM (r 2 pred = 0.72). MLR and PLS methods are the least effective techniques (r 2 pred = 0.67) for estimating sediment amount in the Göksu River. Further analysis of ANN method is applied for different configurations: ANN (3-0), ANN (3-2), and ANN (3-4). According to r 2 pred values given in Table 2, increasing the number of neurons in the 2 nd layer does not have a significant influence on model outcomes.
Widely-used process-based models are based on the relationship between water discharge and sediment concentrations, as well as the topographical and geomorphologic properties of the rivers. However, spatial heterogeneity of river systems cause limitations of measured field data and prevent to obtain an accurate and reliable estimation of the sediment amount. For this reason, simpler approaches have been investigated in literature for modelling sediment transport in rivers. This paper focuses the four different empirical models that provide quick simulations with minimum data requirement. ANN (3-0) model may be used as an effective method instead of process-based models for the estimation of sediment amount in rivers.