A novel randomized recurrent artificial neural network approach: recurrent random vector functional link network

: The random vector functional link (RVFL) has successfully been employed in many applications since 1989. RVFL has a single hidden layer feedforward structure that also has direct links between the input layer and the output layer. Although nonlinearity, high generalization capacity, and fast training ability can be provided in RVFL, it can be found from the literature that higher nonlinearity can be obtained by adding recurrent feedback to an artificial neural network. In this paper, the recurrent type of RVFL (R-RVFL), which has both outer feedbacks and also inner feedbacks, is proposed. In order to evaluate and validate the proposed approach, a total of 109 public datasets were employed. Obtained results showed that R-RVFL can be employed successfully in terms of obtained success rates.

The motivation behind this paper is to improve RVFL in order to have a recurrent form. The proposed recurrent RVFL (R-RVFL) was tested by 109 datasets that can be grouped as classification (31 datasets), regression (31 datasets), and time series (47 datasets) datasets. The obtained success rates showed that the proposed approach can be employed in classification and regression tasks. The rest of the paper is organized as follows. The list of employed benchmark datasets and their sources, the proposed approach, and the applied procedure are presented in Section 2. Obtained results are given and discussed in Section 3. Finally, Section 4 concludes this study.

Traditional random vector functional links network
The structure of the RVFL artificial neural network is shown in Figure 1 [3].

Inputs
Outputs Output we ghts D rect l nks Enhancement Nodes Figure 1. General structure of basic form of RVFL.
As seen in Figure 1, the output of RVFL is a summation of direct links from inputs and nonlinearly mapped inputs. The output of RVFL, y , can be calculated as follows: . ( Here, x , n , m , g (), a i,j , β j , and b j are the input, number of attributes, number of enhancement nodes, activation function (any piece of a differentiable function), weights in the hidden layer, weights in the output layer, and biases in the enhancement nodes and in the output layer, respectively. In RVFL, the weights in the hidden layer and biases in the enhancement nodes are assigned randomly. Therefore, the output of the j th enhancement node can be calculated by the following equation: Finally, the weights and biases in the output layer are calculated by Moore-Penrose pseudoinverse or ridge regression [3].

Novel proposed recurrent random vector functional links network
In R-RVFL, inner and outer feedbacks were added to the structure of RVFL as seen in Figure 2.  In R-RVFL, in addition to the weights in the hidden layer and the biases in the enhancement nodes, weights in the inner ( k ) and outer ( K ) feedbacks are also assigned arbitrarily. The output of R-RVFL ( y s ) can be calculated as follows (see Figure 2): where l , k , and K are the number of outer feedback connections and the inner and outer feedbacks weights, respectively. The calculations can be done sequentially and the output of each enhancement node of the sample s is updated according to the previous output of that enhancement node as follows: Furthermore, as seen in Figure 2, context neurons were added to the structure of the RVFL network. The input of each context neuron was associated with a specific delay of the whole output of the RVFL according to the weight of that outer feedback. These context neurons act similarly to other input neurons. Moreover, in traditional RVFL, there is not any activation function on the direct links and also bias in the output layer. In this version of RVFL, as seen in Figure 2, activation functions were also employed on direct links.

Applied methodology
In this study, the results achieved by the proposed R-RVFL were compared with the results obtained by an ANN that was trained by backpropagation, RNN, and RVFL. The applied methodology in this study can be summarized in three steps.
1st step: Normalizing the employed dataset into the range of -1 to 1.
2nd step: Determining the optimal network parameters. The parameters used in the trials are given in Table 1. The parameters were selected according to the achieved highest mean test success rate in crossvalidation.

R-RVFL
Accuracy and root mean square error (RMSE) were used as success rates in classification and regression datasets, respectively. These validation metrics are calculated as follows: where f , y , and N are the desired value, obtained value, and number of observations (samples) in the dataset, respectively.
3rd step: Employing the ANN, RNN, RVFL, and R-RVFL in each dataset. Two different test procedures were employed. The first one is 5-fold cross-validation and all classification, regression, and time series datasets were classified/estimated according to 5-fold cross-validation [14]. The second procedure is Monte Carlo crossvalidation and only time series datasets were employed in this procedure according to the following training-test partitions [15] (see Table 2). Based on the applied procedures, the same data partitions were used in each employed method.

Utilized datasets
In order to validate the proposed approach, 109 benchmark datasets were employed. The datasets can be divided into 3 groups as follows. EUR/USD, 6 US 30Y T-Bond, 6 Euro Bund, 6 Japan Govt. Bond, 6 crude oil, 6 natural gas, 6 gold, 6 copper, 6 and US-wheat. 6

Results and discussion
In this section, first, the effect of the parameters of the R-RVFL will be investigated. Later, results achieved by R-RVFL will be compared with results obtained by ANN, RNN, and RVFL.

Analysis of proposed approach
In order to analyze the properties of the proposed approach, the Lithuanian, forest fire, and Dow 30 datasets were used in tests and the effect of the parameters of R-RVFL was assessed according to obtained mean accuracies in cross-validations. First, R-RVFL was assessed according to the inner and outer feedbacks and obtained mean accuracies are summarized in Table 3. Note that the datasets marked with an asterisk were estimated according to Monte Carlo cross-validation based on the orders of the samples, while the others were classified/estimated according to 5-fold cross-validation. As seen in Table 3, in the Lithuanian dataset, the inner feedbacks are as important as the outer feedbacks. On the other hand, in the forest fire dataset, Dow 30, and Dow 30 * datasets, it can be said that the proposed approach does not gain any extra knowledge in inner feedbacks (i.e. there is not any requirement of inner feedback). However, the findings given in Table 3 show that using both inner and outer feedbacks boosts the overall success and is the main difference between the structure of RVFL. This may be explained by the features of the cascaded control scheme [17], since the disturbances can be eliminated faster, the controllability of the inputs is increased, and time delay effects are reduced [18]. Furthermore, the obtained success rates are related to the number of context neurons, which is the same as the number of outer feedbacks, as summarized in Table  4. As seen in Table 4, a correlation between obtained success rates and the number of context neurons was not found. Therefore, these results show that the optimum number of context neurons must be determined by trials or maybe by an expert opinion. Furthermore, it can be said that for each of the employed datasets, using context neurons increases the success in terms of accuracy, but the increase in the number of used context neurons did not yield an increase in accuracy. It can be said that the number of context neurons may depend on the characteristics of the dataset. Moreover, the effect of output bias, direct link, and activation function in the direct link was tested and obtained success rates are given in Table 5.
As seen in Table 5, the output bias does not boost the success rate in the employed datasets. On the other hand, using an activation function in the direct link increases the success of the RVFL. Moreover, having a direct link and the activation in the network structure increases the success of the R-RVFL. Consequently, reported results in Table 5 suggest that the direct link and the activation function in the direct link boost the success of R-RVFL. This result supports the finding of Zhang and Suganthan that direct links have a high impact on the increase of the success of the RVFL [3]. Although a boosting effect of the output bias was not observed, using output bias did not yield a decrease in the success of the proposed approach.

Obtained general results
Before starting validation of R-RVFL, the parameters of ANN, RNN, RVFL, and R-RVFL were determined by trials and the optimum parameters were determined based on obtained test success rates. The obtained RMSEs for the forest fire dataset by RNN (the same parameters were obtained by ANN), RVFL, and R-RVFL are shown in Figure 3 as an example (note that # CN in Figure 3 refers to number of the context neurons). As seen in Figure 3, the obtained RMSEs were highly dependent on the structure of the network. Furthermore, in the ANN, RNN, and RVFL, determining the optimum number of neurons in the hidden layer and the optimal activation function is required. However, in R-RVFL, in addition to the number of neurons in the hidden layer and the activation function, the number of context neurons must be optimized. Therefore, the number of required trials in the optimization of R-RVFL is higher than in ANN, RNN, and RVFL, but still the number of parameters that must be optimized in R-RVFL, similar to both of the other employed RNN and RVFL methods, is less than the parameters in the ANN that was trained by backpropagation, which is a gradient-based method (e.g., learning rate, number of maximum epochs, stopping criteria) [2,3]. After optimization of each employed method, each particular dataset was classified/estimated based on cross-validation strategies. Obtained success rates for each dataset are reported in the Appendix (see Tables A1-A4) and obtained mean success rates are reported in Table 6. As seen in Table 6, the achieved mean success rates by R-RVFL are higher than the obtained success rates by other employed methods. Furthermore, lower mean RMSE was obtained in estimating in time series datasets according to the orders of the samples and this result supports the literature findings [10,11,19].
Consequently, as seen in this table, higher success rates were obtained by the recurrent form of the RVFL (R-RVFL) compared with RVFL. The main reason for this success is explained by feedback connections [10,11].
These feedback connections (i.e. context neurons, delays) proceed as dynamic memory [8], and this dynamic memory yields a higher modeling capability [8,9,12]. Additionally, it was stated by Alanis et al. that because of this property even difference equations that could not be modeled by feedforward ANNs can be modeled [12]. Furthermore, in [13], it was reported that recurrent links enhance the ability of mapping nonlinear dynamics and especially modeling nonlinear real-time variables. In order to investigate optimized network complexities, the mean number of neurons in the hidden layer, mean number of context neurons, and most common (mode) activation functions are given in Table 7.  As seen in Table 7, the mean of the required number of neurons in the hidden layer by R-RVFL is lower than the required number of neurons in the hidden layer by other employed methods. Furthermore, a correlation between optimum activation function and the employed method could not be found. Additionally, the computational costs of R-RVFL and other employed methods are given in Table 8 in terms of mean process time (s).
Although it can be seen in Table 8 that the means of the required number of neurons in the hidden layer are lower than in the other methods, processing times in both training and test stages by RVFL are higher than in the other employed methods. The reason for this is the sequential computation algorithm of R-RVFL, which is because of the recurrent links. Furthermore, as explained before, an extra number of trials is required in order to optimize R-RVFL. Even though R-RVFL requires higher processing time, the computational cost of R-RVFL is still in an acceptable range based on obtained results by ANN (see Table 8). The reason for this is explained in the literature as randomly assigning some parameters and analytically calculating others instead of optimizing them by tuning (e.g., backpropagation, Levenberg-Marquardt methods) [4].

Conclusion
According to the results obtained in this study, it can be said that the recurrent links boosted the network performance but on the other hand they also increased the computational cost. The reason for obtaining higher accuracies was addressed in the literature as the context neurons that can be associated with memory provide higher nonlinearity. Therefore, this yields an increase in the adaptability of the machine learning method, and in this way, even dynamic systems can be modeled. Furthermore, based on the relationship between the control systems and the machine learning methods, this study showed that higher accuracies can be obtained by a recurrent model based on cascaded control systems (using inner and outer feedbacks together) compared to traditional recurrent models that used only outer feedbacks.