Machine Coded Compact Genetic Algorithms For Real Parameter Optimization Problems

In this paper, we extend the Compact Genetic Algorithm (CGA) for real-valued optimization problems by dividing the total search process into three stages. In the first stage, an initial vector of probabilities is generated. The initial vector contains the probabilities of bits having 1 depending on the bit locations as defined in the IEEE-754 standard. In the second stage, a CGA search is applied on the objective function using the same encoding scheme. In the last stage, a local search is applied using the result obtained by the previous stage as the starting point. A simulation study is performed on a set of well-known test functions to measure the performance differences. Simulation results show that the improvement in search capabilities is significant for many test functions in many dimensions and different levels of difficulty.


Introduction
Genetic Algorithms (GAs) are search and optimization techniques that mimic the natural selection and principals of genetics (Holand, 1975;Goldberg and Holland, 1988;Sastry, 2014). In GAs, a population of random solutions are generated and assigned to fitness values. A fitness value is a measure of the quality of a candidate solution. Well known genetic operators such as crossover and mutation are applied on the selected candidate solutions which have higher fitness values to generate new population members called offspring. After many steps, it is expected that the generated population will have higher average fitness than the one generated in former iterations (Goldberg, 1989).
Estimation of Distribution Algorithms (EDAs) form another family of GAs in which a vector of probabilities are used to generate candidate solutions by sampling rather than a population of candidate solutions (Pelikan et al., 2015). PBIL (Populationbased incremental learning) is an earlier member of EDAs that consists on creating a population of candidate solutions by sampling and updating the vector of probabilities using some best solutions (Baluja, 1994). A vector of probabilities is initially created as [0.5 0.5 . . . 0.5]. th element of the vector represents the probability of ( = 1) where is the th bit of the candidate solution, = 1,2, . . . , , and is the number of elements. The best solutions are selected from the population of size to update the vector of probabilities. The aim of the update process is to increase or decrease the probabilities towards to best solutions for generating better solutions in following iterations. After many steps, it is expected that the elements of the probability vector approach to either zero or one. The final solution is a bit string which is considered as the optimum.
Compact Genetic Algorithms (CGAs) are the other branch of the EDA family (Harik et al., 1999). CGAs are compact as they are not based on a population and require less computer memory to run. This property of CGAs makes the hardware implementation possible in devices with low resources (Aporntewan and Chongstitvatana, 2001). In each step of the algorithm, two candidate solutions are sampled using the vector of probabilities. Depending the fitness values, the best candidate solution is labeled as the winner. If th gene of the winner is 1, then the th element of the probability vector is moved towards to 1 with the amount of 1 popsize where = 1,2, . . . , , is the chromosome length, and popsize is the population size. If the th gene of the winner is 0, then the amount of mutation is negative, that is, the th element of the probability vector is moved downwards to 0. These operations are repeated until all of the elements of the probability vector are either 0s or 1s. If the popsize parameter is large then the amount of mutation is low, that is, more computation time is needed to get a fully converged probability vector. When the popsize is small, then the changing steps are large, convergence rate is high but the result is generally a local optimum because the search space is not well explored. Since the parameters of the crossover probability, mutation probability, population size, number of generations, and crossover and mutation types are not needed, CGAs are parameterless. The popsize parameter is only about the mutation of probability elements and it is not really defines the number of candidate solution as in GAs.
Classical GAs and CGAs represent the search space using bits. Addition to this, both GAs and CGAs are extended to use other types of encoding systems such as integer encoding, floating-point or real-valued encoding, permutation encoding, machinecoding, etc. PBIL and CGA are mainly developed for the binary encoding of variables.
Since it is possible to encode real values as bits, these algorithms can also be applied on the real valued optimization problems. Besides this, some new sampling schemes are based on sampling values using some probability distributions and mutating the distribution parameters during iterations (Sebag and Ducoulombier, 1998;Mininno et al., 2008).
In this paper, we devise a new CGA based algorithm for the real valued optimization problems. The encoding of variables is based on binary encoding but the IEEE-754 transformation is used to separate the sign, the exponent, and the mantissa parts of a real value as stored in computer memory. The algorithm starts with an adjusted probability vector. The adjusted vector is the vector of probabilities in which the corresponding elements represent the probability of bits having value of 1 depending on the locations of bits in the IEEE-754 standard. After obtaining the adjusted vector, the usual CGA search is performed. Finally, a local search is applied to obtain more precise solutions.
In Section 2, we present the algorithm in great detail. In Section 3, an example is given to demonstrate results of the each phase applied on a well known test function. In Section 4, we perform a simulation study to measure the performance differences between the original and the extended algorithms. Finally, in Section 5, we conclude.

The Algorithm
The extended algorithm is mainly based on three steps. In first step, an initial vector of probabilities is generated using the IEEE-754 encoded bits of variables. The initial vector of probabilities does not necessarily have 0.5 in each elements. In the second step, a CGA search is performed using the same encoding scheme of real values. In the last step, a local search is applied to obtain more precise solutions. These steps are defined in Section 2.1, Section 2.2, and Section 2.3.

Encoding of variables
Digital computers store and represent the data using bits. Since bits are numbers in base 2, it is straightforward to express integer numbers by combining many bits. Representing rational numbers is also possible using a finite number of bits. However, representation of real or irrational numbers requires a discretization process (Goldberg, 1991). Emphasizes that success of a GA search is related to the building blocks represented by bits as proved in Schemata theorem. Since there is not a distinction between phenotype and genotype of variables, real valued GAs are blocked in later iterations.
IEEE-754 is a standard for encoding and decoding real numbers using fixed number of bits in computer memory (IEEE, 2008). In this standard, bits of a 32-bit floating number is divided into three parts. The first part is 1 bit length and defines the sign of the number. The following 8 bits form the exponent part and the remaining 23 bits form the mantissa. Finally a 32 bit floating-point number is defined as where is zero if the number is positive. Table 1 shows an example of how the bit representation is changed when a single digit is changed. Since there are 2 32 possible Alphanumeric Journal Volume 8, Issue 1, 2020 representations, the first bit divides the total number of possibilities by 2. It is also shown that the exponent part remains same when a small change is occurred in the number in some cases. In contrast, the numbers 12345.6789 and 02345.6789 have several differences in both exponent and mantissa parts even they differ in a single digit. Consequently, numbers sampled in a predefined range have some patterns in both sign, exponent, and mantissa parts.
Using machine based transformations as the encoding scheme is not new in evolutionary optimization context. Budin et al., (2010)  because half of the values lies above the zero. The interesting part is the exponent as the values in the predefined range possibly have 1s in the first bit whereas the second bit is generally zero. Note that in the mantissa part, most of the bits can be either 0s or 1s with the probability of 50% except the first one since the probability of first bit having 1 is 36%. The probability vector of the exponent part is given in In CGAs, initial elements of the probability vector are both set to 0.5. As it is mentioned before, this assumption is not really needed and the some parts of the search space is not needed to be explored. Addition to this, some bits can be either 0 or 1, however the probabilities of having 0 or 1 are not equal in some cases. In the first Alphanumeric Journal Volume 8, Issue 1, 2020 part of the devised algorithm, the adjusted probability vector is generated before the genetic search in order to prevent moving around the unnecessary parts of the search space. When the range of the variable is defined as −∞ < < ∞ then all of the elements of the probability vector are 0.5. In this special case, the proposed algorithm does not start with an initial probability generating process.

Generating initial probability vector
As mentioned in Section 2.1, the proposed algorithm is based on the IEEE-754 transformation on the real values of 32 bits. Since any bits of an real value can be 1 with probability of 1 2 in a range of −∞ < < ∞, probabilities for some bits can be different in a more narrowed range. For instance, ( 1 = 0) is always zero for the range of 10 ≤ ≤ 100 whereas ( 1 = 1) is 1 for any range with both elements are negative, where is the th bit of the IEEE-754 representation.
The proposed algorithm estimates the probability vector by generating the empirical probabilities of having ( = 1). Algorithm 2.2 shows the whole process for a single variable using a pseudo-code. The process can be repeated for the other variables in the multivariate case. The algorithm generates a probability vector of size 32 for a single real variable. If the bounds of the variable is defined as [ , ], B samples are sampled using a Uniform distribution with parameters minval and maxval. The encode() function gets a real number as argument and returns the IEEE-754 representation. In the iteration , the bit vector is appended at the th row of the result matrix. Finally, the column means of the matrix is returned. The parameter B can be selected manually. In Section 4, we selected the value of 10 5 for the B parameter.

The hybrid compact genetic search
The devised method performs a genetic search using the algorithm given in Algorithm 2.3. The algorithm uses a vector of probabilities generated using the Algorithm 2.2. In each step, two candidate solutions are sampled using the probabilities. Assuming the goal function is subject to be minimized, the winner is the candidate chromosome with lower cost value. These parts of the genetic search are almost same with the Alphanumeric Journal Volume 8, Issue 1, 2020 CGAs expect the decode() part. decode() receives × 32 bits as input and returns a vector of real values which are decoded using the IEEE-754 transformation.

Algorithm 2. Machine Coded Compact Genetic Algorithm
Note that the function decode() uses the single precision version of IEEE-754 which spans 32 bits in the computer memory. The double precision version of the specification represents a wider range of numbers as it spans 64 bits. However, working with longer bit strings reduces the performance drastically.
In GAs, and generally in some evolutionary optimization algorithms, genetic operators perform the search by Exploration and Exploitation (Chen et al., 2009). After performing a genetic operator, a new solution can be created in a different location of the search space which covers the global optimum. On the other hand, the newly generated solution can fall a location close to the global optimum which is generated using two best solutions in the population. Shortly, the processes of searching the new areas and performing local fine-tuning are executed in parallel. The balance between these two vital tasks must be calibrated.
In some cases, a genetic search can terminate by reporting a good solution which is not the global optimum because of lack of a lucky mutation or crossover operation or overshooting due to wrongly selected adaptive probabilities. In other words, a GA search can find a nice solution around the global optimum and a local fine-tuning operation may be required for finding the ideal solution.
Alphanumeric Journal Volume 8, Issue 1, 2020 Hybridization of search algorithms is applied in several ways by combining at least two optimization algorithms. Gonçalves et al. (2015) improved the result obtained by a genetic algorithm using a local search optimization tool to prevent getting stuck on a local optimum. Kim et al (2007) combined a genetic algorithm with a particle swarm optimization tool in the run-time for searching the global optimum of multimodal functions. Arakaki and Usberti (2018)  In our proposed method, we apply a Hooke and Jeeves local search for improving the result obtain by the CGA defined in Algorithm 2.3. The whole algorithm is given in Algorithm 2.3.

Algorithm 3. Machine Coded Compact Genetic Algorithm with Hybridization
The algorithm given in Algorithm 2.3 is mainly based on three steps. The −variable objective function defined as : ℛ → ℛ is transformed using the IEEE-754 standard and redefined as : ℬ ×32 → ℛ where ℬ is the binary space. In the first stage defined in Algorithm 2.2, the initial vector of probabilities is generated. Depending on the range of the variables, some probabilities in this vector equal to 0, 1, or any value within the range. If the corresponding probability is either zero or one, the search space is divided and the remaining effort is performed on the other elements of the vector. The final probabilities are then used in the MCCGA (Machine-coded Compact Genetic Algorithm) stage given in the Algorithm 2.3. This stage is almost as same with the original CGA algorithm except the initial vector of probabilities and the decoder function. In this stage, the vector of probabilities is used for generating new chromosomes by sampling. The decoder is applied to evaluate the objective function using the binary representation standard. After all of the probability elements are converged to either 0s or 1s, the stage is terminated. The final × 32 bits are decoded Alphanumeric Journal Volume 8, Issue 1, 2020 into the real values and a Hooke and Jeeves search is started. The reported solution is expected to be the global optimum.

An Example
The Chichinadze function is defined as ( , ) = 2 − 12 + 11 + 10cos( /2) + 8sin (5 )   Since the selected range is a considerably small zone of the whole floating-point representation space, some bits of the encoded variables tend to take the value of either 0 or 1 with higher probabilities. Figure

Simulations
We perform a simulation study to compare the search capabilities of CGA and the developed algorithm. We use a suit of test functions reported in (Mishra, 2006). This set of test functions is used to measure the performance differences of well-known optimization techniques in the literature. The simulations are repeated 1000 times for each configuration. Since the popSize parameter affects the performance, it is selected as 10, 20, 50, 100, and 200. For the subset of functions defined for ≥ 2 variables, simulations are performed for = 2, = 10, and = 25. Simulation results for = 2 are reported in Table 3-6.
The table columns represents the value of popSize parameter, arithmetic means and standard deviations of reported objective values by algorithms, and p-values. We applied a 2-sample Wilcoxon Test (Mann-Whitney) for independent samples to test equality of location parameters of two populations. Small p-values indicate that we can safely reject the null-hypothesis of 0 : 1 = 2 in contrast to : 1 ≠ 2 where 1 and 2 are location parameters of distributions related to the corresponding objective values. NA values are generated when the algorithms produce exaclty the same results and NAs can be interpreted as the p-value of 1. It is shown in Tables 3-6 that almost all of the p-values are small and this can be accepted as a general evidence for performance inequality of these methods. CGA outperforms the developed algorithm for Cross leg popSize parameter increases the performance also increases but the difference is more apparent for the developed algorithm. Interestingly in some functions, for example Bird function, average performance is decreased as the popSize is increased but the standard deviation is also increased. That means the performance is badly affected and includes fluctuations but it reports solutions that close to the global optimum in some iterations. This exception also designates that increasing the popSize does not necessarily mean having a better solution reported after choosing an unlucky random seed.

Conclusion
Each single bit of an IEEE-754 encoded real value has a different impact depending on the location of the bit. As a result of this, some bits tend to be zero or one when a variable defined in a narrower range. Process of assigning 0.5s to elements of the initial vector of probabilities in CGAs does not consider these biases. In this paper we suggest to generate the initial vector of probabilities depending on the location of bits encoded by the 32-bits IEEE-754 standard. This special binary coding scheme is used elsewhere before and proved to be success in many works. In the second stage of the extension, a usual CGA search is applied on the objective function using the same encoding scheme. In order to improve the solutions obtained by CGA, Hooke-Jeeves algorithm is applied using the reported result as the starting point. An other local search method can be used instead, however, Hooke-Jeeves algorithm has many benefits including applicability in non-differentiable functions. When a good starting point is fed, the algorithm performs a fine-tuning operation to obtain a closer solution to the global optimum. We perform a simulation study using a set of well-known test functions to measure the performance differences. Simulation results show that the hybridized and machine-coded CGAs outperform the classical CGAs.