Performance Evaluations for OpenMP Accelerated Training Of Separable Image Filter

One of the widespread image processing applications is image filtering with two dimensional convolution. Determining the weights of image filters are of importance for the success of filtering operation. Heuristic algorithms such as genetic algorithms provide an efficient way of training these types of filters. Due to the high computational cost of repetitive image filtering operations, this process may take hours to implement using single core computing. OpenMP (Open Multi Processing) provides an efficient library for utilizing the computing power of multicore processors. In this study, OpenMP accelerated training of separable filters that are a subclass of convolution filters has been implemented based on genetic algorithms. Comparative speed-up results for various sizes of images using various sizes of filtering kernels were presented. Also the effect of population size of genetic algorithm and the number of working cores have been investigated.


Introduction
Image filters are widespread operators in image processing applications such as image enhancement, image smoothing, edge detection and noise elimination [1]. Linear filtering using two dimensional convolution or correlation is one of the main filtering operations. This is realized by applying the filtering kernel to each pixel of input image where the kernel is a matrix of weights. The size of the matrix can be 3×3 or 5×5 larger such as 21×21. If the kernel has symmetric properties, it can be expressed as the multiplication of a row and column vectors. This form is called as separable filter and it reduces the number of multiplication/addition operations. The values of the filter weights are determined according to the desired behaviour of the filter. Weights can readily be obtained using different analytical techniques [2][3]. In another approach, the kernel weights can be trained using the original and noisy image samples [4][5][6][7]. Heuristic algorithms provide an efficient way for the computation of the filter kernel weights [8][9][10]. One of the well-known heuristic algorithms is Genetic algorithm which provided its efficiency in various researches. Genetic algorithm is selected to train the weights of the separable filter. In the application of genetic algorithms, a fitness function is used to define the problem. In the present case, fitness functions is formed according to mean squared value of original and noisy images. For the computation of fitness function, intense multiplication and addition operations are carried out to obtain fitness value. Furthermore, computation time depends on the number of weights as well as the image size. During computations, fitness function is called at each iteration of the genetic algorithm. This significantly slow down the process and make the applications impractical. A method for the acceleration of the process is to utilize the computational power of multicore processor. For this purpose, OpenMP provides a useful tool for efficient use of the cores of a multicore processor. OpenMP helps distribute the computational load to defined number of threads. In the present study, OpenMP is utilized to accelerate the computation of fitness function. Fitness function is computed for all individuals in the population and these operation can be realized independently on processor cores. In the experiments, an eight core computer is used and the results are obtained against the number of cores to see the effect of the number of cores. Also various filter kernel sizes, image sizes and the population sizes used in the experiments to show the efficiency of the OpenMP based acceleration.

Separable Image Fılters
Separable image filters are used in a slightly different way from the non-separable filters. An example of non-separable image filter is shown by Fig. 1 which has size 3×3. The filter kernel has a total of 9 weights. This means that the image filtering process train weights number of genetic algorithms is 9. When the size of the filter kernel grows, it is increasing training time of genetic algorithms. For instance, 25 weights for the filter kernel with 5×5, 49 weights for the filter kernel with 7×7, 81 weights for the filter kernel with 9×9, etc. The growth of genetic algorithms filter kernel increases the training time. The 3×3 filter kernel used in separable image filter is shown in Fig 1. This filter kernel which horizontal and vertical vectors as shown in Fig. 2 in the separable image filter is used.   The product of this vector also gives the filter kernel shown in Fig. 1. Separable image filtering process can be divided into two stages. The first stage is to filter noisy image using one of the vectors. The second stage is to filter the resulting image from the first stage using the other vector. Therefore, image filtering process is completed in two stages. Separable image filter has the advantage of reduced number of weights over non-separable filter. For example, the number of filter weights to be trained in separable image filter for 5×5 is 10, while it is 25 for non-separable image filter.

Genetic Algorithms
Genetic algorithm is a search and optimization method which is based on natural selection [7,[11][12]. Genetic algorithms randomly generate multiple solutions. Bad solutions are eliminated in the next generation. Therefore, best solutions appear as the best solutions transferred to next generations. Genetic algorithm involves applying selection, crossover, mutation and Fitness calculations on candidate population which are initially formed randomly. A pseudo code illustrating the operation of genetic algorithms is shown in Fig. 3. In the present case, computationally most intensive part is calculating the value of the fitness function due to the image filtering operations.

OpenMP (Open Multi Processing)
OpenMP is an application programming interface (API) which provide opportunity parallel computing on multicore processors. The calculations are done on multi-core processor architectures OpenMP thanks to coequally distribute all core [13].
OpenMP Architecture is shown figure 4. OpenMP, compiler directives, runtime library and environment variables are comprised from. Programmers write the code to run concurrently by putting special comments in that codes.For instance "#pragma omp parallel". This study was parallelization of the fitness value calculating for "for" block. OpenMP's operation diagram is shown in Fig 5. OpenMP is identified one of the threads as main thread. Tasks are distributed in equal amounts other threads by the main thread. Due to the fact that this study was developed with C programming language, to use the OpenMP function "#include <omp.h>" as is included in the project [15]. Fig. 6 shows the area of the genetic algorithms parallelization process on the flow chart. This block is calculated fitness function value. This block contains computationally intensive mathematical operations. Therefore parallelization process is performed here.

Experimental Results and Discussion
Experimental studies on Windows Server 2012 Essentials™ 64bit operating system, Quad-Core AMD Opteron™ 2378 2.40GHz dual processor, 18GB Ram, have been working on computer servers. The algorithm is written in C programming language. In the experiments, 256×256, 512×512 and 1024×1024 with a pixel size images are used [16]. 3×3, 5×5 and 7×7 sizes filter masks are used for images filtering process. The developed algorithm was running 10 times for each image and at the end of working, these    Table 1 and Table 2 shows the computational times for 3×3 window using 100 and 200 populations respectively. Table 3 and Table 4 shows the computational times for 5×5 window using  Table 5 and Table 6 shows the computational times for 7×7 window using 100 and 200 populations respectively. All results show that as the number of cores increased, the computational times reduces significantly. Figure 7a to 7f show the graphical comparison of the results. Best acceleration rates are obtained for 256×256 image.

Conclusion
In this study, OpenMP with accelerated training of separable image filter were analysed. In the experimental results, various sizes of kernels, and images and population sizes were tested. According to the results, doubling the population size has an increasing effect when the speed-up values on the average. Increasing the kernel size doesn't change the results much. In general, the results show significant accelerations over single core running durations. For future studies the results will be obtained on a machine to see the efficiency limit of the number of cores.