PARALLEL MESHLESS RADIAL BASIS FUNCTION COLLOCATION METHOD FOR NEUTRON DIFFUSION PROBLEMS

: The meshless global radial basis function (RBF) collocation method is widely used to model physical phenomena in science and engineering. The method produces highly accurate solutions with an exponential convergence rate. However, due to the global approximation structure of the method, dense node distributions lead to long computation times and hinder the applicability of the technique. In order to overcome this issue, this study proposes a parallel meshless global RBF collocation algorithm. The algorithm is applied to 2-D neutron diffusion problems. The multiquadric is used as the RBF. The algorithm is developed with Mathematica and eight virtual processors are used in calculations on a multicore computer with four physical cores. The method provides accurate numerical results in a stable manner. Parallel speedup increases with the number of processors up to five and seven processors for external and fission source problems, respectively. The speedup values are limited by the constrained resource sharing of the multicor e computer’s memory. On the other hand, significant time savings are achieved with parallel computation. For the four-group fission source problem, when 4316 interpolation nodes are employed, the utilization of seven processors instead of sequential computation decreases the computation time of the meshless approach by 716 s.


INTRODUCTION
Meshless methods emerged in the 1970s for solving astrophysics problems (Lucy, 1977).Since their appearance in the literature, these methods were used to numerically solve partial differential equations (PDE) encountered in different branches of physics and became an alternative to classical mesh-based approaches such as the finite element method (FEM).The fundamental difference between meshless and mesh-based methods is that the discretization nodes of a meshless approach can be generated without any preliminary definition.This paved the way for the development of more flexible algorithms than mesh-based methods.
Based on their formulation, meshless techniques are classified into three groups (Liu, 2010): (i) Within the strong-form methods, the shape functions are substituted into the PDE and boundary conditions.The RBF collocation method, which was first developed for the solution of hydrodynamics problems by Kansa (1986), is the most prominent member of this class.(ii) On the other hand, the governing equation is multiplied by a weight function and the resulting expression is integrated over the entire problem region when a weak formulation is adopted.The integration weakens the continuity requirements of both the shape and weight functions.The radial version of point interpolation (Liu and Gu, 2005), Element free Galerkin (EFG) (Belytschko et al., 1994), and meshless local Petrov-Galerkin (MLPG) (Atluri and Zhu, 1998) methods are the most well-known weak-form based meshless approaches.(iii) Finally, within the hybrid methods, strong and weak-form algorithms are used together for different subregions of the problem to benefit from the advantageous characteristics of the two main classes.
Weak-form methods are inherently more stable than strong-form methods.Duan (2008) carried out a comparative analysis between RBF based strong-and weak-form meshless algorithms and showed that the use of RBFs in a weak formulation rather than the strong-form approach reduces the solution matrix's condition number by one order of magnitude.On the other hand, the strong-form RBF collocation method is an exponentially convergent method and can yield highly accurate solutions with fewer discretization nodes than other meshless and meshbased methods (Li et al., 2003).Another important advantage of strong-form methods is that a background mesh is not required since there is no integration in their formulation procedures.Therefore, strong-form methods are truly meshless methods.
In the global RBF collocation approach, the computation times increase rapidly with the number of discretization nodes due to the fully populated structure of the collocation matrix.In this context, the use of parallel algorithms is a must to avoid long computation times.There exist several studies on the parallel implementations of meshless approaches.In one of the earliest works, the free mesh method was used in a parallel manner to solve an incompressible fluid flow problem (Shirazaki and Yagawa, 1999).Induced damage simulations of composites were carried out with the parallelized smoothed particle hydrodynamics (SPH) by Medina and Chen (2000).Parallel SPH was also used to model fluid flow problems with a high speedup performance (Ferrari et Crespo et al., 2015).The parallel reproducing kernel particle method (RKPM) was used to model structural mechanics problems (Danielson et al., 2000) and supersonic flow over a NACA airfoil (Günther et al., 2000).Essential boundary conditions were implemented on a single processor by Danielson et al. (2000) and Günther et al. (2000).A hierarchical enrichment method was proposed by Zhang et al. (2002) for the complete parallelization of the RKPM to carry out 3-D CFD studies.In another work, the parallel version of the partition of unity approach was used to solve elliptic PDEs (Griebel and Schweitzer, 2003).
3-D transient heat conduction problems were solved with a coupled method of fundamental and particular solutions approach by Ingber et al. (2004) where parallel domain decomposition was applied to stabilize the interpolation matrix and decrease the computation time.The EFG method was applied to heat transfer (Singh and Jain, 2005a) and fluid dynamics (Singh and Jain, 2005b) in a parallel manner.The computational efforts showed that parallelization improved the speedup and efficiency of calculation as the number of discretization nodes increased, where speedup is the ratio of wall clock times of sequential to parallel computation and computational efficiency is the ratio of speedup to number of processors.However, the efficiency decreased as more processors were employed in computations.Parallel RKPM was used for bulk forming simulation by Hu et al. (2007a), where the Taylor bar, back extrusion analysis, and wheel forging analysis problems were solved with the meshless method.Unlike the general trend of monotonic decrease with increasing processor number, the parallel computation efficiency had its lowest value when 16 processors were used in calculations for all the cases.The back extrusion and wheel forging cases were also studied with the parallel radial point interpolation approach (Hu et al., 2007b), and the results indicated that the computation efficiency decreased with the number of processors for the wheel forging problem, while a minimum efficiency was obtained with 32 processors for the back extrusion analysis.
The complexity of the parallel MLPG method were studied and compared with FEM and finite difference method (FDM) by Trobec et al. (2009).The 2-D time-dependent diffusion equation was solved, and parallel computation efficiencies of 70-80% were found in the numerical experiments.A parallel RBF interpolation algorithm having () complexity (i.e., the calculation load increases linearly with the number of interpolation nodes) and storage requirement was proposed by Yokota et al. (2010).Gaussian RBF was chosen as the radial function and using a supercomputer the authors reported parallel efficiencies of 84% and 79% for strong and weak scalings, respectively.A node pair-wise approach was proposed by Karatarakis et al. (2013) to decrease the computation time for constructing the stiffness matrix of the EFG.2and 3-D elasticity problems were solved in a parallel manner and significant speedup values were found in the numerical experiments.RBF collocation method was employed locally to model twophase incompressible fluid flow (Kelly et al., 2014).The results of sequential and parallel computations showed that speedup factors improved as the interpolation nodes were refined.The meshless finite point method (FPM) was compared with the FEM for modeling 3-D aerodynamics by Ortega et al. (2014).The FPM was parallelized and simulations were carried out on a multicore computer.The speedup of parallel computations was limited due to the constant memory bandwidth and forced sharing of processor resources.
The local RBF collocation method was used in a parallel manner to solve a natural convection problem (Kosec et al., 2014).The simulations showed that the computation time could be decreased significantly with the parallel implementation of the meshless approach.In another CFD application, a parallel least squares fit based meshless method was used to solve compressible flow problems (Ma et al., 2014), where Hilbert space filling curves were used to enhance the algorithm for randomly distributed nodes.Later, the same researchers used the parallel meshless dynamic point cloud method to model compressible flow problems with moving boundaries (Ma et al., 2015).The results of Ma et al. (2014) and Ma et al. (2015) showed that the computation time and speedup depend strongly on the type of processor.The approach presented by Ma et al. (2014) was further improved with an implicit method and extended to solve 3-D cases by Zhang et al. (2018a).
In order to model elasticity problems with meshless and hybrid FEM-meshless methods, a parallel algorithm is presented by Ullah et al. (2016).Parallelization of the methods decreased the computation time, however, the computation efficiency decreased as the number of processors increased.Inviscid compressible flow was modeled with the meshless weighted least squares curve fit (WLSCF) method by Cao et al. (2019).A parallel algorithm was used with multi-layered point reordering to deal with the negative impact of irregularly distributed nodes on the memory of the processor unit.In another work, the parallel meshless WLSCF method was used to model turbulent flows (Zhang et al., 2020), where high levels of speedup were obtained for 2-and 3-D problems.A functional programming based parallel algorithm for meshless methods was presented by Barbosa et al. (2021).The MLPG method was chosen as the meshless approach, and it was found that an increment in number of processors improved the speedup, while it had a negative impact on the computation efficiency.A Poisson disc sampling based parallel domain discretization approach was proposed by Depolli et al. (2022) for the hybrid RBF-FDM and meshless methods.In a recent work, the discrete least squares meshless method was used within a parallel computation strategy for the numerical solution of PDEs (Sefidgar et al., 2022).Poisson equation, dam-break, and sediment transport problems were studied, and it was found that the parallel computation efficiency decreased with an increasing number of processors.Finally, the meshless solutions of neutron diffusion and transport equations were presented by Rokrok et al.The literature review showed that there exist no works on parallelization of the global RBF collocation.Therefore, the novelties of this paper can be expressed as follows:  In this study, a parallel meshless global RBF collocation algorithm is proposed for the first time. The method is used to solve the 2-D multigroup neutron diffusion problems.To the best of author's knowledge, the parallel global RBF collocation method is implemented to model neutron diffusion for the first time. Three problems are solved with the developed parallel technique, and the impacts of the number of interpolation nodes and processors on the speedup and computational efficiency of the algorithm are investigated in detail.

MATERIAL AND METHODS
In this section, first, the meshless global RBF collocation method and its application to the multigroup neutron diffusion equation will be described.Then, information on the parallel implementation of the proposed algorithm within Mathematica will be provided.

Global RBF Collocation Solution of the Neutron Diffusion Equation
Consider a PDE and its boundary conditions in operator form: 1),  is a partial differential operator,  is a partial differential and/or algebraic operator,  is the dependent variable,  and  are driving functions, and  = (, ) are spatial Cartesian coordinates.In order to solve Eq. ( 1) with RBF collocation, firstly, interpolation nodes with   members on the domain and   members on the boundary are generated as where  and  are the sets of interpolation nodes for  and , respectively.When the problem includes Neumann type boundary conditions, method's accuracy can be enhanced by creating interpolation nodes outside the problem region, and solving the PDE on the boundary (Fedoseyev et al., 2002).This approach is used in this study and , with   members, represents the interpolation nodes generated outside the problem domain.Hence the total number of interpolation nodes is   =   +   +   .The nodes can have a uniform or random distribution.
Following the generation of the interpolation nodes, the field variable of the PDE is approximated with a finite series of RBFs: In Eq. ( 3),  is the RBF and the coefficients   ,  = 1, … ,   are calculated at the end of the numerical solution.Approximating the field variable of Eq. ( 1) with Eq. ( 3), and collocating at   gives ∑   (  ) where   = (  ),   = (  ),   = (  ) and the elements of the collocation matrix , and the vectors  and  are determined by The solution of Eq. ( 5) yields the coefficients   ,  = 1, … ,   , and thus the numerical result.Interpolation nodes can be used as the collocation nodes or a different set of nodes can be generated for collocation.In this work, the same nodes are utilized for both interpolation and collocation.The collocation matrix  is a fully populated matrix in the global RBF collocation method.By defining a support domain for all the interpolation nodes that cover the nearest neighbours of   , the approach can be localized.The localization of the method leads to a sparse collocation matrix with higher stability, however, it also degrades the accuracy and the convergence characteristics of the global method.
Although various radial functions have been proposed, the most widely utilized radial function for RBF collocation is the multiquadric function, which was first proposed by Hardy (1971) for approximating geographic surfaces.Multiquadric is defined by In Eq. ( 7),  is the shape parameter and has a substantial impact on the numerical solution.Theoretically, it is shown by Madych (1992) that for function interpolation, the numerical error would vanish as  → ∞ if the calculation could be performed with infinite precision arithmetic.In practice, increasing  decreases the approximation error and leads to a faster convergence, however, it also has an adverse effect on the stability.In fact, the shape parameter has an optimum value that provides a balance between accuracy and stability.
The neutron diffusion equation is a PDE that governs the interaction of neutrons with various materials in a nuclear reactor core.Modeling this behavior in an accurate and computationally efficient manner has a crucial role in the design of nuclear power plants.Within the multi-energy group approach having  energy groups, the operators, field variable, and right-hand side functions of Eq. ( 1) take the following form for the time-independent neutron diffusion equation in 2-D rectangular geometry: where   , is the neutron flux distribution,   is the diffusion constant, Σ , is the removal cross section,   is the fission spectrum function,   is the number of neutrons released per fission for group , Σ , ′ → is a group-to-group scattering cross section representing the probability that a neutron would scatter from group  ′ down to group ,  is the multiplication factor, ∇ 2 is the Laplacian,   ⁄ represents the derivative in the normal direction, and  is the iteration index.  and   represent the Dirichlet and Neumann type boundaries, respectively.Two types of problems are encountered in nuclear reactor physics.If there is no fissile material (i.e.nonmultiplying medium), the neutrons are supplied by an external source,   , and the neutron diffusion equation is solved directly with  =   .On the other hand, if the medium is multiplying, then the coupled  PDEs are solved with fission source iteration (Tanbay and Ozgener, 2014).

Parallel Computation with Mathematica
The parallel meshless RBF collocation method is implemented with an in-house code that is newly developed with Mathematica.Calculations are performed with a personal computer having a multicore Intel i5 1.60GHz processor with 4 cores and 8 threads.The processor architecture is x86-64, and the random access memory capacity is 8GB.
Mathematica uses the Wolfram Symbolic Transfer Protocol (WSTP) for processor communication.Resources are shared through a virtual shared memory and the message passing is based on the WSTP.By default, the number of processors utilized in calculations is equivalent to the number of physical cores.However, additional virtual processors can be activated by using the built-in function LaunchKernels[  ] where   represents the desired number of virtual processors (kernels in Mathematica's terminology).Since the processor has 8 threads, in this work 8 kernels are used in calculations.
RBF collocation solution of a PDE consists of three main steps: 1) Generating the discretization nodes of Eq. ( 2) utilized for interpolation and collocation 2) Calculating the elements of the collocation matrix,   and the elements of the source vector,   of Eq. ( 6) 3) Solving Eq. ( 5) to obtain   The generation of the discretization nodes and the calculation of   and   with the developed parallel algorithm are performed with the built-in function ParallelTable.The arguments computed with the ParallelTable function are distributed to the kernels with another built-in function, DistributeDefinitions.The linear system is solved with the LinearSolve function, which employs efficient algorithms and does not have an effect on the parallel computation efficiency.

RESULTS AND DISCUSSION
The performance of the parallel RBF collocation algorithm is investigated through the solution of three test cases.A single energy group external source case, and two multigroup fission source cases are considered.Dirichlet type boundary conditions exist at  ∈ (, ) ∪ (, ), while Neumann type boundary conditions exist at  ∈ (0, ) ∪ (, 0) for all the problems.Uniformly distributed interpolation nodes with a spacing of ℎ are used, and the members of  are located at a distance of ℎ from the boundaries.Parallel speedup and efficiency are defined as while the accuracy of fission source problems is examined with the relative error in multiplication factor where  and  refer to numerical and analytical, respectively.The performance of the method can be studied with both constant and node number dependent shape parameter strategies.With the variable shape parameter strategy,  is decreased with the number of nodes as  =    1 2 ⁄ ⁄ .Decreasing  with   has a stabilizing effect on the method and it was efficiently used for neutron transport equation (Tanbay and Ozgener, 2020).

External Source Problem
The following trigonometric source is considered for the external source case which yields an analytical neutron flux distribution expressed by where Σ  = Σ  .The geometric and nuclear parameters are chosen as  = 0.25, Σ  = 1.43676  −1 , and  = 0.0177764 (Tanbay and Ozgener, 2014).Figure 1 shows the variation of RMS and maximum errors in neutron flux with the number of interpolation nodes on a semi-logarithmic scale.These results are obtained with a constant shape parameter of  = √0.01.The convergence curves indicate that the parallel RBF collocation algorithm produces accurate flux distributions in a stable manner.

Two-group Fission Source Problem
For the two-group problem, the domain size is  = 0.25 and the nuclear constants are presented in Table 1, which yield a multiplication factor of   = 1.96413.A convergence parameter of 10 −6 is used for the iterative solution.Figure 3

Fission Source Problem
The third and final case is a four-group problem for which  = 0.5.The nuclear parameters are provided in Table 2, where  and Σ have dimensions of  and  −1 , respectively.These data yield an exact multiplication factor of   = 0.87227.The numerical solutions are obtained with 13 iterations where the iteration convergence parameter is 10 −6 .The parallel approach produces accurate and stable results as illustrated with the convergence curve in Figure 5, which is obtained with  =    1 2 ⁄ ⁄ .

Figure 5: Convergence of the four-group fission source problem
The wall clock time, speedup, and computational efficiency of the parallel algorithm for the four-group case are presented in Table 3 for three sets of interpolation nodes.The speedup of the method improves as   increases except at   = 8 where overhead occurs due to communication.Although the speedup is limited by the multicore processor, the results show the significance of parallel computation.For instance, the wall clock time of the RBF collocation decreased by 716 s for   = 4356 when 7 virtual processors are utilized instead of sequential computation.

CONCLUSIONS
The global radial basis function collocation method is an exponentially convergent and highly accurate technique to solve partial differential equations.However, the fully populated collocation matrix of the approach increases the computation time in a fast manner as the interpolation node set gets denser.Therefore, parallelization of the method is imperative for cases that require a large number of discretization nodes, such as multi-region configurations where the dependent variable may change sharply between different regions, to reflect the physical phenomena accurately.
In this work, a parallel meshless global RBF collocation algorithm is developed and it is utilized for the solution of the 2-D neutron diffusion equation.Multiquadric is employed as the RBF and three problems, including a one-group external source case, a two-group fission source case and a four-group fission source case, are solved with the proposed algorithm.The speedup and computation efficiency are used as the criteria to assess the performance of parallelization while maximum and RMS errors in neutron flux distribution and relative error in multiplication factor are considered to measure accuracy.The RBF collocation method produces accurate and stable numerical solutions.Eight virtual processors are used in the numerical experiments, and the results show that the speedup of the method improves up to five and seven processors for the one-group external and multigroup fission source problems, respectively.Although speedup values are limited by the memory constraint of the multicore computer, the parallel approach provides significant time savings for CPU-intensive problems.
In recent years, graphics processing units (GPU) become indispensable hardware elements for modeling complex physical phenomena.Parallelization of meshless methods with the aid of GPU has resulted in high levels of speedup and efficiency in computational fluid dynamics applications.Therefore, future works can be performed on implementing the parallel global RBF collocation method with GPU clusters.High performance computation with this parallel meshless technique has the potential to produce exceptionally accurate solutions for 3-D, multi-region nuclear reactor physics problems.

Figure 1 :
Figure 1: Variation of   and   with   for the external source case Speedup and efficiency of parallel computation of the one-energy group external source problem are presented in Figure 2 for four sets of interpolation nodes.Speedup of the computation improved with the number of processors up to   = 5 for all values of   .The communication between the processors becomes apparent for   ≥ 6 which leads to an asymptotic behavior of the speedup.Similar to the findings of Ortega et al. (2014), the speedup is limited due to the fact that a multicore computer was used for the calculations.The efficiency of the parallel algorithm decreases as   increases, and such a trend is frequently observed in parallel implementations of numerical methods.

Figure 2 :
a. Speedup and b. efficiency of parallel computation for the external source case demonstrates the variation of   with   where  =    1 2 ⁄ ⁄ , and the results are found with 29 iterations.The error values clearly display the accuracy and stability of the meshless algorithm.

Figure 3 :
Figure 3: Convergence of the two-group fission source problem