Comparison of point cloud filtering methods with data acquired by photogrammetric method and RGB-D sensors

,


INTRODUCTION
3D Point Cloud (PC) plays an important role in creating and rendering solid models of physical objects. PC processing is an active research field because it is used in different research applications such as 3D reconstruction (Ahmadabadian et al. 2019), environmental mapping , signal processing (Aghababaee et al. 2019), object recognition (Garcia-Garcia et al. 2018) and pose estimation (Vock et al. 2019), drainage network determination (Gunen et al. 2019). 3D reconstruction applications are increasing with falling costs of computing platforms and improvements in 3D capture systems. Various technologies have been developed based on relatively different principles for acquiring highly accurate PCs from the physical structures of objects. Despite advances in PC capture technologies used to express the numerical equivalents of physical models, PCs suffer from noise due to instantaneous changes in atmospheric physical parameters and noise sources contained in the 3D capture method and equipment used. Therefore, in order to produce high-accuracy digital models of physical models, various noise types that contaminate the PCs should be filtered (Hou et al. 2012;Narváez and Narváez 2006) RGB-D sensors, Photogrammetric Methods, and Terrestrial Laser Scanner (TLS) are the mostly used PC obtaining methods and can be examined in two parts: active and passive methods. While commonly used TLS and RGB-D sensor are active methods, Optical Photogrammetric methods are passive methods (Oliveira et al. 2014). These three methods, which are frequently used in obtaining a PC, have different technical structures. The 3D spatial coordinates are measured in the local coordinate system depending on the direction and distance of the object to be measured according to TLS. Also, TLS is capable of capturing millions of points per second and effectively generating a 3D PC of large areas in a short period. Although it produces high accuracy and precision PC, TLSs have high investment cost. Photogrammetric methods define parallax between correspondence-points in the images of the scene and allow it to obtain spatial coordinates of points related to the extrinsic and intrinsic orientation parameters. Photogrammetric methods have been rapidly increasing in popularity due to progress in imaging technology and software (Ulvi̇ 2018). RGB-D sensors are compact systems consisting of an infrared camera and an RGB camera. Therefore, RGB-D sensors provide the possibility to obtain texture, like some TLS and photogrammetric methods, as well as depth map. RGB-D sensors are widely used in the production of indoor maps, especially with their programmable structures and cost (Amenta, 1999;Hoppe et al. 1992;Tölgyessy and Hubinský 2011).
PCs contain indispensable noise and outliers due to inadequate sensor limits, imperfect nature of the instruments, scene artifacts, presence of inadequate ambient conditions, and systematic errors. Depending on the system PC produced, the sampled discrete information should be processed to remove the noise (Wolff et al. 2016). The raw PC should be filtered to ensure further analysis and processing. In addition, PC filtering is employed to preserve existing details expressed by the PC, such as edge features and to get the smooth surfaces that are required to produce realistic digital models of physical objects (Cai et al., 2019). It is very difficult to recognize and interpret a PC in terms of human perception. So that, they can be converted to solid model surfaces, which is the differential surfaces of mesh model and the edge elements of these surfaces, using mesh models.
In recent years, many 3D filtering methods have been developed for denoising PCs. In general, noise suppression from literature has been done using two different approaches, data processing in the form of a PC and processing of data in the form of differential surface elements (Fleishman et al. 2003). Both approaches benefit from the topological relationships of the vertexes with their neighborhoods. In general, both approaches are based on moving vertex points according to certain criteria. Most of these are applied to the mesh and the lesser part is applied directly to the PCs. Point cloud filtering methods can be generally divided into neighborhood based, statistical based and projection based. Neighborhood based methods that use similarity information between point and its neighbors are the most used methods since they are effective and easy (Han et al., 2017).
The Gaussian Filtering (GF) computes Euclidean distances between the point of interest and its specific neighborhoods. Then, by using Gaussian weights produced with the help of distances, the current point is filtered (Adams et al. 2009;Wirjadi and Breuel 2005). In Median Filtering, the neighborhoods of the point of interest are determined depending on the distance. Then, the median point of the point is projected to a local plane and filtering is performed. The Moving Least Squares method is based on the recognition of the relevant parameter solutions to localized polynomial surfaces obtained by local measurement values. The general method used in the development of the average filtering is based on identifying the normalized mean vectors of the local normal vectors of the points adjacent to the point of interest. Then moving the corresponding point towards the local surface defined by the adjacent points (Gunen, 2017). The Shepard Inverse Distance Weighting (IDW) filtering is one of the basic methods used for filtering PCs is to project each point in the PC by defining the selected limited number of neighboring points. Fluctuated surfaces can be defined by the tensor products of the base functions (Babak and Deutsch 2009;Lu and Wong 2008). Plane-based filtering methods, such as Single Value Decomposition (SVD) Based Plane Fitting, produce fast results. However, they are not robust to noisy data. Evolutionary Computing methods supply better results in general than classical local plane fitting tools, such as the least square method, in the solution of a best plane fitting problem (Gunen, 2017;Kurban, 2014).
Evolutionary computation methods are stochastic search methods that are used effectively in solving different types of problems . The fact that they produce more successful results than classical methods in solving complex problems such as PC cloud filtering motivated the design of a new Evolutionary computation-based 3D spatial filtering.
The PC datasets used in this paper were produced by using TLS, RGB-D sensor, and Photogrammetric method. The PC produced by TLS was determined as reference data, due to its inherent properties. Shepard Inverse Distance Weighting Method (IDW), Gaussian Filtering Method (GF), Single Value Decomposition Based Filtering Method (SVD) and Optimization Based Plane Fitting by using Backtracking Search Optimization Algorithm (BSA) Method with different number of neighborhoods were used to filtering PCs produced by using RGB-D sensor and Photogrammetric method.
The rest of this paper has been organized as follows; Section 2 presents Data Collection, Material and Methods are presented in Section 3, Experimental Results and Discussion are given in Section 4.

DATA COLLECTION
In this section, general 3D data capture principles of the TLS, RGB-D, and Photogrammetric techniques have been analyzed comparatively.

Terrestrial Laser Scanner
In recent years, with the changing and developing technology, laser technology has reached a very advanced level. Capable of capturing thousands of points per second, TLSs can produce data at the desired quality and time, from small objects to large areas without being noticed day or night. In addition to the brand and model of TLS devices using Light Detection and Ranging (LIDAR) technology, the resolution and quality parameters used in scanning affect the PC's spatial coordinate (Sevgen 2019;Yu et al. 2004). TLSs are expensive due to their high equipment requirements, the need for specialized software knowledge, and the need for skilled employees (Yu et al. 2004). Since the PC of the object is created in more than one session, it is necessary to transform in the global or mutual local coordinate system. The random sample consensus (RANSAC)-based Iterative closest point (ICP) method is generally used for registration or geo-referencing of the PCs from different sessions , Altuntas, 2015. It selects random points on different PCs to allow correspondence points to be searched and finalizes the registration process according to the determined criteria (Altuntas, 2015). In addition to being fast and reliable, RANSACbased is preferred because of sampling large data.
Faro Focus3D X130 TLS was used to obtain the PC. The Faro Focus3D X130 TLS is used to obtain PCs because it offers versatile measurement, wide range of solutions, and colored PC. Its light weight, integrated structure, advanced distance measurement capability, and intuitive operation system are used in work requiring precision. In addition to its ability to scan 976,000 dots per second and to scan up to 130 meters of area, its integrated camera captures the current scanning scene with 70 MP 8-bit RGB images. Each model used in the application was scanned in six different sessions with various directions and heights (URL, 2019).

Photogrammetric Methods
The Photogrammetric method acquires 3D PC from the sequential 2D images obtained as overlapping intervals. Multiple images obtained from different angles are used to produce 3D information (Javernick et al. 2014;Tercan, 2017). There are several methods to produce a PC from Satellite, Aerial, and Close-Range images with multiple views. Structure from Motion (SfM) is the method that provides high success and accuracy. SfM is a remote sensing method that produces 3D spatial coordinates of objects using color information of randomly ordered multiple view images. The optimal measurement design is the beginning of the PC production phase. In other words, in order to obtain the best results of the operations in the works as soon as possible, it is necessary to understand the system and technique of the images captured Li et al. 2012;Ulvi̇ 2018). Much of the software uses key points of multiple images to determination the relative orientation of the camera. They usually use the Scale-Invariant Feature Transform and Speeded up Robust Features local feature detector (Juan and Gwon 2009) to determine key points. By using key points, correspondence points are matched by methods such as the RANSAC algorithm. The key point determination is very sensitive to noise; therefore, the results depend on spatial and radiometric resolution images. Also, these points are necessary for the creation of epipolar geometry. After the epipolar geometry is created, the relative orientation of the sequential cameras relative to each other is carried out and their dense point cloud as up to scale is determined. Paying attention to the accuracy of the light in the correct direction and the overlap rate in the pair of stereo images affects the data quality when capturing images of the object (Xiang and Cheong 2003). In cases where the image overlap rates are too low and there are extreme differences between the image scales, SfM may not produce a sufficient result (Doğan and Yakar 2018; Javernick et al. 2014). For better image matching on scene images, the fixed lens should be captured at as high a spatial resolution as possible. Sony Alpha ILCE-A6000, which has a Semi-Pro mirrorless camera and fixed lens, was used to capture images. It is a compact system that can shoot at a resolution of 6000x4000 and has a 24.3 megapixel 23.5x15.6 mm sized CMOS sensor. In addition, the advanced image processor and a superior AF system produces less aliasing images in moving scenes. 109 images were used to produce Model 1, as seen in Figure 1.b. 118 images were used to produce Model 2, as seen in Figure 1.e.

RGB-D sensor
The use of RGB-D sensor in 3D reconstruction applications in computer graphics and computer vision started rapidly in the last several decades. RGB-D sensor, which is developed for human computer interaction, is being used by different disciplines, together with the Software Development Kit (SDK) developed. Great attention has been paid to research due to its cost saving, easy accessibility, efficiency in 3D reconstruction, and use in Simultaneous localization and mapping (SLAM) application (Stückler et al. 2015). RGB-D sensor, which is the time-of-flight-based depth cameras, consist of infrared (IR) depth sensor, IR emitter and RGB camera. These lightweight sensors provide color and depth per pixel in enough resolution. Red, green, and blue CMOS sensors are used in RGB imagery. The depth map is produced by the IR camera, where the distance between the object and the view is recorded as a pixel value by pseudo scale distance. It is very important for the sensor to produce a depth map because the distance is recorded as pixel value and depth information basic of PC. Since the sequential and still image is captured in SLAM applications, various methods have been developed for producing a PC or model simultaneously. Two methods, mainly image-based and shaped-based, are used to generate PCs using an RGB-D sensor (Nyarko et al. 2018). The PC produced from each of the depth maps from the sequential frames has a local coordinate system. In the shaped-based method, PC registration is performed by using RANSAC based ICP between sequential PCs because of the efficiency and reliability of the method. In the image-based methods, pose estimation is performed with the help of epipolar geometry, which forms the basis of photogrammetry. To do this, the key points are first determined from the sequential images and then the corresponding points are determined by RANSAC based methods. With the help of correspondence points, the pose estimation process is completed. In both methods, because of the simultaneous operation, the rapid movement of the sensor or the sudden displacement of the object prevents the calculation of the homography between the PCs and causes the mismatch (Stückler et al. 2015). PCs obtained with RGB-D sensors generate noise depending on the texture of the object surface, lighting condition, viewing angle, sensor restriction and distance to object. Therefore, filters such as Kalman are adapted to the sensors or the PC generated from the depth map is filtered to remove potential noise (Jia et al. 2019). In this paper, a Kinect 2 RGB-D sensor is used. This sensor can capture 30 frame images per second at a 1920x1080 spatial resolution. The effective SLAM-operated sensor between 0.5/4.5 meters can produce a depth map at 514x424 spatial resolution.

MATERIAL and METHODS
The noise level of PCs significantly affects the accuracy of reconstructed models. In order to increase the model accuracy, a controlled filtering process should be used. Filtering can cause the destruction of noisy data from the PC, as well as extracting or suppressing noisy data representing the PC. In this paper, it is emphasized to increase the quality of the model obtained from different methods and to remove noisy data from the PC. IDW, GF, SVD Based Plane Fitting, and Optimization Based Plane Fitting by using BSA methods were used to remove noise.
In practice, the test models (Model 1 and Model 2) in Figure 1, obtained by using TLS are considered to be errorless data (reference data) assuming that there is not much noise because they are obtained from close range. PCs of models obtained from the photogrammetric method and the RGB-D sensor were filtered and then results were compared with the reference data. While obtaining models with different methods, the same lighting conditions were provided. Since each model is produced in the local coordinate system, it is represented in the same coordinate system using the RANSAC-based ICP method. In Model 1, the photogrammetric method produced 124,211 points, TLS produced 259,726 points, and RGB-D sensor produced 100,038 points. In Model 2, the photogrammetric method produced 354,254 points, TLS produced 444,404 points, and RGB-D sensor produced 184,768 points. Although the models produced with three different methods for both models were recorded under equal conditions of lighting in the laboratory environment, they produced different colors due to system characteristics.

Gaussian Filtering Method
With the development of computational capabilities of computers, the Gaussian Filters, which require high computational power, are applied to PCs. The Gaussian filter is a low pass filtering, which uses Gaussian functions to produce the result. Although Gaussian filters cause loss of detail in data, they are fast and simple. There are also Gaussian derivative filters, such as the bilateral filters, which are developed to limit the loss of data caused by the Gaussian filter. k is the closest neighboring point set of the   ,, x y z f vertex (Adams et al., 2009;Tercan 2018;Wirjadi and Breuel 2005). The Euclidean distance between these vertex points and nearest neighbor points are calculated using Equation (1).
The distance values calculated using Equation (1) As the  value changes in Equation (2), the solid model is changed. While determining the value requires expertise and experience, visual value can be estimated by applying statistical tests. In this paper, the optimum  value was selected as 0.4 mm, experimentally. When the weight values obtained by using Equation (2) are used to fuse the positions of k vertexes, the corresponding vertex is filtered. This was expressed by; To achieve more optimum results in 3D Gaussian filtering, the expression shown in Equation (3) may have better results by changing the confidence interval (Adams et al., 2009;Tercan, 2018;Wirjadi and Breuel, 2005). When the Gaussian filtering results are examined in Figure 4 and Figure 5, it is seen that the photogrammetric method has more detail and noise compared to RGB-D data. This is a result of the fact that the RGB-D data has less accuracy and density than the photogrammetric data. Similarly, the filtered photogrammetric data is closer to the reference data than the filtered RGB-D data.

The Shepard Inverse Distance Weighting Method
The Shepard Inverse Distance Weighting (IDW) method is based on giving more weight to close neighborhoods than the distant vertex neighborhoods. When a point selected within the PC is filtered, utilization of points closer to that point increases the quality of filtering. Because of the law of the instrument, an instrument is more affected by what is close to them. The IDW method, which is a deterministic method, is used in the suppression of peak and pit noise. In contrast, this filter tends to disrupt the natural form of the model to be obtained by causing an increase in the number of iterations, resulting in shrinkage in the data. The weighting strategy used in the IDW filtering method is defined using Equation (4); Here, d is the Euclidean distance between the vertex to be filtered and its neighbor w is the weight value. The p shown in Equation (4) is known as the power parameter and is usually taken as 2. The Euclidean distance between the point to be filtered and the neighboring points of this point are calculated using the Equation (5); According to the calculated distance, the points are weighted (Babak and Deutsch, 2009;Lu and Wong, 2008);

Singular Value Decomposition Based Plane Fitting Method
The SVD method yields matrix factorization that is used in many areas such as dimensionality reduction, plane fitting, and feature extraction. Using the SVD method, an (n,n) X can be defined using Equation (7); PC contains 3D spatial information and the third principal component has the lowest variance. That is, the third eigenvector is approximately the normal direction of the local plane obtained from the nearest neighborhood of the filtered point (Golub and Reinsch, 1971;Kurban 2014)

Optimization Based Plane Fitting Method
The problem of plane fitting is one of the problems in literature. In the case of a plane fitting problem, a vector, p , is selected from the nearest neighbor of each point to be filtered and fits the local plane to these points. Then, the point to be filtered in the PC is projected on the local plane. The parameters representing the plane can also be calculated with the least-squares method, SVD method, Levenberg-Marquardt method, or evolutionary computation tools (Bellekens, et al. 2014;Civicioglu, 2013;Civicioglu et al. 2020;Gunen et al. 2020). The objective function used to obtain the coefficients of the local plane to be represented by the p vector is given in Equation (9).
In order to obtain the projected point, 0 0 0 (x ,y ,z ) r , in the local plane parametric equation can be used. Here, is the point to be filtered. The parametric equation can be converted to Equation (11).
From here Equation (12) is obtained.
The distance between points ( , , ) u v w r and 0 0 0 (x ,y ,z ) r is calculated using Equation (15).
Then the coordinates of The steps of the Optimization Based Plane Fitting solution are below; ax+by+cz+d=0 | c=1 represents the local plane.
1. Set the vector, p , which consists of neighboring points of point to be filtered ( Evolutionary computational methods can provide more consistent solutions for plane fitting problems than classical methods. Also, they are used to solve non-linear, non-derivative complex problems. Also, evolutionary computational algorithms do not easily trap local solutions (Tercan et al. 2020). In this paper, the Backtracking Search Optimization Algorithm (BSA) was used to solve the parameters of the local plane.
BSA (Civicioglu, 2013) is an evolutionary search algorithm developed by Civicioglu to solve real-value optimization problems. Compared to various evolutionary algorithms, BSA produces simpler results for problems such as surface fitting. The initial value in the problem solution is not dependent on the single control parameter it has. The mixrate, controls the crossover process, is the only control parameter. When creating new populations, it uses crossover and mutation operators as in the classical differential search algorithm. The search strategy and boundary control that it uses when creating a new population has enabled a very powerful exploration and exploitation skill (Civicioglu, 2013). In this experiment, dimension of pattern matrix is determined as 50. Stopping conditions are given below; 1.Stop when the maximum number of iterations is 500. 2.Stop if a better solution could not be obtained in the last 20 function evaluations. 3.Stop if the absolute value of the solution obtained for the algorithm is less than 10-16.
The BSA pseudo code is given in Figure 2.

Experimental Results and Discussion
Geodetic measurement systems, by their nature, produce noisy data of various types and amplitudes, which are unpredictable. The most important method of achieving reliable measurements in an environment where the avoidance of noise is limited by physical reasons is to produce statistical measures based on multiple observations or to filter the measurements available. Post-process filtering is more suitable because repeated measurement is not always possible. The photogrammetric method and the data generated by the RGB-D sensor were filtered to consider the noise level of the instrument. The data to be filtered is compared with the TLS data and the amount of the average error by changing the number of neighbors depends on the filtering method. Figure 3.a and Figure 3.b are the results of Model 1 photogrammetric method and RGB-D sensor, respectively. When the two figures are examined together, the average error of filtering obtained with the RGB-D sensor is greater. While the Gaussian filter was the most unsuccessful in the photogrammetric method, the SVD method was the worst method in the RGB-D sensor. Because the planes created in SVD method are highly affected by noise. Figure 3.c and Figure 3.d belong to model 2. The results of model 2 photogrammetric and RGB-D sensors are given, respectively. As in model 1, the filtering results of RGB-D sensor produced a higher average error in Model 2. The Gaussian filter was the worst in the photogrammetric method, while the IDW method produced very close results. The Gaussian method was the worst method in the RGB-D method and the SVD followed it. In both test models, according to the changing number of neighborhoods, the least error was given by the Optimization Based Plane Fitting method. Therefore, it has produced an average error of unpredictable magnitude due to the varying noise level in each model. Gaussian filtering yielded the highest average error. PCs from different methods were registered using RANSAC-based ICP. Because of the registration, the Euclidean distance calculated between the filtered data and the reference data and distance value were clustered. The cluster labels allow visual evaluation of the differences between the filtered and reference data. Distance values, which are cluster labels, were changed to the pseudo colors red, turquoise, purple, white, and yellow. Thus, it was visually obtained which vertexes move in the filtered point cloud. The red color shows the least moving vertexes. Yellow color refers to the most moving vertex. In this paper, the nearest neighbor number was experimentally determined for visual representation as (10, 20, and 30). Using larger numbers of the neighboring vertex causes loss of detail in the data and using fewer vertexes prevents the generation of enough information to compare the filtered results. Different numbers of neighborhoods (10, 20, 30, 50, and 100) were used for better expression of the graph data in Figure 3. Figure 4 shows the results of filtering according to the number of model 1 neighborhoods and the solid models of these results. Figure 5 shows the results of filtering according to the number of model 2 neighborhoods and the solid models of these results.
Error values were calculated by comparing the PCs captured with RGB-D and photogrammetric methods with reference data. The calculated error values are assigned to the point cloud of the system where they are generated as pseudo color and then the mesh surface is formed. When the results of the filtering are examined with the error amounts and the colorless solid models, it can be said that the results of the filtering provide approximate values. As the number of neighborhoods increased, the closure and detail of the data gaps decreased. As the sigma value was changed in the Gaussian filtering technique, the surface softness was changed but the most appropriate value was experimentally determined to be 0.4mm. The effect of this change on the result can be examined by a further study. Increasing the power parameter in the IDW filters may impair the result quality of the data. The SVD and Optimization Based Plane Fitting filter methods work differently from others because the surface parameter is fitted by calculating the projection of the point to the surface. Moreover, in some places on the surface, there are discontinuous transitions. The Optimization Based Plane Fitting filter with better quality than the SVD based method has been found to obtain solutions. Free-form surfaces can be used instead of the plane to increase the surface continuity effect in projection-based filtering.

CONCLUSION
PCs suffer from unpredictable and uncontrollable noise types with variable amplitude, due to the general error characteristics of the data capture environment conditions and the hardware used to capture related data. The noise, which disturbs the quality of PCs must be suppressed by using several filtering methods, such as the spatial filtering techniques mentioned in this paper. The most important method of achieving reliable measurements in an environment where the avoidance of noise is limited by physical reasons is to produce measures based on multiple observations or to filter the measurements available. In this paper, test model PCs were obtained using Terrestrial Laser Scanner, the Photogrammetric Method, and RGB-D sensors. The obtained point clouds were filtered according to the number of neighborhoods at three different levels using the Shepard Inverse Distance Weighting method, Gaussian Filtering method, Single Value Decomposition Based Filtering, and Optimization Based Plane Fitting. The Backtracking Search Optimization Algorithm, which works to find the best values for the parameters of the system or model in different conditions, has been used to determine the local plane parameters. Thus, the successes of the measurement systems as well as the success of filtering methods were examined. Although the proposed method provides effective results, it does not make sense to compare it in terms of CPU time consumption. Because evolutionary computation-based methods generally work slower than classical methods. Based on the statistical and visual results obtained, the Optimization Based Plane Fitting Method using Backtracking Search Optimization Algorithm gave the best result.
In future studies, comparisons will be made with detailed analysis using evolutionary calculation-based methods that use different strategies to filter PC.

ACKNOWLEDMENT
This paper forms part of Mehmet Akif Günen's master's thesis and supported by the projects: Erciyes University BAP FYL-2013-4330 and Tubitak 115Y235.