Nonlinear Feature Extraction for Hyperspectral Images

: In this study non-linear dimension reduction methods have been applied to a hyperspectral image in order to increase the classification accuracy in feature extraction step. Furthermore, image segmentation has been ensured the by taking into consideration the spatial synthesis of hyperspectral images and passing from high-dimensional space to low dimensional space. It has been compared the results obtained from the image segmentation made by taking one pixel from this spatial synthesis. The advantages of the effects of the results of the dimension reduction techniques made by facing neighbor pixels on the segmentation of hyper-spectral image have been displayed in the experimental results part.


Introduction
With the advance of remote sensing technology in the recent years hyperspectral image scanners have become very popular in many scientific area such as geosciences and medicine. If hyperspectral images are compared with multispectral images, hyperspectral images contain much richer data. However, the presence of hundreds of adjacent spectral bands in the hyperspectral images negatively affects the pattern recognition algorithm. That is why the most considerable problem of the hyperspectral image processing is the dimension reduction and feature extraction. Feature extraction and dimension reduction techniques are very relevant to each other in their nature. By extraction of new features from original spectral bands as a parallel process of dimension reduction, classification and segmentation algorithms can be used more effectively from machine learning and pattern recognition area in a lower dimensional space. Dimension reduction methods can be realized in two ways: feature selection or feature extraction. In feature selection methods only a small subset of highly informative bands are chosen Feature extraction refers to the mapping of high dimensional data onto a lower dimensional space. In feature extraction methods features are extracted using linear or nonlinear functions of the original bands. In this study dimension reduction, band selection and especially characteristic inference is done. Furthermore, by benefiting from the characteristic inference from hyperspectral images and using class labels with informative methods, it is tried to perform class analysis. There are many methods to execute the characteristic inference. The oldest and hence the most known of these techniques is the linear technique named PCA. However, linear techniques are not sufficient in data inference. It is needed to have non linear techniques to classify complex data. In the recent years, many non linear techniques have been used for this purpose. In addition to the non linear techniques protecting global characteristics such as Kernel PCA, Isomap, Diffusion Maps we have also used non-linear techniques protecting local characteristics such as LLE, Hessian LLE, Laplacian Eigenmaps, LLC. [9] Furthermore, it has used PCA to compare non-linear techniques. As second step, it have been obtained the 5, 10, 15, 20 and 50 spatial neighborhoods for the available data set created with this characteristic inference methods and tried to classify the images. Each data is expressed as a sample point. This sample point is named as pixel. The vectors created for these pixels are valid for all bands and have a specified value. Is the class of a pixel is determined; the class of its neighborhood next to it is the same and represents its spatial neighborhood [1]. This spatial neighborhood for a pixel of an image is to take n×n neighborhood around it and to discipline to image. The proximity of spatial neighborhood between two points is measured with Euclidean distance. In spatial neighborhoods, two close data points are similar and their probability to be in the same class is very high. As distance increases, similarity decreases for n×n neighbor. In the second part, after displaying the linear and nonlinear techniques, the data set has been presented as well the study in the third part. In the last part, the experimental results have been showed as well as advantages of this study. Classification performance can be kept same or can be increased.

Methodology
In the real world, it is been struggled too much data. It is needed too much memory, calculation to correctly classify the data. With the reduction of the data it is possible to use it efficiently and to reduce cost. And to perform this operation without any data loss, you need dimension reduction methods ensuring dimension reduction and decreasing band number. This is specially an important step for the processing of high inference dimensional data. The characteristic inference is done on bands with linear or nonlinear methods. In this study it has been done characteristic inference by using class data [14]. In the following method of dimensionality reduction used in this study are presented .

Diffusion Maps
Diffusion Maps technique is based on Markov chain base. The diffusion distance between two points x and y is known as casual ___________________________________________________________ 1 Computer Engineering Department, Electric-Electronic Faculty, Yildiz Technical University, Esenler, 34220, İstanbul/Turkey movement probability. This range possibility is identified as follows: In Equation (1) (x ) shows the over-weighted characteristics. Diffusion Maps and end weight in data graphic are calculated with Gauss kernel function [2]. This method combines all along the graphic all ways and uses diffusion distance to reduce dimension. It is continued to process all along the graphic by throwing minimum eigenvalue and eigenvector.

Kernal PCA
Kernel PCA-KPCA is the expanded feature of PCA method. However, while PCA is a linear method KPCA is a non-linear technique that improves linear techniques. Data dimension is reduced by using Kernel matrix, so as K kernel matrix of the data points x in the form of (x , x ). By changing K kernel matrix inputs, it is possible to find the centers and d eigenvectors. The essential in KPCA is to select the kernel function. Kernel function can be linear kernel, Gauss kernel and polynomial kernel. KPCA ensures quite successful results in face detection, speech recognition.

Principal Component Analysis (PCA)
PCA is the most popular orthogonal linear transformation. PCA is display of the data having the biggest variance in low dimensional space. High variance characteristics are preferred to low variance. By calculating to covariance matrix of X data matrix samples, it is possible to find the M linear mapping increasing the cost function. It removes the eigenvectors having the biggest eigenvalue. PCA used the euclidean distance between x and x data points. PCA transformation is as µ T = W.W orthogonal matrix, µ T linear transformation; whereas W displays the eigenvector corresponding to covariance matrix.

Isomap
Instead of Euclidean distance used in multidimensional scaling algorithm on Isomap weighted graphic, it is a low dimensioned embedding method using geodesic distance. Geodesic distance is the shortest one and it is needed to find neighborhoods between all data points in order to find this distance. Geodesic distance is calculated with neighborhood graphic between data points x (i=1,2,...,n). Furthermore, in Isomap method it is also important to select the neighborhood parameter. The linkage of each data point is known as the closest Euclidean distance in high dimension space. The shortest distance between two points is found with Dijixtra algorithm.

Local Linear Embedding (LLE)
It is a method similar to Isomap. However as distinct from Isomap, it only protects the local characteristics on data point graphic image. In LLE x data point converging to local neighbors and k w weights are matching a hyper smoothing being a linear union of their closest neighbors by preserving the neighbor relations of each data point. w weights are stable against reconfiguration, transformation, slewing, scaling. Various coefficients are created. W=( ) i,j minimizes the cost function. After calculating the weights, it is passed to low dimensioned space by preserving local neighbors.

Hessian LLE
As LLE discrete matrix techniques are used. It calculates the data graphic by using k the closest neighborhood. It performs preliminary analysis by measuring manifold curves of H matrix and displays in low dimensioned space. It is calculated as Tangent Hessians: For each data point, approximate local tangent coordinates are calculated for manifold Hessians by calculating eigenvectors of covariance matrix. The H matrix is performed minimizes curviness of the Afterwards, the matrix created in order to determined eigenvectors of covariance matrix is orthonormalized. Compared to LLE and Laplacian it is slowest and gives worst performance in discrete ones. However it is successful in convergence problems.

Laplacian Eigenmaps
As similar to LLE, it passes to low dimensioned space by preserving manifold local characteristics. Laplacian method creates first G graphic in association with k the closest neighborhood and selects the weight. The end weights for x and x data points are calculated with gauss. y is subjected to x weights in cost function of low dimensioned space image and little distances between x data points. Hence, cost function is minimized with y and y spectral graphic theory. M is stated as grade matrix, W diagonal matrix and Laplacian graphic L=M-W. Eigen decomposition of Laplacian graphic is done and low dimensioned embedding is created.

Locally Linear Coordination (LLC)
LLC performs global editing of linear models by calculating linear models. A mixture of factor analysis is calculated with LLC EM(Expectation Maximization) algorithm. The mixture of factor m gives local data images and , y which are the related responsibilities. W weight matrix shows the weights created with LLE by using . The weights are stated as: (3)

Multidimensional Scaling (MDS)
MDS is a technique based on similarity matrix. It matches as much as possible the points having closest original distance for similarity between N couples of data. This indicates how much the objects resemble to each other. Furthermore, MDS finds also matrix distances that are not like. There are many MDS algorithms and classifies these MDS algorithms depending on entry matrix.

Experimental Study and Results
In this study Salinas data has been used set obtained from 224 bands captured from AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) sensor. Salinas data set is a data set offering high extensional resolution found in the valley of Salinas in California. It is composed of 217 samples, 512 lines and in whole 16 classes.
In this study, in order to ensure at first the processing of all data, it has been obtained the signatures ensuring some characteristics. It is thought that these spectral signatures represent as far as possible the spectral range. The non-linear techniques allowing characteristic inference have been applied not to all data but to these spectral signatures. The data set was obtained by removing 10 and 15 featured vectors thanks to dimension reduction techniques used in this study. Furthermore, RBF and kNN interpolation was used in order to discipline all data set. The data set reduced with RBF was learnt with artificial neural networks and each spectral signature passed to low dimensioned space by using a network structure. kNN interpolation method is found according to k closest samples. In kNN interpolation, weights are calculated by finding the closest neighborhood to normalized image. In this method, spectral signatures and distances of each pixel are found. There after by selecting k=9 value feature vectors are obtained by getting nonlinear dimension reduction method of k spectral signature which is closest to pixel to be reduced. The purpose here is to find the closest mid evaluation to be obtained with non-linear projection methods. This method can be applied on all spectral signatures and ensures dimension reduction. The experimental results for Salinas data set are for 10, 15 dimensions. Salinas data set was reduced to 10, 15 bands by using 224 bands extensional union. Segmentation and classification success were measures by using 5, 10, 15, 20 and 50 neighborhoods. Segmentation results obtained by using one pixel as well as segmentation results obtained by taking neighborhoods for 16 different classes were evaluated. Have also displayed the results obtained from a linear method PCA for comparison purpose. The results of the study are shown in the following table.
In the Table I and Table II

Conclusion
In this study, it is have introduced spatial coherence into dimensionality reduction and showed its contribution for the image segmentation. The spatial coherence is introduced by comparing individual pixels based on neighborhood structure of their nonlinear dimension reduction techniques. Furthermore, using spatial coherence have been tried to increase the classification success of dimension reduction techniques. Reduction to 10 bands and 15 bands considering 10-20 neighbors gave the best classification results. Compared to results the classification accuracy decreased when the reduced too few of band or too many of bands. It is important to express the best by taking the appropriate number spatial neighborhood.