Investigating the Effects of Facial Regions to Age Estimation

Aging process causes evident alterations on human facial appearance. Real world age progression on human face is personalized and related with many factors such as, genetics, living style, eating habits, facial expressions, climate etc. The wide degree of variations on facial appearance of different individuals affects the age estimation performance. In accordance with these facts discovering the aging information contained in facial regions is an important issue in automatic age estimation. Thus the facial regions emphasizing the aging information can be used for more accurate age estimation. In this context, age estimation performances of facial regions (eye, nose, mouth and chin, cheeks and sides of mouth) are investigated in this paper. For this purpose, an age estimation method is designed to produce an estimate of the age of a subject by using the texture features extracted from facial regions. In this method the facial images are warped into the mean shape thus variations of head pose and scale are eliminated and the texture information of facial images are aligned. Then the holistic and spatial texture features are extracted from facial regions using Local Phase Quantization (LPQ) texture descriptor, robust to blur, illumination and expression variations. After the low dimensional representation of these features, a linear aging function is learned using multiple linear regression. In the experiments FGNET and PAL databases are used to evaluate the age estimation accuracies of facial regions i.e. eye, nose, mouth and chin, cheek and sides of mouth, separately. The results have shown that the eye region carries the most significant information for age estimation. Also the mouth and chin, cheek regions are effective in the prediction of age. The results also have shown that, using the spatial texture features enhances the discriminative power of the texture descriptor and thus increases the estimation accuracy.


Introduction
Age estimation is the process of estimating the age or age group of an individual according to his/her facial information.During the aging process, evident alterations occur on human facial appearance.These variations are personalized and affected by factors such as race, genetics, living conditions, eating habits, the frequency of facial expressions, etc.This makes the age estimation problem much harder than other facial image processing problems.Therefore the accuracy of the age estimation systems are insufficient, even the human skills about age estimation are limited.In this context, discovering the amount of aging information contained in facial regions and thus improving the accuracy of age estimation systems by using the facial regions emphasizing the aging information is an important issue in this field.In this paper we investigate the age estimation performances of facial regions: eye, nose, mouth and chin, cheek, sides of mouth.For this purpose we designed an age estimation system using the texture features of these facial regions to produce an estimate of the age of a subject.The block diagram of the system is shown in Fig. 1.In our method the input images are normalized so that the shape variations such as scale, head pose are eliminated and the facial texture is aligned.Then Local Phase Quantization (LPQ) is used to extract holistic and spatial texture features from facial regions.After feature extraction, dimensionality reduction is performed with Principal Component Analysis (PCA).Finally an aging function is learned using Multiple Linear Regression (MLR) for age estimation.The feature extraction, dimensionality reduction and aging function learning steps are performed separately for each of the facial regions to discover the amount of aging information contained in that region.The rest of the paper is organized as follows.A survey on the age estimation methods are given in Section 2. The proposed age estimation approach is explained in Section 3. In Section 4 the experimental results on various databases are reported and analyzed.Finally, the conclusions are outlined in Section 5.The age image representation techniques can be grouped under five topics.The anthropometric models relays on the facial geometry.In these models the distances and the ratios of these distances are calculated using the fiducial points on the facial images.As these geometrical features can only deal with young ages, wrinkle features are used with geometric features to strengthen the classification performance for older ages [1][2][3].Active Appearance Model (AAM) based age estimation methods incorporate shape and appearance information together.For this reason AAM's are frequently used in age estimation methods [4][5][6].In some studies AAM features are extracted as global facial features and fused with local facial features for efficient age estimation [7].In Aging Pattern Subspace method, the sequence of an individual's aging face images are used to model the aging process [8].But age manifold methods uses the images of different individuals at different ages to learn the common aging pattern.They utilize manifold embedding techniques to discover the aging trend in a low dimensional space [9,10].Appearance models are mainly focused on aging-related facial feature extraction using various texture descriptors such as local binary patterns, Gabor filters, histograms of gradients etc. [7,[11][12][13].All the studies mentioned above generally uses the whole face in feature extraction phases.Unfortunately the number of works examining the age estimation performances of facial regions/parts is insufficient in the literature.Lanitis [14] investigated the significance of facial parts in age estimation.In the experiments, the age estimation performances of whole face (including the hairline), internal face, upper part of the face and lower part of the face are calculated.AAMs are used to represent the shape and appearance of facial parts with model parameters.The results showed that the upper part of the face gives the minimum age estimation error than other parts.El-Dib and Onsi [15] used bioinspired features to analyze the different facial parts: eye wrinkles (covering the eyes and the area under the eyes), internal face and whole face.They built six support vector regression and one support vector machine model to estimate the age of a subject.Their results showed that the eye region covering the eyes and the area under the eyes contains the most important aging features when compared with others.

Image Normalization
In order to eliminate the shape variations such as head pose, scale, size, etc. and to align the texture information, image normalization is applied on facial images.Image normalization is performed by warping the facial images into the mean shape obtained from training set.The facial images in the training set are labelled with 68 landmark points as shown in Fig. 2-a.Mean shape is obtained by taking the mean of all coordinates of these points.Then all the images are warped into the mean shape using Delaunay triangulation (Fig. 2-b) and affine transformation, so the landmark points of all images are matched with the mean shape and also the texture information is aligned.The result of warping process is given in Fig. 2-c.As the head poses of facial images varies, the warped images can be inclined, therefore rotation is the last step in image normalization (Fig. 2-d).

Facial Regions
Investigating the aging information contained in facial regions is important for age estimation system's design.In this way, the accuracies of age estimation systems can be improved by using the facial regions emphasizing the aging information.For this purpose the facial images are divided into regions and the age estimation algorithm is applied to these regions separately.In the study the age estimation performances of eye (19055), nose (5570), mouth and chin (8570), cheek (6560) and side of mouth (4045) regions are determined.The facial regions used in the experiments are shown in Fig. 3.

Feature Extraction with LPQ
LPQ is a blur insensitive texture descriptor based on the blur invariance property of the Fourier phase spectrum [16].In this method LPQ codes are computed in local image windows using discrete Fourier transform (DFT) and the results are presented as a histogram.
The spatially invariant blurring of an image can be expressed by a convolution between the image and the point spread function (PSF).In the frequency domain this is equal to (u) = (u).(u), where (u), (u) and (u) are the DFTs of the blurred image, the original image and the PSF, respectively.Considering the phase of the spectrum we have ∠(u) = ∠(u) + ∠(u).If we assume that the blur PSF ℎ(x) is centrally symmetric, ℎ(x) = ℎ(−x), its Fourier transform is always real valued, 0 for ∠(u) ≥ 0 and  for ∠(u) < 0. This means that ∠(u) = ∠(u) for all (u) ≥ 0. In other words the phase of the observed image ∠(u) is invariant to centrally symmetric blur, at the frequencies where (u) is positive.
If the NxN neighborhood around a pixel x is denoted as   , the two dimensional (2-D) DFT of   is defined by, where w u is the basis vector of the 2-D DFT at frequency u, and f x is the vector containing all N 2 pixels in   .Only the complex coefficients of u 1 =[a, 0] T , u 2 =[0, a] T , u 3 =[a, a] T , u 4 =[a,−a] T are considered in LPQ. a is a scalar frequency below the first zero crossing of (u) that satisfies (u) ≥ 0. For each pixel position this results in a vector given by, where Re{.} and Im{.} return real and imaginary parts of a complex number, respectively.Then G x is computed for all image positions, i.e., x ∈ {x 1 , x 2 , . . ., x N }, and the resulting vectors are quantized using a simple scalar quantizer, where   is the j th component of G x .The quantized coefficients are represented as integer values between 0-255 using binary coding using (5), and the histogram of these integer values is used as a feature vector. (5) The LPQ texture descriptor represents the input image as a histogram of 256 bins.In this holistic representation, the histogram is produced without taking into account the spatial information of the pixels.The discriminative power of the texture descriptor can be enhanced by using the spatial histograms.The spatial histograms are produced by concatenating the local histograms extracted from small image blocks.

Dimensionality Reduction
In order to find a lower dimensional subspace of the extracted features and to obtain the significant features for age estimation, dimensionality reduction is performed using PCA.PCA method finds the embedding that maximizes the projected variance given by   = arg max ‖‖=1    where  = ∑  − ̅ )  =1 (  − )  is the scatter matrix,   is i th feature vector with   ∈   and ̅ is the mean of the feature vectors.By solving this problem, a set of  ≤  eigenvectors associated to the d largest eigenvalues of S is obtained.Then dimensionality reduction is performed by projecting all samples on the projection subspace using   =     with   ∈   .

Age Estimation
After finding the low dimensional representation of features, the age estimation problem is recast as a multiple linear regression as  =  +  where  is the data matrix, B is the unknown parameter vector, L is the age label vector and e is the error vector.In the learning phase the unknown parameters are estimated by means of least squares, or robust regression.The regression function used in this study is a linear function given by  ̂=  ̂0 +  ̂1  where  ̂ is the estimate of age,  ̂0 is the offset,  ̂1 is the weight vector containing the coefficients for each value in the feature vector and y is the low dimensional representation of the extracted feature vector.

Experiments and Results
In this paper the FGNET and PAL databases are used to evaluate the age estimation performances of facial regions.FG-NET database [17] is composed of 1,002 images that were retrieved from real-life albums of 82 subjects, thus includes variations of head pose, occlusion, illumination, facial expressions, etc.The age range in this database is 0-69 years, but the images are not uniformly distributed according to the ages.This can be a disadvantage for the estimation accuracy.
The PAL aging database [18] contains 580 images of different persons taken under natural lighting conditions using a digital camera.The images include various expressions such as neutral faces, anger, sadness or smiling.The age distribution of the images in this database is between 18 and 93 years old and also not uniform.
The age estimation performance of the system is evaluated using n-fold cross validation method.In this method all the samples are randomly partitioned into n equal sized subsamples.Then one subsample is used as test set and n-1 subsamples are used as training set.This procedure is repeated n times until each of the subsamples are used once as a test set.Then the average of all n estimations is considered as the system performance.In our experiments we set n=3.
The performance is measured using the Mean Absolute Error (MAE) metric given as, /  (6) where  ̂ is the estimated age value of i th test sample,   is the real age value of i th test sample, and   is the total number of test samples.
In the experiments the age estimation performances of facial regions are calculated using the features extracted with LPQ texture descriptor and regression.LPQ is performed on 5 × 5 local image windows.The LPQ histograms are first extracted from the whole facial region resulting the holistic description of that region.Then the region is divided into  ×  blocks, LPQ histograms are extracted from each block and concatenated to obtain the spatial description of the region.The spatial features are used to enhance the discriminative power of the texture descriptor.The spatial LPQ histograms are extracted for n=2, 3, 4, 5, 6 and age estimation is performed using these features.The effect of using the spatial texture features of facial regions for age estimation on FGNET and PAL databases are given in Fig. 4. We can see from the figure that the holistic feature representation is not encouraging and increasing the number of blocks in spatial feature representation generally increases the estimation accuracy of the facial regions.
The age estimation accuracies of the whole face and facial regions are tabulated in Table 1.We can see from the results that the age estimation accuracy of the whole face and the eye region are close to each other.These results indicate that the eye region carries the most significant information for age estimation.Also the mouth and chin, cheek regions are more effective than other regions in the prediction of age.

Conclusion
In this paper the age estimation performances of the facial regions are investigated.For this purpose the facial image is divided into five regions: eye, nose, mouth and chin, cheek, side of mouth.
Then LPQ is used to extract features from these regions.In the feature extraction phase holistic representation of the region is obtained by applying the texture descriptor on the whole region.Also the spatial representation is obtained by dividing the region into number of blocks, extracting features from these blocks and concatenating them.The age estimation accuracies of the facial regions are evaluated separately and the experiments on FGNET and PAL databases have shown that the most of the aging information is contained in the eye region.Moreover the age estimation accuracy of the eye region is close to the whole face.
As the estimation of the whole face is better than the regions alone, determining the weights of these regions in age estimation and thus achieving better estimation accuracy is our future work.

Figure 2 .Figure 3 .
Figure 2. Image normalization a) Facial image labeled with 68 landmark points b) Delaunay triangulation c) The image after warping into the mean shape d) Rotated image.

Figure 4 .
Figure 4.The effect of using spatial texture features of facial regions for age estimation a) FGNET database b) PAL database.
Age estimation systems generally consist of age image representation and age estimation modules.The aim of age image representation module is to extract the shape or texture based features from facial images.Then classification techniques are used to classify the facial images into multiple age groups or regression techniques are applied to estimate the age of the subjects.

Table 1 .
MAE's of facial regions for FGNET and PAL databases