Multivariate Multiple Regression Analysis Based on Principal Component Scores to Study Relationships between Some Pre- and Post-slaughter Traits of Broilers

The main purpose of this study is to show that how can we use multivariate multiple linear regression analysis (MMLR) based on principal component scores to investigate relations between two data sets (i.e.preand postslaughter traits of Ross 308 broiler chickens). Principal component analysis (PCA) was applied to predictor variables to avoid multicolinearity problem. According to results of the PCA, out of 7 principal components only the first three components (PC1, PC2, and PC3) with eigenvalue greater than 1 were selected (explained 89.45 % of the variation) for MMLR analysis. Then, the first three principal component scores were used as predictor variables in MMLR. The results of MMLR analysis showed that shank width, breast circumference and body weight had a similar linear effect on predicting the post-slaughter traits (P=0.746). As a result, since the animals had high value of shank width, breast circumference and body weight, it might be probable that their post-slaughter traits namely heart weight, liver weight, gizzard weight and hot carcass weight were also expected to be high.


Introduction
Information on the relations between two data or variable sets (i.e. pre-and post-slaughter traits) is quite important (Mendeş et al 2005). In practice, canonical correlation analysis (CCA) is commonly used to investigate the relations between two data sets (Akbaş & Takma 2005;Mendeş & Akkartal 2007;Sousa et al 2007). Multivariate multiple linear regression analysis (MMLR) can also be used for this objective (Lutz & Eckert 1994). In animal science, however, there are no studies in literature using MMLR. The MMLR is one of the alternatives of CCA. Thus, it will be useful to show how MMLR can be used in investigating relations between two data sets.
It is well known that in the presence of multicollinearity problem, the standard errors of the parameter estimates could be quite high, resulting in unstable estimates of the regression model. Hence, the multicollinearity between predictor variables can lead to incorrect identification of the most important predictors (Sharma 1996;Thompson et al 2001;Hoe & Kim 2004). Previously, some researchers (Sharma 1996;Çamdeviren et al 2005;Sousa et al 2007;Mendeş 2009) reported that one of the approaches to avoid this problem is the principal component analysis. Recently, the usage of principal component analysis (PCA) to avoid multicollinearity problem began to increase with the availability of related statistical package programs such as SAS, Statistica, SPSS and NCSS (Fievez et al 2003;Liu et al 2004;Raick et al 2006;Posta et al 2007).
The main purpose of this study is to show that how can we use multivariate multiple linear regression analysis (MMLR) based on principal component scores to investigate relations between two data sets (i.e.pre-and post-slaughter traits of Ross 308 broiler chickens).

Materials
A total of 51 broiler male Ross 308 broiler chickens were used. The following pre-slaughter traits were measured when the birds were six weeks old: body weight (BW,g), shank width (SHW, mm), shank length (SHL, mm), breast bone length (BBL, mm), breast width (BRW, mm), breast circumference (BRC, cm) and body length (BL, cm). The birds were slaughtered at 6 th week of the investigation. The post-slaughter traits were: hot carcass weight (HCW, g), heart weight (HW, g), liver weight (LW, g) and gizzard weight (GW, g).

Multivariate Multiple Linear Regression Analysis (MMLR)
Multivariate multiple linear regression analysis (MMLR) was used to investigate the relations between pre-and post-slaughter traits of broiler chickens. Like CCA, MMLR analysis enables us to investigate the relations between two data sets (Baldava 2004;Akkartal et al 2009). In this study, the first data set formed by pre-slaughter traits (independent or predictor variables) and the second data set formed by post-slaughter traits (dependent variables). MMLR is similar to multiple linear regression analysis (MLR). However, the numbers of dependent variables are more than one in MMLR. Computationally, MMLR gives the same coefficients, standard errors, t-and p-values and confidence intervals as one would estimate with individual MLR computations for each of the dependent variables separately (Breiman & Friedman 1997;Baldava 2004). However, since numbers of dependent variables are more than one and the dependent variables are correlated in general, the using of MMLR instead of MLR would be more appropriate in investigating relations between two data sets (Goldwasser & Fitzmaurice 2006). The hypothesis being tested by a multivariate regression is that there is a joint linear effect of the set of independent variables on the set of dependent variables. Hence, the null hypothesis is that slope of all coefficients is simultaneously zero.
In MMLR, q dependent variables (Y 1 , Y 2 ,…, Y q ) are to be predicted by linear relationships with r independent variables (X 1 , X 2 ,…, X r ). That is, we are interested in how the set of dependent variables relate to the independent variables. Therefore, the statistical model for the MMLR is: where Y represents n observations of a q-dependent variable, X represents the design matrix of rank r+1 with its first column being the vector 1, B is a matrix of parameters to be estimated and E represents the matrix of residuals (Monge 1977;Lutz & Eckert 1994;Khattree & Naik 1999;Keskin et al 2008).
In practice, MMLR include a large number of independent (predictor) variables (X's) where some of them might be slightly correlated with the dependent variables (Y's) or may be redundant because of high correlations (multicollinearity problem) with other independent variables. The use of redundant predictors can be harmful since potential gain in accuracy attributable to their inclusion is outweighed by inaccuracies associated with estimating their proper contribution to the prediction (Spark et al 1985). Especially, in the presence of multicollinearity among the column of X can have significant impacts on the quality and stability of the fitted model (Johnson 1991). One approach to avoid this problem is PCA.

Principal Component Analysis (PCA)
PCA is an appropriate multivariate technique to reduce the dimension of a data set consisting of a large number of interrelated variables, while retaining as much as possible of the variation present in the data set (Sharma 1996;Özkan & Mendeş 2004). This is achieved by transforming set of original variables to a new set of variables, the principal components (PCs), and which are ordered so that the first few retain most of the variation present in all of the original variables (Joliffe 2002). Since detailed information about PCA was given by Johnson & Wichern (1982), Sharma (1996) and Tabachnick & Fidell (2001), we did not give any further information about PCA. The basic equation of PCA is, in matrix notation, given by following equation: Y=W ' X (2) where X is the matrix which contains p original variables and W is the matrix of weights which contains standardized weights (w ij ) of the each variable in PCs. The magnitudes of the coefficients give the contributions of each variable to that component.
The units of the variables could have an effect on PCA. Due to differences in the units of variables used in PCA, correlation matrix of variables (C) was used to get eigenvalues and weight of variables.
Bartlett's sphericity test is used to test the null hypothesis that the correlation matrix is an identity matrix or all correlations are zero. In other words, the applicability of PCA is determined based on Bartlett's sphericity test result. Therefore, the applicability of PCA was tested by Bartlett's sphericity test. The formula for computing this test is: where p is the number of components, n is the number of observations in the sample and C is the correlation matrix.
In this study, we used only three PCs with eigenvalue greater than 1 out of 7 PCs (Kaiser 1960). Therefore, only score values of first three selected PCs were considered. Coefficient of determination (R 2 ), Durbin-Watson statistic (DW) and residual mean square error (RMSE) were used as goodness-of-fit criteria. SAS V8 (SAS Institute Inc., 1999) and NCSS (Hintze 2001) statistical package programs were used in the statistical analyses.

Results and Discussion
It has observed that when the raw data of the study were used for the multivariate multiple linear regression analysis (MMLR), a multicollinearity problem had existed with high values of variance inflation factor (VIF>10.0; Table 1). Johnson (1991) and Sharma (1996) reported that the prediction of dependent variable(s) in the regression analysis was highly affected from the multicollinearity problem between the independent variables. It is well known that in the presence of multicollinearity, the standard errors of the parameter estimates could be quite high, resulting in unstable estimates of the regression model. In this case, using the MMLR analysis to investigate the relationships between two data sets can not be gave reliable results. Sharma (1996) and Sousa et al (2007) informed that one of the approaches to avoid this problem was principal component analysis (PCA). Therefore, principal component scores were used as independent variables in MMLR model to avoid multicollinearity problem and get more reliable parameter estimates. Results of PCA are given in Tables 2, 3 & 4. Firstly, Bartlett's sphericity test for testing all correlations are zero or for testing the null hypothesis that the correlation matrix is an identity matrix was used to verify the applicability of PCA. The value of Bartlett's sphericity test was found to be as 50.77 and it implied that the PCA is applicable to our data set (P=0.000). In principal component analysis, one of the most commonly used criteria for solving the number-of-components problem is the eigenvalue-one criterion, also known as the Kaiser criterion (Kaiser 1960). With this approach, we retain and interpret any component with an eigenvalue greater than one. According to the results of PCA (Table 2), out of 7 principal components only first three principal components (PC1, PC2, PC3) with eigenvalue greater than one were selected for MMLR analysis. Because eigenvalues represent variances and that each standardized variable contributes to principal component extraction is one, a component with the eigenvalue less than one is not as significant. Thus, the first three principal components provide adequate information of the data for most purposes. The first three principal components explained 89.45 % of the variation ( Table 2). The first principal component (PC1) is a measure of SHW, BRC and BW while the second principal component (PC2) is a measure of SHL and BRW and the third principal component (PC3) is a measure of BBL and BL across the birds. The PC1 accounts for 43.00 % of the total variance while the PC2 and PC3 account for 25.22 % and 21.24 % of total variance, respectively.
Loadings, the simple correlations between the original and new variables, give an indication of the extent to which the original variables are influential or important in forming new variables (Sharma 1996) are given in Table 3. The bold market loads indicate the highest existing correlation between variables and the selected components (the higher loading the more influential variable in forming the principal component scores). As can be seen from Table 3, SHW, BRC and BW traits have the highest correlations with the PC1. PC1 or score 1 had a positive impact on heart weight (HW), liver weight (LW), gizzard weight (GW) and hot carcass weight (HCW) of birds and 91.63 % of variation in these post-slaughter traits could be explained by the first scores (Table 4).
Multivariate multiple linear regression analysis results based on PC scores showed that the overall model was statistically significant (Wilks' Lambda=0.00754; P=0.0351). Therefore, MMLR model based on principal component scores for investigate the relations between pre-and postslaughter traits can be written as: (HW+LW+GW+HCW)=0.95725PC1+0.12509PC2 -0.16849PC3 with a value for R 2 of 96.04%.

Table 1-Multivariate multiple linear regression analyses results based on raw data
Çizelge 1-Ham veriler üzerinden elde edilen çokdeğişkenli çoklu doğrusal regresyon analizi sonuçları Stepwise regression analysis was made to determine which PCs contributed to the variation in the dependent (post-slaughters) variables set. The stepwise regression analysis revealed that only PC1, which was composed of SHW, BRC and BW, contributed the most. Hence, the final MMLR model can be written as: HW+LW+GW+HCW)=0.95725PC1 with a value for R 2 of 91.63%.
Results of MMLR analysis show that SHW, BRC and BW have a similar linear effect on predicting the post-slaughter traits (P=0.746). The loading of SHW (0.868), BRC (0.892) and BW (0.849) found to be also similar to each other. These results suggest that these three pre-slaughter traits namely SHW, BRC and BW vary together. Hence, since the animals had high value of SHW, BRC and BW, it may be probable that their post-slaughter traits namely HW, LW, GW and HCW (especially HCW and HW) were also expected to be high. SHW, BRC and BW account for 91.63 % of the total variance in post-slaughter traits.

Conclusion
Multivariate multiple linear regression analysis can be a useful way of demonstrating the relationships between two variable sets in animal based studies as well as in other areas of research. In view of these results, it possible to suggest that the most important pre-slaughter traits influencing the changes in the post-slaughter traits (especially in heart weight and hot carcass weight) were shank width, breast circumference and body weight. On the other hand, changes in shank width, breast circumference and body weight contributed more to changes in heart weight and hot carcass weight rather than changes in liver weight and gizzard weight.