Association Rules Analysis for Continuous Chicken Egg Traits Dataset
Year 2024,
, 296 - 304, 09.12.2024
Figen Ceritoğlu
,
Zeynel Cebeci
Abstract
This study aims to apply the Apriori association rule algorithm on 14 continuous egg quality traits recorded from 4320 eggs of three commercial white-laying chicken lines. In the study all the continuous data were discretized using Equal-Width-Interval method based the number of intervals obtained with Rice formula. Association rules analysis on the discretized dataset resulted with a total of 349 rules consists of 3 and 4 items. According to the top five rules by support and confidence, some important associations were obtained between the certain value ranges of the traits egg weight, egg width, egg length, shell thickness, and shell breaking strength when compared to the others. The appropriate biological and economic interpretations of the obtained rules may contribute to the poultry industry in practice.
References
- Agrawal, R., Imielinski, T., Swami, A., 1993. Mining association rules between sets of items in large databases. SIGMOD '93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, May 26-28, Washington, USA, pp. 207-216.
- Agrawal, R., Srikant, R., 1994. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference Very Large Data Bases, September 12-15, Santiago de Chile, Chile, pp. 487-499.
- Align, B.N., Malheiros, R.D., Anderson, K.E., 2023. Evaluation of physical egg quality parameters of commercial brown laying hens housed in five production systems. Animals, 13(4): 716.
- Anonymous, 2023. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, (https://www.R-project.org/>), (Accessed Date: 25/05/2024).
- Balhara, S., Singh, R.P., Ruhil, A.P., 2021. Data mining and decision support systems for efficient dairy production. Veterinary World, 14(5): 1258-1262.
- Bhatia, J., Gupta, A., 2014. Mining of quantitative association rules in agricultural data warehouse: A road map. International Journal of Information Science and Intelligent System, 3(1): 187-198.
- Brooks, C.E.P., Carruthers, N., 1953. Handbook of Statistical Methods in Meteorology. HM Stationery Office, London.
- Cebeci, Z., Yildiz, F., 2017a. Unsupervised discretization of continuous variables in a chicken egg quality traits dataset. Turkish Journal of Agricultural-Food Science and Technology, 5(4): 315-320.
- Cebeci, Z., Yildiz, F., 2017b. Comparison of Chi-square based algorithms for discretization of continuous chicken egg quality traits. Journal of Agricultural Informatics, 8(1): 13-22.
- Cencov, N.N., 1962. Estimation of an unknown distribution density from observations. Soviet Mathematics, 3: 1559-1562.
- Davies, O.L, Goldsmith, P.L., 1980. Statistical Methods in Research and Production. Longman, London.
Doane, D.P., 1976. Aesthetic frequency classification. American Statistician, 30(4): 181-183.
- Doran, J.E., Hodson, F.R., 1975. Mathematics and Computers in Archaeology. Massachusetts: Harvard University Press, Cambridge.
- Dougherty, J., Kohavi, R., Sahami, M., 1995. Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the Twelfth International Conference on Machine Learning, City, California, July 9-12, p. 194-202.
- Durmuş, İ., 2014. Effect of egg quality traits on hatching results. Akademik Ziraat Dergisi, 3(2): 95-99. (In Turkish).
- Elibol, O., 2009. Embryo development and hatching. In: M. Türkoğlu and M. Sarıca (Eds.), Poultry Science, Breeding, Nutrition, Diseases, Bey Ofset Matbaacılık, Ankara, Türkiye, pp. 151-188. (In Turkish).
- Freedman, D., Diaconis, P., 1981. On this histogram as a density estimator: L2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 57(4): 453-476.
- García, S., Luengo, J., Sáez, J.A., López, V., Herrera, F., 2013. Survey of discretization techniques, taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4): 734-750.
- Gül, E.N., Altuntaş, E., Demir, R., 2021. Determining the internal and external quality traits of eggs with different weights. Journal of Agricultural Machinery Science, 17(2): 55-63. (In Turkish).
- Hacibeyoglu, M., Ibrahim, M.H., 2018. EF unique: An improved version of unsupervised equal frequency discretization method. Arabian Journal for Science and Engineering, 43(12): 7695-7704.
- Han, J., Kamber, M., 2001. Data Mining Concept and Technology. China Machine Press: Beijing, China.
Hahsler, M., Chelluboina, S., 2011. Visualizing Association Rules: Introduction to the R- Extension Package arulesViz. (https://cran.csiro.au/web/ packages/arulesViz/vignettes/arulesViz.pdf), (Accessed Date: 25/05/2024).
- Hahsler, M., Buchta, C., Gruen, B., Hornik, K., 2016. Arules: Mining Association Rules and Frequent Itemsets. (https://CRAN.R-project.org/package= arules), (Accessed Date: 20.06.2024).
- Hahsler, M., Karpienko, R., 2017. Visualizing association rules in hierarchical groups. Journal of Business Economics, 87(3): 317-335.
- Houtsma, M., Swami, A., 1995. Set-oriented mining for association rules in relational databases. In: Proceedings of the 11th IEEE International Conference on Data Engineering, March 6-10, Taipei, Taiwan, pp. 25-34.
- Kotsiantis, S., Kanellopoulos, D., 2006. Discretization techniques: A recent survey. International Transactions on Computer Science and Engineering, 32(1): 47-58.
- Kuhn, M., Quinlan, R., 2023. C50: C5.0 Decision Trees and Rule-Based Models. R Package Version 0.1.8. (https://CRAN.R-project.org/package=C50), (Accessed Date: 20/06/2024).
- Kumbhare, T.A., Chobe, S.V., 2014. An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1): 927-930.
- Lane, D.M., Scott, D., Hebl, M., Guerra, R., Osherson, D., Zimmer, H., 2016. Online Statistics Education: A Multimedia Course of Study. (http://onlinestatbook. com/Online_Statistics_Education.pdf), (Accessed Date: 20/06/2024).
- Little, R., 1988. Missing-data Adjustments in large surveys. Journal of Business and Economic Statistics, 6(3): 287-296.
- Liu, H., Hussain, F., Tan, C.L., Dash, M., 2002. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4): 393-423.
- Mehta, A., Bura, D., 2020. Mining of association rules in R using Apriori algorithm. Advances in Communication and Computational Technology, 668: 181-188.
- Molnar, S., Szöllösi, L., 2020. Sustainability and quality aspects of different table egg production systems: A literature review. Sustainability, 12(19): 7884.
- Narushin, V.G., Romanov, M.N., 2002. Egg physical characteristics and hatchability. World’s Poultry Science Journal, 58(3): 297-303.
- Niu, L., Yang, C., Du, Y., Qin, L., Li, B., 2020. Cattle disease auxiliary diagnosis and treatment system based on data analysis and mining. In: 5th International Conference on Computer and Communication Systems, May 15-18, Shanghai, China, pp. 24-27.
- Nyambo, D.G., Luhanga, E.T., Yonah, Z.O., 2019. Characteristics of smallholder dairy farms by association rules mining based on Apriori algorithm. International Journal of Society Systems Science, 11(2): 99-118.
- Okon, B., Ibom, L.A., Dauda, A., Ebegbulem, V.N., 2020. Egg quality traits, phenotypic correlations, egg and yolk weights prediction using external and internal egg quality traits of Japanese quails reared in Calabar, Nigeria. International Journal of Molecular Biology, 5(1): 21-26.
- Ramírez-Gallego, S., García, S., Mouriño-Talín, H., Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A, Benítez, J.M., Herrera, F., 2015. Data discretization: taxonomy and big data challenge. WIREs Data Mining Knowledge Discovery, 6(1): 5-21.
- Pandya, R., Pandya, J., 2015. C5.0 Algorithm to improved decision tree with feature selection and reduced error pruning. International Journal of Computer Applications, 117(16): 18-21.
- Patel, H., Patel, D., 2014. A brief survey of data mining techniques applied to agricultural data. International Journal of Computer Applications, 95(9): 6-8.
- Patil, A.B., 2021. A Role of data mining technique in healthcare system of lactating animals. International Research of Humanities and Interdisciplinary Studies, August 27-29, Maharashtra, India, pp. 25-29.
- Pham, D.T., Dimov, S.S., Nguyen, C.D., 2005. Selection of K in K-means clustering. Journal of Mechanical Engineering Science, 219(1): 103-119.
- Putri, P.A.R., Prasetiyowati, S.S., Sibaroni, Y., 2023. The performance of Equal-Width and Equal-Frequency discretization methods on data features in classification process. Sinkron: Jurnal dan Penelitian Teknik Informatika, 8(4): 2082-2098.
- Qiao, L., Peng, C., Guo, X., Wang, Y., 2017. Price association analysis of agricultural products based on Apriori algorithm. Proceedings of Science, Information Science and Cloud Computing (ISCC 2017), December 16-17, Guangzhou, China, pp. 1-7.
- Raj, S., Ramesh, D., Sethi, K.K., 2021. A spark-based Apriori algorithm with reduced shuffle overhead. The Journal of Supercomputing, 77(1): 133-151.
- Savesere, A., Omiecinski, E., Navathe, S., 1995. An efficient algorithm for mining association rules in large databases. In: Proceedings of 20th International Conference on VLDB, September 10, San Francisco, United States, pp. 432-444.
- Scott, D.W., 1979. On optimal and data-based histograms. Biometrika, 66(3): 605-610.
- Shin, S., Yoo, S., Kim, H., Lee, T., 2015. Association analysis of technology convergence based on information system utilization. Journal of Computer Virology and Hacking Techniques, 11(3): 173-179.
- Sturges, H., 1926. The choice of a class-interval. Journal of the American Statistical Association, 21(153): 65-66.
- Terrell, G.R., Scott, D.W., 1985. Oversmoothed nonparametric density estimates. Journal of the American Statistical Association, 80(389): 209-214.
- Van Buuren, S., Groothuis-Oudshoorn, K., 2011. Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3): 1-67.
- Wang, H., Bah, M.J., Hammad, M., 2019. Progress in outlier detection techniques: A survey. IEEE Access, 7: 107964-108000.
- Wilson, H.R., 1991. Interrelationships of size, chick size, post hatching growth and hatchability. World’s Poultry Science Journal, 47: 5-20.
- Zaki, M.J., Hsiao, C.J., 2012. CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of the 12th SIAM International Conference on Data Mining, 26-28 April, Anaheim, USA, pp. 457-473.
Association Rules Analysis for Continuous Chicken Egg Traits Dataset
Year 2024,
, 296 - 304, 09.12.2024
Figen Ceritoğlu
,
Zeynel Cebeci
Abstract
This study aims to apply the Apriori association rule algorithm on 14 continuous egg quality traits recorded from 4320 eggs of three commercial white-laying chicken lines. In the study all the continuous data were discretized using Equal-Width-Interval method based the number of intervals obtained with Rice formula. Association rules analysis on the discretized dataset resulted with a total of 349 rules consists of 3 and 4 items. According to the top five rules by support and confidence, some important associations were obtained between the certain value ranges of the traits egg weight, egg width, egg length, shell thickness, and shell breaking strength when compared to the others. The appropriate biological and economic interpretations of the obtained rules may contribute to the poultry industry in practice.
References
- Agrawal, R., Imielinski, T., Swami, A., 1993. Mining association rules between sets of items in large databases. SIGMOD '93: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, May 26-28, Washington, USA, pp. 207-216.
- Agrawal, R., Srikant, R., 1994. Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference Very Large Data Bases, September 12-15, Santiago de Chile, Chile, pp. 487-499.
- Align, B.N., Malheiros, R.D., Anderson, K.E., 2023. Evaluation of physical egg quality parameters of commercial brown laying hens housed in five production systems. Animals, 13(4): 716.
- Anonymous, 2023. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, (https://www.R-project.org/>), (Accessed Date: 25/05/2024).
- Balhara, S., Singh, R.P., Ruhil, A.P., 2021. Data mining and decision support systems for efficient dairy production. Veterinary World, 14(5): 1258-1262.
- Bhatia, J., Gupta, A., 2014. Mining of quantitative association rules in agricultural data warehouse: A road map. International Journal of Information Science and Intelligent System, 3(1): 187-198.
- Brooks, C.E.P., Carruthers, N., 1953. Handbook of Statistical Methods in Meteorology. HM Stationery Office, London.
- Cebeci, Z., Yildiz, F., 2017a. Unsupervised discretization of continuous variables in a chicken egg quality traits dataset. Turkish Journal of Agricultural-Food Science and Technology, 5(4): 315-320.
- Cebeci, Z., Yildiz, F., 2017b. Comparison of Chi-square based algorithms for discretization of continuous chicken egg quality traits. Journal of Agricultural Informatics, 8(1): 13-22.
- Cencov, N.N., 1962. Estimation of an unknown distribution density from observations. Soviet Mathematics, 3: 1559-1562.
- Davies, O.L, Goldsmith, P.L., 1980. Statistical Methods in Research and Production. Longman, London.
Doane, D.P., 1976. Aesthetic frequency classification. American Statistician, 30(4): 181-183.
- Doran, J.E., Hodson, F.R., 1975. Mathematics and Computers in Archaeology. Massachusetts: Harvard University Press, Cambridge.
- Dougherty, J., Kohavi, R., Sahami, M., 1995. Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the Twelfth International Conference on Machine Learning, City, California, July 9-12, p. 194-202.
- Durmuş, İ., 2014. Effect of egg quality traits on hatching results. Akademik Ziraat Dergisi, 3(2): 95-99. (In Turkish).
- Elibol, O., 2009. Embryo development and hatching. In: M. Türkoğlu and M. Sarıca (Eds.), Poultry Science, Breeding, Nutrition, Diseases, Bey Ofset Matbaacılık, Ankara, Türkiye, pp. 151-188. (In Turkish).
- Freedman, D., Diaconis, P., 1981. On this histogram as a density estimator: L2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 57(4): 453-476.
- García, S., Luengo, J., Sáez, J.A., López, V., Herrera, F., 2013. Survey of discretization techniques, taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4): 734-750.
- Gül, E.N., Altuntaş, E., Demir, R., 2021. Determining the internal and external quality traits of eggs with different weights. Journal of Agricultural Machinery Science, 17(2): 55-63. (In Turkish).
- Hacibeyoglu, M., Ibrahim, M.H., 2018. EF unique: An improved version of unsupervised equal frequency discretization method. Arabian Journal for Science and Engineering, 43(12): 7695-7704.
- Han, J., Kamber, M., 2001. Data Mining Concept and Technology. China Machine Press: Beijing, China.
Hahsler, M., Chelluboina, S., 2011. Visualizing Association Rules: Introduction to the R- Extension Package arulesViz. (https://cran.csiro.au/web/ packages/arulesViz/vignettes/arulesViz.pdf), (Accessed Date: 25/05/2024).
- Hahsler, M., Buchta, C., Gruen, B., Hornik, K., 2016. Arules: Mining Association Rules and Frequent Itemsets. (https://CRAN.R-project.org/package= arules), (Accessed Date: 20.06.2024).
- Hahsler, M., Karpienko, R., 2017. Visualizing association rules in hierarchical groups. Journal of Business Economics, 87(3): 317-335.
- Houtsma, M., Swami, A., 1995. Set-oriented mining for association rules in relational databases. In: Proceedings of the 11th IEEE International Conference on Data Engineering, March 6-10, Taipei, Taiwan, pp. 25-34.
- Kotsiantis, S., Kanellopoulos, D., 2006. Discretization techniques: A recent survey. International Transactions on Computer Science and Engineering, 32(1): 47-58.
- Kuhn, M., Quinlan, R., 2023. C50: C5.0 Decision Trees and Rule-Based Models. R Package Version 0.1.8. (https://CRAN.R-project.org/package=C50), (Accessed Date: 20/06/2024).
- Kumbhare, T.A., Chobe, S.V., 2014. An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1): 927-930.
- Lane, D.M., Scott, D., Hebl, M., Guerra, R., Osherson, D., Zimmer, H., 2016. Online Statistics Education: A Multimedia Course of Study. (http://onlinestatbook. com/Online_Statistics_Education.pdf), (Accessed Date: 20/06/2024).
- Little, R., 1988. Missing-data Adjustments in large surveys. Journal of Business and Economic Statistics, 6(3): 287-296.
- Liu, H., Hussain, F., Tan, C.L., Dash, M., 2002. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4): 393-423.
- Mehta, A., Bura, D., 2020. Mining of association rules in R using Apriori algorithm. Advances in Communication and Computational Technology, 668: 181-188.
- Molnar, S., Szöllösi, L., 2020. Sustainability and quality aspects of different table egg production systems: A literature review. Sustainability, 12(19): 7884.
- Narushin, V.G., Romanov, M.N., 2002. Egg physical characteristics and hatchability. World’s Poultry Science Journal, 58(3): 297-303.
- Niu, L., Yang, C., Du, Y., Qin, L., Li, B., 2020. Cattle disease auxiliary diagnosis and treatment system based on data analysis and mining. In: 5th International Conference on Computer and Communication Systems, May 15-18, Shanghai, China, pp. 24-27.
- Nyambo, D.G., Luhanga, E.T., Yonah, Z.O., 2019. Characteristics of smallholder dairy farms by association rules mining based on Apriori algorithm. International Journal of Society Systems Science, 11(2): 99-118.
- Okon, B., Ibom, L.A., Dauda, A., Ebegbulem, V.N., 2020. Egg quality traits, phenotypic correlations, egg and yolk weights prediction using external and internal egg quality traits of Japanese quails reared in Calabar, Nigeria. International Journal of Molecular Biology, 5(1): 21-26.
- Ramírez-Gallego, S., García, S., Mouriño-Talín, H., Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A, Benítez, J.M., Herrera, F., 2015. Data discretization: taxonomy and big data challenge. WIREs Data Mining Knowledge Discovery, 6(1): 5-21.
- Pandya, R., Pandya, J., 2015. C5.0 Algorithm to improved decision tree with feature selection and reduced error pruning. International Journal of Computer Applications, 117(16): 18-21.
- Patel, H., Patel, D., 2014. A brief survey of data mining techniques applied to agricultural data. International Journal of Computer Applications, 95(9): 6-8.
- Patil, A.B., 2021. A Role of data mining technique in healthcare system of lactating animals. International Research of Humanities and Interdisciplinary Studies, August 27-29, Maharashtra, India, pp. 25-29.
- Pham, D.T., Dimov, S.S., Nguyen, C.D., 2005. Selection of K in K-means clustering. Journal of Mechanical Engineering Science, 219(1): 103-119.
- Putri, P.A.R., Prasetiyowati, S.S., Sibaroni, Y., 2023. The performance of Equal-Width and Equal-Frequency discretization methods on data features in classification process. Sinkron: Jurnal dan Penelitian Teknik Informatika, 8(4): 2082-2098.
- Qiao, L., Peng, C., Guo, X., Wang, Y., 2017. Price association analysis of agricultural products based on Apriori algorithm. Proceedings of Science, Information Science and Cloud Computing (ISCC 2017), December 16-17, Guangzhou, China, pp. 1-7.
- Raj, S., Ramesh, D., Sethi, K.K., 2021. A spark-based Apriori algorithm with reduced shuffle overhead. The Journal of Supercomputing, 77(1): 133-151.
- Savesere, A., Omiecinski, E., Navathe, S., 1995. An efficient algorithm for mining association rules in large databases. In: Proceedings of 20th International Conference on VLDB, September 10, San Francisco, United States, pp. 432-444.
- Scott, D.W., 1979. On optimal and data-based histograms. Biometrika, 66(3): 605-610.
- Shin, S., Yoo, S., Kim, H., Lee, T., 2015. Association analysis of technology convergence based on information system utilization. Journal of Computer Virology and Hacking Techniques, 11(3): 173-179.
- Sturges, H., 1926. The choice of a class-interval. Journal of the American Statistical Association, 21(153): 65-66.
- Terrell, G.R., Scott, D.W., 1985. Oversmoothed nonparametric density estimates. Journal of the American Statistical Association, 80(389): 209-214.
- Van Buuren, S., Groothuis-Oudshoorn, K., 2011. Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3): 1-67.
- Wang, H., Bah, M.J., Hammad, M., 2019. Progress in outlier detection techniques: A survey. IEEE Access, 7: 107964-108000.
- Wilson, H.R., 1991. Interrelationships of size, chick size, post hatching growth and hatchability. World’s Poultry Science Journal, 47: 5-20.
- Zaki, M.J., Hsiao, C.J., 2012. CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of the 12th SIAM International Conference on Data Mining, 26-28 April, Anaheim, USA, pp. 457-473.