Improving Fish Weight Estimation with Quantile and Box-Cox Transforms: Comparative Machine Learning Models

Hatice Esen; Havvanur Taşdelen; Sefa Kucuk; Işıl Karabey Aksakallı

Research Article

Improving Fish Weight Estimation with Quantile and Box-Cox Transforms: Comparative Machine Learning Models

Year 2025, Volume: 16 Issue: 3, 581 - 597

Hatice Esen , Havvanur Taşdelen , Sefa Kucuk , Işıl Karabey Aksakallı

Abstract

Fish weight estimation using machine learning ensures that fish are fed appropriately, reduces labor, prevents physical harm to the fish, and saves time. In this study, Quantile and Box-Cox transformations are applied to improve the accuracy of fish weight predictions. These transformations correct the asymmetric distribution of the data and enable machine learning algorithms to generalize more effectively and produce more accurate results. CatBoost, Random Forest, Polynomial Regression, and Support Vector Regression methods were evaluated for fish weight estimation both before and after applying the transformations. The experimental results show that both the Quantile and Box-Cox transformations effectively reduce model error rates, particularly by normalizing the dataset distribution. Notably, models without transformation exhibit significant improvements in error rates after transformation is applied. The lowest Mean Absolute Error (MAE) without transformation was obtained using the CatBoost model, yielding a value of 14.002. After applying the Quantile transformation, the MAE decreased to 0.0171, while the Box-Cox transformation resulted in an MAE of 0.3302. Although both transformations contribute to error reduction, the Quantile transformation has a more substantial impact on fish weight estimation. These findings underscore the importance of data transformations in the preprocessing stage and highlight that transformation techniques are as crucial as selecting the appropriate machine learning model.

Keywords

Machine learning , Quantile transformation , Box-cox transformation , Fish weight estimation

References

[1] S. B. Gutzmann, E. E. Hodgson, D. Braun, J. W. Moore, and R. A. Hovel, “Predicting fish weight using photographic image analysis: a case study of broad whitefish in the lower Mackenzie River watershed,” Arct. Sci., vol. 8, no. 4, pp. 1356–1361, Dec. 2022. Accessed on: Jun. 30, 2025. doi: 10.1139/AS-2021-0017.
[2] D. A. Konovalov, A. Saleh, D. B. Efremova, J. A. Domingos, and D. R. Jerry, “Automatic weight estimation of harvested fish from images,” in Proc. 2019 Digital Image Computing: Techniques and Applications (DICTA), Dec. 2019, doi: 10.1109/DICTA47822.2019.8945971.
[3] R. Islamadina, N. Pramita, F. Arnia, and K. Munadi, “Estimating fish weight based on visual captured,” in Proc. 2018 Int. Conf. Inf. Commun. Technol. (ICOIACT), vol. 2018-January, pp. 366–372, Apr. 2018, doi: 10.1109/ICOIACT.2018.8350762.
[4] S. Suwannakhun and P. Daungmala, “Estimating pig weight with digital image processing using deep learning,” in Proc. 14th Int. Conf. Signal Image Technol. Internet Based Syst. (SITIS), pp. 320–326, Jul. 2018, doi: 10.1109/SITIS.2018.00056.
[5] R. A. Peterson and J. E. Cavanaugh, “Ordered quantile normalization: a semiparametric transformation built for the cross-validation era,” J. Appl. Stat., vol. 47, no. 13–15, pp. 2312–2327, Nov. 2020, doi: 10.1080/02664763.2019.1630372.
[6] B. Peng, R. K. Yu, K. L. DeHoff, and C. I. Amos, “Normalizing a large number of quantitative traits using empirical normal quantile transformation,” BMC Proc., vol. 1, no. S1, pp. 1–5, Dec. 2007, doi: 10.1186/1753-6561-1-S1-S156.
[7] G. D. Rayner and H. L. MacGillivray, “Weighted quantile-based estimation for a class of transformation distributions,” Comput. Stat. Data Anal., vol. 39, no. 4, pp. 401–433, Jun. 2002, doi: 10.1016/S0167-9473(01)00090-1.
[8] T. Zhang and B. Yang, “Box–Cox transformation in big data,” Technometrics, vol. 59, no. 2, pp. 189–201, Apr. 2017, doi: 10.1080/00401706.2016.1156025.
[9] J. W. Osborne, “Improving your data transformations: Applying the Box–Cox transformation,” Pract. Assess. Res. Eval., vol. 15, no. 1, Jan. 2010, doi: 10.7275/QBPC-GK17.
[10] Fish Market. Accessed: Feb. 26, 2025. [Online]. Available:https://www.kaggle.com/datasets/vipullrathod/fish-market.
[11] Models and test, we have used. [Online]. Available: http://jse.amstat.org/datasets/fishcatch.txt.
[12] J. Hu and S. Szymczak, “A review on longitudinal data analysis with random forest,” Brief Bioinform., vol. 24, no. 2, pp. 1–11, Mar. 2023, doi: 10.1093/bib/bbad002.
[13] J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” J. Big Data, vol. 7, no. 1, pp. 1–45, Dec. 2020, doi: 10.1186/s40537-020-00369-8.
[14] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: Gradient boosting with categorical features support,” arXiv preprint, Oct. 2018. Accessed: Mar. 1, 2025.
[15] A. Parmar, R. Katariya, and V. Patel, “A review on random forest: An ensemble classifier,” in Lecture Notes on Data Engineering and Communications Technologies, vol. 26, pp. 758–763, 2019, doi: 10.1007/978-3-030-03146-6_86. [16] F. Kazemi, N. Asgarkhani, T. Shafighfard, R. Jankowski, and D. Y. Yoo, “Machine-learning methods for estimating performance of structural concrete members reinforced with fiber-reinforced polymers,” Arch. Comput. Methods Eng., vol. 32, no. 1, pp. 571–603, Jan. 2024, doi: 10.1007/s11831-024-10143-1.
[17] S. von Bülow, G. Tesei, and K. Lindorff-Larsen, “Machine learning methods to study sequence–ensemble–function relationships in disordered proteins,” Curr. Opin. Struct. Biol., vol. 92, p. 103028, Jun. 2025, doi: 10.1016/j.sbi.2025.103028.
[18] B. Liu, Y. Yu, Z. Liu, Z. Cui, and W. Tian, “Prediction of CO₂ solubility in aqueous amine solutions using machine learning method,” Sep. Purif. Technol., vol. 354, p. 129306, Feb. 2025, doi: 10.1016/j.seppur.2024.129306.
[19] P. D’Orazio and A. D. Pham, “Evaluating climate-related financial policies’ impact on decarbonization with machine learning methods,” Sci. Rep., vol. 15, no. 1, p. 1694, Dec. 2025, doi: 10.1038/s41598-025-85127-7.
[20] N. Tuerxun et al., “Accurate estimation of jujube leaf chlorophyll content using optimized spectral indices and machine learning methods integrating geospatial information,” Ecol. Inform., vol. 85, p. 102980, Mar. 2025, doi: 10.1016/j.ecoinf.2024.102980.
[21] K. Bogner, F. Pappenberger, and H. L. Cloke, “Technical note: The normal quantile transformation and its application in a flood forecasting system,” Hydrol. Earth Syst. Sci., vol. 16, no. 4, pp. 1085–1094, 2012, doi: 10.5194/hess-16-1085-2012.
[22] M. Buchinsky, “Quantile regression, Box–Cox transformation model, and the U.S. wage structure, 1963–1987,” J. Econom., vol. 65, no. 1, pp. 109–154, Jan. 1995, doi: 10.1016/0304-4076(94)01599-U.
[23] X. Xie, Y. Mei, B. Gu, and W. He, “Changing Box–Cox transformation parameter as an early warning signal for abrupt climate change,” Clim. Dyn., vol. 60, no. 11–12, pp. 4133–4143, Jun. 2023, doi: 10.1007/s00382-022-06563-z.
[24] J. Nagendra et al., “Evaluation of surface roughness of novel Al-based MMCs using Box–Cox transformation,” Int. J. Interact. Des. Manuf., vol. 18, no. 5, pp. 3369–3382, Jul. 2024, doi: 10.1007/s12008-023-01561-9.
[25] A. A. Al Abbasi, M. J. Alam, S. Saha, I. A. Begum, and M. F. Rola-Rubzen, “Impact of rural transformation on rural income and poverty for sustainable development in Bangladesh: A moments-quantile regression with fixed-effects models approach,” Sustain. Dev., vol. 33, no. 2, pp. 2951–2974, Apr. 2024, doi: 10.1002/sd.3276.
[26] Dhiman, H. S., Deb, D., & Guerrero, J. M. (2019). Hybrid machine intelligent SVR variants for wind forecasting and ramp events. Renewable and Sustainable Energy Reviews, 108, pp. 369-379.
[27] Terrault, N. A., & Hassanein, T. I. (2016). Management of the patient with SVR. Journal of hepatology, 65(1), pp. 120-129.
[28] Robeson, S. M., & Willmott, C. J. (2023). Decomposition of the mean absolute error (MAE) into systematic and unsystematic components. PloS one, 18(2), e0279774.
[29] Gao, J. (2024). R-Squared (R2)–How much variation is explained?. Research Methods in Medicine & Health Sciences, 5(4), 104-109.
[30] Tengtrairat, N., Woo, W. L., Parathai, P., Rinchumphu, D., & Chaichana, C. (2022). Non-intrusive fish weight estimation in turbid water using deep learning and regression models. Sensors, 22(14), 5161.
[31] Hamzaoui, M., Aoueileyine, M. O. E., Romdhani, L., & Bouallegue, R. (2023). Optimizing XGBoost performance for fish weight prediction through parameter pre-selection. Fishes, 8(10), 505.
[32] Mots' oehli, M., Nikolaev, A., IGede, W. B., Lynham, J., Mous, P. J., & Sadowski, P. (2024, July). Fishnet: Deep neural networks for low-cost fish stock estimation. In 2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS) (pp. 1-7). IEEE.
[33] Zhang, T., Yang, Y., Liu, Y., Liu, C., Zhao, R., Li, D., & Shi, C. (2024). Fully automatic system for fish biomass estimation based on deep neural network. Ecological Informatics, 79, 102399.
[34] Rani, S. J., Ioannou, I., Swetha, R., Lakshmi, R. D., & Vassiliou, V. (2024). A novel automated approach for fish biomass estimation in turbid environments through deep learning, object detection, and regression. Ecological Informatics, 81, 102663.

Quantile ve Box-Cox Dönüşümleri ile Balık Ağırlığı Tahmininin İyileştirilmesi: Makine Öğrenimi Modellerinin Karşılaştırmalı Bir Çalışması

Year 2025, Volume: 16 Issue: 3, 581 - 597

Hatice Esen , Havvanur Taşdelen , Sefa Kucuk , Işıl Karabey Aksakallı

Abstract

Balık ağırlığının makine öğrenimi (ML) ile tahmini balıkların ihtiyacı kadar yemlenmesini sağlarken iş gücünü
azaltmakta, balıkların Zarar görmesini önlemekte ve zamandan da tasarruf sağlamaktadır. Bu çalışmada balıkların
ağırlık tahmin doğruluğunu artırmak için, veri dağılımını iyileştiren Quantile (QT) ve Box-Cox (BCT) dönüşümleri
uygulanmaktadır. Bu dönüşümler, verinin asimetrik dağılımını düzelterek ML algoritmalarının daha iyi genelleme
yapmasını ve daha doğru tahminler üretmesini sağlamaktadır. Balık ağırlığı tahmini için CatBoost, Random Forest,
Polynomial Regression ve Destek Vektör Regresyon (SVR) yöntemleri, dönüşüm öncesi ve sonrası olmak üzere
karşılaştırılmıştır. Deneysel sonuçlar hem QT hem de BCT’nin, özellikle veri kümesinin dağılımını daha normal bir
hale getirerek modellerin hata oranlarını düşürmede etkili olduğunu ve genel olarak hata oranlarını azalttığını
göstermektedir. Özellikle dönüşümsüz modellerde, dönüşüm uygulandıktan sonra belirgin şekilde hata oranlarında
azalma elde edilmektedir. Dönüşüm uygulanmadan en iyi Ortalama Mutlak Hatası (MAE) değeri 14.0020 ile CatBoost
yöntemi ile elde edilmektedir. QT uygulandığında MAE değeri 0.0171’e, BCT uygulandığında ise 0.3302’ye
düşmektedir. Her iki dönüşüm de MAE değerini azaltırken, QT'nin balık ağırlığı tahmini üzerinde daha belirgin şekilde etkisi olduğu görülmektedir. Bu bulgular, dönüşümlerin veri ön işleme aşamasında önemli bir yere sahip olduğunu ve doğru makine öğrenimi modelini seçmenin yanı sıra, veri dönüşüm tekniklerinin de balık ağırlık tahmininde önemli olduğunu ortaya koymaktadır.

References

[1] S. B. Gutzmann, E. E. Hodgson, D. Braun, J. W. Moore, and R. A. Hovel, “Predicting fish weight using photographic image analysis: a case study of broad whitefish in the lower Mackenzie River watershed,” Arct. Sci., vol. 8, no. 4, pp. 1356–1361, Dec. 2022. Accessed on: Jun. 30, 2025. doi: 10.1139/AS-2021-0017.
[2] D. A. Konovalov, A. Saleh, D. B. Efremova, J. A. Domingos, and D. R. Jerry, “Automatic weight estimation of harvested fish from images,” in Proc. 2019 Digital Image Computing: Techniques and Applications (DICTA), Dec. 2019, doi: 10.1109/DICTA47822.2019.8945971.
[3] R. Islamadina, N. Pramita, F. Arnia, and K. Munadi, “Estimating fish weight based on visual captured,” in Proc. 2018 Int. Conf. Inf. Commun. Technol. (ICOIACT), vol. 2018-January, pp. 366–372, Apr. 2018, doi: 10.1109/ICOIACT.2018.8350762.
[4] S. Suwannakhun and P. Daungmala, “Estimating pig weight with digital image processing using deep learning,” in Proc. 14th Int. Conf. Signal Image Technol. Internet Based Syst. (SITIS), pp. 320–326, Jul. 2018, doi: 10.1109/SITIS.2018.00056.
[5] R. A. Peterson and J. E. Cavanaugh, “Ordered quantile normalization: a semiparametric transformation built for the cross-validation era,” J. Appl. Stat., vol. 47, no. 13–15, pp. 2312–2327, Nov. 2020, doi: 10.1080/02664763.2019.1630372.
[6] B. Peng, R. K. Yu, K. L. DeHoff, and C. I. Amos, “Normalizing a large number of quantitative traits using empirical normal quantile transformation,” BMC Proc., vol. 1, no. S1, pp. 1–5, Dec. 2007, doi: 10.1186/1753-6561-1-S1-S156.
[7] G. D. Rayner and H. L. MacGillivray, “Weighted quantile-based estimation for a class of transformation distributions,” Comput. Stat. Data Anal., vol. 39, no. 4, pp. 401–433, Jun. 2002, doi: 10.1016/S0167-9473(01)00090-1.
[8] T. Zhang and B. Yang, “Box–Cox transformation in big data,” Technometrics, vol. 59, no. 2, pp. 189–201, Apr. 2017, doi: 10.1080/00401706.2016.1156025.
[9] J. W. Osborne, “Improving your data transformations: Applying the Box–Cox transformation,” Pract. Assess. Res. Eval., vol. 15, no. 1, Jan. 2010, doi: 10.7275/QBPC-GK17.
[10] Fish Market. Accessed: Feb. 26, 2025. [Online]. Available:https://www.kaggle.com/datasets/vipullrathod/fish-market.
[11] Models and test, we have used. [Online]. Available: http://jse.amstat.org/datasets/fishcatch.txt.
[12] J. Hu and S. Szymczak, “A review on longitudinal data analysis with random forest,” Brief Bioinform., vol. 24, no. 2, pp. 1–11, Mar. 2023, doi: 10.1093/bib/bbad002.
[13] J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” J. Big Data, vol. 7, no. 1, pp. 1–45, Dec. 2020, doi: 10.1186/s40537-020-00369-8.
[14] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: Gradient boosting with categorical features support,” arXiv preprint, Oct. 2018. Accessed: Mar. 1, 2025.
[15] A. Parmar, R. Katariya, and V. Patel, “A review on random forest: An ensemble classifier,” in Lecture Notes on Data Engineering and Communications Technologies, vol. 26, pp. 758–763, 2019, doi: 10.1007/978-3-030-03146-6_86. [16] F. Kazemi, N. Asgarkhani, T. Shafighfard, R. Jankowski, and D. Y. Yoo, “Machine-learning methods for estimating performance of structural concrete members reinforced with fiber-reinforced polymers,” Arch. Comput. Methods Eng., vol. 32, no. 1, pp. 571–603, Jan. 2024, doi: 10.1007/s11831-024-10143-1.
[17] S. von Bülow, G. Tesei, and K. Lindorff-Larsen, “Machine learning methods to study sequence–ensemble–function relationships in disordered proteins,” Curr. Opin. Struct. Biol., vol. 92, p. 103028, Jun. 2025, doi: 10.1016/j.sbi.2025.103028.
[18] B. Liu, Y. Yu, Z. Liu, Z. Cui, and W. Tian, “Prediction of CO₂ solubility in aqueous amine solutions using machine learning method,” Sep. Purif. Technol., vol. 354, p. 129306, Feb. 2025, doi: 10.1016/j.seppur.2024.129306.
[19] P. D’Orazio and A. D. Pham, “Evaluating climate-related financial policies’ impact on decarbonization with machine learning methods,” Sci. Rep., vol. 15, no. 1, p. 1694, Dec. 2025, doi: 10.1038/s41598-025-85127-7.
[20] N. Tuerxun et al., “Accurate estimation of jujube leaf chlorophyll content using optimized spectral indices and machine learning methods integrating geospatial information,” Ecol. Inform., vol. 85, p. 102980, Mar. 2025, doi: 10.1016/j.ecoinf.2024.102980.
[21] K. Bogner, F. Pappenberger, and H. L. Cloke, “Technical note: The normal quantile transformation and its application in a flood forecasting system,” Hydrol. Earth Syst. Sci., vol. 16, no. 4, pp. 1085–1094, 2012, doi: 10.5194/hess-16-1085-2012.
[22] M. Buchinsky, “Quantile regression, Box–Cox transformation model, and the U.S. wage structure, 1963–1987,” J. Econom., vol. 65, no. 1, pp. 109–154, Jan. 1995, doi: 10.1016/0304-4076(94)01599-U.
[23] X. Xie, Y. Mei, B. Gu, and W. He, “Changing Box–Cox transformation parameter as an early warning signal for abrupt climate change,” Clim. Dyn., vol. 60, no. 11–12, pp. 4133–4143, Jun. 2023, doi: 10.1007/s00382-022-06563-z.
[24] J. Nagendra et al., “Evaluation of surface roughness of novel Al-based MMCs using Box–Cox transformation,” Int. J. Interact. Des. Manuf., vol. 18, no. 5, pp. 3369–3382, Jul. 2024, doi: 10.1007/s12008-023-01561-9.
[25] A. A. Al Abbasi, M. J. Alam, S. Saha, I. A. Begum, and M. F. Rola-Rubzen, “Impact of rural transformation on rural income and poverty for sustainable development in Bangladesh: A moments-quantile regression with fixed-effects models approach,” Sustain. Dev., vol. 33, no. 2, pp. 2951–2974, Apr. 2024, doi: 10.1002/sd.3276.
[26] Dhiman, H. S., Deb, D., & Guerrero, J. M. (2019). Hybrid machine intelligent SVR variants for wind forecasting and ramp events. Renewable and Sustainable Energy Reviews, 108, pp. 369-379.
[27] Terrault, N. A., & Hassanein, T. I. (2016). Management of the patient with SVR. Journal of hepatology, 65(1), pp. 120-129.
[28] Robeson, S. M., & Willmott, C. J. (2023). Decomposition of the mean absolute error (MAE) into systematic and unsystematic components. PloS one, 18(2), e0279774.
[29] Gao, J. (2024). R-Squared (R2)–How much variation is explained?. Research Methods in Medicine & Health Sciences, 5(4), 104-109.
[30] Tengtrairat, N., Woo, W. L., Parathai, P., Rinchumphu, D., & Chaichana, C. (2022). Non-intrusive fish weight estimation in turbid water using deep learning and regression models. Sensors, 22(14), 5161.
[31] Hamzaoui, M., Aoueileyine, M. O. E., Romdhani, L., & Bouallegue, R. (2023). Optimizing XGBoost performance for fish weight prediction through parameter pre-selection. Fishes, 8(10), 505.
[32] Mots' oehli, M., Nikolaev, A., IGede, W. B., Lynham, J., Mous, P. J., & Sadowski, P. (2024, July). Fishnet: Deep neural networks for low-cost fish stock estimation. In 2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS) (pp. 1-7). IEEE.
[33] Zhang, T., Yang, Y., Liu, Y., Liu, C., Zhao, R., Li, D., & Shi, C. (2024). Fully automatic system for fish biomass estimation based on deep neural network. Ecological Informatics, 79, 102399.
[34] Rani, S. J., Ioannou, I., Swetha, R., Lakshmi, R. D., & Vassiliou, V. (2024). A novel automated approach for fish biomass estimation in turbid environments through deep learning, object detection, and regression. Ecological Informatics, 81, 102663.

There are 33 citations in total.

Details

Primary Language	English
Subjects	Image Processing, Machine Learning (Other)
Journal Section	Articles
Authors	Hatice Esen 0009-0008-9805-829X Havvanur Taşdelen 0009-0000-8222-2093 Sefa Kucuk 0000-0002-0279-3185 Işıl Karabey Aksakallı 0000-0002-4156-9098
Early Pub Date	September 30, 2025
Publication Date	October 4, 2025
Submission Date	April 11, 2025
Acceptance Date	August 26, 2025
Published in Issue	Year 2025 Volume: 16 Issue: 3

Cite

IEEE	H. Esen, H. Taşdelen, S. Kucuk, and I. Karabey Aksakallı, “Improving Fish Weight Estimation with Quantile and Box-Cox Transforms: Comparative Machine Learning Models”, DUJE, vol. 16, no. 3, pp. 581–597, 2025.

Download Cover Image

Article Files

Full Text