TY - JOUR T1 - Improving Fish Weight Estimation with Quantile and Box-Cox Transforms: Comparative Machine Learning Models TT - Quantile ve Box-Cox Dönüşümleri ile Balık Ağırlığı Tahmininin İyileştirilmesi: Makine Öğrenimi Modellerinin Karşılaştırmalı Bir Çalışması AU - Karabey Aksakallı, Işıl AU - Esen, Hatice AU - Taşdelen, Havvanur AU - Kucuk, Sefa PY - 2025 DA - September Y2 - 2025 JF - Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi JO - DUJE PB - Dicle Üniversitesi WT - DergiPark SN - 1309-8640 SP - 581 EP - 597 VL - 16 IS - 3 LA - en AB - Fish weight estimation using machine learning ensures that fish are fed appropriately, reduces labor, prevents physical harm to the fish, and saves time. In this study, Quantile and Box-Cox transformations are applied to improve the accuracy of fish weight predictions. These transformations correct the asymmetric distribution of the data and enable machine learning algorithms to generalize more effectively and produce more accurate results. CatBoost, Random Forest, Polynomial Regression, and Support Vector Regression methods were evaluated for fish weight estimation both before and after applying the transformations. The experimental results show that both the Quantile and Box-Cox transformations effectively reduce model error rates, particularly by normalizing the dataset distribution. Notably, models without transformation exhibit significant improvements in error rates after transformation is applied. The lowest Mean Absolute Error (MAE) without transformation was obtained using the CatBoost model, yielding a value of 14.002. After applying the Quantile transformation, the MAE decreased to 0.0171, while the Box-Cox transformation resulted in an MAE of 0.3302. Although both transformations contribute to error reduction, the Quantile transformation has a more substantial impact on fish weight estimation. These findings underscore the importance of data transformations in the preprocessing stage and highlight that transformation techniques are as crucial as selecting the appropriate machine learning model. KW - Machine learning KW - Quantile transformation KW - Box-cox transformation KW - Fish weight estimation N2 - Balık ağırlığının makine öğrenimi (ML) ile tahmini balıkların ihtiyacı kadar yemlenmesini sağlarken iş gücünüazaltmakta, balıkların Zarar görmesini önlemekte ve zamandan da tasarruf sağlamaktadır. Bu çalışmada balıklarınağırlık tahmin doğruluğunu artırmak için, veri dağılımını iyileştiren Quantile (QT) ve Box-Cox (BCT) dönüşümleriuygulanmaktadır. Bu dönüşümler, verinin asimetrik dağılımını düzelterek ML algoritmalarının daha iyi genellemeyapmasını ve daha doğru tahminler üretmesini sağlamaktadır. Balık ağırlığı tahmini için CatBoost, Random Forest,Polynomial Regression ve Destek Vektör Regresyon (SVR) yöntemleri, dönüşüm öncesi ve sonrası olmak üzerekarşılaştırılmıştır. Deneysel sonuçlar hem QT hem de BCT’nin, özellikle veri kümesinin dağılımını daha normal birhale getirerek modellerin hata oranlarını düşürmede etkili olduğunu ve genel olarak hata oranlarını azalttığınıgöstermektedir. Özellikle dönüşümsüz modellerde, dönüşüm uygulandıktan sonra belirgin şekilde hata oranlarındaazalma elde edilmektedir. Dönüşüm uygulanmadan en iyi Ortalama Mutlak Hatası(MAE) değeri 14.0020 ile CatBoostyöntemi ile elde edilmektedir. QT uygulandığında MAE değeri 0.0171’e, BCT uygulandığında ise 0.3302’yedüşmektedir. Her iki dönüşüm de MAE değerini azaltırken, QT'nin balık ağırlığı tahmini üzerinde daha belirgin şekilde etkisi olduğu görülmektedir. Bu bulgular, dönüşümlerin veri ön işleme aşamasında önemli bir yere sahip olduğunu ve doğru makine öğrenimi modelini seçmenin yanı sıra, veri dönüşüm tekniklerinin de balık ağırlık tahmininde önemli olduğunu ortaya koymaktadır. CR - [1] S. B. Gutzmann, E. E. Hodgson, D. Braun, J. W. Moore, and R. A. Hovel, “Predicting fish weight using photographic image analysis: a case study of broad whitefish in the lower Mackenzie River watershed,” Arct. Sci., vol. 8, no. 4, pp. 1356–1361, Dec. 2022. Accessed on: Jun. 30, 2025. doi: 10.1139/AS-2021-0017. CR - [2] D. A. Konovalov, A. Saleh, D. B. Efremova, J. A. Domingos, and D. R. Jerry, “Automatic weight estimation of harvested fish from images,” in Proc. 2019 Digital Image Computing: Techniques and Applications (DICTA), Dec. 2019, doi: 10.1109/DICTA47822.2019.8945971. CR - [3] R. Islamadina, N. Pramita, F. Arnia, and K. Munadi, “Estimating fish weight based on visual captured,” in Proc. 2018 Int. Conf. Inf. Commun. Technol. (ICOIACT), vol. 2018-January, pp. 366–372, Apr. 2018, doi: 10.1109/ICOIACT.2018.8350762. CR - [4] S. Suwannakhun and P. Daungmala, “Estimating pig weight with digital image processing using deep learning,” in Proc. 14th Int. Conf. Signal Image Technol. Internet Based Syst. (SITIS), pp. 320–326, Jul. 2018, doi: 10.1109/SITIS.2018.00056. CR - [5] R. A. Peterson and J. E. Cavanaugh, “Ordered quantile normalization: a semiparametric transformation built for the cross-validation era,” J. Appl. Stat., vol. 47, no. 13–15, pp. 2312–2327, Nov. 2020, doi: 10.1080/02664763.2019.1630372. CR - [6] B. Peng, R. K. Yu, K. L. DeHoff, and C. I. Amos, “Normalizing a large number of quantitative traits using empirical normal quantile transformation,” BMC Proc., vol. 1, no. S1, pp. 1–5, Dec. 2007, doi: 10.1186/1753-6561-1-S1-S156. CR - [7] G. D. Rayner and H. L. MacGillivray, “Weighted quantile-based estimation for a class of transformation distributions,” Comput. Stat. Data Anal., vol. 39, no. 4, pp. 401–433, Jun. 2002, doi: 10.1016/S0167-9473(01)00090-1. CR - [8] T. Zhang and B. Yang, “Box–Cox transformation in big data,” Technometrics, vol. 59, no. 2, pp. 189–201, Apr. 2017, doi: 10.1080/00401706.2016.1156025. CR - [9] J. W. Osborne, “Improving your data transformations: Applying the Box–Cox transformation,” Pract. Assess. Res. Eval., vol. 15, no. 1, Jan. 2010, doi: 10.7275/QBPC-GK17. CR - [10] Fish Market. Accessed: Feb. 26, 2025. [Online]. Available:https://www.kaggle.com/datasets/vipullrathod/fish-market. CR - [11] Models and test, we have used. [Online]. Available: http://jse.amstat.org/datasets/fishcatch.txt. CR - [12] J. Hu and S. Szymczak, “A review on longitudinal data analysis with random forest,” Brief Bioinform., vol. 24, no. 2, pp. 1–11, Mar. 2023, doi: 10.1093/bib/bbad002. CR - [13] J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” J. Big Data, vol. 7, no. 1, pp. 1–45, Dec. 2020, doi: 10.1186/s40537-020-00369-8. CR - [14] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: Gradient boosting with categorical features support,” arXiv preprint, Oct. 2018. Accessed: Mar. 1, 2025. CR - [15] A. Parmar, R. Katariya, and V. Patel, “A review on random forest: An ensemble classifier,” in Lecture Notes on Data Engineering and Communications Technologies, vol. 26, pp. 758–763, 2019, doi: 10.1007/978-3-030-03146-6_86. [16] F. Kazemi, N. Asgarkhani, T. Shafighfard, R. Jankowski, and D. Y. Yoo, “Machine-learning methods for estimating performance of structural concrete members reinforced with fiber-reinforced polymers,” Arch. Comput. Methods Eng., vol. 32, no. 1, pp. 571–603, Jan. 2024, doi: 10.1007/s11831-024-10143-1. CR - [17] S. von Bülow, G. Tesei, and K. Lindorff-Larsen, “Machine learning methods to study sequence–ensemble–function relationships in disordered proteins,” Curr. Opin. Struct. Biol., vol. 92, p. 103028, Jun. 2025, doi: 10.1016/j.sbi.2025.103028. CR - [18] B. Liu, Y. Yu, Z. Liu, Z. Cui, and W. Tian, “Prediction of CO₂ solubility in aqueous amine solutions using machine learning method,” Sep. Purif. Technol., vol. 354, p. 129306, Feb. 2025, doi: 10.1016/j.seppur.2024.129306. CR - [19] P. D’Orazio and A. D. Pham, “Evaluating climate-related financial policies’ impact on decarbonization with machine learning methods,” Sci. Rep., vol. 15, no. 1, p. 1694, Dec. 2025, doi: 10.1038/s41598-025-85127-7. CR - [20] N. Tuerxun et al., “Accurate estimation of jujube leaf chlorophyll content using optimized spectral indices and machine learning methods integrating geospatial information,” Ecol. Inform., vol. 85, p. 102980, Mar. 2025, doi: 10.1016/j.ecoinf.2024.102980. CR - [21] K. Bogner, F. Pappenberger, and H. L. Cloke, “Technical note: The normal quantile transformation and its application in a flood forecasting system,” Hydrol. Earth Syst. Sci., vol. 16, no. 4, pp. 1085–1094, 2012, doi: 10.5194/hess-16-1085-2012. CR - [22] M. Buchinsky, “Quantile regression, Box–Cox transformation model, and the U.S. wage structure, 1963–1987,” J. Econom., vol. 65, no. 1, pp. 109–154, Jan. 1995, doi: 10.1016/0304-4076(94)01599-U. CR - [23] X. Xie, Y. Mei, B. Gu, and W. He, “Changing Box–Cox transformation parameter as an early warning signal for abrupt climate change,” Clim. Dyn., vol. 60, no. 11–12, pp. 4133–4143, Jun. 2023, doi: 10.1007/s00382-022-06563-z. CR - [24] J. Nagendra et al., “Evaluation of surface roughness of novel Al-based MMCs using Box–Cox transformation,” Int. J. Interact. Des. Manuf., vol. 18, no. 5, pp. 3369–3382, Jul. 2024, doi: 10.1007/s12008-023-01561-9. CR - [25] A. A. Al Abbasi, M. J. Alam, S. Saha, I. A. Begum, and M. F. Rola-Rubzen, “Impact of rural transformation on rural income and poverty for sustainable development in Bangladesh: A moments-quantile regression with fixed-effects models approach,” Sustain. Dev., vol. 33, no. 2, pp. 2951–2974, Apr. 2024, doi: 10.1002/sd.3276. CR - [26] Dhiman, H. S., Deb, D., & Guerrero, J. M. (2019). Hybrid machine intelligent SVR variants for wind forecasting and ramp events. Renewable and Sustainable Energy Reviews, 108, pp. 369-379. CR - [27] Terrault, N. A., & Hassanein, T. I. (2016). Management of the patient with SVR. Journal of hepatology, 65(1), pp. 120-129. CR - [28] Robeson, S. M., & Willmott, C. J. (2023). Decomposition of the mean absolute error (MAE) into systematic and unsystematic components. PloS one, 18(2), e0279774. CR - [29] Gao, J. (2024). R-Squared (R2)–How much variation is explained?. Research Methods in Medicine & Health Sciences, 5(4), 104-109. CR - [30] Tengtrairat, N., Woo, W. L., Parathai, P., Rinchumphu, D., & Chaichana, C. (2022). Non-intrusive fish weight estimation in turbid water using deep learning and regression models. Sensors, 22(14), 5161. CR - [31] Hamzaoui, M., Aoueileyine, M. O. E., Romdhani, L., & Bouallegue, R. (2023). Optimizing XGBoost performance for fish weight prediction through parameter pre-selection. Fishes, 8(10), 505. CR - [32] Mots' oehli, M., Nikolaev, A., IGede, W. B., Lynham, J., Mous, P. J., & Sadowski, P. (2024, July). Fishnet: Deep neural networks for low-cost fish stock estimation. In 2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS) (pp. 1-7). IEEE. CR - [33] Zhang, T., Yang, Y., Liu, Y., Liu, C., Zhao, R., Li, D., & Shi, C. (2024). Fully automatic system for fish biomass estimation based on deep neural network. Ecological Informatics, 79, 102399. CR - [34] Rani, S. J., Ioannou, I., Swetha, R., Lakshmi, R. D., & Vassiliou, V. (2024). A novel automated approach for fish biomass estimation in turbid environments through deep learning, object detection, and regression. Ecological Informatics, 81, 102663. UR - https://dergipark.org.tr/tr/pub/dumf/issue//1674123 L1 - https://dergipark.org.tr/tr/download/article-file/4764486 ER -