Research Article
BibTex RIS Cite

Sıra, Büyüklük, Ağırlık ve Eksik Veri Dayanıklılığına Dayalı Çok Parametreli Çapraz Doğrulama Yöntemi: Yeni Bir Çapraz Doğrulama Modeli MP-OSW-CV

Year 2025, EARLY VIEW, 1 - 1
https://doi.org/10.2339/politeknik.1672706

Abstract

Test verisi verilmediği sürece, bir sınıflandırıcının performansı çapraz doğrulama yöntemleri yardımıyla hesaplanır. Mevcut modeller, katman sayısı, dengeli sınıf dağılımı, test veri boyutu gibi farklı parametrelere odaklanmaktadır. Ancak bu modeller, yanlılık-varyans dengesini aşamamaktadır. Katman sayısı ve test boyutunun seçimi genellikle rastgeledir. Bu çalışmada, örnek sırası, test boyutu, ağırlık ve eksik veri dayanıklılığına dayalı çok parametreli çapraz doğrulama yöntemi (MP-OSW-CV) önerilmektedir. Bu yöntem dört parametreden oluşmaktadır: sıra, boyut, ağırlık ve eksik veri. İlk olarak, veri kümesini veri indislerine göre farklı parçalara ayırır ve tüm veri kümesinden rastgele örnekler seçmek yerine, her parçadan eşit sayıda rastgele örnek seçer. İkinci olarak, test katman boyutu değiştirilir. Farklı test boyutlarıyla elde edilen doğruluk sonuçları, genel performansa ya eşit ağırlıklarla ya da ters orantılı olarak hesaplanan iki farklı ağırlıklandırma yöntemiyle yansıtılır. Son olarak, eksik veri dayanıklılığı analiz edilecekse, test kümesi oluşturulduktan sonra eğitim kümesi boyutu son parametre ile belirlenir. Önerilen yöntem, UCI ML Repository’den alınan bazı veri kümeleri ile geleneksel yöntemlerle karşılaştırılmıştır. Bulgularımız, MP-OSW-CV’nin referans hatasına kıyasla en küçük hata sapmalarını sağladığını göstermektedir. MP-OSW-CV, veri kümesini daha iyi temsil eden bölmeler üreterek modelin daha güvenilir bir şekilde değerlendirilmesine olanak tanımıştır. Bu durum, güvenilir bir model değerlendirmesi için sıra ve test boyutunun bütüncül olarak ele alınmasının önemini vurgulamaktadır.

References

  • [1] Khan S.S., Karg M.E., Kulić D., and Hoey J., “Detecting falls with X-Factor Hidden Markov Models”, Applied Soft Computing, 55:168–177, (2017).
  • [2] Wang Z., Yang D., Cai M., and Liu H., “Resistant effect of HLA-DRB1∗09 on hepatitis C virus infection identified by a new cross-validation method”, Clinical Laboratory, 62:1367–1370, (2016).
  • [3] Agyapong D., Propster J.R., Marks J., and Hocking, T.D., “Cross-validation for training and testing co-occurrence network inference algorithms”, BMC Bioinformatics, 26:74, (2025).
  • [4] Ablain M., Lalau N., Meyssignac B., Fraudeau R., Barnoud A., Dibarboure G., Egido A., and Donlon C., “Benefits of a second tandem flight phase between two successive satellite altimetry missions for assessing instrumental stability”, Ocean Science, 21:343–358, (2025).
  • [5] Kaneko H. and Funatsu K., “Automatic determination method based on cross-validation for optimal intervals of time difference”, Journal of Chemical Engineering of Japan, 46:219–225, (2013).
  • [6] Tamaddoni-Nezhad A., Milani G.A., Raybould A., Muggleton S., and Bohan D.A., “Construction and validation of food webs using logic-based machine learning and text mining”, Advances in Ecological Research, 49:225–289, (2013).
  • [7] Yaghini M., Khoshraftar M.M., and Fallahi M., “A hybrid algorithm for artificial neural network training”, Engineering Applications of Artificial Intelligence, 26(1):293–301, (2013).
  • [8] Zhao Q., Li W., Li C., Chu P.W., Kornak J., Lang T.F., Fang J., and Lu Y., “A statistical method (cross-validation) for bone loss region detection after spaceflight”, Australasian Physical & Engineering Sciences in Medicine, 33(2):163–169, (2010).
  • [9] Li X., Su H., and Chu J., “Multiple models soft-sensing technique based on online clustering arithmetic”, Journal of Chemical Industry and Engineering (China), 58(11):2834–2839, (2007).
  • [10] Zhang H. and Zou, G., “Cross-validation model averaging for generalized functional linear model”, Econometrics, 8(1):7, (2020).
  • [11] Wang Y., Khodadadzadeh M., and Zurita-Milla R., “Spatial+: A new cross-validation method to evaluate geospatial machine learning models”, International Journal of Applied Earth Observation and Geoinformation, 121:103364, (2023).
  • [12] Yi L. and Moon J., “Accurately disentangling core and winding losses in experimental, in-situ magnetic loss measurement for power electronic circuits and applications”, IEEE Journal of Emerging and Selected Topics in Power Electronics, 12(5):5568–5578, (2024).
  • [13] Yang Z. and Zhou J., “Novel cross-validity criteria and statistical index in non-Gaussian space”, 2022 7th International Conference on Control, Robotics and Cybernetics (CRC), 53–58, (2022).
  • [14] Youndjé É., “Bandwidth choice for the smooth Kaplan–Meier estimator when the censoring variable can be discontinuous”, South African Statistical Journal, 56(1):53–67, (2022).
  • [15] Mayne G.C., Woods C.M., Dharmawardana N., Wang T., Krishnan S., Hodge J.C., Foreman A., Boase S., Carney A.S., Sigston E.A.W., Watson D.I., Ooi E.H., and Hussey D.J., “Cross-validated serum small extracellular vesicle microRNAs for the detection of oropharyngeal squamous cell carcinoma”, Journal of Translational Medicine, 18:280, (2020).
  • [16] Zhang Y., Zhao T., Zhang H., Xia C., and Cao G., “Study of mechanomyography-based wrist movement classification with repeatedly wearing a signal acquisition armband”, 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), 771–775, (2022).
  • [17] Moran B.M. and Anderson E.C., “Bayesian inference from the conditional genetic stock identification model”, Canadian Journal of Fisheries and Aquatic Sciences, 76(4):551–560, (2019).
  • [18] Li G., Huang J.Z., and Shen H., “Exponential family functional data analysis via a low-rank model”, Biometrics, 74(4):1301–1310, (2018).
  • [19] Piette E.R. and Moore J.H., “Improving machine learning reproducibility in genetic association studies with proportional instance cross-validation (PICV)”, BioData Mining, 11:6, (2018).
  • [20] Kilpatrick A.J., Ćwiek A., and Kawahara S., “Random forests, sound symbolism and Pokémon evolution”, PLOS ONE, 18(1):e0279350, (2023).

Multi-Parametrized Cross Validation Method based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV

Year 2025, EARLY VIEW, 1 - 1
https://doi.org/10.2339/politeknik.1672706

Abstract

Unless test data is given, the performance of a classifier is calculated with the help of cross validation methods. The existing models focus on different parameters like number of folds, balanced class distribution, test data size etc. However, these models can not overcome bias-variance tradeoff. In this paper, we propose the multi-parametrized cross validation method based on order of an instance, test fold size, weight, and missing data robustness (MP-OSW-CV). This method is composed of four parameters: order, size, weight, and missing data. Firstly, it divides dataset into different parts concerning data indexes and chooses randomly equal number of samples from each part instead of selecting random samples from the whole dataset. Secondly, the test fold size is varied. The accuracy results generated by different test sizes is reflected to the overall performance either with same weights or two different types of inversely proportionally calculated weights. Finally, train size is determined by the last parameter after creating the test fold if missing data robustness is to be analyzed. The proposed method is compared to conventional methods with some datasets from UCI ML Repository. MP-OSW-CV generated more representative data splits, leading to more dependable model assessments.

References

  • [1] Khan S.S., Karg M.E., Kulić D., and Hoey J., “Detecting falls with X-Factor Hidden Markov Models”, Applied Soft Computing, 55:168–177, (2017).
  • [2] Wang Z., Yang D., Cai M., and Liu H., “Resistant effect of HLA-DRB1∗09 on hepatitis C virus infection identified by a new cross-validation method”, Clinical Laboratory, 62:1367–1370, (2016).
  • [3] Agyapong D., Propster J.R., Marks J., and Hocking, T.D., “Cross-validation for training and testing co-occurrence network inference algorithms”, BMC Bioinformatics, 26:74, (2025).
  • [4] Ablain M., Lalau N., Meyssignac B., Fraudeau R., Barnoud A., Dibarboure G., Egido A., and Donlon C., “Benefits of a second tandem flight phase between two successive satellite altimetry missions for assessing instrumental stability”, Ocean Science, 21:343–358, (2025).
  • [5] Kaneko H. and Funatsu K., “Automatic determination method based on cross-validation for optimal intervals of time difference”, Journal of Chemical Engineering of Japan, 46:219–225, (2013).
  • [6] Tamaddoni-Nezhad A., Milani G.A., Raybould A., Muggleton S., and Bohan D.A., “Construction and validation of food webs using logic-based machine learning and text mining”, Advances in Ecological Research, 49:225–289, (2013).
  • [7] Yaghini M., Khoshraftar M.M., and Fallahi M., “A hybrid algorithm for artificial neural network training”, Engineering Applications of Artificial Intelligence, 26(1):293–301, (2013).
  • [8] Zhao Q., Li W., Li C., Chu P.W., Kornak J., Lang T.F., Fang J., and Lu Y., “A statistical method (cross-validation) for bone loss region detection after spaceflight”, Australasian Physical & Engineering Sciences in Medicine, 33(2):163–169, (2010).
  • [9] Li X., Su H., and Chu J., “Multiple models soft-sensing technique based on online clustering arithmetic”, Journal of Chemical Industry and Engineering (China), 58(11):2834–2839, (2007).
  • [10] Zhang H. and Zou, G., “Cross-validation model averaging for generalized functional linear model”, Econometrics, 8(1):7, (2020).
  • [11] Wang Y., Khodadadzadeh M., and Zurita-Milla R., “Spatial+: A new cross-validation method to evaluate geospatial machine learning models”, International Journal of Applied Earth Observation and Geoinformation, 121:103364, (2023).
  • [12] Yi L. and Moon J., “Accurately disentangling core and winding losses in experimental, in-situ magnetic loss measurement for power electronic circuits and applications”, IEEE Journal of Emerging and Selected Topics in Power Electronics, 12(5):5568–5578, (2024).
  • [13] Yang Z. and Zhou J., “Novel cross-validity criteria and statistical index in non-Gaussian space”, 2022 7th International Conference on Control, Robotics and Cybernetics (CRC), 53–58, (2022).
  • [14] Youndjé É., “Bandwidth choice for the smooth Kaplan–Meier estimator when the censoring variable can be discontinuous”, South African Statistical Journal, 56(1):53–67, (2022).
  • [15] Mayne G.C., Woods C.M., Dharmawardana N., Wang T., Krishnan S., Hodge J.C., Foreman A., Boase S., Carney A.S., Sigston E.A.W., Watson D.I., Ooi E.H., and Hussey D.J., “Cross-validated serum small extracellular vesicle microRNAs for the detection of oropharyngeal squamous cell carcinoma”, Journal of Translational Medicine, 18:280, (2020).
  • [16] Zhang Y., Zhao T., Zhang H., Xia C., and Cao G., “Study of mechanomyography-based wrist movement classification with repeatedly wearing a signal acquisition armband”, 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), 771–775, (2022).
  • [17] Moran B.M. and Anderson E.C., “Bayesian inference from the conditional genetic stock identification model”, Canadian Journal of Fisheries and Aquatic Sciences, 76(4):551–560, (2019).
  • [18] Li G., Huang J.Z., and Shen H., “Exponential family functional data analysis via a low-rank model”, Biometrics, 74(4):1301–1310, (2018).
  • [19] Piette E.R. and Moore J.H., “Improving machine learning reproducibility in genetic association studies with proportional instance cross-validation (PICV)”, BioData Mining, 11:6, (2018).
  • [20] Kilpatrick A.J., Ćwiek A., and Kawahara S., “Random forests, sound symbolism and Pokémon evolution”, PLOS ONE, 18(1):e0279350, (2023).
There are 20 citations in total.

Details

Primary Language English
Subjects Machine Learning (Other)
Journal Section Research Article
Authors

Alican Doğan 0000-0002-0553-2888

Early Pub Date September 23, 2025
Publication Date October 14, 2025
Submission Date April 9, 2025
Acceptance Date July 29, 2025
Published in Issue Year 2025 EARLY VIEW

Cite

APA Doğan, A. (2025). Multi-Parametrized Cross Validation Method based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV. Politeknik Dergisi1-1. https://doi.org/10.2339/politeknik.1672706
AMA Doğan A. Multi-Parametrized Cross Validation Method based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV. Politeknik Dergisi. Published online September 1, 2025:1-1. doi:10.2339/politeknik.1672706
Chicago Doğan, Alican. “Multi-Parametrized Cross Validation Method Based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV”. Politeknik Dergisi, September (September 2025), 1-1. https://doi.org/10.2339/politeknik.1672706.
EndNote Doğan A (September 1, 2025) Multi-Parametrized Cross Validation Method based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV. Politeknik Dergisi 1–1.
IEEE A. Doğan, “Multi-Parametrized Cross Validation Method based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV”, Politeknik Dergisi, pp. 1–1, September2025, doi: 10.2339/politeknik.1672706.
ISNAD Doğan, Alican. “Multi-Parametrized Cross Validation Method Based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV”. Politeknik Dergisi. September2025. 1-1. https://doi.org/10.2339/politeknik.1672706.
JAMA Doğan A. Multi-Parametrized Cross Validation Method based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV. Politeknik Dergisi. 2025;:1–1.
MLA Doğan, Alican. “Multi-Parametrized Cross Validation Method Based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV”. Politeknik Dergisi, 2025, pp. 1-1, doi:10.2339/politeknik.1672706.
Vancouver Doğan A. Multi-Parametrized Cross Validation Method based on Order, Size, Weight, and Missing Data Robustness: A New Cross Validation Model MP-OSW-CV. Politeknik Dergisi. 2025:1-.