Makine Öğrenmesinde Rastgele Oran ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi

Sena Aslan; Tuğba Yıldız

doi:10.21205/deufmd.2022247004

Research Article

Makine Öğrenmesinde Rastgele Oran ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi

Year 2022, Volume: 24 Issue: 70, 29 - 36, 17.01.2022

Sena Aslan , Tuğba Yıldız

https://doi.org/10.21205/deufmd.2022247004

Cited By: 3

Abstract

Makine öğrenmesi en basit tanımıyla, insana ait özellik ve davranışları bilgisayara öğretmektir. Makine öğrenmesi algoritmaları kendilerine verilen örnek olayları inceleyerek öğrenir ve bu örnek olaylar üzerinden genelleme yapma yeteneği kazanır. Modele öğretilmek istenilenlerin öğretileceği kısım eğitim seti, ne kadar iyi öğrendiğinin test edildiği kısım ise test seti olarak adlandırılır. Makine öğrenmesi literatüründe var olan çalışmalarda, veri seti bölme işlemi kullanıcının istediği rastgele bir oranda gerçekleşmektedir. Bu çalışmada, Kaliforniya Üniversitesi’nin lisansüstü öğrenci kabul kriterleri göz önünde bulundurularak, Hindistan’daki öğrenciler için oluşturulan yüksek lisans başvuru verileri, rastgele oran yöntemi ve sıralı küme örneklemesi (SKÖ) ile bölünmüş, elde edilen eğitim setleri kullanılarak doğrusal regresyon modelleri oluşturulmuştur. Daha sonra, test setleri kullanılarak modellerin hata kareler ortalamalarının karekökleri (HKOK) üzerinden, veri seti bölme yöntemlerinin performans karşılaştırması yapılmıştır. SKÖ yöntemi ile, temel bileşenler, kısmi en küçük kareler ve ridge regresyon modelleri için tek bir durum dışında, rastgele oran yöntemine göre daha düşük hata değerlerine ulaşılmıştır. Elastic net regresyon modeli hariç, diğer doğrusal regresyon modellerinde, SKÖ yöntemi ile, rastgele oran yönteminden daha iyi sonuçlar elde edilmiştir.

Keywords

Makine Öğrenmesi , Sıralı Küme Örneklemesi , Doğrusal Regresyon Modelleri , HKOK

References

Samuel, A.L. 1959. Some Studies in Machine Learning Using the Game of Checkers, IBM Journal of Research and Development, Cilt. 3, s. 211 . DOI: 10.1147/rd.33.0210
McIntryre, G.A. 1952. A Method of Unbiased Selective Sampling Using Ranked Sets, Australian Journal of Agriculture Research, Cilt. 3, s. 385. DOI: 10.1071/AR9520385
Halls, L.S., Dell, T.R. 1966. Trial of Ranked Set Sampling for Forage Yields, Forest Science, Vol. 12, s. 24. DOI: 10.1093/forestscience/12.1.22
Evans, M.J. 1967. Application of Ranked Set Sampling to Regeneration Surveys in Areas Direct-Seeded to Longleaf Pine, Louisiana State University, School of Foresty and Wildlife Management, Masters Thesis , Baton Rouge.
Takahasi K., Wakimoto K. 1968. On Unbiased Estimates of the Population Mean Based on the Sample Stratified by Means of Ordering, Annals of the Institude of Istatistical Mathematics, Cilt. 21, s. 250. DOI: 10.1007/BF02911622
Dell D.R., Clutter J.L. 1972. Ranked Set Sampling Theory with Order Statistics Background, Biometrics, Cilt. 28, s. 550. DOI: 10.2307/2556166
Martin, W.L., Sharik, T.L., Oderwald R.G., Smith D.W. 1980. Evaluation of Ranked Set Sampling for Estimating Shrub Phytomass in Appalachian Oak Forests, Publication Number FWS-4-80, School of Forestry and Wildlife Resources, Virginia Polytechnic Institute and State University, Blacksburg.
Stokes S.L. 1980. Estimation of Variance Using Judgement Ordered Ranked Set Samples, Biometrics, Cilt. 36, s. 36. DOI: 10.2307/2530493
Ridout M.S., Cobby J.M. 1987. Ranked Set Sampling with Non-random Selection of Sets and Errors in Ranking, Applied Statistics, Cilt. 36, s. 146. DOI: 10.2307/2347546
Stokes S.L., Sager T.W. 1988. Characterization of a Ranked Set Sample with Application to Estimating Distribution Functions, Journal of the American Statistical Association, Cilt. 83, s. 376-377. DOI: 10.2307/2288852
Patil G.P., Sinha A.K., Taillie C. 1994. Ranked Set Sampling, A Handbook of Statistics, Cilt. 12, s. 180. DOI: 10.1016/S0169-7161(05)80007-0
Patil G.P., Sinha A.K., Taillie C. 1997. Ranked Set Sampling, Coherent Rankings and Size-Biased Permutations, Journal of Statistical Planning and Inference, Cilt. 63, s. 311-324. DOI: 10.1016/S0378-3758(97)00030-X
Montgomery, D.C., Peck, E.A., Vining G.G. 2013. Linear Regression Analysis. 5th. John Wiley & Sons. 687s.
Bursa, N. 2019. Bağımsız Bileşenler Analizi ile Çoklu Bağlantı Sorununa Bir Yaklaşım. Hacettepe Üniversitesi, Fen Bilimleri Enstitüsü, Doktora Tezi, 124s, Ankara.
Wold, H. 1985. Partial Least Squares, Encyclopedia of Statistical Sciences, Cilt. 6, s. 581-591. DOI: 10.1002/0471667196.ess1914
Bulut, Y. M. 2011. Çoklu İç İlişki Durumunda Kısmi En Küçük Kareler Regresyonu ve Alternatif Yöntemlerle Karşılaştırılması. Eskişehir Osmangazi Üniversitesi, Fen Bilimleri Enstitüsü, Yüksek Lisans Tezi, 92s, Eskişehir.
Hoerl, A. E., Kennard, R. W. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, Cilt. 12, s. 55-67. DOI: 10.2307/1271436
Gupta, P. 2017. Regularization in Machine Learning. http://www.towardsdatascience.com/regularization-in-machine-learning-76441ddcf (Erişim Tarihi: 03.04.2021)
Tibshirani, R. 1996. Regression Shrinkage and Selection Via the Lasso, Journal of the Royal Statistical Society: Series B (statistical methodology), Cilt. 58, s. 267-288. DOI: 10.1111/j.2517-6161.1996.tb02080.x
Frank, D. E., Friedman J. H. 1993. A Statistical View of Some Chemometrics Regression Tools, Technometrics, Cilt. 35, s. 109-135. DOI: 10.2307/1269656
Zou, H., Hastie T. 2005. Regularization and Variable Selecetion Via the Elastic Net, Journal of the Royal Statistical Society: Series B (statistical methodology), Cilt. 67, s. 301-320. DOI: 10.1111/j.1467-9868.2005.00503.x
Bengio, Y., Grandvalet, Y. 2004. No Unbiased Estimator of the Variance of K-fold Cross-Validation, Journal of Machine Learning Research, Cilt. 5, s. 1089-1105.
Chai, T., Draxler, R. R. 2014. Root Mean Square Error (RMSE) or Mean Absolute Error (MAE), Geoscientific Model Development Discussions, Cilt. 7, s. 1525-1534. DOI: 10.5194/gmd-7-1247-2014

Effect of Random Rate and Ranked Set Sampling Methods on Linear Regression Models in Machine Learning

Year 2022, Volume: 24 Issue: 70, 29 - 36, 17.01.2022

Sena Aslan , Tuğba Yıldız

https://doi.org/10.21205/deufmd.2022247004

Cited By: 3

Abstract

The simplest definition of machine learning is to teach human characteristics and behaviors to the computer. Machine learning algorithms learn by examining case studies given to them and gain the ability to generalize through these events. The part where the model will be taught what is wanted is called the training set, and the part where it is tested how well it is learned is called the test set. In the studies that exist in the machine learning literature, data set splitting occurs at a random rate that the user wants. In this study, considering the graduate student admission criteria of the University of California, graduate application data created for students in India were divided by both random and ranked set sampling (RSS), linear regression models were created using the obtained training sets. Then, using the test sets, the performance comparison of the data set splitting methods was made based on the root mean square error (RMSE) of the models. With the RSS method, lower error values were obtained for principal components, partial least squares and ridge regression models compared to the random rate method except for a single case. In other linear regression models except elastic net regression model, better results were obtained with the RSS method compared to the random rate method.

Keywords

Machine Learning , Ranked Set Sampling , Linear Regression Models , RMSE

References

Samuel, A.L. 1959. Some Studies in Machine Learning Using the Game of Checkers, IBM Journal of Research and Development, Cilt. 3, s. 211 . DOI: 10.1147/rd.33.0210
McIntryre, G.A. 1952. A Method of Unbiased Selective Sampling Using Ranked Sets, Australian Journal of Agriculture Research, Cilt. 3, s. 385. DOI: 10.1071/AR9520385
Halls, L.S., Dell, T.R. 1966. Trial of Ranked Set Sampling for Forage Yields, Forest Science, Vol. 12, s. 24. DOI: 10.1093/forestscience/12.1.22
Evans, M.J. 1967. Application of Ranked Set Sampling to Regeneration Surveys in Areas Direct-Seeded to Longleaf Pine, Louisiana State University, School of Foresty and Wildlife Management, Masters Thesis , Baton Rouge.
Takahasi K., Wakimoto K. 1968. On Unbiased Estimates of the Population Mean Based on the Sample Stratified by Means of Ordering, Annals of the Institude of Istatistical Mathematics, Cilt. 21, s. 250. DOI: 10.1007/BF02911622
Dell D.R., Clutter J.L. 1972. Ranked Set Sampling Theory with Order Statistics Background, Biometrics, Cilt. 28, s. 550. DOI: 10.2307/2556166
Martin, W.L., Sharik, T.L., Oderwald R.G., Smith D.W. 1980. Evaluation of Ranked Set Sampling for Estimating Shrub Phytomass in Appalachian Oak Forests, Publication Number FWS-4-80, School of Forestry and Wildlife Resources, Virginia Polytechnic Institute and State University, Blacksburg.
Stokes S.L. 1980. Estimation of Variance Using Judgement Ordered Ranked Set Samples, Biometrics, Cilt. 36, s. 36. DOI: 10.2307/2530493
Ridout M.S., Cobby J.M. 1987. Ranked Set Sampling with Non-random Selection of Sets and Errors in Ranking, Applied Statistics, Cilt. 36, s. 146. DOI: 10.2307/2347546
Stokes S.L., Sager T.W. 1988. Characterization of a Ranked Set Sample with Application to Estimating Distribution Functions, Journal of the American Statistical Association, Cilt. 83, s. 376-377. DOI: 10.2307/2288852
Patil G.P., Sinha A.K., Taillie C. 1994. Ranked Set Sampling, A Handbook of Statistics, Cilt. 12, s. 180. DOI: 10.1016/S0169-7161(05)80007-0
Patil G.P., Sinha A.K., Taillie C. 1997. Ranked Set Sampling, Coherent Rankings and Size-Biased Permutations, Journal of Statistical Planning and Inference, Cilt. 63, s. 311-324. DOI: 10.1016/S0378-3758(97)00030-X
Montgomery, D.C., Peck, E.A., Vining G.G. 2013. Linear Regression Analysis. 5th. John Wiley & Sons. 687s.
Bursa, N. 2019. Bağımsız Bileşenler Analizi ile Çoklu Bağlantı Sorununa Bir Yaklaşım. Hacettepe Üniversitesi, Fen Bilimleri Enstitüsü, Doktora Tezi, 124s, Ankara.
Wold, H. 1985. Partial Least Squares, Encyclopedia of Statistical Sciences, Cilt. 6, s. 581-591. DOI: 10.1002/0471667196.ess1914
Bulut, Y. M. 2011. Çoklu İç İlişki Durumunda Kısmi En Küçük Kareler Regresyonu ve Alternatif Yöntemlerle Karşılaştırılması. Eskişehir Osmangazi Üniversitesi, Fen Bilimleri Enstitüsü, Yüksek Lisans Tezi, 92s, Eskişehir.
Hoerl, A. E., Kennard, R. W. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics, Cilt. 12, s. 55-67. DOI: 10.2307/1271436
Gupta, P. 2017. Regularization in Machine Learning. http://www.towardsdatascience.com/regularization-in-machine-learning-76441ddcf (Erişim Tarihi: 03.04.2021)
Tibshirani, R. 1996. Regression Shrinkage and Selection Via the Lasso, Journal of the Royal Statistical Society: Series B (statistical methodology), Cilt. 58, s. 267-288. DOI: 10.1111/j.2517-6161.1996.tb02080.x
Frank, D. E., Friedman J. H. 1993. A Statistical View of Some Chemometrics Regression Tools, Technometrics, Cilt. 35, s. 109-135. DOI: 10.2307/1269656
Zou, H., Hastie T. 2005. Regularization and Variable Selecetion Via the Elastic Net, Journal of the Royal Statistical Society: Series B (statistical methodology), Cilt. 67, s. 301-320. DOI: 10.1111/j.1467-9868.2005.00503.x
Bengio, Y., Grandvalet, Y. 2004. No Unbiased Estimator of the Variance of K-fold Cross-Validation, Journal of Machine Learning Research, Cilt. 5, s. 1089-1105.
Chai, T., Draxler, R. R. 2014. Root Mean Square Error (RMSE) or Mean Absolute Error (MAE), Geoscientific Model Development Discussions, Cilt. 7, s. 1525-1534. DOI: 10.5194/gmd-7-1247-2014

There are 23 citations in total.

Details

Primary Language	Turkish
Journal Section	Research Article
Authors	Sena Aslan 0000-0001-5312-3993 Tuğba Yıldız 0000-0002-8552-2806
Publication Date	January 17, 2022
Published in Issue	Year 2022 Volume: 24 Issue: 70

Cite

APA	Aslan, S., & Yıldız, T. (2022). Makine Öğrenmesinde Rastgele Oran ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, 24(70), 29-36. https://doi.org/10.21205/deufmd.2022247004
AMA	Aslan S, Yıldız T. Makine Öğrenmesinde Rastgele Oran ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi. DEUFMD. January 2022;24(70):29-36. doi:10.21205/deufmd.2022247004
Chicago	Aslan, Sena, and Tuğba Yıldız. “Makine Öğrenmesinde Rastgele Oran Ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi 24, no. 70 (January 2022): 29-36. https://doi.org/10.21205/deufmd.2022247004.
EndNote	Aslan S, Yıldız T (January 1, 2022) Makine Öğrenmesinde Rastgele Oran ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 24 70 29–36.
IEEE	S. Aslan and T. Yıldız, “Makine Öğrenmesinde Rastgele Oran ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi”, DEUFMD, vol. 24, no. 70, pp. 29–36, 2022, doi: 10.21205/deufmd.2022247004.
ISNAD	Aslan, Sena - Yıldız, Tuğba. “Makine Öğrenmesinde Rastgele Oran Ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi 24/70 (January2022), 29-36. https://doi.org/10.21205/deufmd.2022247004.
JAMA	Aslan S, Yıldız T. Makine Öğrenmesinde Rastgele Oran ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi. DEUFMD. 2022;24:29–36.
MLA	Aslan, Sena and Tuğba Yıldız. “Makine Öğrenmesinde Rastgele Oran Ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi”. Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen Ve Mühendislik Dergisi, vol. 24, no. 70, 2022, pp. 29-36, doi:10.21205/deufmd.2022247004.
Vancouver	Aslan S, Yıldız T. Makine Öğrenmesinde Rastgele Oran ve Sıralı Küme Örneklemesi Yöntemlerinin Doğrusal Regresyon Modellerine Etkisi. DEUFMD. 2022;24(70):29-36.

Cited By

Farklı regresyon modelleriyle kestirilen zenit troposferik gecikmelerin değerlendirilmesi

Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi

https://doi.org/10.28948/ngumuh.1088375

Determination of Solar Chimney Inlet Temperature by Regression Methods

Journal of Testing and Evaluation

https://doi.org/10.1520/JTE20220594

YAPAY ZEKÂ UYGULAMASI İLE GÜNEŞ PANELİ SİSTEMİ ENERJİ ÜRETİMİ TAHMİNİ

Yalvaç Akademi Dergisi

https://doi.org/10.57120/yalvac.1543369

Article Files

Full Text