Performance analysis of set partitioning formulations on the rule extraction from random forests

Mert Edalı

Araştırma Makalesi

Rastgele ormanlardan kural çıkarmada küme bölüntüleme formülasyonlarının performans analizi

Yıl 2021, Cilt: 27 Sayı: 4, 513 - 519, 20.08.2021

Mert Edalı

Öz

Rastgele Ormanlar farklı alanlardaki sınıflandırma ve regresyon problemleri için sıklıkla kullanılan bir yapay öğrenme algoritmasıdır. Yüksek başarım göstermelerine rağmen, yapıtaşları olan karar ağaçlarına kıyasla yorumlanabilirlikleri oldukça düşüktür. Her bir üyesinin bir karar ağacı olduğu gerçeğinden yola çıkarak, Rastgele Ormanlardan yorumlanabilir eğer-ise tipinde kurallar çıkarmak için farklı küme bölüntüleme formülasyonları öneriyoruz. Literatürde sıklıkla kullanılan sınıflandırma ve regresyon veri setleri üzerinde yaptığımız deneylerin sonuçları göstermektedir ki orijinal küme bölüntüleme model formülasyonu, başarımı kabul edilebilir seviyelerde tutarak kural sayısını önemli ölçüde düşürebilmektedir. Çıkarılan kural sayısını daha da düşürebilmek için problemin amaç fonksiyonuna bir değişiklik öneriyoruz. Bu değişiklikle birlikte, çıkarılan kural sayısında daha da düşüş gözlemlerken başarımın aynı seviyelerde kaldığını gözlemliyoruz. Küme bölüntüleme problemi NP-zor olmasına rağmen, çoğu veri seti için yirmi dakika içinde en iyi çözümü buluyoruz.

Anahtar Kelimeler

Rastgele ormanlar, Kural çıkarma, Küme bölüntüleme, Sınıflandırma, Regresyon, Yorumlanabilirlik

Kaynakça

[1] Boulesteix AL, Janitza S, Kruppa J, König IR. “Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics”. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 493-507, 2012.
[2] Masetic Z, Subasi A. “Congestive heart failure detection using random forest classifier”. Computer Methods and Programs in Biomedicine, 130, 54-64, 2016.
[3] Jog A, Carass A, Roy S, Pham DL, Prince JL. “Random forest regression for magnetic resonance image synthesis”. Medical Image Analysis, 35, 475-488, 2017.
[4] Belgiu M, Drăguţ L. “Random forest in remote sensing: A review of applications and future directions”. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24-31, 2016.
[5] Baydogan MG, Runger G, Tuv E. “A bag-of-features framework to classify time series”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2796-2802, 2013.
[6] Breiman L. “Random forests”. Machine Learning, 45(1), 5-32, 2001.
[7] Mashayekhi M, Gras R. “Rule extraction from random forest: the RF + HC methods”. Canadian Conference on Artificial Intelligence, Halifax, NS, Canada, 2-5 June 2015.
[8] Mashayekhi M, Gras R. “Rule extraction from decision trees ensembles: new algorithms based on heuristic search and sparse group lasso methods”. International Journal of Information Technology & Decision Making, 16(6), 1707-1727, 2017.
[9] Liu S, Patel RY, Daga PR, Liu H, Fu G, Doerksen RJ, Chen Y, Wilkins DE. “Combined rule extraction and feature elimination in supervised classification”. IEEE Transactions on Nanobioscience, 11(3), 228-236, 2012.
[10] Adnan MN, Islam MZ. “Forex++: A new framework for knowledge discovery from decision forests”. Australasian Journal of Information Systems, 2017. https://doi.org/10.3127/ajis.v21i0.1539
[11] Phung LTK, Chau VTN, Phung NH. “Extracting rule RF in educational data classification: from a random forest to interpretable refined rules”. 2015 International Conference on Advanced Computing and Applications (ACOMP), Ho Chi Minh City, Vietnam, 23-25 November 2015.
[12] Meinshausen N. “Node harvest”. The Annals of Applied Statistics, 4(4), 2049-2072, 2010.
[13] Friedman JH, Popescu BE. “Predictive learning via rule ensembles”. The Annals of Applied Statistics, 2(3), 916-954, 2008.
[14] Deng H. “Interpreting tree ensembles with inTrees”. International Journal of Data Science and Analytics, 7(4), 277-287, 2019.
[15] Wang S, Wang Y, Wang D, Yin Y, Wang Y, Jin Y. “An improved random forest-based rule extraction method for breast cancer diagnosis”. Applied Soft Computing, 86, 105941, 1-18, 2020.
[16] Marsten RE, Shepardson F. “Exact solution of crew scheduling problems using the set partitioning model: Recent successful applications”. Networks, 11(2), 165-177, 1981.
[17] Baldacci R, Christofides N, Mingozzi A. “An exact algorithm for the vehicle routing problem based on the set partitioning formulation with additional cuts”. Mathematical Programming, 115(2), 351-385, 2008.
[18] Garfinkel RS, Nemhauser GL. “The set-partitioning problem: Set covering with equality constraints”. Operations Research, 17(5), 848-856, 1969.
[19] Dua D, Graff C. “UCI Machine Learning Repository”. http://archive.ics.uci.edu/ml (08.07.2020).
[20] Carnegie Mellon University. “StatLib-Datasets Archive”. http://lib.stat.cmu.edu/datasets/boston (08.07.2020).
[21] Woods KS, Doss CС, Bowyer KW, Solka JL, Priebe CE, Kegelmeyer Jr WP. “Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography”. International Journal of Pattern Recognition and Artificial Intelligence, 7(6), 1417-1436, 1993.
[22] Liaw A, Wiener M. “Classification and Regression by randomForest”. R News, 2(3), 18-22, 2002.
[23] R Foundation for Statistical Computing. “R: A language and environment for statistical computing”. https://www.R-project.org/ (08.07.2020).
[24] Gurobi Optimization LLC. “Gurobi Optimizer Reference Manual”. http://www.gurobi.com (08.07.2020).
[25] Lewis M, Kochenberger G, Alidaee B. “A new modeling and solution approach for the set-partitioning problem”. Computers & Operations Research, 35(3), 807-813, 2008.
[26] Rasmussen MS. Optimisation-Based Solution Methods for Set Partitioning Models. PhD Thesis, Technical University of Denmark, Kgs. Lyngby, Denmark, 2011.
[27] RuleExtractionfromRFs. “Example Scripts for the Manuscript”. https://github.com/mertedali/ RuleExtractionfromRFs (25.10.2020).

Performance analysis of set partitioning formulations on the rule extraction from random forests

Yıl 2021, Cilt: 27 Sayı: 4, 513 - 519, 20.08.2021

Mert Edalı

Öz

Random Forests is a widely used machine learning algorithm for classification and regression problems from different domains. Although they are generally accurate, their interpretability is low compared to their building blocks: single decision trees. Using the fact that each member of a Random Forest is a decision tree, we propose different set partitioning formulations to extract interpretable if-then rules from Random Forests. Our experiments on well-known classification and regression datasets show that the original set partitioning model formulation significantly reduces the number of rules while keeping the accuracy at acceptable levels. We also propose a modification to the problem's objective function, which aims to reduce the number of extracted rules further. We observe a further reduction in the number of extracted rules while the accuracy values stay nearly the same. Although the set partitioning problem is NP-hard, we obtain optimal results for most datasets within twenty minutes.

Anahtar Kelimeler

Random forests, Rule extraction, Set partitioning, Classification, Regression, Interpretability

Kaynakça

[1] Boulesteix AL, Janitza S, Kruppa J, König IR. “Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics”. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 493-507, 2012.
[2] Masetic Z, Subasi A. “Congestive heart failure detection using random forest classifier”. Computer Methods and Programs in Biomedicine, 130, 54-64, 2016.
[3] Jog A, Carass A, Roy S, Pham DL, Prince JL. “Random forest regression for magnetic resonance image synthesis”. Medical Image Analysis, 35, 475-488, 2017.
[4] Belgiu M, Drăguţ L. “Random forest in remote sensing: A review of applications and future directions”. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24-31, 2016.
[5] Baydogan MG, Runger G, Tuv E. “A bag-of-features framework to classify time series”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2796-2802, 2013.
[6] Breiman L. “Random forests”. Machine Learning, 45(1), 5-32, 2001.
[7] Mashayekhi M, Gras R. “Rule extraction from random forest: the RF + HC methods”. Canadian Conference on Artificial Intelligence, Halifax, NS, Canada, 2-5 June 2015.
[8] Mashayekhi M, Gras R. “Rule extraction from decision trees ensembles: new algorithms based on heuristic search and sparse group lasso methods”. International Journal of Information Technology & Decision Making, 16(6), 1707-1727, 2017.
[9] Liu S, Patel RY, Daga PR, Liu H, Fu G, Doerksen RJ, Chen Y, Wilkins DE. “Combined rule extraction and feature elimination in supervised classification”. IEEE Transactions on Nanobioscience, 11(3), 228-236, 2012.
[10] Adnan MN, Islam MZ. “Forex++: A new framework for knowledge discovery from decision forests”. Australasian Journal of Information Systems, 2017. https://doi.org/10.3127/ajis.v21i0.1539
[11] Phung LTK, Chau VTN, Phung NH. “Extracting rule RF in educational data classification: from a random forest to interpretable refined rules”. 2015 International Conference on Advanced Computing and Applications (ACOMP), Ho Chi Minh City, Vietnam, 23-25 November 2015.
[12] Meinshausen N. “Node harvest”. The Annals of Applied Statistics, 4(4), 2049-2072, 2010.
[13] Friedman JH, Popescu BE. “Predictive learning via rule ensembles”. The Annals of Applied Statistics, 2(3), 916-954, 2008.
[14] Deng H. “Interpreting tree ensembles with inTrees”. International Journal of Data Science and Analytics, 7(4), 277-287, 2019.
[15] Wang S, Wang Y, Wang D, Yin Y, Wang Y, Jin Y. “An improved random forest-based rule extraction method for breast cancer diagnosis”. Applied Soft Computing, 86, 105941, 1-18, 2020.
[16] Marsten RE, Shepardson F. “Exact solution of crew scheduling problems using the set partitioning model: Recent successful applications”. Networks, 11(2), 165-177, 1981.
[17] Baldacci R, Christofides N, Mingozzi A. “An exact algorithm for the vehicle routing problem based on the set partitioning formulation with additional cuts”. Mathematical Programming, 115(2), 351-385, 2008.
[18] Garfinkel RS, Nemhauser GL. “The set-partitioning problem: Set covering with equality constraints”. Operations Research, 17(5), 848-856, 1969.
[19] Dua D, Graff C. “UCI Machine Learning Repository”. http://archive.ics.uci.edu/ml (08.07.2020).
[20] Carnegie Mellon University. “StatLib-Datasets Archive”. http://lib.stat.cmu.edu/datasets/boston (08.07.2020).
[21] Woods KS, Doss CС, Bowyer KW, Solka JL, Priebe CE, Kegelmeyer Jr WP. “Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography”. International Journal of Pattern Recognition and Artificial Intelligence, 7(6), 1417-1436, 1993.
[22] Liaw A, Wiener M. “Classification and Regression by randomForest”. R News, 2(3), 18-22, 2002.
[23] R Foundation for Statistical Computing. “R: A language and environment for statistical computing”. https://www.R-project.org/ (08.07.2020).
[24] Gurobi Optimization LLC. “Gurobi Optimizer Reference Manual”. http://www.gurobi.com (08.07.2020).
[25] Lewis M, Kochenberger G, Alidaee B. “A new modeling and solution approach for the set-partitioning problem”. Computers & Operations Research, 35(3), 807-813, 2008.
[26] Rasmussen MS. Optimisation-Based Solution Methods for Set Partitioning Models. PhD Thesis, Technical University of Denmark, Kgs. Lyngby, Denmark, 2011.
[27] RuleExtractionfromRFs. “Example Scripts for the Manuscript”. https://github.com/mertedali/ RuleExtractionfromRFs (25.10.2020).

Toplam 27 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Makale
Yazarlar	Mert Edalı Bu kişi benim
Yayımlanma Tarihi	20 Ağustos 2021
Yayımlandığı Sayı	Yıl 2021 Cilt: 27 Sayı: 4

Kaynak Göster

APA	Edalı, M. (2021). Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 27(4), 513-519.
AMA	Edalı M. Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. Ağustos 2021;27(4):513-519.
Chicago	Edalı, Mert. “Performance Analysis of Set Partitioning Formulations on the Rule Extraction from Random Forests”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 27, sy. 4 (Ağustos 2021): 513-19.
EndNote	Edalı M (01 Ağustos 2021) Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 27 4 513–519.
IEEE	M. Edalı, “Performance analysis of set partitioning formulations on the rule extraction from random forests”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 27, sy. 4, ss. 513–519, 2021.
ISNAD	Edalı, Mert. “Performance Analysis of Set Partitioning Formulations on the Rule Extraction from Random Forests”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 27/4 (Ağustos 2021), 513-519.
JAMA	Edalı M. Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2021;27:513–519.
MLA	Edalı, Mert. “Performance Analysis of Set Partitioning Formulations on the Rule Extraction from Random Forests”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 27, sy. 4, 2021, ss. 513-9.
Vancouver	Edalı M. Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2021;27(4):513-9.

Makale Dosyaları

Tam Metin

Bu dergi Creative Commons Al 4.0 Uluslararası Lisansı ile lisanslanmıştır.