Research Article

Performance analysis of set partitioning formulations on the rule extraction from random forests

Volume: 27 Number: 4 August 20, 2021
  • Mert Edalı
TR EN

Performance analysis of set partitioning formulations on the rule extraction from random forests

Abstract

Random Forests is a widely used machine learning algorithm for classification and regression problems from different domains. Although they are generally accurate, their interpretability is low compared to their building blocks: single decision trees. Using the fact that each member of a Random Forest is a decision tree, we propose different set partitioning formulations to extract interpretable if-then rules from Random Forests. Our experiments on well-known classification and regression datasets show that the original set partitioning model formulation significantly reduces the number of rules while keeping the accuracy at acceptable levels. We also propose a modification to the problem's objective function, which aims to reduce the number of extracted rules further. We observe a further reduction in the number of extracted rules while the accuracy values stay nearly the same. Although the set partitioning problem is NP-hard, we obtain optimal results for most datasets within twenty minutes.

Keywords

References

  1. [1] Boulesteix AL, Janitza S, Kruppa J, König IR. “Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics”. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6), 493-507, 2012.
  2. [2] Masetic Z, Subasi A. “Congestive heart failure detection using random forest classifier”. Computer Methods and Programs in Biomedicine, 130, 54-64, 2016.
  3. [3] Jog A, Carass A, Roy S, Pham DL, Prince JL. “Random forest regression for magnetic resonance image synthesis”. Medical Image Analysis, 35, 475-488, 2017.
  4. [4] Belgiu M, Drăguţ L. “Random forest in remote sensing: A review of applications and future directions”. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24-31, 2016.
  5. [5] Baydogan MG, Runger G, Tuv E. “A bag-of-features framework to classify time series”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2796-2802, 2013.
  6. [6] Breiman L. “Random forests”. Machine Learning, 45(1), 5-32, 2001.
  7. [7] Mashayekhi M, Gras R. “Rule extraction from random forest: the RF + HC methods”. Canadian Conference on Artificial Intelligence, Halifax, NS, Canada, 2-5 June 2015.
  8. [8] Mashayekhi M, Gras R. “Rule extraction from decision trees ensembles: new algorithms based on heuristic search and sparse group lasso methods”. International Journal of Information Technology & Decision Making, 16(6), 1707-1727, 2017.

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Authors

Mert Edalı This is me
Türkiye

Publication Date

August 20, 2021

Submission Date

July 1, 2020

Acceptance Date

-

Published in Issue

Year 2021 Volume: 27 Number: 4

APA
Edalı, M. (2021). Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 27(4), 513-519. https://izlik.org/JA22WJ24FL
AMA
1.Edalı M. Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2021;27(4):513-519. https://izlik.org/JA22WJ24FL
Chicago
Edalı, Mert. 2021. “Performance Analysis of Set Partitioning Formulations on the Rule Extraction from Random Forests”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 27 (4): 513-19. https://izlik.org/JA22WJ24FL.
EndNote
Edalı M (August 1, 2021) Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 27 4 513–519.
IEEE
[1]M. Edalı, “Performance analysis of set partitioning formulations on the rule extraction from random forests”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 27, no. 4, pp. 513–519, Aug. 2021, [Online]. Available: https://izlik.org/JA22WJ24FL
ISNAD
Edalı, Mert. “Performance Analysis of Set Partitioning Formulations on the Rule Extraction from Random Forests”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 27/4 (August 1, 2021): 513-519. https://izlik.org/JA22WJ24FL.
JAMA
1.Edalı M. Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2021;27:513–519.
MLA
Edalı, Mert. “Performance Analysis of Set Partitioning Formulations on the Rule Extraction from Random Forests”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 27, no. 4, Aug. 2021, pp. 513-9, https://izlik.org/JA22WJ24FL.
Vancouver
1.Mert Edalı. Performance analysis of set partitioning formulations on the rule extraction from random forests. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi [Internet]. 2021 Aug. 1;27(4):513-9. Available from: https://izlik.org/JA22WJ24FL