Research Article
BibTex RIS Cite

Performance Comparison of Association Rule Algorithms with SPMF on Automotive Industry Data

Year 2019, Volume: 7 Issue: 3, 1985 - 2000, 31.07.2019
https://doi.org/10.29130/dubited.581931

Abstract

By the recent developments about the
information technologies, companies can store their data faster and easier with
lower costs. All transactions (sales, current card, invoicing, etc.) performed
in companies during the day combine at the end of the day to form big datasets.
It is possible to extract valuable information through these datasets with data
mining. And this has become more important for companies in terms of today's
conditions where the competition in the market is high. In this study, a
dataset of a company selling car maintenance and repair products in Turkey is
used. Association Rules are applied on this dataset for determining the items
which are bought together by the customers. These rules, which are calculated
specifically for the company, can be used to redefine the sales and marketing
strategies, to revise the storage areas efficiently, and to create sales
campaigns suitable for the customers and regions. These algorithms are also
called Frequent Itemset Mining Algorithms. The most recent 11 algorithms from
these are applied to this dataset in order to compare the performances according
to metrics like memory usage and execution times against varying support values
and varying record numbers by using SPMF platform. Three different datasets are
created by using the whole dataset like 6-months, 12-months and 22-months. According
to the experiments, it can be said that executon times generally increases
inversely with the support values as nearly all algorithms have higher
execution time values for the lowest support value of 0.1. dEclat_bitset
algorithm has the most efficient performance for 6-months and 12-months dataset.
But Eclat algorithm can be said to be the most efficient algorithm for 0.7 and
0.3 support values; on the other hand dEclat_bitset is the most efficient
algorithm for 0.3 and 0.1 support values on 22-months dataset.

References

  • [1] Gancheva, "Market basket analysis of beauty products." Master of Science in Economics and Business, Erasmus University Rotterdam, Erasmus School of Economics, Rotterdam, Netherlands, 2013.
  • [2] Fayyad, Usama, Gregory Piatetsky-Shapiro, and Padhraic Smyth. "From data mining to knowledge discovery in databases." AI magazine, vol.17, no.3, pp. 37, 1996.
  • [3] Erpolat, "Otomobil Yetkili Servislerinde Birliktelik Kurallarının Belirlenmesinde Apriori ve FP-Growth Algoritmalarının Karşılaştırılması," Anadolu Üniversitesi Sosyal Bilimler Dergisi, c.12, s.1, ss. 151-166, 2012.
  • [4] Bala, A., Shuaibu, M. Z., KaramiLawal, Z., and Zakari, R. I. Y. "Performance Analysis of Apriori and FP-Growth Algorithms (Association Rule Mining)," Int. J. Computer Technology &Applications vol.7, no.2, pp. 279-293, 2016.
  • [5] G. Yıldız Erduran, "Online müşteri şikayetlerinin veri madenciliği ile incelenmesi," Doktora tezi, İşletme Bölümü, Trakya Üniversitesi, Edirne, Türkiye, 2017.
  • [6] C. Aguwa, M. H. Olya, and L. Monplaisir, "Modeling of fuzzy-based voice of customer for business decision analytics," Knowledge-Based Systems, vol. 125, pp. 136-145, 2017. [7] A. Griva, C. Bardaki, K. Pramatari, and D. Papakiriakopoulos, "Retail business analytics: Customer visit segmentation using market basket data," Expert Systems with Applications, vol. 100, pp. 1-16, 2018.
  • [8] M. Postigo-Boix and J. L. Melus-Moreno, "A social model based on customers' profiles for analyzing the churning process in the mobile market of data plans," Physica a-Statistical Mechanics and Its Applications, vol. 496, pp. 571-592, 2018.
  • [9] B. Doğan, A. Buldu, Ö. Demir ve B. Erol, "Sigortacılık Sektöründe Müşteri İlişki Yönetimi İçin Kümeleme Analizi." Karaelmas Fen ve Mühendislik Dergisi, c.8, s.1, ss.11-18, 2018.
  • [10] T. Bardak, Ö. Avcı, K. Kayahan ve S. Bardak, "Mobilya Alımında Geleneksel Mağaza ile Sanal Mağaza Tercihinin Veri Madenciliğine Dayalı Analizi," 6. Uluslararası Bilim, Kültür ve Spor Konferansı'nda sunuldu, Lviv/Ukrayna, 2018.
  • [11] Dökeroğlu, Tansel, Zahraa Mohammed Malik MALIK ve AL-SHEHABI Shadi, "Gözetimsiz Makine Öğrenme Teknikleri ile Miktara Dayalı Negatif Birliktelik Kural Madenciliği," Düzce Üniversitesi Bilim ve Teknoloji Dergisi, c.6, s.4, ss. 1119-1138, 2018.
  • [12] Bakariya, Brijesh, Ghanshyam Singh Thakur, and Kapil Chaturvedi, "An efficient algorithm for extracting infrequent itemsets from weblog," International Arab J. Information Technology, vol.16, no.2, pp. 275-280, 2019.
  • [13] A. Morais, H. Peixoto, C. Coimbra, A. Abelha, and J. Machado, "Predicting the need of Neonatal Resuscitation using data mining." Procedia computer science, vol.113, pp. 571-576, 2017.
  • [14] Anandan, B. and C. Clifton, “Differentially Private Feature Selection for Data Mining,” 18th Proceedings of the Fourth Acm International Workshop on Security and Privacy Analytics (IWSPA), 2018, pp. 43-53.
  • [15] Stokes, A., Brigante, O., Rohan, K., Kendall, G., Patel, M., Hama, B., ... & Schneider, “Long Term Lead Survival in Adult Congenital Heart Disease Patients: A Retrospective Analysis Using Clinical Correspondence Data Mining,” Heart, no. 104, pp. A22-A23, 2018.
  • [16] Idri, A., Benhar, H., Fernández-Alemán, J. L., & Kadi, “A systematic map of medical data preprocessing in knowledge discovery,” Computer Methods and Programs in Biomedicine, no. 162, pp. 69-85, 2018.
  • [17] Ionita, Irina, and Liviu Ionita, “Classification Algorithms of Data Mining Applied for Demographic Processes,” Brain-Broad Research in Artificial Intelligence and Neuroscience, vol.9, no.1, pp. 94-100, 2018.
  • [18] Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J., “The elements of statistical learning: data mining, inference and prediction,” The Mathematical Intelligencer, vol.27, no.2, pp. 83-85, 2005.
  • [20] Arora, Jyoti, Nidhi Bhalla, and Sanjeev Rao, "A review on association rule mining algorithms." International journal of innovative research in computer and communication engineering, vol.1, no.5, pp. 1246-1251, 2013.
  • [21] AL-Zawaidah, Farah Hanna, Yosef Hasan Jbara, and A. L. Marwan, "An improved algorithm for mining association rules in large databases," World of Computer science and information technology journal, vol.1, no.7, pp. 311-316, 2011.
  • [22] Agrawal, Rakesh, Tomasz Imieliński, and Arun Swami, "Mining association rules between sets of items in large databases," Acm sigmod record (ACM), 1993, pp. 207-216.
  • [23] Geyer-Schulz, A. and M. Hahsler, “Evaluation of recommender algorithms for an internet information broker based on simple association rules and on the repeat-buying theory,” In proceedings WEBKDD, pp. 100-114, 2002.
  • [24] Charanjeet Kaur, "Association rule mining using apriori algorithm: a survey," International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol.2, no.2, pp. 2081-2084, 2013.
  • [25] J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation," ACM sigmod record, vol. 29, no. 2, pp. 1-12, 2000.
  • [26] Grahne, Gösta, and Jianfei Zhu, "Efficiently using prefix-trees in mining frequent itemsets," FIMI, Vol. 90, 2003.
  • [27] Bart Goethals, "Survey on frequent pattern mining," Univ. of Helsinki, vol. 19, pp. 840-852, 2003.
  • [28] M. J. Zaki and K. Gouda, "Fast vertical mining using diffsets," Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining(ACM), 2003, pp. 326-335.
  • [29] M. Adda, L. Wu, and Y. Feng, "Rare itemset mining," Sixth International Conference on Machine Learning and Applications (ICMLA 2007), 2007, pp. 73-80.
  • [30] Pillai, Jyothi, and O. P. Vyas, "Overview of itemset utility mining and its applications," International Journal of Computer Applications, vol.5, no.11, pp. 9-13, 2010.
  • [31] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. I. J. A. i. k. d. Verkamo, and d. mining, "Fast discovery of association rules," Advances in knowledge discovery and data mining, vol. 12, no. 1, pp. 307-328, 1996.
  • [32] Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L., "Discovering frequent closed itemsets for association rules," International Conference on Database Theory, 1999, pp 398-416.
  • [33] Lucchese, Claudio, Salvatore Orlando, and Raffaele Perego, "DCI Closed: A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets," FIMI, 2004.
  • [34] Grahne, Gösta, and Jianfei Zhu, "Fast algorithms for frequent itemset mining using fp-trees," IEEE transactions on knowledge and data engineering, vol.17, no.10, pp. 1347-1362, 2005.
  • [35] Dwivedi, Neha, and Srinivasa Rao Satti, "Set and array based hybrid data structure solution for Frequent Pattern Mining," 2015 Tenth International Conference on Digital Information Management (ICDIM), 2015, pp. 14-19.

Otomotiv Endüstrisi Verileri Üzerinde Birliktelik Kuralları Algoritmalarının SPMF ile Performans Karşılaştırması

Year 2019, Volume: 7 Issue: 3, 1985 - 2000, 31.07.2019
https://doi.org/10.29130/dubited.581931

Abstract

Bilgi
teknolojilerindeki son gelişmeler sayesinde, şirketler verilerini daha düşük
maliyetlerle daha hızlı ve daha kolay saklayabilirler. Gün içinde şirketlerde
gerçekleştirilen tüm işlemler (satışlar, cari kartlar, faturalama vb.), günün
sonunda birleştirilir ve büyük veri setleri oluştururlar. Bu veri setlerinden
veri madenciliği aracılıyla değerli bilgiler elde edilmesi mümkündür. Pazardaki
rekabetin yüksek olduğu günümüz şartları açısından bu durum şirketler için çok
daha önemli hale gelmiştir.  Bu çalışmada
Türkiye’de araç bakım ve servis ürünleri satan bir şirketin veriseti
kullanılmıştır. Bu verisetine, müşteriler tarafından birlikte satın alınmış
olan ürünlerin tespiti için Birliktelik Kuralları uygulanmıştır. Şirketlere
özgü olarak çıkarımı yapılan bu kurallar şirketlerin satış ve pazarlama stratejilerinin
belirlenmesinde, depoların verimli bir şekilde kullanımlarında ve müşteriler ya
da bölgelere göre uygun satış kampanyaları oluşturulmasında kullanılabilir.
Birliktelik kuralları aynı zamanda Sık Satılan Ürün Algoritmaları olarak da
isimlendirilebilmektedir. Bu algoritmalardan en güncel 11 tanesi SPMF yazılımı
kullanılarak bu veri setine uygulanmış ve bu algoritmaların değişken destek
değerleri ve değişken kayıt sayılarına bağlı olarak performansları, bellek
kullanım miktarları ve işlem süreleri açısından karşılaştırılmıştır.
Başlangıçtaki veri seti, 6 aylık, 12 aylık ve 22 aylık kayıt içerecek şekilde 3
ayrı veri seti haline getirilmiştir. Deney sonuçlarına bakıldığında, işlem zamanlarının
genellikle destek değerleriyle ters orantılı olarak arttığı söylenebilir. Çünkü
neredeyse tüm algoritmaların en düşük destek değeri olan 0,1 için daha yüksek işlem
zamanı değerlerine sahip oldukları görülmüştür. 6 aylık ve 12 aylık veri
setleri için dEclat_bitset algoritması en verimli performansı göstermiştir. Fakat
22 aylık veri setinde, 0,7 ve 0,3 destek değerleri için Eclat algoritması en
verimli olarak görünürken; 0,3 ve 0,1 destek değerleri için dEclat_bitset
algoritması en verimli olarak görünmektedir. 

References

  • [1] Gancheva, "Market basket analysis of beauty products." Master of Science in Economics and Business, Erasmus University Rotterdam, Erasmus School of Economics, Rotterdam, Netherlands, 2013.
  • [2] Fayyad, Usama, Gregory Piatetsky-Shapiro, and Padhraic Smyth. "From data mining to knowledge discovery in databases." AI magazine, vol.17, no.3, pp. 37, 1996.
  • [3] Erpolat, "Otomobil Yetkili Servislerinde Birliktelik Kurallarının Belirlenmesinde Apriori ve FP-Growth Algoritmalarının Karşılaştırılması," Anadolu Üniversitesi Sosyal Bilimler Dergisi, c.12, s.1, ss. 151-166, 2012.
  • [4] Bala, A., Shuaibu, M. Z., KaramiLawal, Z., and Zakari, R. I. Y. "Performance Analysis of Apriori and FP-Growth Algorithms (Association Rule Mining)," Int. J. Computer Technology &Applications vol.7, no.2, pp. 279-293, 2016.
  • [5] G. Yıldız Erduran, "Online müşteri şikayetlerinin veri madenciliği ile incelenmesi," Doktora tezi, İşletme Bölümü, Trakya Üniversitesi, Edirne, Türkiye, 2017.
  • [6] C. Aguwa, M. H. Olya, and L. Monplaisir, "Modeling of fuzzy-based voice of customer for business decision analytics," Knowledge-Based Systems, vol. 125, pp. 136-145, 2017. [7] A. Griva, C. Bardaki, K. Pramatari, and D. Papakiriakopoulos, "Retail business analytics: Customer visit segmentation using market basket data," Expert Systems with Applications, vol. 100, pp. 1-16, 2018.
  • [8] M. Postigo-Boix and J. L. Melus-Moreno, "A social model based on customers' profiles for analyzing the churning process in the mobile market of data plans," Physica a-Statistical Mechanics and Its Applications, vol. 496, pp. 571-592, 2018.
  • [9] B. Doğan, A. Buldu, Ö. Demir ve B. Erol, "Sigortacılık Sektöründe Müşteri İlişki Yönetimi İçin Kümeleme Analizi." Karaelmas Fen ve Mühendislik Dergisi, c.8, s.1, ss.11-18, 2018.
  • [10] T. Bardak, Ö. Avcı, K. Kayahan ve S. Bardak, "Mobilya Alımında Geleneksel Mağaza ile Sanal Mağaza Tercihinin Veri Madenciliğine Dayalı Analizi," 6. Uluslararası Bilim, Kültür ve Spor Konferansı'nda sunuldu, Lviv/Ukrayna, 2018.
  • [11] Dökeroğlu, Tansel, Zahraa Mohammed Malik MALIK ve AL-SHEHABI Shadi, "Gözetimsiz Makine Öğrenme Teknikleri ile Miktara Dayalı Negatif Birliktelik Kural Madenciliği," Düzce Üniversitesi Bilim ve Teknoloji Dergisi, c.6, s.4, ss. 1119-1138, 2018.
  • [12] Bakariya, Brijesh, Ghanshyam Singh Thakur, and Kapil Chaturvedi, "An efficient algorithm for extracting infrequent itemsets from weblog," International Arab J. Information Technology, vol.16, no.2, pp. 275-280, 2019.
  • [13] A. Morais, H. Peixoto, C. Coimbra, A. Abelha, and J. Machado, "Predicting the need of Neonatal Resuscitation using data mining." Procedia computer science, vol.113, pp. 571-576, 2017.
  • [14] Anandan, B. and C. Clifton, “Differentially Private Feature Selection for Data Mining,” 18th Proceedings of the Fourth Acm International Workshop on Security and Privacy Analytics (IWSPA), 2018, pp. 43-53.
  • [15] Stokes, A., Brigante, O., Rohan, K., Kendall, G., Patel, M., Hama, B., ... & Schneider, “Long Term Lead Survival in Adult Congenital Heart Disease Patients: A Retrospective Analysis Using Clinical Correspondence Data Mining,” Heart, no. 104, pp. A22-A23, 2018.
  • [16] Idri, A., Benhar, H., Fernández-Alemán, J. L., & Kadi, “A systematic map of medical data preprocessing in knowledge discovery,” Computer Methods and Programs in Biomedicine, no. 162, pp. 69-85, 2018.
  • [17] Ionita, Irina, and Liviu Ionita, “Classification Algorithms of Data Mining Applied for Demographic Processes,” Brain-Broad Research in Artificial Intelligence and Neuroscience, vol.9, no.1, pp. 94-100, 2018.
  • [18] Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J., “The elements of statistical learning: data mining, inference and prediction,” The Mathematical Intelligencer, vol.27, no.2, pp. 83-85, 2005.
  • [20] Arora, Jyoti, Nidhi Bhalla, and Sanjeev Rao, "A review on association rule mining algorithms." International journal of innovative research in computer and communication engineering, vol.1, no.5, pp. 1246-1251, 2013.
  • [21] AL-Zawaidah, Farah Hanna, Yosef Hasan Jbara, and A. L. Marwan, "An improved algorithm for mining association rules in large databases," World of Computer science and information technology journal, vol.1, no.7, pp. 311-316, 2011.
  • [22] Agrawal, Rakesh, Tomasz Imieliński, and Arun Swami, "Mining association rules between sets of items in large databases," Acm sigmod record (ACM), 1993, pp. 207-216.
  • [23] Geyer-Schulz, A. and M. Hahsler, “Evaluation of recommender algorithms for an internet information broker based on simple association rules and on the repeat-buying theory,” In proceedings WEBKDD, pp. 100-114, 2002.
  • [24] Charanjeet Kaur, "Association rule mining using apriori algorithm: a survey," International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol.2, no.2, pp. 2081-2084, 2013.
  • [25] J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation," ACM sigmod record, vol. 29, no. 2, pp. 1-12, 2000.
  • [26] Grahne, Gösta, and Jianfei Zhu, "Efficiently using prefix-trees in mining frequent itemsets," FIMI, Vol. 90, 2003.
  • [27] Bart Goethals, "Survey on frequent pattern mining," Univ. of Helsinki, vol. 19, pp. 840-852, 2003.
  • [28] M. J. Zaki and K. Gouda, "Fast vertical mining using diffsets," Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining(ACM), 2003, pp. 326-335.
  • [29] M. Adda, L. Wu, and Y. Feng, "Rare itemset mining," Sixth International Conference on Machine Learning and Applications (ICMLA 2007), 2007, pp. 73-80.
  • [30] Pillai, Jyothi, and O. P. Vyas, "Overview of itemset utility mining and its applications," International Journal of Computer Applications, vol.5, no.11, pp. 9-13, 2010.
  • [31] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A. I. J. A. i. k. d. Verkamo, and d. mining, "Fast discovery of association rules," Advances in knowledge discovery and data mining, vol. 12, no. 1, pp. 307-328, 1996.
  • [32] Pasquier, N., Bastide, Y., Taouil, R., and Lakhal, L., "Discovering frequent closed itemsets for association rules," International Conference on Database Theory, 1999, pp 398-416.
  • [33] Lucchese, Claudio, Salvatore Orlando, and Raffaele Perego, "DCI Closed: A Fast and Memory Efficient Algorithm to Mine Frequent Closed Itemsets," FIMI, 2004.
  • [34] Grahne, Gösta, and Jianfei Zhu, "Fast algorithms for frequent itemset mining using fp-trees," IEEE transactions on knowledge and data engineering, vol.17, no.10, pp. 1347-1362, 2005.
  • [35] Dwivedi, Neha, and Srinivasa Rao Satti, "Set and array based hybrid data structure solution for Frequent Pattern Mining," 2015 Tenth International Conference on Digital Information Management (ICDIM), 2015, pp. 14-19.
There are 33 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Melih Nair 0000-0002-0920-2704

Fatih Kayaalp 0000-0002-8752-3335

Publication Date July 31, 2019
Published in Issue Year 2019 Volume: 7 Issue: 3

Cite

APA Nair, M., & Kayaalp, F. (2019). Performance Comparison of Association Rule Algorithms with SPMF on Automotive Industry Data. Duzce University Journal of Science and Technology, 7(3), 1985-2000. https://doi.org/10.29130/dubited.581931
AMA Nair M, Kayaalp F. Performance Comparison of Association Rule Algorithms with SPMF on Automotive Industry Data. DUBİTED. July 2019;7(3):1985-2000. doi:10.29130/dubited.581931
Chicago Nair, Melih, and Fatih Kayaalp. “Performance Comparison of Association Rule Algorithms With SPMF on Automotive Industry Data”. Duzce University Journal of Science and Technology 7, no. 3 (July 2019): 1985-2000. https://doi.org/10.29130/dubited.581931.
EndNote Nair M, Kayaalp F (July 1, 2019) Performance Comparison of Association Rule Algorithms with SPMF on Automotive Industry Data. Duzce University Journal of Science and Technology 7 3 1985–2000.
IEEE M. Nair and F. Kayaalp, “Performance Comparison of Association Rule Algorithms with SPMF on Automotive Industry Data”, DUBİTED, vol. 7, no. 3, pp. 1985–2000, 2019, doi: 10.29130/dubited.581931.
ISNAD Nair, Melih - Kayaalp, Fatih. “Performance Comparison of Association Rule Algorithms With SPMF on Automotive Industry Data”. Duzce University Journal of Science and Technology 7/3 (July 2019), 1985-2000. https://doi.org/10.29130/dubited.581931.
JAMA Nair M, Kayaalp F. Performance Comparison of Association Rule Algorithms with SPMF on Automotive Industry Data. DUBİTED. 2019;7:1985–2000.
MLA Nair, Melih and Fatih Kayaalp. “Performance Comparison of Association Rule Algorithms With SPMF on Automotive Industry Data”. Duzce University Journal of Science and Technology, vol. 7, no. 3, 2019, pp. 1985-00, doi:10.29130/dubited.581931.
Vancouver Nair M, Kayaalp F. Performance Comparison of Association Rule Algorithms with SPMF on Automotive Industry Data. DUBİTED. 2019;7(3):1985-2000.