Araştırma Makalesi
BibTex RIS Kaynak Göster

Mining Top-K High Occupancy Itemsets

Yıl 2025, Cilt: 8 Sayı: 6, 1723 - 1730, 15.11.2025
https://doi.org/10.34248/bsengineering.1744061

Öz

High-occupancy itemset mining aims to identify itemsets within databases whose occupancy values satisfy a specified minimum threshold set by the user. However, selecting a suitable threshold can be difficult for users. If the threshold is set too low, it can result in too many itemsets, causing inefficiencies in terms of time and memory usage during the mining process and making it harder for decision-makers to interpret the results. On the other hand, setting the threshold too high may lead to the omission of valuable itemsets. To overcome this limitation, this paper extends the classical high-occupancy itemset mining problem into the top-k high-occupancy itemset mining problem and proposes an algorithm called TKHOIM (top-k high-occupancy itemset miner) that applies three strategies to address the problem efficiently. In this approach, users can directly specify the number of itemsets to be discovered, denoted as k, without the need to define a minimum occupancy threshold. Experimental results demonstrate that TKHOIM is effective in discovering the top-k high-occupancy itemsets.

Etik Beyan

Ethics committee approval was not required for this study because of there was no study on animals or humans.

Kaynakça

  • Chen J, Yang S, Ding W, Li P, Liu A, Zhang H, Li T. 2024. Incremental high average-utility itemset mining: survey and challenges. Sci Rep, 14: 9924.
  • Deng Z. 2013. Mining top‐rank‐k erasable itemsets by PID_lists. Int J Intell Syst, 28: 366-379.
  • Deng ZH. 2020. Mining high occupancy itemsets. Future Gener Comput Syst, 102: 222-229.
  • Hong TP, Huang WM, Lan GC, Chiang MC, Lin JCW. 2021. A bitmap approach for mining erasable itemsets. IEEE Access, 9: 106029-106038.
  • Huynh B, Tung NT, Nguyen TD, Bui QT, Nguyen LT, Yun U, Vo B. 2024. An efficient strategy for mining high-efficiency itemsets in quantitative databases. Knowl Based Syst, 299: 112035.
  • Kim H, Cho M, Nam H, Baek Y, Park S, Kim D, Vo B, Yun U. 2024. Advanced incremental erasable pattern mining from the time-sensitive data stream. Knowl Based Syst, 299: 112001.
  • Kim H, Cho M, Park S, Kim D, Kim D, Yun U. 2025. Damped weighted erasable itemset mining with time sensitive dynamic environments. J Big Data, 12: 20.
  • Kim H, Ryu T, Lee C, Kim H, Truong T, Fournier-Viger P, Pedrycz W, Yun U. 2022. Mining high occupancy patterns to analyze incremental data in intelligent systems. ISA Trans, 131: 460–475.
  • Liu X, Chen G, Wu F, Wen S, Zuo W. 2023. Mining top-k high average-utility itemsets based on breadth-first search. Appl Intell, 53: 29319–29337.
  • Luna JM, Fournier‐Viger P, Ventura S. 2019. Frequent itemset mining: A 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov, 9: e1329.
  • Luna JM, Kiran RU, Fournier-Viger P, Ventura S. 2023. Efficient mining of top-k high utility itemsets through genetic algorithms. Inf Sci, 624: 529-553.
  • Nguyen LT, Mai T, Pham GH, Yun U, Vo B. 2023. An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowl Based Syst, 267: 110441.
  • Qu JF, Fournier-Viger P, Liu M, Hang B, Hu C. 2023. Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng, 35: 10224-10236.
  • Singh K, Singh SS, Kumar A, Biswas B. 2019. TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell, 49: 1078-1097.
  • Wan X, Han X. 2024. Efficient top-k frequent itemset mining on massive data. Data Sci Eng, 9: 177-203.
  • Yan Y, Niu X, Zhang Z, Fournier-Viger P, Ye L, Min F. 2024. Efficient high utility itemset mining without the join operation. Inf Sci, 681: 121218.
  • Yildirim I, Celik M. 2019. An efficient tree-based algorithm for mining high average-utility itemset. IEEE Access, 7: 144245-144263.
  • Yildirim I, Celik M. 2020. Mining high-average utility itemsets with positive and negative external utilities. New Gener Comput, 38: 153-186.
  • Yildirim I. 2024. Mining High Average-Efficiency Itemsets. In 2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP), September 21-22, Malatya, Türkiye, pp: 1-9.
  • Yildirim I. 2025a. An efficient algorithm for fast discovery of high-efficiency patterns. Knowl Based Syst, 313:113157.
  • Yildirim I. 2025b. Mining High-Efficiency Itemsets with Negative Utilities. Mathematics, 13: 659.
  • Yun U, Kim D, Ryang H, Lee G, Lee KM. 2016. Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst, 30: 3605-3617.
  • Zhang C, Du Z, Gan W, Yu PS. 2021. TKUS: Mining top-k high utility sequential patterns. Inf Sci, 570: 342-359.
  • Zhang X, Chen G, Song L, Gan W, Song Y. 2023. HEPM: High-efficiency pattern mining. Knowl Based Syst, 281: 111068.

Mining Top-K High Occupancy Itemsets

Yıl 2025, Cilt: 8 Sayı: 6, 1723 - 1730, 15.11.2025
https://doi.org/10.34248/bsengineering.1744061

Öz

High-occupancy itemset mining aims to identify itemsets within databases whose occupancy values satisfy a specified minimum threshold set by the user. However, selecting a suitable threshold can be difficult for users. If the threshold is set too low, it can result in too many itemsets, causing inefficiencies in terms of time and memory usage during the mining process and making it harder for decision-makers to interpret the results. On the other hand, setting the threshold too high may lead to the omission of valuable itemsets. To overcome this limitation, this paper extends the classical high-occupancy itemset mining problem into the top-k high-occupancy itemset mining problem and proposes an algorithm called TKHOIM (top-k high-occupancy itemset miner) that applies three strategies to address the problem efficiently. In this approach, users can directly specify the number of itemsets to be discovered, denoted as k, without the need to define a minimum occupancy threshold. Experimental results demonstrate that TKHOIM is effective in discovering the top-k high-occupancy itemsets.

Etik Beyan

Ethics committee approval was not required for this study because of there was no study on animals or humans.

Kaynakça

  • Chen J, Yang S, Ding W, Li P, Liu A, Zhang H, Li T. 2024. Incremental high average-utility itemset mining: survey and challenges. Sci Rep, 14: 9924.
  • Deng Z. 2013. Mining top‐rank‐k erasable itemsets by PID_lists. Int J Intell Syst, 28: 366-379.
  • Deng ZH. 2020. Mining high occupancy itemsets. Future Gener Comput Syst, 102: 222-229.
  • Hong TP, Huang WM, Lan GC, Chiang MC, Lin JCW. 2021. A bitmap approach for mining erasable itemsets. IEEE Access, 9: 106029-106038.
  • Huynh B, Tung NT, Nguyen TD, Bui QT, Nguyen LT, Yun U, Vo B. 2024. An efficient strategy for mining high-efficiency itemsets in quantitative databases. Knowl Based Syst, 299: 112035.
  • Kim H, Cho M, Nam H, Baek Y, Park S, Kim D, Vo B, Yun U. 2024. Advanced incremental erasable pattern mining from the time-sensitive data stream. Knowl Based Syst, 299: 112001.
  • Kim H, Cho M, Park S, Kim D, Kim D, Yun U. 2025. Damped weighted erasable itemset mining with time sensitive dynamic environments. J Big Data, 12: 20.
  • Kim H, Ryu T, Lee C, Kim H, Truong T, Fournier-Viger P, Pedrycz W, Yun U. 2022. Mining high occupancy patterns to analyze incremental data in intelligent systems. ISA Trans, 131: 460–475.
  • Liu X, Chen G, Wu F, Wen S, Zuo W. 2023. Mining top-k high average-utility itemsets based on breadth-first search. Appl Intell, 53: 29319–29337.
  • Luna JM, Fournier‐Viger P, Ventura S. 2019. Frequent itemset mining: A 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov, 9: e1329.
  • Luna JM, Kiran RU, Fournier-Viger P, Ventura S. 2023. Efficient mining of top-k high utility itemsets through genetic algorithms. Inf Sci, 624: 529-553.
  • Nguyen LT, Mai T, Pham GH, Yun U, Vo B. 2023. An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowl Based Syst, 267: 110441.
  • Qu JF, Fournier-Viger P, Liu M, Hang B, Hu C. 2023. Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng, 35: 10224-10236.
  • Singh K, Singh SS, Kumar A, Biswas B. 2019. TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell, 49: 1078-1097.
  • Wan X, Han X. 2024. Efficient top-k frequent itemset mining on massive data. Data Sci Eng, 9: 177-203.
  • Yan Y, Niu X, Zhang Z, Fournier-Viger P, Ye L, Min F. 2024. Efficient high utility itemset mining without the join operation. Inf Sci, 681: 121218.
  • Yildirim I, Celik M. 2019. An efficient tree-based algorithm for mining high average-utility itemset. IEEE Access, 7: 144245-144263.
  • Yildirim I, Celik M. 2020. Mining high-average utility itemsets with positive and negative external utilities. New Gener Comput, 38: 153-186.
  • Yildirim I. 2024. Mining High Average-Efficiency Itemsets. In 2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP), September 21-22, Malatya, Türkiye, pp: 1-9.
  • Yildirim I. 2025a. An efficient algorithm for fast discovery of high-efficiency patterns. Knowl Based Syst, 313:113157.
  • Yildirim I. 2025b. Mining High-Efficiency Itemsets with Negative Utilities. Mathematics, 13: 659.
  • Yun U, Kim D, Ryang H, Lee G, Lee KM. 2016. Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst, 30: 3605-3617.
  • Zhang C, Du Z, Gan W, Yu PS. 2021. TKUS: Mining top-k high utility sequential patterns. Inf Sci, 570: 342-359.
  • Zhang X, Chen G, Song L, Gan W, Song Y. 2023. HEPM: High-efficiency pattern mining. Knowl Based Syst, 281: 111068.
Toplam 24 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Bilgi Sistemleri Geliştirme Metodolojileri ve Uygulamaları, Karar Desteği ve Grup Destek Sistemleri
Bölüm Research Articles
Yazarlar

İrfan Yıldırım 0000-0002-5635-2991

Erken Görünüm Tarihi 12 Kasım 2025
Yayımlanma Tarihi 15 Kasım 2025
Gönderilme Tarihi 16 Temmuz 2025
Kabul Tarihi 17 Eylül 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 8 Sayı: 6

Kaynak Göster

APA Yıldırım, İ. (2025). Mining Top-K High Occupancy Itemsets. Black Sea Journal of Engineering and Science, 8(6), 1723-1730. https://doi.org/10.34248/bsengineering.1744061
AMA Yıldırım İ. Mining Top-K High Occupancy Itemsets. BSJ Eng. Sci. Kasım 2025;8(6):1723-1730. doi:10.34248/bsengineering.1744061
Chicago Yıldırım, İrfan. “Mining Top-K High Occupancy Itemsets”. Black Sea Journal of Engineering and Science 8, sy. 6 (Kasım 2025): 1723-30. https://doi.org/10.34248/bsengineering.1744061.
EndNote Yıldırım İ (01 Kasım 2025) Mining Top-K High Occupancy Itemsets. Black Sea Journal of Engineering and Science 8 6 1723–1730.
IEEE İ. Yıldırım, “Mining Top-K High Occupancy Itemsets”, BSJ Eng. Sci., c. 8, sy. 6, ss. 1723–1730, 2025, doi: 10.34248/bsengineering.1744061.
ISNAD Yıldırım, İrfan. “Mining Top-K High Occupancy Itemsets”. Black Sea Journal of Engineering and Science 8/6 (Kasım2025), 1723-1730. https://doi.org/10.34248/bsengineering.1744061.
JAMA Yıldırım İ. Mining Top-K High Occupancy Itemsets. BSJ Eng. Sci. 2025;8:1723–1730.
MLA Yıldırım, İrfan. “Mining Top-K High Occupancy Itemsets”. Black Sea Journal of Engineering and Science, c. 8, sy. 6, 2025, ss. 1723-30, doi:10.34248/bsengineering.1744061.
Vancouver Yıldırım İ. Mining Top-K High Occupancy Itemsets. BSJ Eng. Sci. 2025;8(6):1723-30.

                           24890