Research Article
BibTex RIS Cite

Mining Top-K High Occupancy Itemsets

Year 2025, Volume: 8 Issue: 6, 1723 - 1730, 15.11.2025
https://doi.org/10.34248/bsengineering.1744061

Abstract

High-occupancy itemset mining aims to identify itemsets within databases whose occupancy values satisfy a specified minimum threshold set by the user. However, selecting a suitable threshold can be difficult for users. If the threshold is set too low, it can result in too many itemsets, causing inefficiencies in terms of time and memory usage during the mining process and making it harder for decision-makers to interpret the results. On the other hand, setting the threshold too high may lead to the omission of valuable itemsets. To overcome this limitation, this paper extends the classical high-occupancy itemset mining problem into the top-k high-occupancy itemset mining problem and proposes an algorithm called TKHOIM (top-k high-occupancy itemset miner) that applies three strategies to address the problem efficiently. In this approach, users can directly specify the number of itemsets to be discovered, denoted as k, without the need to define a minimum occupancy threshold. Experimental results demonstrate that TKHOIM is effective in discovering the top-k high-occupancy itemsets.

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans.

References

  • Chen J, Yang S, Ding W, Li P, Liu A, Zhang H, Li T. 2024. Incremental high average-utility itemset mining: survey and challenges. Sci Rep, 14: 9924.
  • Deng Z. 2013. Mining top‐rank‐k erasable itemsets by PID_lists. Int J Intell Syst, 28: 366-379.
  • Deng ZH. 2020. Mining high occupancy itemsets. Future Gener Comput Syst, 102: 222-229.
  • Hong TP, Huang WM, Lan GC, Chiang MC, Lin JCW. 2021. A bitmap approach for mining erasable itemsets. IEEE Access, 9: 106029-106038.
  • Huynh B, Tung NT, Nguyen TD, Bui QT, Nguyen LT, Yun U, Vo B. 2024. An efficient strategy for mining high-efficiency itemsets in quantitative databases. Knowl Based Syst, 299: 112035.
  • Kim H, Cho M, Nam H, Baek Y, Park S, Kim D, Vo B, Yun U. 2024. Advanced incremental erasable pattern mining from the time-sensitive data stream. Knowl Based Syst, 299: 112001.
  • Kim H, Cho M, Park S, Kim D, Kim D, Yun U. 2025. Damped weighted erasable itemset mining with time sensitive dynamic environments. J Big Data, 12: 20.
  • Kim H, Ryu T, Lee C, Kim H, Truong T, Fournier-Viger P, Pedrycz W, Yun U. 2022. Mining high occupancy patterns to analyze incremental data in intelligent systems. ISA Trans, 131: 460–475.
  • Liu X, Chen G, Wu F, Wen S, Zuo W. 2023. Mining top-k high average-utility itemsets based on breadth-first search. Appl Intell, 53: 29319–29337.
  • Luna JM, Fournier‐Viger P, Ventura S. 2019. Frequent itemset mining: A 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov, 9: e1329.
  • Luna JM, Kiran RU, Fournier-Viger P, Ventura S. 2023. Efficient mining of top-k high utility itemsets through genetic algorithms. Inf Sci, 624: 529-553.
  • Nguyen LT, Mai T, Pham GH, Yun U, Vo B. 2023. An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowl Based Syst, 267: 110441.
  • Qu JF, Fournier-Viger P, Liu M, Hang B, Hu C. 2023. Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng, 35: 10224-10236.
  • Singh K, Singh SS, Kumar A, Biswas B. 2019. TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell, 49: 1078-1097.
  • Wan X, Han X. 2024. Efficient top-k frequent itemset mining on massive data. Data Sci Eng, 9: 177-203.
  • Yan Y, Niu X, Zhang Z, Fournier-Viger P, Ye L, Min F. 2024. Efficient high utility itemset mining without the join operation. Inf Sci, 681: 121218.
  • Yildirim I, Celik M. 2019. An efficient tree-based algorithm for mining high average-utility itemset. IEEE Access, 7: 144245-144263.
  • Yildirim I, Celik M. 2020. Mining high-average utility itemsets with positive and negative external utilities. New Gener Comput, 38: 153-186.
  • Yildirim I. 2024. Mining High Average-Efficiency Itemsets. In 2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP), September 21-22, Malatya, Türkiye, pp: 1-9.
  • Yildirim I. 2025a. An efficient algorithm for fast discovery of high-efficiency patterns. Knowl Based Syst, 313:113157.
  • Yildirim I. 2025b. Mining High-Efficiency Itemsets with Negative Utilities. Mathematics, 13: 659.
  • Yun U, Kim D, Ryang H, Lee G, Lee KM. 2016. Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst, 30: 3605-3617.
  • Zhang C, Du Z, Gan W, Yu PS. 2021. TKUS: Mining top-k high utility sequential patterns. Inf Sci, 570: 342-359.
  • Zhang X, Chen G, Song L, Gan W, Song Y. 2023. HEPM: High-efficiency pattern mining. Knowl Based Syst, 281: 111068.

Mining Top-K High Occupancy Itemsets

Year 2025, Volume: 8 Issue: 6, 1723 - 1730, 15.11.2025
https://doi.org/10.34248/bsengineering.1744061

Abstract

High-occupancy itemset mining aims to identify itemsets within databases whose occupancy values satisfy a specified minimum threshold set by the user. However, selecting a suitable threshold can be difficult for users. If the threshold is set too low, it can result in too many itemsets, causing inefficiencies in terms of time and memory usage during the mining process and making it harder for decision-makers to interpret the results. On the other hand, setting the threshold too high may lead to the omission of valuable itemsets. To overcome this limitation, this paper extends the classical high-occupancy itemset mining problem into the top-k high-occupancy itemset mining problem and proposes an algorithm called TKHOIM (top-k high-occupancy itemset miner) that applies three strategies to address the problem efficiently. In this approach, users can directly specify the number of itemsets to be discovered, denoted as k, without the need to define a minimum occupancy threshold. Experimental results demonstrate that TKHOIM is effective in discovering the top-k high-occupancy itemsets.

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans.

References

  • Chen J, Yang S, Ding W, Li P, Liu A, Zhang H, Li T. 2024. Incremental high average-utility itemset mining: survey and challenges. Sci Rep, 14: 9924.
  • Deng Z. 2013. Mining top‐rank‐k erasable itemsets by PID_lists. Int J Intell Syst, 28: 366-379.
  • Deng ZH. 2020. Mining high occupancy itemsets. Future Gener Comput Syst, 102: 222-229.
  • Hong TP, Huang WM, Lan GC, Chiang MC, Lin JCW. 2021. A bitmap approach for mining erasable itemsets. IEEE Access, 9: 106029-106038.
  • Huynh B, Tung NT, Nguyen TD, Bui QT, Nguyen LT, Yun U, Vo B. 2024. An efficient strategy for mining high-efficiency itemsets in quantitative databases. Knowl Based Syst, 299: 112035.
  • Kim H, Cho M, Nam H, Baek Y, Park S, Kim D, Vo B, Yun U. 2024. Advanced incremental erasable pattern mining from the time-sensitive data stream. Knowl Based Syst, 299: 112001.
  • Kim H, Cho M, Park S, Kim D, Kim D, Yun U. 2025. Damped weighted erasable itemset mining with time sensitive dynamic environments. J Big Data, 12: 20.
  • Kim H, Ryu T, Lee C, Kim H, Truong T, Fournier-Viger P, Pedrycz W, Yun U. 2022. Mining high occupancy patterns to analyze incremental data in intelligent systems. ISA Trans, 131: 460–475.
  • Liu X, Chen G, Wu F, Wen S, Zuo W. 2023. Mining top-k high average-utility itemsets based on breadth-first search. Appl Intell, 53: 29319–29337.
  • Luna JM, Fournier‐Viger P, Ventura S. 2019. Frequent itemset mining: A 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov, 9: e1329.
  • Luna JM, Kiran RU, Fournier-Viger P, Ventura S. 2023. Efficient mining of top-k high utility itemsets through genetic algorithms. Inf Sci, 624: 529-553.
  • Nguyen LT, Mai T, Pham GH, Yun U, Vo B. 2023. An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowl Based Syst, 267: 110441.
  • Qu JF, Fournier-Viger P, Liu M, Hang B, Hu C. 2023. Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng, 35: 10224-10236.
  • Singh K, Singh SS, Kumar A, Biswas B. 2019. TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell, 49: 1078-1097.
  • Wan X, Han X. 2024. Efficient top-k frequent itemset mining on massive data. Data Sci Eng, 9: 177-203.
  • Yan Y, Niu X, Zhang Z, Fournier-Viger P, Ye L, Min F. 2024. Efficient high utility itemset mining without the join operation. Inf Sci, 681: 121218.
  • Yildirim I, Celik M. 2019. An efficient tree-based algorithm for mining high average-utility itemset. IEEE Access, 7: 144245-144263.
  • Yildirim I, Celik M. 2020. Mining high-average utility itemsets with positive and negative external utilities. New Gener Comput, 38: 153-186.
  • Yildirim I. 2024. Mining High Average-Efficiency Itemsets. In 2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP), September 21-22, Malatya, Türkiye, pp: 1-9.
  • Yildirim I. 2025a. An efficient algorithm for fast discovery of high-efficiency patterns. Knowl Based Syst, 313:113157.
  • Yildirim I. 2025b. Mining High-Efficiency Itemsets with Negative Utilities. Mathematics, 13: 659.
  • Yun U, Kim D, Ryang H, Lee G, Lee KM. 2016. Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst, 30: 3605-3617.
  • Zhang C, Du Z, Gan W, Yu PS. 2021. TKUS: Mining top-k high utility sequential patterns. Inf Sci, 570: 342-359.
  • Zhang X, Chen G, Song L, Gan W, Song Y. 2023. HEPM: High-efficiency pattern mining. Knowl Based Syst, 281: 111068.
There are 24 citations in total.

Details

Primary Language English
Subjects Information Systems Development Methodologies and Practice, Decision Support and Group Support Systems
Journal Section Research Article
Authors

İrfan Yıldırım 0000-0002-5635-2991

Submission Date July 16, 2025
Acceptance Date September 17, 2025
Early Pub Date November 12, 2025
Publication Date November 15, 2025
Published in Issue Year 2025 Volume: 8 Issue: 6

Cite

APA Yıldırım, İ. (2025). Mining Top-K High Occupancy Itemsets. Black Sea Journal of Engineering and Science, 8(6), 1723-1730. https://doi.org/10.34248/bsengineering.1744061
AMA Yıldırım İ. Mining Top-K High Occupancy Itemsets. BSJ Eng. Sci. November 2025;8(6):1723-1730. doi:10.34248/bsengineering.1744061
Chicago Yıldırım, İrfan. “Mining Top-K High Occupancy Itemsets”. Black Sea Journal of Engineering and Science 8, no. 6 (November 2025): 1723-30. https://doi.org/10.34248/bsengineering.1744061.
EndNote Yıldırım İ (November 1, 2025) Mining Top-K High Occupancy Itemsets. Black Sea Journal of Engineering and Science 8 6 1723–1730.
IEEE İ. Yıldırım, “Mining Top-K High Occupancy Itemsets”, BSJ Eng. Sci., vol. 8, no. 6, pp. 1723–1730, 2025, doi: 10.34248/bsengineering.1744061.
ISNAD Yıldırım, İrfan. “Mining Top-K High Occupancy Itemsets”. Black Sea Journal of Engineering and Science 8/6 (November2025), 1723-1730. https://doi.org/10.34248/bsengineering.1744061.
JAMA Yıldırım İ. Mining Top-K High Occupancy Itemsets. BSJ Eng. Sci. 2025;8:1723–1730.
MLA Yıldırım, İrfan. “Mining Top-K High Occupancy Itemsets”. Black Sea Journal of Engineering and Science, vol. 8, no. 6, 2025, pp. 1723-30, doi:10.34248/bsengineering.1744061.
Vancouver Yıldırım İ. Mining Top-K High Occupancy Itemsets. BSJ Eng. Sci. 2025;8(6):1723-30.

                            24890