HypoGAN: A mode- and boundary-aware generative oversampling framework for imbalanced tabular data
Abstract
While numerous oversampling strategies have emerged in recent years, synthesizing minority-class samples that preserve the intrinsic data structure while improving classifier performance remains challenging, especially under severe class imbalance. In this study, we propose HypoGAN, a generative oversampling framework for imbalanced numeric tabular data that integrates minority mode discovery, local minority-majority boundary information, latent-noise-driven candidate generation, and post-generation filtering to produce classification-useful minority samples. To evaluate the proposed framework, we adopted a two-phase experimental strategy: (i) simulation experiments under class imbalance ratios of 90:10, 95:5, and 98:2 with feature dimensionalities of n = 3, 5, 10, 20, and (ii) real-world experiments on Wisconsin Breast Cancer, Pima Indians Diabetes, and detection of credit card fraud. In all experiments, HypoGAN was compared with SMOTE, ADASYN, and Borderline-SMOTE within a nested train/validation/test framework with leakage protection. The results indicate that HypoGAN is a competitive oversampling framework in a range of challenging imbalance settings. In the simulation study, it achieved particularly strong performance in lower- and medium-dimensional scenarios, while remaining precision-oriented under more difficult conditions. In real-world experiments, HypoGAN remained competitive in the Wisconsin Breast Cancer and Credit Card Fraud Detection datasets, achieving F1-scores of 0.9489 and 0.9028, respectively, compared to 0.9489 and 0.9037 for SMOTE. Additional results suggest that HypoGAN’s performance is scenariodependent, with effectiveness influenced by the dataset’s structure and the characteristics of the imbalance.
Keywords
References
- 1] N.V. Chawla, K.W. Bowyer, L.O. Hall and W.P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res. 16, 321-357, 2002.
Details
Primary Language
English
Subjects
Adversarial Machine Learning, Classification Algorithms
Journal Section
Research Article
Authors
Olcay Alpay
*
0000-0003-1446-0801
Türkiye
Early Pub Date
June 1, 2026
Publication Date
-
Submission Date
March 3, 2026
Acceptance Date
May 30, 2026
Published in Issue
Year 2026 Number: Advanced Online Publication