Uncertainty-Gated Dual-Branch Additive–Attention Network for Robust and Calibrated Tabular Classification Under Missingness
Abstract
Tabular deep learning is still challenging in real-world settings. Many datasets include both numerical and categorical variables, substantial missingness, and a need for not only strong classification performance but also interpretability and reliable probability estimates. DA2-Net is proposed to address this problem through a dual-branch architecture. It combines an interpretable additive pathway for feature-wise main effects with a selective self-attention pathway for higher-order interactions. In this design, features are ranked using additive contribution magnitude, uncertainty, and missingness-aware scaling. Only a Top-K subset is then passed to a single multi-head self-attention block. The final prediction is obtained through uncertainty-aware gated fusion. The model is also supported by sparsity, stability, and Brier-based calibration regularization. This allows it to balance expressive interaction modeling with transparency and robustness under incomplete data. DA2-Net is evaluated on four public binary tabular benchmarks, namely AdultIncome, DefaultCredit, HeartDisease, and BankMarketing, under controlled Missing Completely At Random (MCAR) missingness levels of 0.0, 0.1, 0.2, and 0.3. The evaluation uses 5-fold stratified cross-validation repeated across three random seeds. This produces 15 runs for each dataset and missingness condition, and 128 evaluation blocks in total across AUC, AUPRC, ACC, F1, sensitivity, specificity, Brier score, and Expected Calibration Error (ECE). Across this benchmark, DA2-Net achieves the best overall mean rank with 3.078 ± 2.044, ahead of SAINT-Lite at 3.980 ± 2.624. It achieves or shares the best result in all 16 AUC blocks, 13 of 16 AUPRC blocks, 10 of 16 ACC blocks, 11 of 16 Brier blocks, and 7 of 16 ECE blocks. These results show that its main strength lies in robust ranking-based discrimination and strong overall probability quality under missingness. It also shows a favorable practical-efficiency profile in the current benchmark, remaining more compact and inference-efficient than the main transformer-like baselines. Epoch-wise loss analysis also shows stable convergence across all four datasets. The binary cross-entropy (BCE) term drives the optimization, while the auxiliary regularizers act as controlled refinements. The ablation study further confirms that the interaction branch is essential. Removing it in the AdditiveOnly variant causes the clearest degradation in both predictive and calibration metrics. In contrast, removing the gate or the auxiliary regularization terms leads only to minor changes. A sensitivity analysis also supported the selected interaction subset size k=10 and spline knot count K=8 as balanced settings, while additive shape-function visualizations provided direct qualitative evidence for feature-wise interpretability.
Keywords
- Tabular Deep Learning
- Missing Robustness
- Additive Models
- Selective Self-Attention
- Probability Calibration
- Uncertainty-Gated Fusion
Ethical Statement
References
- Adhikari, D., Jiang, W., Zhan, J., He, Z., Rawat, D. B., Aickelin, U., & Khorshidi, H. A. (2022). A comprehensive survey on imputation of missing data in internet of things. ACM Computing Surveys, 55(7), 133. https://doi.org/10.1145/3533381
- Agarwal, R., Melnick, L., Frosst, N., Zhang, X., Lengerich, B., Caruana, R., & Hinton, G. E. (2021, December 6-14). Neural additive models: Interpretable machine learning with neural nets. In: M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, & J. Wortman Vaughan (Eds.), Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS'21), (pp. 4699-4711). https://doi.org/10.48550/arXiv.2004.13912
- Arik, S. Ö., & Pfister, T. (2021, February 2-9). Tabnet: Attentive interpretable tabular learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 8, pp. 6679-6687). https://doi.org/10.1609/aaai.v35i8.16826
- Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., & Kasneci, G. (2022). Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 35(6), 7499-7519. http://doi.org/10.1109/TNNLS.2022.3229161
- Carmichael, Z., & Scheirer, W. J. (2023). How well do feature-additive explainers explain feature-additive predictors?. https://doi.org/10.48550/arXiv.2310.18496
- Chang, C. H., Tan, S., Lengerich, B., Goldenberg, A., & Caruana, R. (2021, August 14-18). How interpretable and trustworthy are gams?. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD’21), (pp. 95-105), Singapore. https://doi.org/10.1145/3447548.3467453
- Choi, S. R., & Lee, M. (2023). Transformer architecture and attention mechanisms in genome data analysis: a comprehensive review. Biology, 12(7), 1033. https://doi.org/10.3390/biology12071033
- Cihan, M. (2026, February 4-5). Interpretable Additive Modeling for Heart Disease Prediction: A Reproducible Benchmark on the UCI Cleveland Dataset. In: M. Keskin (Eds.), International Congress of Health Disciplines (ICHD 2026), (pp. 42-70).
Details
Primary Language
English
Subjects
Deep Learning, Neural Networks
Journal Section
Research Article
Authors
Mücahit Cihan
*
0000-0002-1426-319X
Türkiye
Publication Date
June 30, 2026
Submission Date
January 23, 2026
Acceptance Date
April 20, 2026
Published in Issue
Year 2026 Volume: 13 Number: 2